CoralCDN Lesson: The interface was right -or- Programming elastic CDN services

While my previous post argued that CoralCDN’s architecture design might not be ideal given its deployment, it has proven successful from the simple perspective of real-world use. Rather than any technical argument, we believe that the central reason for its adoption has been its simple user interface: Any URL can be requested through CoralCDN by appending nyud.net to its hostname.

Interface design

While superficially obvious, this interface design achieves several important deployment goals:

  • Transparency: Work with unmodified, unconfigured, and unaware web clients and web servers.
  • Deep caching: Support the automatic retrieval of embedded images or links also through CoralCDN when appropriate.
  • Server control: Not interfere with sites’ ability to perform usage logging or otherwise control how their content is served (e.g., via CoralCDN or directly).
  • Ad-friendly: Not interfere with third-party advertising, analytics, or other tools incorporated into a site.
  • Forward compatible: Be amenable to future end-to-end security mechanisms for content integrity or other end-host deployed mechanisms.

Consider an alternative, even simpler, interface design. RedSwoosh, Dijjer, FreeCache, and CoBlitz, among others, all embedded origin URLs within the URL’s relative path, e.g., http://nyud.net/example.com/file. Not only is HTTP parsing simpler, but their nameservers do not need to synthesize DNS records on the fly (unlike CoralCDN’s DNS servers for *.nyud.net) and can take better advantage of client-side DNS caching. Unfortunately, while such an interface supports the distribution of specifically named files, it fails to transparently load an HTML webpage: Any relative embedded links would lack the example.com prefix, and a proxy would thus be unable to identify to which origin domain it refers. (One alternative might be to try to rewrite pages to add such links, although active content such as javascript makes this notoriously difficult, even ignoring the goal of not modifying server content.)

CoralCDN’s approach, however, interprets relative links with respect to a page’s Coralized hostname, and thus transparently requests these objects through it as well. But because CoralCDN does not modify body content, all absolute URLs continue to point to their origin sites. Thus, third-party advertisements are largely unaffected, and origin servers can use simple web beacons to log clients. Origin sites retain control about how their content is displayed and, down the line, content may be amenable to verification through end-to-end content signatures (as in RFC2660) or web tripwire tricks.

An API for dynamic adoption

CoralCDN was envisioned with manual URL manipulation in mind, whether by publishers editing HTML, users typing Coralized URLs, or third-party posters to Usenet groups or web portals submitting Coralized URLs. After deployment, however, users soon began treating CoralCDN’s interface as an API for accessing CDN services.

On the client side, these techniques included simple browser extensions that offer “right-click” options to Coralize links or that provide a CoralCDN link when a page appears unavailable. They also ranged to more complex integration into frameworks like Firefox’s Greasemonkey. Greasemonkey allows third-party developers to write site-specific javascript code that, once installed by users, manipulates a site’s HTML content (usually through the DOM interface) whenever the user accesses it. CoralCDN scripts for Greasemonkey include ones that automatically rewrite links, or that add Coralized links (in text or via tooltips) to posted articles on Slashdot, digg, or other portals. CoralCDN was also integrated directly into a number of client-side software for podcasting, such as Plone’s Plodcasting, Juice Receiver, and Easypodcast. Given the view that podcasting served to “democratize” Internet radio broadcasting, this seemed to fit quite well with CoralCDN’s stated aims of “democratizing content publication”.

But perhaps the more interesting cases of CoralCDN integration are those on the server-side. In flash-crowd scenarios, smaller websites might become overloaded for a variety of reasons: bandwidth-limited to serve larger files (especially due to hosting contracts), CPU-limited given expensive scripts (e.g., PHP), or disk-limited given expensive database queries. At the same time, their webserver(s) can often still handle the network interrupt and processing overhead for simple HTTP requests. And further, websites often still want to get complete logs for all page accesses, especially given Referer headers. Given such scenarios, a common use of CoralCDN is for origin servers to directly receive an HTTP request, but respond with an HTTP redirect (302) to a Coralized URL that will serve the actual content.

This is as simple as installing a server plugin and writing a few lines of code. For example, the complete dynamic redirection rule using Apache’s mod_rewrite plugin is the following:

   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT !^CoralWebPrx
   RewriteCond %{QUERY_STRING !(^|&)coral-no-serve$
   RewriteRule ^(.*)$ http://%{HTTP_HOST.nyud.net %{REQUEST_URI [R,L]

while similar plugins and scripts exist for other web platforms (e.g., the WordPress blogging suite).

Redirection rules must be crafted somewhat carefully, still. In the above example, the second line checks whether the client is a CoralCDN proxy and thus should be served directly. Otherwise, a redirection loop could be formed. Numerous server misconfigurations have omitted such checks; thus, CoralCDN proxies check for potential loops and return errors if present. Amusingly, some early users during CoralCDN’s deployment caused recursion in a different way. By submitting URLs with many copies of nyud.net appended to the hostname suffix:

   http://example.com.nyud.net.nyud.net....nyud.net/

they created a form of amplification attack against CoralCDN. This single request caused a proxy to issue a number of requests, stripping the last instance of nyud.net off in each iteration. Such requests are now rejected.

While the above dynamic rewriting rules apply for all content, other sites incorporate URL Coralization in more inventive ways:

   RewriteCond %{HTTP_REFERER slashdot\.org [NC]
   RewriteCond %{HTTP_REFERER digg\.com [NC,OR]
   RewriteCond %{HTTP_REFERER blogspot\.com [NC,OR]

Combined with the above, these rules redirect clients to CoralCDN if and only if the requester originates from particular high-traffic portals. In Apache, such rules can be specified in .htaccess files and thus do not require administrative privileges. Other sites have even combined such tools with server plugins that monitor server load and bandwidth use, so that their servers only start rewriting requests under high load conditions.

These examples have shown users innovate with CoralCDN’s simple interface, which can be accessed like any other URL resource. We have even recently seen Coralized URLs being dynamically constructed within client-side Flash ActionScript. Indeed, CoralCDN’s most popular domain as of January 2009 was a Tamil imitation of YouTube that loads Coralized URLs from Flash animations of “recently watched” videos.

An Elastic Computing Resource

One of the most interesting aspects of these developments has been the adoption of CoralCDN as an elastic resource for content distribution, long before the term “cloud computing” was popularized and Amazon began offering CDN and other “surge” services.  Through completely automated means, work can get dynamically expanded out to use CoralCDN when websites require additional bandwidth resources, and contracted back when flash crowds abate. Still without prior registration, sites can even specify between several options on how they would like CoralCDN to handle their requests. X-Coral-Control headers returned by webservers provide in-band signaling and are saved as cache meta-data, such as whether to “redirect home” when domains exceed their bandwidth limits (per our previous post). But again, this type of control illustrates CoralCDN’s interface as a programmable API.

Admittedly, CoralCDN can provide free service (and avoid registration) because it operates on a deployment platform, PlanetLab, comprised of volunteer research sites.  On the flip side, CoralCDN’s popularity led it to quickly overwhelm the bandwidth resources allocated to PlanetLab by affiliated sites, leading to the fair-sharing mechanisms we described earlier.  My next (and final) post about our experiences with CoralCDN asks whether we should just move off a trusted platform like PlanetLab and accept untrusted operators.