The Design of CoralCDN

In this post, I describe the architecture and mechanisms of CoralCDN at a high-level. This is meant to provide some of the background necessary for some of our experiences and lessons with operating the system.

System Overview

CoralCDN is composed of three main parts: (1) a network of cooperative HTTP proxies that handle users’ requests, (2) a network of DNS nameservers for .nyud.net that map clients to nearby CoralCDN HTTP proxies, and (3) the underlying Coral indexing infrastructure and clustering machinery on which the first two applications are built.  You’ll find that I refer to the entire system as “CoralCDN”, but the network of indexing nodes as “Coral” (although this clarification is not always shared by others, who refer to the entire system as either “Coral” or “Coral Cache”).

At a high-level, the following steps occur when a client issues a request to CoralCDN.

  1. A client resolves a “Coralized” domain name (e.g., of the form sns.cs.princeton.edu.nyud.net) using CoralCDN nameservers, possibly starting at one of the 10–12 primary nameservers registered for .nyud.net under the .net domain.
  2. Upon receiving a query, a CoralCDN nameserver probes the client to determines its round-trip-time.  The nameserver uses this information to determine which nameservers (NS records) and CoralCDN proxies (authoritative A records) to return.
  3. The client sends an HTTP request for a Coralized URL to one of the returned proxy.  If the proxy is caching the file locally, it returns the file and stops.  Otherwise, this process continues.
  4. The proxy looks up the web object’s URL in Coral.
  5. If Coral returns the address of a node caching the object, the proxy fetches the object from this node.  Otherwise, the proxy downloads the object from the origin server (e.g., sns.cs.princeton.edu).
  6. The proxy stores the web object and returns it to the client browser.
  7. The proxy stores a reference to itself in Coral, recording the fact that is now caching the URL.

A review of DHTs

The Coral indexing layer is closely related to the structure and organization of distributed hash tables (DHTs) like Chord and Kademlia.   Let’s briefly review the design of DHTs; skip to the next section if you are already familiar with DHTs.

DHTs were first concurrently proposed around 2001 by several different research groups (Chord from MIT, Pastry from Rice/Microsoft, CAN from ICSI, and Tapestry from Berkeley), which basically all had a similar flavor.  They provide a distributed data structure that exposes two functions:   put(key, value) and get(key).  These proposals took load-balancing properties first considered by the consistent hashing work of Karger et al ’97, differing because consistent hashing had each node have a global view of the system (where all system participants (nodes) know all other nodes in the system) where DHTs focused on more scalable designs (where nodes only know a fraction of total system participants).  For most of these algorithms, this “fraction” is O(log n), although you now see DHT proposals called “one-hop DHTs” which effectively go back to the O(n) state of consistent hashing.   DHTs have had some real impact:  BitTorrent uses DHTs to select trackers for their “trackerless” operation, while memory-caching systems like memcached (used by Facebook, LiveJournal, Wikipedia, etc.) use it to map objects only cache nodes.

In these DHTs, all nodes are assigned one or more random identifiers, called nodeids, typically from some space “keyspace” (e.g., all 160-bit numbers).  “Keys” are also mapped into this same identifier/keyspace by computing some function F(k) that maps any input key k ({0,1}*) into the keyspace ({0,1}^160).  One often uses a cryptographic hash function like SHA-1 for this F, as it has the property of collision resistance; namely, that it is very difficult to find two inputs k and k’ for which F(k) == F(k’) yet k != k’.  (As a slight digression, if SHA-1 indeed behaves like a random function, then finding any such pair k,k’ that “collide” will require searching roughly 2^80 numbers for 160-bit values, i.e., the “Birthday bound”.  There’s been some very clever attacks against SHA-1 in the past few years to bring this down to around 2^63, but that’s perhaps not that relevant to our discussion, and one could always switch instead to a longer function, such as the 256-bit aptly-named SHA-256, and NIST currently has another competition going on to determine the next-generation cryptographic hash function.)

Anyway, I’ll describe the basic approach of DHTs using Chord as an example, because I think it’s the easiest to explain.  We first start out with a identifier space that’s comprised of all 160-bit numbers, i.e., [0, 2^160-1].  Imagine placing these numbers on a ring, so that you have both “0″ and “2^160-1″ in the 12-o’clock position.  Then, we map both keys and nodeids onto this space.  In Coral, keys are the SHA-1 hash of URLs, and nodeids are the SHA-1 hash of nodes’ IP addresses.  We say that a node is a successor (or root) of a key if it is the first node that is clockwise from the key, as you go around the ring.  You can also view a node as “being responsible for” the entire keyspace between its own identifier and that of its nearest node predecessor (in the counter-clockwise direction).

So that’s how we assign keys to nodes, which is (roughly) how Akamai supposedly assigns responsibility for caching files to its servers, as well as how memcached works (albeit memcache doesn’t support the dynamic addition or removal of cache nodes, unfortunately). This approach of consistent hashing is superior to other simple “mod-based” hashing techniques, which number all nodes from 1..n, then assign a key to the ith node by simply computing key mod n.  In mod-based hashing, one gets great load balancing, but bad smoothness.  This smoothness property addresses how many items will move between nodes if new nodes join or leave the system.  In the hash-based approach, if n changes by only one, then all items will need to be reassigned (or somehow a complex renumbering trick played); in consistent hashing, when a node is added or removed, only its successor will typically be affected.

But both of these hashing approaches assume that every node knows approximately all other nodes in the system.  DHTs ask how we can design a system where each node only knows some small fraction (typically O(log n)) of the system’s nodes.  In Chord, as in most other DHTs, these known nodes are drawn from a special distribution, one that allows any node to find a key’s node successor in a short number of routing hops (typically O(log n)) when we use the system’s nodes as a routing overlays.  Thus nodes in a multi-hop DHT play the role of both routing lookups and storing keys/value pairs.

Chord maintains two different types of “links” in its routing table:  (1) successor links which map to its immediate k nodes in a clockwise direction around the ID ring (i.e., nodes with next-highest ids), where k successors are known instead of just a single one for fault-tolerance; (2) fingers which provide long-distance shortcuts for efficient routing.  These finger links are drawn from an exponential distribution, so a node with id me knows the node successors for the ids {me+2^1, me+2^2, me+2^3, … me+2^log n}.  In other words, it has a link to some node 1/2 around the ring from it, 1/4 around the ring, 1/8, 1/16, and so on.  (In fact, some performance optimizations don’t require that fingers are chosen in quite so strict a manner–see “proximity neighbor selection“–but this basic distribution is still maintained for efficient routing.)

Then, in order to route to another node responsible for any key, a node uses these links in a greedy fashion:  in each hop, it chooses a known node who’s nodeid is closest to the key being looked up.  Given that each node has a routing table of neighbors with the above distribution, that means that each hop is guaranteed to get at least 1/2 closer to the destination, hence the O(log n) bound for routing.  As a concrete example, let’s consider we just have 5-bit identifiers, and let’s assume for simplicity that there are actually 32 nodes in the system.  Then, the node with id 0 will have links to nodes with ids 1, 2, 4, 8, and 16.  Further, if node 0 wants to lookup key 29, its lookup traverses the path through nodes 16, 24, 28, and 29, respectively.

The Coral indexing layer

Compared to “traditional” DHTs, Coral introduced a few novelties given its application requirements: Its key-value indexing layer offered weaker consistency and included a locality-organized hierarchical design.  After all, a client need not discover all proxies caching a particular file, it only needs to find several such proxies, preferably ones nearby.  So, keys in Coral are computed SHA-1(URL), values are the IP addresses of proxying a web object, and each key/value pair is stored as soft-state in the system so includes a TTL for expiry.

We start by introducing the notion of sloppiness, which basically refers to the fact that the DHT layer is explicitly designed to not be strongly (or even eventually) consistent. The basic idea with sloppiness is to prevent a key’s nominal root from getting overloaded if there’s significant interest in a single key (i.e., the URL that’s currently getting Slashdotted).  So while unpopular keys will be stored on their root, inserts to popular keys will halt during their overlay routing before reaching the actual root and instead store the key/value pair being inserted on some more distant node.  Then, a subsequent lookup for this key can similarly stop upon finding any set of values stored under the key, i.e., never actually progressing to the root.  As keys get more popular, their storage will get pushed out to more nodes in the overlay, particularly because the number of nodes that are one hop away from the root is log n, the number that are two hops away is log^2 n, etc.

So the real question is the following:  (1) how do we actually do this “push-back” to store items further away from the root, and (2) how do we ensure that lookups still succeed, no matter which routing path requests come in from, especially as keys are now getting pushed back and entries in Coral are just soft-state.

To insert a key/value pair, Coral performs a two-phase operation.  In the “forward” phase, Coral routes to nodes successively closer to the key k, as in typical DHT routing, but stops when happening upon either (1) the key’s root or (2) a node that is both full and loaded.  We say a node is full if it has reached the maximum number of values for the key (with sufficiently long-lived TTLs), and loaded when there is heavy write traffic for a particular key.  During the “reverse” phase, the proxy attempts to insert the value at the closest node seen.  These insert-rate-limiting processes prevent tree saturation—i.e., the root becoming overloaded with a high rate of insertions—and makes get operations satisfiable by more distance nodes.

That gives us sloppiness, but what about locality?  To improve locality, these routing operations are not initially performed across the entire global overlay.  Instead, each Coral node belongs to several distinct routing structures called clusters. Each cluster is characterized by a maximum desired network round-trip-time (RTT).  The system is parameterized by a fixed hierarchy of clusters with different expected RTT thresholds. Coral’s deployment uses a three-level hierarchy, with level-0 denoting the global cluster and level-2 the most local one.  Coral employs distributed algorithms to form localized, stable clusters.

coral-clusters

Every node belongs to one cluster at each level, as shown in the picture.  Coral queries nodes in its local cluster (shown by the nodes in yellow for lookups numbered 1 and 2) before going on to lookups across larger or the global cluster. This both reduces lookup latency and increases the chances of returning values stored at nearby nodes, which would in fact be pointers to nearby proxies.  Thus, if a local lookup succeeds, the returned IP addresses will also correspond to proxies that have a low RTT with respect to the requesting node.  This doesn’t guarantee that downloading a web object from that proxy will be fast, but its a good approximation, especially as we are normally dealing with smaller web objects and hence TCP-based transfers are often in slow-start (making low RTT all the more important).

Coral’s hierarchical structure has another interesting property. Because Coral nodes maintain the same nodeids in all clusters, once a lookup has progressed as far as it can in one level, it is guaranteed to have made similar forward progress within its larger cluster. This is shown by the vertical dotted lines in the above picture.  For a more in-depth description of how Coral self-organizes itself into these clusters, see our original NSDI paper.  I will also talk a bit more about the clustering design when discussing “latency-sensitive” system architectures in a later post.

This clustering mechanism arose around the same time locality-preferences started getting introduced into DHTs (see mention of “proximity neighbor selection (PNS)” above).   I’m still unclear as to the relative benefit of both.  The clusters certainly introduce complexity to the systems, which is a point against them.  On the flip side, they provide a more static structure for routing, which perhaps allows us to make stronger claims as to their load-balancing properties and also make them better suited for “constrained routing” for security-conscious DHT designs. Perhaps most important, however, is that the hierarchy seems less susceptible to variations arising from measuring RTTs between nodes (especially important in virtualized environments like PlanetLab which can cause significant jitter, as I will discuss in a later post).  Thus, I’m concerned that the PNS-based approaches will “miss” locally stored content more often than the hierarchy-based approach — the routing path in PNS has to line up perfectly, after all — although I’ve not simulated or otherwise tested this theory to say so with any real confidence.

The CoralCDN HTTP Proxy

CoralCDN seeks to aggressively minimize load on origin servers.  We now summarize how its HTTP proxies were designed to use the Coral indexing layer for inter-proxy cooperation and rapid adaptation to flash crowds.

Locality-optimized inter-proxy transfers.  Each CoralCDN proxy keeps a local cache from which it can immediately fulfill client requests.  When a client requests a non-resident URL, CoralCDN proxies attempt to fetch web content from each other, using the Coral indexing layer for discovery.  A proxy will only contact a URL’s origin server after the Coral indexing layer provides no referrals or none of its referrals return the data.

If a CoralCDN proxy discovers that one or more neighbors have the object, it attempts to establish TCP connections in parallel to multiple proxies (2 in our live deployment).  It issues an HTTP request on the first established connection.  If this request fails—e.g., if the remote proxy has evicted cached content, returns an unexpected error, or is unresponsive—these pre-established connections provide fast failover.  Otherwise, they are closed once the remote proxy begins returning valid content.

CoralCDN’s inter-proxy transfers are locality-optimized, both from their use of parallel connection requests and, more importantly, by the order which neighboring proxies are contacted. The properties of Coral’s hierarchical indexing layer implies that the list of proxy IP addresses returned by get(SHA-1(URL)) will be sorted based on their cluster distance to the request initiator.  Thus, proxies will attempt to contact level-2 neighbors before level-1 and level-0 proxies, respectively.

Rapid adaptation to flash crowds. Unlike many web proxies, CoralCDN is explicitly designed for flash-crowd scenarios.  If a flash crowd suddenly arrives for a web object, proxies self-organize into a form of multicast tree for retrieving the object.  Data streams from the proxies that initially fetch the object from the origin server to those arriving later. CoralCDN provides such behavior by combining optimistic references and cut-through routing.

First, as soon as a CoralCDN proxy begins receiving the first bytes of a web object—either from the origin or from another proxy—it inserts a reference to itself in the Coral indexing layer with a short time-to-live (30 seconds).  It continually renews this short-lived reference until either it completes the download or the download fails.

Second, CoralCDN’s cut-through routing at each proxy helps reduce transmission time for larger files.  That is, a proxy will upload portions of a object as soon as they are downloaded, not waiting until it receives the entire object.  (The proxy implements such behavior through its event-based structure communicating through the file system. Multiple client connections can register read callbacks on a single file, and whenever this file is written to as content is received from a remote node, these connection callbacks are dispatched and the newly-written content is sent to other remote nodes downstream in the tree.)  Once any CoralCDN proxy successfully finishes downloading the content, it inserts a much longer-lived reference to itself into Coral. Because the Coral insertion algorithm accounts for expiry times—the deployed system uses 2-hour TTLs for “successful” transmissions (e.g., status codes of 200 (OK), 301 or 302 (Moved), etc.), while also uses 15-minute TTLs for non-transient “unsuccessful” transmissions (e.g., 403 (Forbidden), 404 (File Not Found), etc.) from the origin—these longer-lived references will overwrite shorter-lived ones, and they can be stored on well-selected nodes even under high insertion load. Proxies renew these pointers in Coral  after their TTL expires, but only if the content has been subsequently requested by other parties.

A CoralCDN proxy manages its disk using a least-recently-used eviction policy (each proxy in our PlanetLab deployment has 3GB of disk cache).  The LRU eviction policy can cause a proxy to evict an item, even while a reference in Coral still exists.  (Offering only best-effort guarantees, Coral does not provide any mechanism to delete references.) Recovery from such failures is handled by HTTP failover between proxies, as described above.  That said, in a later post I’ll show that this condition is relatively rare given the system’s working set size.

So that’s the basic design of CoralCDN, with a little pedagogical discussion of DHTs thrown in for fun.  Since deployment, however, I’ve needed to introduce additional functionality to keep in running.  In the next post, I’ll give a brief overview of some of our security mechanisms, while I’ll later discuss some of the per-origin bandwidth fair-sharing mechanisms that we introduced to address the fact that, well, PlanetLab gives us approximately 2 TB of upstream bandwidth per day system-wide, while we’ve long seen demand on the order of >10TB per day.  So if CoralCDN gives you the dreaded “over quota” message, I plead impotent!

  • Priya

    Mike,
    Very informative article. I have a doubt. Can you please let me know how dynamic pages are handled by Coral CDN. ie: if a web page has to be created based on the logged in user, may be by retrieving data from the database, how does CoralCDN handle such dynamically created pages?? Any info on this would be helpful.
    Regards,
    Priya.

  • Priya

    Mike,
    Very informative article. I have a doubt. Can you please let me know how dynamic pages are handled by Coral CDN. ie: if a web page has to be created based on the logged in user, may be by retrieving data from the database, how does CoralCDN handle such dynamically created pages?? Any info on this would be helpful.
    Regards,
    Priya.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Hi Priya,

    So CoralCDN mostly obeys HTTP Cache-Control directives, which specify how long a Web object may be cached for. There is one big caveat here, however: Designed primarily for reducing server load, CoralCDN currently uses a lower-bound on cache revalidation of 5 minutes. This means that, even if the no-cache HTTP directive is specified, it will only revalidate the content to the server (in order to reduce server load) once per 5 minutes. If you do not set any expiry period, Coral defaults to a 12 hour expiry period.

    That said, as my post about some of CoralCDN’s security mechanisms mentions:
    http://sns.cs.princeton.edu/2009/05/security-mechanisms-in-coralcdn/

    CoralCDN proxies delete cookies in both directions. So “dynamic” content that’s personalized based on a user’s unique, authenticated information doesn’t arise.

    Now, one might rightly ask why we are caching “no-cache” information at all, however short a time period. Basically, we found that such server behavior was primarily used to ensure websites can keep accurate accounting of their users, as opposed to privacy reasons (when one disallows login).

    On the other hand, this might lead to strange behavior sometimes when content still uses some implicit input from the client that’s not part of the actual URL. For example, Google (and other sites) will send a redirect based on country-level IP geolocation: If you access google.com in France, it will redirect you to google.fr. Now, given that CoralCDN has a world-wide discovery layer, that redirect will be cached in its DHT, and a subsequent Coralized request from the US might get a redirect to google.fr. Not ideal, but this is a problem when non-explicit interfaces are used. (And personally, I hate this behavior when I’m traveling.)

    It would be relatively easy to start obeying “private” and “no-cache” directives. But, we find them more used by sites (often without understanding their implications) for analytics, as opposed to actual privacy reasons, so we have kept CoralCDN’s original behavior intact.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Hi Priya,

    So CoralCDN mostly obeys HTTP Cache-Control directives, which specify how long a Web object may be cached for. There is one big caveat here, however: Designed primarily for reducing server load, CoralCDN currently uses a lower-bound on cache revalidation of 5 minutes. This means that, even if the no-cache HTTP directive is specified, it will only revalidate the content to the server (in order to reduce server load) once per 5 minutes. If you do not set any expiry period, Coral defaults to a 12 hour expiry period.

    That said, as my post about some of CoralCDN’s security mechanisms mentions:

    http://sns.cs.princeton.edu/2009/05/security-mechanisms-in-coralcdn/

    CoralCDN proxies delete cookies in both directions. So “dynamic” content that’s personalized based on a user’s unique, authenticated information doesn’t arise.

    Now, one might rightly ask why we are caching “no-cache” information at all, however short a time period. Basically, we found that such server behavior was primarily used to ensure websites can keep accurate accounting of their users, as opposed to privacy reasons (when one disallows login).

    On the other hand, this might lead to strange behavior sometimes when content still uses some implicit input from the client that’s not part of the actual URL. For example, Google (and other sites) will send a redirect based on country-level IP geolocation: If you access google.com in France, it will redirect you to google.fr. Now, given that CoralCDN has a world-wide discovery layer, that redirect will be cached in its DHT, and a subsequent Coralized request from the US might get a redirect to google.fr. Not ideal, but this is a problem when non-explicit interfaces are used. (And personally, I hate this behavior when I’m traveling.)

    It would be relatively easy to start obeying “private” and “no-cache” directives. But, we find them more used by sites (often without understanding their implications) for analytics, as opposed to actual privacy reasons, so we have kept CoralCDN’s original behavior intact.

  • Priya

    Mike,
    Thanks for the reply. Though your reply is very informative, I am afraid it has not answered by question. May be I did not put forth the question correctly.. Simply put, I would like to know if the Coral proxies stores/caches only static web pages or does it store dynamic pages(like pages retrieved based on user request) also.
    Appreciate if you could throw some light on this.
    Thanks.
    Regards,
    Priya.

  • Priya

    Mike,
    Thanks for the reply. Though your reply is very informative, I am afraid it has not answered by question. May be I did not put forth the question correctly.. Simply put, I would like to know if the Coral proxies stores/caches only static web pages or does it store dynamic pages(like pages retrieved based on user request) also.
    Appreciate if you could throw some light on this.
    Thanks.
    Regards,
    Priya.

  • http://www.cs.princeton.edu/~jterrace/ Jeff Terrace

    Hey Priya,

    I think the short answer is yes, but just with a small expiration delay. For example:
    http://www.cnn.com.nyud.net/

    Jeff

  • http://www.cs.princeton.edu/~jterrace/ Jeff Terrace

    Hey Priya,

    I think the short answer is yes, but just with a small expiration delay. For example:
    http://www.cnn.com.nyud.net/

    Jeff

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Priya:

    From an HTTP protocol perspective, and especially from the perspective of a HTTP cache, there’s not really any such thing as “static” and “dynamic” content. There is, however, an HTTP header which describes how long a particular object can be cached before a user and/or proxy should revalidate its freshness with an origin server.

    Now, you could think of “static” content as having no expiry time (or just something significantly long, such as measured in days) and “dynamic” content as having a “no-cache” or “private” directive (the latter specifying that a user’s browser could cache content, but an on-path proxy should not). But from the perspective of a web proxy, a web object is a web object, the only difference being whether and for how long a particular object is to be cached (i.e., specified in the Cache-Control header in HTTP/1.1).

    So my direct answer to your question is: CoralCDN stores both what are sometimes thought of as “static” and “dynamic” content, with some of the caveats described above.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Priya:

    From an HTTP protocol perspective, and especially from the perspective of a HTTP cache, there’s not really any such thing as “static” and “dynamic” content. There is, however, an HTTP header which describes how long a particular object can be cached before a user and/or proxy should revalidate its freshness with an origin server.

    Now, you could think of “static” content as having no expiry time (or just something significantly long, such as measured in days) and “dynamic” content as having a “no-cache” or “private” directive (the latter specifying that a user’s browser could cache content, but an on-path proxy should not). But from the perspective of a web proxy, a web object is a web object, the only difference being whether and for how long a particular object is to be cached (i.e., specified in the Cache-Control header in HTTP/1.1).

    So my direct answer to your question is: CoralCDN stores both what are sometimes thought of as “static” and “dynamic” content, with some of the caveats described above.

  • Priya

    Jeff/Mike,
    Thanks a lot for your answers. It proved helpful. Basically, for dynamic content, for which caching period is short or nill, the proxy everytime contacts the original server to fetch the requested page.
    So in case, if a dynamic page(with no-cache directive) has static content, it does not get cached separately in the proxy. The web page as a whole will be cached. Hence the next time the user requests for the same dynamic page, the whole page along with the static content will be retrieved from the original server. Please let me know if I am correct. Since we are in the process of understanding P2P CDNs, my question might be very basic. Appreciate if could answer.
    Thanks.
    Regards,
    Priya.

  • Priya

    Jeff/Mike,
    Thanks a lot for your answers. It proved helpful. Basically, for dynamic content, for which caching period is short or nill, the proxy everytime contacts the original server to fetch the requested page.
    So in case, if a dynamic page(with no-cache directive) has static content, it does not get cached separately in the proxy. The web page as a whole will be cached. Hence the next time the user requests for the same dynamic page, the whole page along with the static content will be retrieved from the original server. Please let me know if I am correct. Since we are in the process of understanding P2P CDNs, my question might be very basic. Appreciate if could answer.
    Thanks.
    Regards,
    Priya.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Partially.

    Web proxies and servers don’t reason about “web pages” (i.e., the original URL and all the embedded content (URLs) that also get fetched and loaded by your browser when you load the original URL). They only reason about individual web objects (URLs).

    So when you go to a site like Facebook or Amazon, you fetch the original .html URL, as well as a whole bunch of embedded links (javascript, CSS, images, etc.), all in separate HTTP requests.

    These resulting objects can all have different cache expiration times — indeed, what you normally think of as “dynamic object” (the HTML) is mostly comprised of “static content” (the images, CSS, etc.). So while your browser might need to contact the origin site each time to fetch/revalidate the “dynamic” object (given that it has a short expiry time), it might be able to just use its local cache to load the embedded “static” content. The same with a proxy, in that it just treats each object request separately.

    If you use FireFox, try out the plugin “Firebug”. It’ll give you a sense of the number of URLs that are fetched to load a single “page”.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    Partially.

    Web proxies and servers don’t reason about “web pages” (i.e., the original URL and all the embedded content (URLs) that also get fetched and loaded by your browser when you load the original URL). They only reason about individual web objects (URLs).

    So when you go to a site like Facebook or Amazon, you fetch the original .html URL, as well as a whole bunch of embedded links (javascript, CSS, images, etc.), all in separate HTTP requests.

    These resulting objects can all have different cache expiration times — indeed, what you normally think of as “dynamic object” (the HTML) is mostly comprised of “static content” (the images, CSS, etc.). So while your browser might need to contact the origin site each time to fetch/revalidate the “dynamic” object (given that it has a short expiry time), it might be able to just use its local cache to load the embedded “static” content. The same with a proxy, in that it just treats each object request separately.

    If you use FireFox, try out the plugin “Firebug”. It’ll give you a sense of the number of URLs that are fetched to load a single “page”.

  • Priya

    Mike,
    Thanks a lot for explaining so clearly. I could understand the concept now.
    Now coming to CoralCDN, it is mentioned that CoralCDN is designed to manage flash crowd. Assuming that large number of users are trying to login to a webpage simultaneously that is hosted by coralCDN, coral everytime contacts the original server to get the page(dynamic content) specific to each user. In this case, the original server becomes overloaded with huge number of requests and finally become unavailable. Is it not so?
    Regards,
    Priya.

  • Priya

    Mike,
    Thanks a lot for explaining so clearly. I could understand the concept now.
    Now coming to CoralCDN, it is mentioned that CoralCDN is designed to manage flash crowd. Assuming that large number of users are trying to login to a webpage simultaneously that is hosted by coralCDN, coral everytime contacts the original server to get the page(dynamic content) specific to each user. In this case, the original server becomes overloaded with huge number of requests and finally become unavailable. Is it not so?
    Regards,
    Priya.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    With a normal web proxy, you are correct. With CoralCDN, however, the situation is a bit different, for two reasons.

    1. You can’t “logon” with CoralCDN: It doesn’t support https, POST, or cookies (see other posts).

    2. The CoralCDN proxies use a minimum expiry time for content (5 minutes), even when content is marked with a shorter expiry (or even as non-cacheable).

    Both of these obvious reduce its functionality (e.g., you can’t use a CoralCDN proxy as a normal, transparent web proxy) to focus on its goal of content distribution and flash-crowd protection for the masses.

  • http://www.cs.princeton.edu/~mfreed/ Mike Freedman

    With a normal web proxy, you are correct. With CoralCDN, however, the situation is a bit different, for two reasons.

    1. You can’t “logon” with CoralCDN: It doesn’t support https, POST, or cookies (see other posts).

    2. The CoralCDN proxies use a minimum expiry time for content (5 minutes), even when content is marked with a shorter expiry (or even as non-cacheable).

    Both of these obvious reduce its functionality (e.g., you can’t use a CoralCDN proxy as a normal, transparent web proxy) to focus on its goal of content distribution and flash-crowd protection for the masses.

  • Priya

    Mike,
    Thanks a lot for the clarification.
    Regards,
    Priya.

  • Priya

    Mike,
    Thanks a lot for the clarification.
    Regards,
    Priya.

  • http://ke7.org/ alex

    Thanks a lot for explaining, i can understand how CoralCDN works much better now.

  • Artezanal

    This topic is very good and is very interesting, I hope that you write more about it, thanks.

    http://artezanalnet.com.br/blog/

  • Artezanal

    This topic is very good and is very interesting, I hope that you write more about it, thanks.

    http://artezanalnet.com.br/blog/

  • http://www.genie4free.com/96927 store

    Hello, I was just reading your post; Princeton S* Network Systems» Blog Archive » The Design of CoralCDN. I don’t know if you have seen it or any of your readers but there is a new free shopping application that makes it easy to find the best internet price on any item. I got it last Tuesday for free and it still is I am sure.

  • http://www.ztechsoftware.com Low Price Software

    Your post is concise and to the point and it has been a pleasure to read.

  • Haslykate

    “Read reviews, get customer ratings, see screenshots, and learn more about Love quotes for Moods – Love quotes |Love poems | love spells | true love quotes for the heart on the App Store. Download Love quotes for Moods – Love quotes |Love poems | love spells | true love quotes for the heart and enjoy it on your iPhone, iPad, and iPod touch.”

  • youfantasy

    Thanks for explaining about it ,its concisely done. I just wonder why they don’t support https, POST or cookies? Well, I don’t really have any idea why.
    Allesa
    My blog : robinet thermostatique 

  • https://www.artezanal.com/ Artezanal Decorau00e7u00e3o

    Priyau20225 years agonnMike, Very good and informative article, I could understand the concept now, Thanks.nnhttps://www.artezanal.com/