CoralCDN Lesson: Fair-sharing bandwidth via admission control

For commercial CDNs and other computing services, the typical answer to resource limits is simply to acquire more capacity.  As CoralCDN’s deployment on PlanetLab does not have that luxury, we instead apply admission control to manage its bandwidth resources.  This post describes some of these mechanisms, while we’ll take a step back in the next post to describe some of the challenges in doing resource accounting and management on a virtualized and shared platform such as PlanetLab.

asiantsunamivideos

Following the Asian tsunami of December 2004, and with YouTube yet to be created, CoralCDN distributed large quantities of amateur videos of the natural disaster.  With no bandwidth restrictions on PlanetLab at the time, CoralCDN’s network traffic to the public Internet quickly spiked.  PlanetLab sites threatened to pull their servers off the network if such use could not be curtailed.  It was agreed that the service should restrict its usage to approximately 10GB per node per day.

One could limit a proxy’s bandwidth use by rate-throttling all traffic (as in BitTorrent and tor), or by simply shutting a proxy down after exceeding its configured daily capacity (also supported by tor).  But if CoralCDN primarily provides a service for websites, as opposed to for clients, then perhaps it should provide some notion of fairness across origin domains. While this problem may seem difficult given that CoralCDN can interact with 10,000s of domains over the course of the day, our usage patterns greatly simplify the accounting.

forbiddenThe figure on the left shows the total number of requests per domain that CoralCDN received over one recent day in January, 2009 (the solid top line).  The distribution clearly has some very popular domains: the most popular one (Tamil YouTube) received 2.6M requests, the second most popular (a Brazilian forum on Japanese manga) received 1.2M requests, the third (screensaver videos of user-contributed “fractal flames”) received 880K requests, while the remaining distribution fell off in a Zipf-like manner. Given that CoralCDN’s traffic is dominated by a limited number of domains, its mechanisms can serve mainly to reject requests for (i.e., perform admission control on) the bandwidth hogs.  Still, CoralCDN should differentiate between peak limits and steady-state behavior to allow for flash crowds or changing traffic patterns. To achieve these aims, each CoralCDN proxy implements an algorithm that attempts to simultaneously (1) provide a hard-upper limit on peak traffic per hour (configured to 1000 MB per hour), (2) bound the expected total traffic per epoch in steady state (400 MB per hour), and (3) bound the steady-state limit per domain.  As setting this last limit statically—such as 1/k-th of the total traffic if there are k popular domains—would lead to good fairness but poor utilization (given the non-uniform distribution across domains), we dynamically adjust this last traffic limit to balance this trade-off.

During each hour-long epoch, a proxy records the total number of bytes transmitted for each domain.  It also calculates domains’ average bandwidth as an exponentially-weighted moving average (attenuated over one week), as well as the total average consumption across all domains.   Across epochs, bandwidth usage is only tracked, and durably stored, for the top-100 domains (although more traditional HTTP logs do store information about all requests).  If a domain is not currently one of the top-100 bandwidth users, its historical average bandwidth is set to zero (providing leeway to sites experiencing flash crowds).

The per-domain dynamic bandwidth limit is calculated at the beginning of each epoch for each of the top-100 domains as follows.  The proxy computes the sum of all top domains’ average bandwidths. If this sum exceeds the total steady-state limit, it reduces the per-domain limit until the point when, if that limit had been enforced in the prior time period, the bandwidth sum would have been <= the permitted total limit.  (This limit thus first affects the top-ranked domain, then affects the top-two domains, etc.)

When a particular requested domain is over its hourly budget (case 3 above), CoralCDN proxies respond with 403 Forbidden messages. If instead the proxy is over its peak or steady-state limit calculated over all domains (cases 1 or 2 above), then the proxy redirects the client back to the origin site, and the proxy temporarily making itself unavailable for new client requests, which would be rejected anyway.  (If clients are to be redirected back to the origin, a proxy appends the query-string coral-no-serve on the location URL returned to the client.  Origins that use redirection scripts with CoralCDN check for such a string prevent loops.)  Although not the default, operators of some sites preferred this redirection home even if their particular domain was to blame. Domains can specify this type of control in a X-Coral-Control response header (which we’ll discuss a bit later).

bwusageBy applying  these mechanisms, CoralCDN reduces its bandwidth consumption to manageable levels.  While its demand sometimes exceeds 10TBs per day, its actual HTTP traffic remains steady at about 2TB per day after rejecting a significant number of requests.  The scatter plot in the above figure shows the number of requests resulting in 403 responses per domain, most due to these admission control mechanisms.  We see how variances in domains’ object sizes yield different rejection rates.  The second-most popular domain mostly serves gifs smaller than 10KB and experiences a rejection rate of 3.3%.  Yet the fractal videos of the third-most popular domain are typically 5MB in size, and it sees an 89% rejection rate. Ranking these domains instead by bandwidth served, the figure on the left plots the average hourly traffic that four proxies permitted each of the top-100 bandwidth consumers.

Note that this accounting mechanism is applied independently on each node.  This brings up two interesting questions worth further consideration:

  1. Would the system behave differently if its accounting shared information between nodes?  One potential result from not sharing information is penalizing content with local interest (that is, files that may be of interest only to more regional Internet populations).  For if a system’s server-selection mechanisms actually redirect clients to more localized proxies, then most users from a particular region might only see a smaller subset of the total number of servers.  On the other hand, if our goal is more to prevent the total bandwidth hogs, rather than provide any strict notion of fairness, does this problem arise significantly in practice?
  2. From a more research perspective, how can one efficiently perform (approximate) accounting across the entire network, yet do so without centralization and in a bandwidth-efficient manner.  Work  in SIGCOMM ’07 called Cloud Control with Distributed Rate Limiting considered a similar problem, but effectively shared O(kN) information across the system via gossiping (where k is the number of domains we’re monitoring and N the number of nodes).  Can we provide accurate/precise rate limiting with o(kN) communication complexity?  If anybody has any pointers to effective algorithms here, please leave a comment.  It feels like the sensor network community probably has addressed this problem at some point when aggregating and collecting information.

In the next post, we’ll ask whether a hosting platform need bother provide any special interfaces or information to support such accounting.

  • thanks for this article.it helped in my project

  • A new player in the Content Distribution Network field – TinyCDN™ just launched

    There’s a new player in the Content Delivery Network field and that is TinyCDN™, a company dedicated to optimizing the speed and reliability of content delivered over the Internet. It makes it easy and economical for individuals and companies to distribute information over the Internet faster by using a Content Distribution Network (CDN).

    People expect to view websites, videos, microsites, flash files, pictures and other content-rich applications in milliseconds.You will lose visitors if you make them wait.
    With TinyCDN, users don’t have to wait. Files can be optimized and then delivered on the closest path. Visitors in Hong Kong, London, and Chicago will now be able to access information faster than ever before

  • Pingback: Princeton S* Network Systems» Blog Archive » CoralCDN Lesson: The design was mostly wrong()

  • Anonymous

    very nice article Thanks For Sharing nice Chart also its Really know about This….

  • Anonymous

    Thank you for the article. I was unaware of CoralCDN before reading this.

  • Rade Stanojevic

    The UCSD paper on Distributed Rate Limiting deals with the problem of controlling the aggregate usage of a service (service = domain, in your context) across a set of distributed nodes so that performance (reject rate) at each node is equalized. Their solution indeed uses gossiping to collect the aggregate (accounting) usage that is used as a feedback on when to discard packets/requests.

    In this paper, we show that in fact one can solve the same problem, without collecting the aggregate info)and without any centralized controller). Namely, if every node updates its local limit based on its performance (QoS) and the performance of its few neighbors, eventually the system should converge to the state at which performance is equalized across all nodes.

    However, the problem you are facing in CoralCDN with multiple domains sharing finite resources seems way more challenging. Actually, it is not obvious to me what the objective would be and how to formalize the problem of bandwidth sharing in such context. Do you have any thoughts on this? I suspect that for appropriate performance objective, one might not need per node/domain accounting, but could rely on some adaptive decentralized controller to drive the system to the desired state with minimal communication overhead.

  • Rade,

    Thanks for the comment. The website for hamilton.ie seems offline, so I can't currently access your paper (Google HTML cache is a bit hard to read).

    That said, I haven't thought much about formally formulating the problem, much actually solving it, but I could imagine specifying the function something like this:

    For a given node:
    maximize sum_{forall_i} u (c_i)
    subject to
    c_i >= 0 // allocated capacity to site i is non-negative
    c_i <= d_i // site i's allocated capacity <= demand
    sum_{forall_i} c_i <= C // allocated capacity doesn't exceed node's total capacity

    and model the utility function u as some concave function (say log x).

    Now, this doesn't say anything about performance, but it does say that, for a given node, it's better to allocate some marginal bandwidth to a new domain rather than allocate more resources to one existing one (subject to available capacity), given the concave utility. Now if you extend this over all nodes, reasoning about a site's utility function taking as input sum capacity c_i,j forall node's j, subject to all available capacities, that might be a reasonable formulation.

    Your thoughts are welcome.

  • Rade,

    Thanks for the comment. The website for hamilton.ie seems offline, so I can't currently access your paper (Google HTML cache is a bit hard to read).

    That said, I haven't thought much about formally formulating the problem, much actually solving it, but I could imagine specifying the function something like this:

    For a given node:
    maximize sum_{forall_i} u (c_i)
    subject to
    c_i >= 0 // allocated capacity to site i is non-negative
    c_i <= d_i // site i's allocated capacity <= demand
    sum_{forall_i} c_i <= C // allocated capacity doesn't exceed node's total capacity

    and model the utility function u as some concave function (say log x).

    Now, this doesn't say anything about performance, but it does say that, for a given node, it's better to allocate some marginal bandwidth to a new domain rather than allocate more resources to one existing one (subject to available capacity), given the concave utility. Now if you extend this over all nodes, reasoning about a site's utility function taking as input sum capacity c_i,j forall node's j, subject to all available capacities, that might be a reasonable formulation.

    Your thoughts are welcome.