CoralCDN Lesson: Fair-sharing bandwidth via admission control
For commercial CDNs and other computing services, the typical answer to resource limits is simply to acquire more capacity. As CoralCDN’s deployment on PlanetLab does not have that luxury, we instead apply admission control to manage its bandwidth resources. This post describes some of these mechanisms, while we’ll take a step back in the next post to describe some of the challenges in doing resource accounting and management on a virtualized and shared platform such as PlanetLab.
Following the Asian tsunami of December 2004, and with YouTube yet to be created, CoralCDN distributed large quantities of amateur videos of the natural disaster. With no bandwidth restrictions on PlanetLab at the time, CoralCDN’s network traffic to the public Internet quickly spiked. PlanetLab sites threatened to pull their servers off the network if such use could not be curtailed. It was agreed that the service should restrict its usage to approximately 10GB per node per day.
One could limit a proxy’s bandwidth use by rate-throttling all traffic (as in BitTorrent and tor), or by simply shutting a proxy down after exceeding its configured daily capacity (also supported by tor). But if CoralCDN primarily provides a service for websites, as opposed to for clients, then perhaps it should provide some notion of fairness across origin domains. While this problem may seem difficult given that CoralCDN can interact with 10,000s of domains over the course of the day, our usage patterns greatly simplify the accounting.
The figure on the left shows the total number of requests per domain that CoralCDN received over one recent day in January, 2009 (the solid top line). The distribution clearly has some very popular domains: the most popular one (Tamil YouTube) received 2.6M requests, the second most popular (a Brazilian forum on Japanese manga) received 1.2M requests, the third (screensaver videos of user-contributed “fractal flames”) received 880K requests, while the remaining distribution fell off in a Zipf-like manner. Given that CoralCDN’s traffic is dominated by a limited number of domains, its mechanisms can serve mainly to reject requests for (i.e., perform admission control on) the bandwidth hogs. Still, CoralCDN should differentiate between peak limits and steady-state behavior to allow for flash crowds or changing traffic patterns. To achieve these aims, each CoralCDN proxy implements an algorithm that attempts to simultaneously (1) provide a hard-upper limit on peak traffic per hour (configured to 1000 MB per hour), (2) bound the expected total traffic per epoch in steady state (400 MB per hour), and (3) bound the steady-state limit per domain. As setting this last limit statically—such as 1/k-th of the total traffic if there are k popular domains—would lead to good fairness but poor utilization (given the non-uniform distribution across domains), we dynamically adjust this last traffic limit to balance this trade-off.
During each hour-long epoch, a proxy records the total number of bytes transmitted for each domain. It also calculates domains’ average bandwidth as an exponentially-weighted moving average (attenuated over one week), as well as the total average consumption across all domains. Across epochs, bandwidth usage is only tracked, and durably stored, for the top-100 domains (although more traditional HTTP logs do store information about all requests). If a domain is not currently one of the top-100 bandwidth users, its historical average bandwidth is set to zero (providing leeway to sites experiencing flash crowds).
The per-domain dynamic bandwidth limit is calculated at the beginning of each epoch for each of the top-100 domains as follows. The proxy computes the sum of all top domains’ average bandwidths. If this sum exceeds the total steady-state limit, it reduces the per-domain limit until the point when, if that limit had been enforced in the prior time period, the bandwidth sum would have been <= the permitted total limit. (This limit thus first affects the top-ranked domain, then affects the top-two domains, etc.)
When a particular requested domain is over its hourly budget (case 3 above), CoralCDN proxies respond with 403 Forbidden messages. If instead the proxy is over its peak or steady-state limit calculated over all domains (cases 1 or 2 above), then the proxy redirects the client back to the origin site, and the proxy temporarily making itself unavailable for new client requests, which would be rejected anyway. (If clients are to be redirected back to the origin, a proxy appends the query-string
coral-no-serve on the location URL returned to the client. Origins that use redirection scripts with CoralCDN check for such a string prevent loops.) Although not the default, operators of some sites preferred this redirection home even if their particular domain was to blame. Domains can specify this type of control in a
X-Coral-Control response header (which we’ll discuss a bit later).
By applying these mechanisms, CoralCDN reduces its bandwidth consumption to manageable levels. While its demand sometimes exceeds 10TBs per day, its actual HTTP traffic remains steady at about 2TB per day after rejecting a significant number of requests. The scatter plot in the above figure shows the number of requests resulting in 403 responses per domain, most due to these admission control mechanisms. We see how variances in domains’ object sizes yield different rejection rates. The second-most popular domain mostly serves gifs smaller than 10KB and experiences a rejection rate of 3.3%. Yet the fractal videos of the third-most popular domain are typically 5MB in size, and it sees an 89% rejection rate. Ranking these domains instead by bandwidth served, the figure on the left plots the average hourly traffic that four proxies permitted each of the top-100 bandwidth consumers.
Note that this accounting mechanism is applied independently on each node. This brings up two interesting questions worth further consideration:
- Would the system behave differently if its accounting shared information between nodes? One potential result from not sharing information is penalizing content with local interest (that is, files that may be of interest only to more regional Internet populations). For if a system’s server-selection mechanisms actually redirect clients to more localized proxies, then most users from a particular region might only see a smaller subset of the total number of servers. On the other hand, if our goal is more to prevent the total bandwidth hogs, rather than provide any strict notion of fairness, does this problem arise significantly in practice?
- From a more research perspective, how can one efficiently perform (approximate) accounting across the entire network, yet do so without centralization and in a bandwidth-efficient manner. Work in SIGCOMM ’07 called Cloud Control with Distributed Rate Limiting considered a similar problem, but effectively shared O(kN) information across the system via gossiping (where k is the number of domains we’re monitoring and N the number of nodes). Can we provide accurate/precise rate limiting with o(kN) communication complexity? If anybody has any pointers to effective algorithms here, please leave a comment. It feels like the sensor network community probably has addressed this problem at some point when aggregating and collecting information.
In the next post, we’ll ask whether a hosting platform need bother provide any special interfaces or information to support such accounting.