Before finally getting to some experiences, I wanted to touch on some of the security mechanisms that CoralCDN proxies incorporate to curtail misuse, especially important given their deployment at PlanetLab-affiliated universities.
CoralCDN proxies only support GET and HEAD requests. Many of the attacks for which “open” proxies are infamous are simply not feasible. For example, clients cannot use CoralCDN to POST passwords for brute-force cracking. It does not support SSL and thus risk carry more confidential data. CoralCDN proxies do not support CONNECT requests, and thus they cannot be used to send spam as SMTP relays or forge
From: addresses in web mail. And it deletes cookies from HTTP headers (in both direction), again reducing its use for carrying personalized or authenticated traffic. Furthermore, because CoralCDN only handles Coralized URLs (i.e.,
example.com.nyud.net, rather than
example.com), it cannot be used by simply configuring a vanilla browser’s proxy settings.
Maximum file size
Given both its storage and bandwidth limitations, CoralCDN enforces a maximum file size of 50MB. This has generally prevented clients from using CoralCDN for video distribution. Some sites have attempted to circumvent these limits by omitting
Content-Length headers (on connections marked as persistent and without chunked encoding). So, to ensure compliance, proxies monitor ongoing transfers and halt any ones that exceed their limits. The 50MB limit was instigated after we saw early use of CoralCDN to serve 700MB-or-so videos through it, which open us up both to increased DMCA take-down notices and, perhaps more from a fairness perspective, uses up a significant fraction of local disk cache (only 4 GB per server) and our daily bandwidth limits (15 GB per server per day).
Excessive resource use
While I’ll later discuss CoralCDN’s techniques for limiting bandwidth overuse by servers—as we only have about 15 GB of bandwidth per server per day, we need to somewhat fairly split it between origin sites if oversubscribed—there’s also resource abuse by clients. So, in addition to monitoring server-side consumption, proxies keep a sliding window of client-side usage patterns. Not only do we seek to prevent excessive aggregate bandwidth usage by clients, but also an excessive number of (even small) requests. These are usually caused either by server misconfigurations (which I’ll discuss in a future post) that result in HTTP redirection loops or by bot misuse. While CoralCDN does not support POSTs or
https, even some top-10 web sites support cleartext passwords over GET requests. Thus, bots occasionally attempt to use CoralCDN to hide their identity while launching brute-force login attacks on websites through it.
Finally, I maintain a global domain-name blacklist that each proxy regularly fetches and reloads during run-time. This blacklist is used to satisfy origin servers that do not wish their content to be accessed via CoralCDN. Such restrictions were especially necessary to keep the proxies running on university campuses, as many universities have their IP address space “whitelisted” by some websites as a form of IP-based authentication, a practice especially common among online academic journals. Prior to implementing this, we—or rather, the NetOps folks at various deployment sites—received abuse complaints from journal operators about the existence of “open proxies”, leading the ops people to threaten that the CoralCDN proxies (or even PlanetLab servers) would get pulled from their networks. Part of sites’ contractual obligations with journals include prevention of such unauthorized access.
But interestingly, these site operators actually do have a way to prevent such accesses themselves. After all, CoralCDN does not attempt to hide its presence nor its clients’ identities. Proxies use unique
User-Agent strings (“
CoralWebPrx“) when fetching content from webservers, and client IP addresses are reported in
X-Forwarded-For headers (a semi-standardized header introduced by Squid proxies in the ’90s). In practice, however, many site operators are either averse or unable to perform such security checks themselves. As abuse complaints to PlanetLab sites can risk their continued operation, such blacklists were necessary to enable CoralCDN to continue to run.
We did encounter some interesting circumventions of our domain-based blacklists. On at least one occasion, a user created dynamic DNS records for a random prefix that pointed to the IP address of a target domain: i.e.,
x12345.foo.com pointed to the IP address for
www.journal.org. (I forget the specific online academic journal that was targeted, although I vaguely remember it being a Russian-language chemistry one.) Then, because we weren’t blacklisting
foo.com, the domain passed our check. CoralCDN proxies resolved it to the journal’s IP address and successfully downloaded some articles from the origin site. Now, it’s rather surprising that this attack even worked, because the HTTP request from our proxy would have had a
Host field of
x12345.foo.com, rather than that of the journal. But the site operators were not checking the
Host field, and, when informed of the problem, were not initially sure how to do so (I vaguely remember them running some non-Apache webserver on Solaris). This site, and some others, have requested that we perform IP-based blacklisting instead of domain-based, although I’ve avoided such restrictions given the difficulty in keeping these up-to-date.
Still, there’s an issue here related to our initial goal of a fully decentralized CDN. We’ve been using a central administratively-supplied blacklist, as otherwise deployment sites get uncomfortable with running a (semi) “open” proxy. If different proxies were to employ different blacklists, however, there’s the configuration challenge of directing users to proxies that can satisfy their requests and avoiding those proxies that would reject them. The tor anonymizing network allows different exit proxies to specify different configuration policies, but these are fairly coarse-grained (i.e., which ports to block or allow) as opposed to more finer-grained configuration that we want (e.g., blacklisting specific HTTP domains). I remember a similar problem about academic journals be discussed by someone looking to operate a tor exist node at Berkeley a few years back (although I don’t remember how the problem was actually resolved…anybody?). In general, this seems like a hard configuration and organization problem to solve, perhaps a good research problem related to decentralized management and access.
Security through Obscurity (and Unintentional Tarpitting)
Overall, however, CoralCDN sees surprisingly less abuse than I expected. (Let’s hope this post isn’t thumbing my nose at fate!) I see two reasons for this.
First, as described above, CoralCDN doesn’t work with vanilla proxy settings because it only handled Coralized URLs. Thus, it also doesn’t show success for crawlers that “search” for open proxies. In fact, to the best of my knowledge, CoralCDN does not appear in any published open-proxy lists. Now, it would be trivial for some user to change whatever script they are using to auto-Coralized domain names (indeed, there are a host of browser plugins and Greasemonkey scripts that do exactly that). But this one-line change is probably beyond the ability of some script-kiddies. And, perhaps even more important, there are just so many other crunchy targets to be had that a special rule for CoralCDN proxies is just not worth the trouble.
Second, CoralCDN can be quite slow at times, especially when it doesn’t have content cached locally and must perform global DHT lookup before touching an origin site. So delays for a URL found neither locally nor anywhere in the DHT—as happens when one is performing a brute-force login attack against a website, where the URL query string changes each attempt—can be on the order of seconds. This sounds very much like an (albeit unintentional) tarpit, which is a security trick of “delaying” response to a client request, especially used in SMTP. Good users don’t notice the additional delay (although this isn’t quite so true in interactive protocols like HTTP), while attackers see much lower throughput, given any limits on the number of outstanding connections they make. Put simpler, it’s going to be a long day when each attempt to guess a password takes 2 seconds.
Another Princeton-based CDN, CoDeeN, has introduced a number of other interesting security mechanisms for automated robot detection. Perhaps if our usage patterns change we’ll have to look into using some of them.
So that’s my background description of CoralCDN and its security mechanisms. In the next post of this series, I’ll discuss how all published DHT algorithms are susceptible to both race conditions and routing problems—the latter given non-transitive network connectivity—and what can be done to mitigate these problems.