Emil Sit, a friend from MIT with whom I published one of my first papers, has been blogging a series of “interviews” from colleagues about their experience building distributed systems. He started this series after describing some implementation issues with building Chord, which was one of the first and remains the canonical distributed hash table (DHT). CoralCDN uses a DHT for its indexing, and its first implementation actually used MIT Chord as a software layer. (I later implemented my own DHT layer, although instead based on Kademlia — which was proposed by an officemate at NYU, Petar Maymounkov — the choice more due to proximity than fundamental differences in the algorithms.) Chord and DHTs have been incredibly influential both for wide-area peer-to-peer systems and increasingly even for data-center services (albeit using less in the way of multi-hop routing protocols). In fact, the “trackerless” form of BitTorrent (in both the mainline code and in Vuze) uses a DHT, although Kademlia specifically. And they indeed have seemed to scale: the Vuze DHT has been measured and analyzed at more than a million active users.
Anyway, the real reason for my post: Emil asked me to add some of my own thoughts to his interview series, which he’s recently published.