Consistency, Availability, and Geo-Replicated Storage

For the past few years, we’ve been working on problems related to geo-replicated storage. We want to store data “in the cloud,” but that data should reside within multiple datacenters, not just in a single one.  When data is geographically replicated in such a fashion:

  • Users can experience lower latency by accessing a datacenter near to them, rather than one halfway around the world.
  • Network or system failures at a single datacenter doesn’t make the service unavailable  (even for data stored at that site).

This is common practice today.  Google runs multiple datacenters around the world, and Amazon Web Services offers multiple “Availability Zones” that are supposed to fail independently.

When data is replicated between locations, an important question arises about the consistency model such a system exposes.  Wyatt Lloyd has been tackling this question in his recent COPS and Eiger systems.  The  problem space this work explores — between giving up on any consistency guarantees one can reason about and just going with “eventual” consistency on one extreme, and giving up on availability guarantees to gain strong consistency and real transactions on the other — is going to be an increasingly important one.

Normally, folks think that the CAP Theorem tells us these two choices are fundamental.  But the key point is that CAP doesn’t tell us that eventual consistency is required, just that (as Partitions can happen) one can’t have both Availability and Strong Consistency (or more formally, linearizability).  It doesn’t tell us anything about consistency models that are weaker than linearizability yet stronger than “eventual.”  And that’s where COPS and Eiger come in.

One of our collaborators at CMU, Dave Andersen, recently wrote-up a more accessible discussion of these systems, and the causally-consistent data model they expose.   With the explosion of new data storage systems, particularly of the NoSQL variety, it’s important for folks to realize that there’s a (powerful and practical) choice between these two extremes.

Caring about Causality – now in Cassandra

Over the past few years, we’ve spent a bunch of time thinking about and designing scalable systems that provide causally-consistent wide-area replication.  (Here, “we” means the team of Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and myself;  but if you know academia, you wouldn’t be surprised that about 90% of the project was accomplished by Wyatt, who’s a graduating Ph.D. student at the time of this writing.)  I’m posting this because we’ve finally entered the realm of the practical, with the release of both the paper (to appear at NSDI’13) and code for our new implementation of causally-consistent replication (we call it Eiger) within the popular Cassandra key-value store.

Read Dave’s full post here.