Coordination in Distributed Systems (ZooKeeper)

coordination-in-distributed-systems-zookeeper

Architecting distributed systems can be very difficult. Arguably the hardest part of programming a distributed application is getting node coordination correct. I’ll define a node in this context as a service running on a single server which communicates with other nodes and together make up your distributed application.

What I mean by coordination here is some act that multiple nodes must perform together. Some examples of coordination:

  • Group membership
  • Locking
  • Publisher/Subscriber
  • Ownership
  • Synchronization

One or more of these primitives show up in all distributed systems, so implementing them correctly is extremely important. While developing CRAQ, I originally implemented a very simple group membership service, but it didn’t provide reliability, replication, or scalability and was therefore a single point of failure. I went looking for an out-of-the-box coordination service and came across ZooKeeper.

ZooKeeper is an open-source, reliable, scalable, high-performance coordination service for distributed applications – just what I was looking for. It provides a metadata warehouse that is indexed like a file system and its primitive operations allows programmers to implement any of the common coordination examples I mentioned above. It has client bindings for Java and C, which made it relatively easy for me to integrate into CRAQ’s source code which is written in Tame.

The only thing I wish ZooKeeper had was support for wide-area deployments. I’m hopeful that this might eventually get added (maybe even by me!) some day.

I highly recommend ZooKeeper for anyone who is considering building a distributed system. It simplifies one of the hardest parts of implementing a distributed application.

  • rp

    Interesting that you mentioned WAN, did you ran into any issues with WAN ?

  • rp

    Interesting that you mentioned WAN, did you ran into any issues with WAN ?

  • http://www.cs.princeton.edu/~jterrace/ Jeff Terrace

    Only that when deploying over a WAN, there is a lot of redundant cross-datacenter traffic when there are multiple ZooKeeper nodes in each datacenter. I also didn’t evaluate its performance over a WAN, but I’d imagine that latency becomes an issue.

  • http://www.cs.princeton.edu/~jterrace/ Jeff Terrace

    Only that when deploying over a WAN, there is a lot of redundant cross-datacenter traffic when there are multiple ZooKeeper nodes in each datacenter. I also didn’t evaluate its performance over a WAN, but I’d imagine that latency becomes an issue.

  • benjamin reed

    hey jeff! i’m glad you like zookeeper. i agree that zookeeper needs more WAN support; it’s high on our wish list. zookeeper currently assumes a data center network where all nodes have the same latency and bandwidth between them. (in our large data centers that isn’t really true, but it’s close enough.) this assumption causes us problems in two dimensions: 1) as you mention, consensus traffic has a star topology with the coordinator at the center. since WAN links have higher latency and lower bandwidth than the data center network, we should really build a tree from the coordinator and minimize the traffic over the WAN links. 2) clients treat all servers equally, so in WAN applications, clients usually are given a list of just the local zookeeper servers to connect to, but if those servers go down, it would be nice to connect to a server in another data center to keep things going.

    we are hoping to fix this weakness of zookeeper. (i would be super happy to commit a patch from you :)

  • benjamin reed

    hey jeff! i’m glad you like zookeeper. i agree that zookeeper needs more WAN support; it’s high on our wish list. zookeeper currently assumes a data center network where all nodes have the same latency and bandwidth between them. (in our large data centers that isn’t really true, but it’s close enough.) this assumption causes us problems in two dimensions: 1) as you mention, consensus traffic has a star topology with the coordinator at the center. since WAN links have higher latency and lower bandwidth than the data center network, we should really build a tree from the coordinator and minimize the traffic over the WAN links. 2) clients treat all servers equally, so in WAN applications, clients usually are given a list of just the local zookeeper servers to connect to, but if those servers go down, it would be nice to connect to a server in another data center to keep things going.

    we are hoping to fix this weakness of zookeeper. (i would be super happy to commit a patch from you :)

  • ajaydeep jain

    vinyl banner
    Thanks for sharing such a valuable information