Pro Yearly is on sale from $80 to $50! »

ZooKeeper: Wait-free coordination for Internet-scale systems

E16bc9c356b65d61ee1d74c8f06ae35b?s=47 Colin Jones
January 19, 2015

ZooKeeper: Wait-free coordination for Internet-scale systems

A PapersWeLove Chicago presentation

E16bc9c356b65d61ee1d74c8f06ae35b?s=128

Colin Jones

January 19, 2015
Tweet

Transcript

  1. ZooKeeper Wait-free coordination for Internet-scale systems Colin Jones • 8th

    Light • @trptcolin
  2. ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev

    Konar Yahoo! Grid {phunt,mahadev}@yahoo-inc.com Flavio P. Junqueira and Benjamin Reed Yahoo! Research {fpj,breed}@yahoo-inc.com Abstract In this paper, we describe ZooKeeper, a service for co- ordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a repli- cated, centralized service. The interface exposed by Zoo- Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet pow- erful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all re- quests that change the ZooKeeper state. These design de- cisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used exten- sively by client applications. 1 Introduction Large-scale distributed applications require different forms of coordination. Configuration is one of the most basic forms of coordination. In its simplest form, con- figuration is just a list of operational parameters for the system processes, whereas more sophisticated systems have dynamic configuration parameters. Group member- ship and leader election are also common in distributed systems: often processes need to know which other pro- cesses are alive and what those processes are in charge of. Locks constitute a powerful coordination primitive that implement mutually exclusive access to critical re- sources. One approach to coordination is to develop services for each of the different coordination needs. For exam- ple, Amazon Simple Queue Service [3] focuses specif- ically on queuing. Other services have been devel- oped specifically for leader election [25] and configura- tion [27]. Services that implement more powerful prim- itives can be used to implement less powerful ones. For example, Chubby [6] is a locking service with strong synchronization guarantees. Locks can then be used to implement leader election, group membership, etc. When designing our coordination service, we moved away from implementing specific primitives on the server side, and instead we opted for exposing an API that enables application developers to implement their own primitives. Such a choice led to the implementa- tion of a coordination kernel that enables new primitives without requiring changes to the service core. This ap- proach enables multiple forms of coordination adapted to the requirements of applications, instead of constraining developers to a fixed set of primitives. When designing the API of ZooKeeper, we moved away from blocking primitives, such as locks. Blocking primitives for a coordination service can cause, among other problems, slow or faulty clients to impact nega- tively the performance of faster clients. The implemen- tation of the service itself becomes more complicated if processing requests depends on responses and fail- ure detection of other clients. Our system, Zookeeper, hence implements an API that manipulates simple wait- free data objects organized hierarchically as in file sys- tems. In fact, the ZooKeeper API resembles the one of any other file system, and looking at just the API signa- tures, ZooKeeper seems to be Chubby without the lock methods, open, and close. Implementing wait-free data objects, however, differentiates ZooKeeper significantly from systems based on blocking primitives such as locks. Although the wait-free property is important for per- 1
  3. What does it mean to love a paper?

  4. What properties do we look for in a paper we’re

    considering loving?
  5. Challenge / Reward

  6. Index into other things I don’t know about yet

  7. Practicality

  8. I like-like this paper

  9. ZooKeeper is perfect for everybody and everybody should use it

  10. ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev

    Konar Yahoo! Grid {phunt,mahadev}@yahoo-inc.com Flavio P. Junqueira and Benjamin Reed Yahoo! Research {fpj,breed}@yahoo-inc.com Abstract In this paper, we describe ZooKeeper, a service for co- ordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a repli- cated, centralized service. The interface exposed by Zoo- Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet pow- erful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all re- quests that change the ZooKeeper state. These design de- cisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used exten- sively by client applications. 1 Introduction Large-scale distributed applications require different forms of coordination. Configuration is one of the most basic forms of coordination. In its simplest form, con- figuration is just a list of operational parameters for the system processes, whereas more sophisticated systems have dynamic configuration parameters. Group member- ship and leader election are also common in distributed systems: often processes need to know which other pro- cesses are alive and what those processes are in charge of. Locks constitute a powerful coordination primitive that implement mutually exclusive access to critical re- sources. One approach to coordination is to develop services for each of the different coordination needs. For exam- ple, Amazon Simple Queue Service [3] focuses specif- ically on queuing. Other services have been devel- oped specifically for leader election [25] and configura- tion [27]. Services that implement more powerful prim- itives can be used to implement less powerful ones. For example, Chubby [6] is a locking service with strong synchronization guarantees. Locks can then be used to implement leader election, group membership, etc. When designing our coordination service, we moved away from implementing specific primitives on the server side, and instead we opted for exposing an API that enables application developers to implement their own primitives. Such a choice led to the implementa- tion of a coordination kernel that enables new primitives without requiring changes to the service core. This ap- proach enables multiple forms of coordination adapted to the requirements of applications, instead of constraining developers to a fixed set of primitives. When designing the API of ZooKeeper, we moved away from blocking primitives, such as locks. Blocking primitives for a coordination service can cause, among other problems, slow or faulty clients to impact nega- tively the performance of faster clients. The implemen- tation of the service itself becomes more complicated if processing requests depends on responses and fail- ure detection of other clients. Our system, Zookeeper, hence implements an API that manipulates simple wait- free data objects organized hierarchically as in file sys- tems. In fact, the ZooKeeper API resembles the one of any other file system, and looking at just the API signa- tures, ZooKeeper seems to be Chubby without the lock methods, open, and close. Implementing wait-free data objects, however, differentiates ZooKeeper significantly from systems based on blocking primitives such as locks. Although the wait-free property is important for per- 1
  11. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  12. What is ZooKeeper?

  13. What are its goals? coordination kernel (simple API) avoid blocking

    provide linearizable writes be better than Chubby
  14. What does that involve? group messaging shared registers wait-free linearizability

    leader election consensus universal object atomic broadcast
  15. What does that involve? group messaging shared registers wait-free linearizability

    leader election consensus universal object atomic broadcast
  16. Related

  17. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  18. Terminology client server ensemble znode data tree session

  19. Service Overview hierarchical name space regular & ephemeral znodes sequential

    flag watches sessions
  20. Client API ! create(path, data, flags) delete(path, version) exists(path, watch)

    ! getData(path, watch) setData(path, data, version) getChildren(path, watch) sync(path)
  21. ZooKeeper Guarantees [A-]Linearizable writes FIFO client order

  22. Distributed Configuration Problem processes shouldn’t use the old configuration once

    the new one starts being written processes shouldn’t see partial configurations
  23. Distributed Configuration Problem 1. new leader deletes a /ready znode

    2. new leader updates configuration 3. new leader re-creates the /ready znode
  24. Examples of primitives ! configuration management rendezvous group membership !

    locks locks without herd effect read/write locks double barrier
  25. Configuration Management a single znode with a watch

  26. Rendezvous a single znode with a watch

  27. Group Membership ephemeral/sequential znodes as children of the group znode

  28. Simple Locks ephemeral znode with watch

  29. Simple Locks without Herd Effect

  30. Read/Write Locks

  31. Double Barrier barrier znode with child znodes for processes watches

  32. Practical Sidebar on Primitives Curator http://curator.apache.org/curator-recipes/index.html

  33. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  34. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  35. ZooKeeper Components

  36. Request Processor client requests -> idempotent transactions

  37. Atomic Broadcast ordering of state changes constraints for new leadership

  38. Replicated Database fuzzy snapshots

  39. Client-Server Interactions watch notifications read requests session failures

  40. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  41. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  42. Chubby lock service stronger consistency model slower and lamer http://static.googleusercontent.com/media/

    research.google.com/en/us/archive/chubby-osdi06.pdf
  43. Paxos replicated state machine http://research.microsoft.com/en-us/um/people/ lamport/pubs/paxos-simple.pdf https://cwiki.apache.org/confluence/display/ ZOOKEEPER/Zab+vs.+Paxos

  44. 1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper

    Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions
  45. Learn more http://zookeeper.apache.org/ http://curator.apache.org/ Colin Jones • 8th Light •

    @trptcolin