Slide 1

Slide 1 text

ZooKeeper Wait-free coordination for Internet-scale systems Colin Jones • 8th Light • @trptcolin

Slide 2

Slide 2 text

ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev Konar Yahoo! Grid {phunt,mahadev}@yahoo-inc.com Flavio P. Junqueira and Benjamin Reed Yahoo! Research {fpj,breed}@yahoo-inc.com Abstract In this paper, we describe ZooKeeper, a service for co- ordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a repli- cated, centralized service. The interface exposed by Zoo- Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet pow- erful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all re- quests that change the ZooKeeper state. These design de- cisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used exten- sively by client applications. 1 Introduction Large-scale distributed applications require different forms of coordination. Configuration is one of the most basic forms of coordination. In its simplest form, con- figuration is just a list of operational parameters for the system processes, whereas more sophisticated systems have dynamic configuration parameters. Group member- ship and leader election are also common in distributed systems: often processes need to know which other pro- cesses are alive and what those processes are in charge of. Locks constitute a powerful coordination primitive that implement mutually exclusive access to critical re- sources. One approach to coordination is to develop services for each of the different coordination needs. For exam- ple, Amazon Simple Queue Service [3] focuses specif- ically on queuing. Other services have been devel- oped specifically for leader election [25] and configura- tion [27]. Services that implement more powerful prim- itives can be used to implement less powerful ones. For example, Chubby [6] is a locking service with strong synchronization guarantees. Locks can then be used to implement leader election, group membership, etc. When designing our coordination service, we moved away from implementing specific primitives on the server side, and instead we opted for exposing an API that enables application developers to implement their own primitives. Such a choice led to the implementa- tion of a coordination kernel that enables new primitives without requiring changes to the service core. This ap- proach enables multiple forms of coordination adapted to the requirements of applications, instead of constraining developers to a fixed set of primitives. When designing the API of ZooKeeper, we moved away from blocking primitives, such as locks. Blocking primitives for a coordination service can cause, among other problems, slow or faulty clients to impact nega- tively the performance of faster clients. The implemen- tation of the service itself becomes more complicated if processing requests depends on responses and fail- ure detection of other clients. Our system, Zookeeper, hence implements an API that manipulates simple wait- free data objects organized hierarchically as in file sys- tems. In fact, the ZooKeeper API resembles the one of any other file system, and looking at just the API signa- tures, ZooKeeper seems to be Chubby without the lock methods, open, and close. Implementing wait-free data objects, however, differentiates ZooKeeper significantly from systems based on blocking primitives such as locks. Although the wait-free property is important for per- 1

Slide 3

Slide 3 text

What does it mean to love a paper?

Slide 4

Slide 4 text

What properties do we look for in a paper we’re considering loving?

Slide 5

Slide 5 text

Challenge / Reward

Slide 6

Slide 6 text

Index into other things I don’t know about yet

Slide 7

Slide 7 text

Practicality

Slide 8

Slide 8 text

I like-like this paper

Slide 9

Slide 9 text

ZooKeeper is perfect for everybody and everybody should use it

Slide 10

Slide 10 text

ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev Konar Yahoo! Grid {phunt,mahadev}@yahoo-inc.com Flavio P. Junqueira and Benjamin Reed Yahoo! Research {fpj,breed}@yahoo-inc.com Abstract In this paper, we describe ZooKeeper, a service for co- ordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a repli- cated, centralized service. The interface exposed by Zoo- Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet pow- erful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all re- quests that change the ZooKeeper state. These design de- cisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used exten- sively by client applications. 1 Introduction Large-scale distributed applications require different forms of coordination. Configuration is one of the most basic forms of coordination. In its simplest form, con- figuration is just a list of operational parameters for the system processes, whereas more sophisticated systems have dynamic configuration parameters. Group member- ship and leader election are also common in distributed systems: often processes need to know which other pro- cesses are alive and what those processes are in charge of. Locks constitute a powerful coordination primitive that implement mutually exclusive access to critical re- sources. One approach to coordination is to develop services for each of the different coordination needs. For exam- ple, Amazon Simple Queue Service [3] focuses specif- ically on queuing. Other services have been devel- oped specifically for leader election [25] and configura- tion [27]. Services that implement more powerful prim- itives can be used to implement less powerful ones. For example, Chubby [6] is a locking service with strong synchronization guarantees. Locks can then be used to implement leader election, group membership, etc. When designing our coordination service, we moved away from implementing specific primitives on the server side, and instead we opted for exposing an API that enables application developers to implement their own primitives. Such a choice led to the implementa- tion of a coordination kernel that enables new primitives without requiring changes to the service core. This ap- proach enables multiple forms of coordination adapted to the requirements of applications, instead of constraining developers to a fixed set of primitives. When designing the API of ZooKeeper, we moved away from blocking primitives, such as locks. Blocking primitives for a coordination service can cause, among other problems, slow or faulty clients to impact nega- tively the performance of faster clients. The implemen- tation of the service itself becomes more complicated if processing requests depends on responses and fail- ure detection of other clients. Our system, Zookeeper, hence implements an API that manipulates simple wait- free data objects organized hierarchically as in file sys- tems. In fact, the ZooKeeper API resembles the one of any other file system, and looking at just the API signa- tures, ZooKeeper seems to be Chubby without the lock methods, open, and close. Implementing wait-free data objects, however, differentiates ZooKeeper significantly from systems based on blocking primitives such as locks. Although the wait-free property is important for per- 1

Slide 11

Slide 11 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 12

Slide 12 text

What is ZooKeeper?

Slide 13

Slide 13 text

What are its goals? coordination kernel (simple API) avoid blocking provide linearizable writes be better than Chubby

Slide 14

Slide 14 text

What does that involve? group messaging shared registers wait-free linearizability leader election consensus universal object atomic broadcast

Slide 15

Slide 15 text

What does that involve? group messaging shared registers wait-free linearizability leader election consensus universal object atomic broadcast

Slide 16

Slide 16 text

Related

Slide 17

Slide 17 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 18

Slide 18 text

Terminology client server ensemble znode data tree session

Slide 19

Slide 19 text

Service Overview hierarchical name space regular & ephemeral znodes sequential flag watches sessions

Slide 20

Slide 20 text

Client API ! create(path, data, flags) delete(path, version) exists(path, watch) ! getData(path, watch) setData(path, data, version) getChildren(path, watch) sync(path)

Slide 21

Slide 21 text

ZooKeeper Guarantees [A-]Linearizable writes FIFO client order

Slide 22

Slide 22 text

Distributed Configuration Problem processes shouldn’t use the old configuration once the new one starts being written processes shouldn’t see partial configurations

Slide 23

Slide 23 text

Distributed Configuration Problem 1. new leader deletes a /ready znode 2. new leader updates configuration 3. new leader re-creates the /ready znode

Slide 24

Slide 24 text

Examples of primitives ! configuration management rendezvous group membership ! locks locks without herd effect read/write locks double barrier

Slide 25

Slide 25 text

Configuration Management a single znode with a watch

Slide 26

Slide 26 text

Rendezvous a single znode with a watch

Slide 27

Slide 27 text

Group Membership ephemeral/sequential znodes as children of the group znode

Slide 28

Slide 28 text

Simple Locks ephemeral znode with watch

Slide 29

Slide 29 text

Simple Locks without Herd Effect

Slide 30

Slide 30 text

Read/Write Locks

Slide 31

Slide 31 text

Double Barrier barrier znode with child znodes for processes watches

Slide 32

Slide 32 text

Practical Sidebar on Primitives Curator http://curator.apache.org/curator-recipes/index.html

Slide 33

Slide 33 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 34

Slide 34 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 35

Slide 35 text

ZooKeeper Components

Slide 36

Slide 36 text

Request Processor client requests -> idempotent transactions

Slide 37

Slide 37 text

Atomic Broadcast ordering of state changes constraints for new leadership

Slide 38

Slide 38 text

Replicated Database fuzzy snapshots

Slide 39

Slide 39 text

Client-Server Interactions watch notifications read requests session failures

Slide 40

Slide 40 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 41

Slide 41 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 42

Slide 42 text

Chubby lock service stronger consistency model slower and lamer http://static.googleusercontent.com/media/ research.google.com/en/us/archive/chubby-osdi06.pdf

Slide 43

Slide 43 text

Paxos replicated state machine http://research.microsoft.com/en-us/um/people/ lamport/pubs/paxos-simple.pdf https://cwiki.apache.org/confluence/display/ ZOOKEEPER/Zab+vs.+Paxos

Slide 44

Slide 44 text

1. Abstract & Introduction 2. The ZooKeeper Service 3. ZooKeeper Applications 4. ZooKeeper Implementation 5. Evaluation 6. Related Work 7. Conclusions

Slide 45

Slide 45 text

Learn more http://zookeeper.apache.org/ http://curator.apache.org/ Colin Jones • 8th Light • @trptcolin