Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zookeeper : For managing distributed Zoo

Urvil
September 17, 2018

Zookeeper : For managing distributed Zoo

Urvil

September 17, 2018
Tweet

More Decks by Urvil

Other Decks in Programming

Transcript

  1. ZooKeeper “Because Coordinating Distributed System is Like Zoo” image source:

    https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn Urvil Patel Twitter: @UrvilPatel12 Github : urvil38
  2. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  3. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  4. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  5. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  6. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  7. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  8. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  9. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  10. What is Distributed Computing ? • Field of computer science

    that study distributed system • What is distributes system ? • It is a system whose components are located in different network computers. • They are communicate via passing message to each others. • they use HTTP, RPC-message or message queue for passing message to each other
  11. Characteristics of distributed systems • concurrency of components • lack

    of global clock • failure of independent component Concurrency ≠ Parallelism https://www.youtube.com/watch?v=cN_DpYBzKso
  12. Challenges in distributed system • Failure Handling • Managing concurrency(Synchronization,

    Race condition, Deadlock ) • Security • Scalability • many more ….
  13. Benefits of distributed system • High Availability • High Performance

    • High Throughput • High Scalability • High Reliability
  14. What Is ZooKeeper? • Exposes a simple set of primitives

    • Very simple API therefore easy to program to • Uses a data model like directory tree • Used for • Synchronization • Configuration management • Coordination service that does not suffer from • Race Conditions • Dead Locks A Distributed Coordination Service for Distributed Applications
  15. Design Goals • A shared hierarchal namespace looks like unix

    file system • The namespace has data nodes - znodes (similar to file/dir) • Data is kept in memory • High throughput and low latency • High performance • Used in large, distributed system • Highly available • No single point of failure • Strictly ordered access • provide synchronisation primitives 1.Simple / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
  16. Design Goals The clients • Keeps TCP connection • Gets

    watch events • Sends ❤ beats • If connection breaks, • connects to different server 2. Replicated - HA The servers • Know each others • Keeps state in-memory • Periodically write transaction logs & snapshot to disk for persistence image source:https://zookeeper.apache.org/doc/current/images/zkservice.jpg
  17. Design Goals 3. Ordered • Zookeeper stamps every update with

    a number The number • Reflects the order of transactions. • used for implement higher-level abstractions, such as synchronization primitives.
  18. Design Goals 4. Fast image source: https://zookeeper.apache.org/doc/r3.4.13/images/zkperfRW-3.2.jpg • Perform best

    where reads are more common than writes, at rations around 10:1 • At Yahoo, where this software was created, they benchmarked the performance of read and write on a zookeeper cluster.
  19. Design Goals 5. Reliability image source: https://zookeeper.apache.org/doc/r3.4.13/images/zkperfreliability.jpg 1. Failure and

    recovery of a follower 2. Failure and recovery of a different follower 3. Failure of a Leader 4. Failure and recovery of two followers 5. Failure of another Leader In observations of team at Yahoo, ZooKeeper takes less than 200ms to elect a new leader
  20. Basic commands for debugging ZooKeeper server ruok Check Server is

    running in non-error state conf Print details about current configuration ( from zoo.conf ) envi Print details about current environment srvr statistics, znode, mode( standalone , leader or follower) stat Server statistics and connected clients srst Reset server statistics isro is it in read-only mode? … … More information about other commands: https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html • You can issue this commands using telnet or nc on client port 2181
  21. Data Model • Very similar to FileSystem, but it isn’t

    fs. • Znode - Can have children node - All have Metadata ( + Data ) - They have ACLs (Access Control Lists) • No append operation. a.k.a no update only set operation. • Data access (read/write) is atomic - either success or error. / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
  22. Data Model - Znode • Ephemeral Znode • exists as

    long as the session that created this znode is active • Deleted by ZooKeeper as session ends or timeouts. • Can not have any children, not even ephemeral ones. • It tied to client session, but visible to everyone. • Persistance Znode • Remains their until it explicitly deleted.
  23. • Sequencial Znode • Create a node with a sequence

    number in the name. • The number is automatically appended. • Counter keeps increasingly monotonically. • Each node keeps a counter create -s /hello “world” Created /hello0000000001 create -s /hello “zookeeper” Created /hello0000000002
  24. • Watches • Client get notify when znode are changes

    in some way. • Watches are triggered only once. • For multiple notification you need to re-register watch. Event Type How to set event ? Create exists() Delete exists() or getData() or getChildren() Changed exists() or getData() Child getChildren() image source: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn
  25. Operation JAVA API CLI Commands Description create create Creates a

    znode (parent znode must exists) delete rmr Deletes a znode (must not have children) exists stat Tests whether a znode exists & get a metadata getAcl, setAcl getAcl, setAcl Gets/Sets the ACL for a znode getChildren ls Gets a list of children of a znode getData, setData get, set Gets/Sets a data associated with a znode sync sync Synchronizes a client view of a znode with zookeeper Zookeeper APIs
  26. Zookeeper APIs • Two core APIs: JAVA & C •

    other languages (created by community) : Perl, Python, REST, C++,Go • For each API there are two variants available • Sync • Async void getData(String path, boolean watch, DataCallback cb, Object context) byte[] getData(String path, boolean watch, Stat stat) Async: Sync:
  27. ACLs - Access Control Lists • Determine who can perform

    certain operation on znodes. • It is like sets of permission. • using getAcl() and setAcl() API. CREATE Can create a child znode READ Can get a data from znode and list it’s children WRITE Can set data for a node DELETE Can delete a child node ADMIN Can set permissions List of Permission
  28. Architecture Runs in two modes • Standalone • Single Server

    • use for testing purpose • No HA • Replicated • Runs on collection of machines. • Recommended number of node is “2n + 1”. • HA
  29. Architecture Phase I : Leader Election • The machines elect

    a distinguished member - Leader • Other are named as followers. • This phase is finish when majority of followers sync their state with leader. • If leader fails, the remaining machines holds election.takes 200ms.
  30. Architecture Phase II : Atomic broadcast • All write requests

    are forwarded to the leader. • Leader broadcasts the update to the followers. • When a majority have persisted the change : • The leader commits the update • The client get success response. • The protocol for achieving consensus is atomic • Machines write to disk before in-memory
  31. Sessions • A client has list of servers in the

    cluster. • Server creates a new session for the client • A session has a timeout period - decided by caller • If the server hasn’t received a request within the timeout period, it may expire the session. • On session expire, Ephemeral nodes are lost • To keep sessions alive client sends pings (heartbeats) • Client library takes care of sending heartbeats • Failover is handled automatically by the client • Session are still valid on switching to another server
  32. Uses of zookeeper in real world applications Features of zookeeper

    used by other distributed system • Configuration Management • Leader Election • Load Balancing • Node Management • Locking • synchronization Used in • Hadoop • HBase • Solr
  33. Zookeeper node1 node2 node3 / /servers Register watch event on

    this znode 192.168.0.100 192.168.0.101 192.168.0.102
  34. Zookeeper node1 node2 node3 / /servers /servers/node1 Ephemeral Register watch

    event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  35. Zookeeper node1 node2 node3 / /servers /servers/node1 Ephemeral /servers/node2 Register

    watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  36. Zookeeper node1 node2 node3 / /servers /servers/node1 /servers/node2 Ephemeral /servers/node3

    Register watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  37. Zookeeper node1 node2 node3 / /servers/node1 /servers/node2 /servers/node3 /servers Ephemeral

    Register watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102 192.168.0.100
  38. Zookeeper node1 node3 / /servers/node1 /servers/node2 /servers/node3 /servers Ephemeral Register

    watch event on this znode 192.168.0.100 node2 192.168.0.101 192.168.0.102 192.168.0.100
  39. Zookeeper node1 node3 / /servers/node1 /servers/node3 /servers Ephemeral /servers/node2 Register

    watch event on this znode 192.168.0.100 192.168.0.102 192.168.0.100
  40. Zookeeper node1 node3 192.168.0.100 192.168.0.102 Client - - - -

    - Watch Trigger Hey, changes in /servers
  41. / /queue pop() 2 1 4 3 5 6 orderedChildren()

    -> take min(seq) and remove it 1
  42. Alternative solution to ZooKeeper Hashicorp - Consul CoreOs - etcd

    https://github.com/hashicorp/consul https://github.com/coreos/etcd Netflix - Eureka https://github.com/Netflix/eureka image source: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn