Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zookeeper : For managing distributed Zoo

Avatar for Urvil Urvil
September 17, 2018

Zookeeper : For managing distributed Zoo

Avatar for Urvil

Urvil

September 17, 2018
Tweet

More Decks by Urvil

Other Decks in Programming

Transcript

  1. ZooKeeper “Because Coordinating Distributed System is Like Zoo” image source:

    https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn Urvil Patel Twitter: @UrvilPatel12 Github : urvil38
  2. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  3. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  4. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  5. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  6. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  7. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  8. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  9. Set a context for distributed system Design Goals of zookeeper

    Data Model of zookeeper Zookeeper API (CLI and JAVA) Architecture of zookeeper Applications of zookeeper ROADMAP
  10. What is Distributed Computing ? • Field of computer science

    that study distributed system • What is distributes system ? • It is a system whose components are located in different network computers. • They are communicate via passing message to each others. • they use HTTP, RPC-message or message queue for passing message to each other
  11. Characteristics of distributed systems • concurrency of components • lack

    of global clock • failure of independent component Concurrency ≠ Parallelism https://www.youtube.com/watch?v=cN_DpYBzKso
  12. Challenges in distributed system • Failure Handling • Managing concurrency(Synchronization,

    Race condition, Deadlock ) • Security • Scalability • many more ….
  13. Benefits of distributed system • High Availability • High Performance

    • High Throughput • High Scalability • High Reliability
  14. What Is ZooKeeper? • Exposes a simple set of primitives

    • Very simple API therefore easy to program to • Uses a data model like directory tree • Used for • Synchronization • Configuration management • Coordination service that does not suffer from • Race Conditions • Dead Locks A Distributed Coordination Service for Distributed Applications
  15. Design Goals • A shared hierarchal namespace looks like unix

    file system • The namespace has data nodes - znodes (similar to file/dir) • Data is kept in memory • High throughput and low latency • High performance • Used in large, distributed system • Highly available • No single point of failure • Strictly ordered access • provide synchronisation primitives 1.Simple / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
  16. Design Goals The clients • Keeps TCP connection • Gets

    watch events • Sends ❤ beats • If connection breaks, • connects to different server 2. Replicated - HA The servers • Know each others • Keeps state in-memory • Periodically write transaction logs & snapshot to disk for persistence image source:https://zookeeper.apache.org/doc/current/images/zkservice.jpg
  17. Design Goals 3. Ordered • Zookeeper stamps every update with

    a number The number • Reflects the order of transactions. • used for implement higher-level abstractions, such as synchronization primitives.
  18. Design Goals 4. Fast image source: https://zookeeper.apache.org/doc/r3.4.13/images/zkperfRW-3.2.jpg • Perform best

    where reads are more common than writes, at rations around 10:1 • At Yahoo, where this software was created, they benchmarked the performance of read and write on a zookeeper cluster.
  19. Design Goals 5. Reliability image source: https://zookeeper.apache.org/doc/r3.4.13/images/zkperfreliability.jpg 1. Failure and

    recovery of a follower 2. Failure and recovery of a different follower 3. Failure of a Leader 4. Failure and recovery of two followers 5. Failure of another Leader In observations of team at Yahoo, ZooKeeper takes less than 200ms to elect a new leader
  20. Basic commands for debugging ZooKeeper server ruok Check Server is

    running in non-error state conf Print details about current configuration ( from zoo.conf ) envi Print details about current environment srvr statistics, znode, mode( standalone , leader or follower) stat Server statistics and connected clients srst Reset server statistics isro is it in read-only mode? … … More information about other commands: https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html • You can issue this commands using telnet or nc on client port 2181
  21. Data Model • Very similar to FileSystem, but it isn’t

    fs. • Znode - Can have children node - All have Metadata ( + Data ) - They have ACLs (Access Control Lists) • No append operation. a.k.a no update only set operation. • Data access (read/write) is atomic - either success or error. / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
  22. Data Model - Znode • Ephemeral Znode • exists as

    long as the session that created this znode is active • Deleted by ZooKeeper as session ends or timeouts. • Can not have any children, not even ephemeral ones. • It tied to client session, but visible to everyone. • Persistance Znode • Remains their until it explicitly deleted.
  23. • Sequencial Znode • Create a node with a sequence

    number in the name. • The number is automatically appended. • Counter keeps increasingly monotonically. • Each node keeps a counter create -s /hello “world” Created /hello0000000001 create -s /hello “zookeeper” Created /hello0000000002
  24. • Watches • Client get notify when znode are changes

    in some way. • Watches are triggered only once. • For multiple notification you need to re-register watch. Event Type How to set event ? Create exists() Delete exists() or getData() or getChildren() Changed exists() or getData() Child getChildren() image source: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn
  25. Operation JAVA API CLI Commands Description create create Creates a

    znode (parent znode must exists) delete rmr Deletes a znode (must not have children) exists stat Tests whether a znode exists & get a metadata getAcl, setAcl getAcl, setAcl Gets/Sets the ACL for a znode getChildren ls Gets a list of children of a znode getData, setData get, set Gets/Sets a data associated with a znode sync sync Synchronizes a client view of a znode with zookeeper Zookeeper APIs
  26. Zookeeper APIs • Two core APIs: JAVA & C •

    other languages (created by community) : Perl, Python, REST, C++,Go • For each API there are two variants available • Sync • Async void getData(String path, boolean watch, DataCallback cb, Object context) byte[] getData(String path, boolean watch, Stat stat) Async: Sync:
  27. ACLs - Access Control Lists • Determine who can perform

    certain operation on znodes. • It is like sets of permission. • using getAcl() and setAcl() API. CREATE Can create a child znode READ Can get a data from znode and list it’s children WRITE Can set data for a node DELETE Can delete a child node ADMIN Can set permissions List of Permission
  28. Architecture Runs in two modes • Standalone • Single Server

    • use for testing purpose • No HA • Replicated • Runs on collection of machines. • Recommended number of node is “2n + 1”. • HA
  29. Architecture Phase I : Leader Election • The machines elect

    a distinguished member - Leader • Other are named as followers. • This phase is finish when majority of followers sync their state with leader. • If leader fails, the remaining machines holds election.takes 200ms.
  30. Architecture Phase II : Atomic broadcast • All write requests

    are forwarded to the leader. • Leader broadcasts the update to the followers. • When a majority have persisted the change : • The leader commits the update • The client get success response. • The protocol for achieving consensus is atomic • Machines write to disk before in-memory
  31. Sessions • A client has list of servers in the

    cluster. • Server creates a new session for the client • A session has a timeout period - decided by caller • If the server hasn’t received a request within the timeout period, it may expire the session. • On session expire, Ephemeral nodes are lost • To keep sessions alive client sends pings (heartbeats) • Client library takes care of sending heartbeats • Failover is handled automatically by the client • Session are still valid on switching to another server
  32. Uses of zookeeper in real world applications Features of zookeeper

    used by other distributed system • Configuration Management • Leader Election • Load Balancing • Node Management • Locking • synchronization Used in • Hadoop • HBase • Solr
  33. Zookeeper node1 node2 node3 / /servers Register watch event on

    this znode 192.168.0.100 192.168.0.101 192.168.0.102
  34. Zookeeper node1 node2 node3 / /servers /servers/node1 Ephemeral Register watch

    event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  35. Zookeeper node1 node2 node3 / /servers /servers/node1 Ephemeral /servers/node2 Register

    watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  36. Zookeeper node1 node2 node3 / /servers /servers/node1 /servers/node2 Ephemeral /servers/node3

    Register watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102
  37. Zookeeper node1 node2 node3 / /servers/node1 /servers/node2 /servers/node3 /servers Ephemeral

    Register watch event on this znode 192.168.0.100 192.168.0.101 192.168.0.102 192.168.0.100
  38. Zookeeper node1 node3 / /servers/node1 /servers/node2 /servers/node3 /servers Ephemeral Register

    watch event on this znode 192.168.0.100 node2 192.168.0.101 192.168.0.102 192.168.0.100
  39. Zookeeper node1 node3 / /servers/node1 /servers/node3 /servers Ephemeral /servers/node2 Register

    watch event on this znode 192.168.0.100 192.168.0.102 192.168.0.100
  40. Zookeeper node1 node3 192.168.0.100 192.168.0.102 Client - - - -

    - Watch Trigger Hey, changes in /servers
  41. / /queue pop() 2 1 4 3 5 6 orderedChildren()

    -> take min(seq) and remove it 1
  42. Alternative solution to ZooKeeper Hashicorp - Consul CoreOs - etcd

    https://github.com/hashicorp/consul https://github.com/coreos/etcd Netflix - Eureka https://github.com/Netflix/eureka image source: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn