that study distributed system • What is distributes system ? • It is a system whose components are located in different network computers. • They are communicate via passing message to each others. • they use HTTP, RPC-message or message queue for passing message to each other
• Very simple API therefore easy to program to • Uses a data model like directory tree • Used for • Synchronization • Configuration management • Coordination service that does not suffer from • Race Conditions • Dead Locks A Distributed Coordination Service for Distributed Applications
file system • The namespace has data nodes - znodes (similar to file/dir) • Data is kept in memory • High throughput and low latency • High performance • Used in large, distributed system • Highly available • No single point of failure • Strictly ordered access • provide synchronisation primitives 1.Simple / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
watch events • Sends ❤ beats • If connection breaks, • connects to different server 2. Replicated - HA The servers • Know each others • Keeps state in-memory • Periodically write transaction logs & snapshot to disk for persistence image source:https://zookeeper.apache.org/doc/current/images/zkservice.jpg
where reads are more common than writes, at rations around 10:1 • At Yahoo, where this software was created, they benchmarked the performance of read and write on a zookeeper cluster.
recovery of a follower 2. Failure and recovery of a different follower 3. Failure of a Leader 4. Failure and recovery of two followers 5. Failure of another Leader In observations of team at Yahoo, ZooKeeper takes less than 200ms to elect a new leader
running in non-error state conf Print details about current configuration ( from zoo.conf ) envi Print details about current environment srvr statistics, znode, mode( standalone , leader or follower) stat Server statistics and connected clients srst Reset server statistics isro is it in read-only mode? … … More information about other commands: https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html • You can issue this commands using telnet or nc on client port 2181
fs. • Znode - Can have children node - All have Metadata ( + Data ) - They have ACLs (Access Control Lists) • No append operation. a.k.a no update only set operation. • Data access (read/write) is atomic - either success or error. / /app1 /app2 /app1/v1 /app1/v2 /app1/v3
long as the session that created this znode is active • Deleted by ZooKeeper as session ends or timeouts. • Can not have any children, not even ephemeral ones. • It tied to client session, but visible to everyone. • Persistance Znode • Remains their until it explicitly deleted.
number in the name. • The number is automatically appended. • Counter keeps increasingly monotonically. • Each node keeps a counter create -s /hello “world” Created /hello0000000001 create -s /hello “zookeeper” Created /hello0000000002
in some way. • Watches are triggered only once. • For multiple notification you need to re-register watch. Event Type How to set event ? Create exists() Delete exists() or getData() or getChildren() Changed exists() or getData() Child getChildren() image source: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwiF2_-5kancAhWKfysKHVvLBkwQjRx6BAgBEAU&url=https%3A%2F%2Fwww.hiddenbrains.com%2Fzookeeper.html&psig=AOvVaw20uvgTf9fo8S9avMlpBAWn
znode (parent znode must exists) delete rmr Deletes a znode (must not have children) exists stat Tests whether a znode exists & get a metadata getAcl, setAcl getAcl, setAcl Gets/Sets the ACL for a znode getChildren ls Gets a list of children of a znode getData, setData get, set Gets/Sets a data associated with a znode sync sync Synchronizes a client view of a znode with zookeeper Zookeeper APIs
other languages (created by community) : Perl, Python, REST, C++,Go • For each API there are two variants available • Sync • Async void getData(String path, boolean watch, DataCallback cb, Object context) byte[] getData(String path, boolean watch, Stat stat) Async: Sync:
certain operation on znodes. • It is like sets of permission. • using getAcl() and setAcl() API. CREATE Can create a child znode READ Can get a data from znode and list it’s children WRITE Can set data for a node DELETE Can delete a child node ADMIN Can set permissions List of Permission
a distinguished member - Leader • Other are named as followers. • This phase is finish when majority of followers sync their state with leader. • If leader fails, the remaining machines holds election.takes 200ms.
are forwarded to the leader. • Leader broadcasts the update to the followers. • When a majority have persisted the change : • The leader commits the update • The client get success response. • The protocol for achieving consensus is atomic • Machines write to disk before in-memory
cluster. • Server creates a new session for the client • A session has a timeout period - decided by caller • If the server hasn’t received a request within the timeout period, it may expire the session. • On session expire, Ephemeral nodes are lost • To keep sessions alive client sends pings (heartbeats) • Client library takes care of sending heartbeats • Failover is handled automatically by the client • Session are still valid on switching to another server
used by other distributed system • Configuration Management • Leader Election • Load Balancing • Node Management • Locking • synchronization Used in • Hadoop • HBase • Solr