Upgrade to Pro — share decks privately, control downloads, hide ads and more …


Sushant Mane
May 04, 2023


Sushant Mane

May 04, 2023


  1. Verify, And Then Trust: Data Inconsistency Detection in ZooKeeper Sushant

    Mane San José State University California, USA Fangmin Lyu Meta Platforms, Inc California, USA Benjamin Reed San José State University California, USA
  2. Background • ZooKeeper - source of truth for mission-critical applications

    • Assumes crash-stop fault model • In production not all faults are crash-stop failures • Bugs in code, OS, etc. can corrupt the system state • Unlike crash-stop faults state corruptions are not detectable by clients • Hard to determine if system is functioning reliably until users notice it
  3. Background • BFT protocols are used to tolerate arbitrary behavior

    but they are expensive • Use fault detection instead as it is cheaper • This paper presents ◦ ZooKeeper's data inconsistency detection mechanisms ◦ Their impact on the overall performance
  4. Fault Model • Covers replica divergence due to software bugs,

    configuration errors, and corrupted data • Does not cover malicious-byzantine faults • Two types of replica ◦ Correct replica - consistent ◦ Faulty replica - inconsistent
  5. Offline Comparisons Method • Uses external consistency checker • Copy

    snapshot and transaction log files of all replicas • Deserialize DataTree of every replica • Compute the digest of each DataTree • Compare the digests to find diverged replica
  6. Online Comparisons via Auditor Method • Digest computations is embedded

    in ZooKeeper service • Every replica maintains a replica digest and a digest log (historical digests) • Replica digest ◦ Represents the state of replica up to a particular transaction id ◦ Updated upon a change to its DataTree ◦ Incremental Hash (AdHASH) is used to compute digests efficiently • Replica digest is added to the digest log after every fixed number of transactions • External auditor is employed to compare historical digests
  7. DataTree Digest Calculation using AdHASH* ReplicaDigest New = ReplicaDigest Old

    − Digest(OldNodeData) + Digest(NewNodeData) • When creating a new node, the Digest(OldNodeData) will be 0 • When deleting a node, the Digest(NewNodeData) will be 0 *Mihir Bellare and Daniele Micciancio. 1997. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology—EUROCRYPT’97: International Conference on the Theory and Application of Cryptographic Techniques Konstanz, Germany, May 11–15, 1997 Proceedings 16. Springer, 163–192.
  8. Realtime Detection Method • Detect replica divergence as soon as

    it happens, i.e., as the replica state changes • Both digest computation and comparisons are embedded in ZooKeeper • Two types of digest ◦ Replica Digest ◦ Predictive Digest
  9. Performance Evaluation • Request size: Payload(1024) + 17B (Path) +

    60B (Stats) • Async repeated read (getData) and write(setData) • At most 100 outstanding request per client • Clients remain connected to the same server • 900 clients - 300 clients per server
  10. Baseline vs Online Comparison vs Realtime Detection • Throughput: Baseline

    > Online Comparison > Realtime Detection • Reliability: Baseline < Online Comparison < Realtime Detection
  11. Impact of hash function used in AdHASH • Weak hash

    function - low penalty and low confidence in detection • Strong hash function - high penalty and high confidence in detection Performance with different hash functions in Online Comparison Performance with different hash functions in Realtime Detection
  12. Conclusion • Both online comparison and real-time detection methods incur

    an acceptable performance penalty ▪ Online Comparison - 2% with 100% write operations and CRC-32 ▪ Realtime Detection - 20% with 100% write operations and CRC-32 • We are able to detect data inconsistencies with no increase in deployment cost and only minor development and performance cost
  13. Online Comparisons via Auditor Method • Advantages ◦ No need

    to download/copy replica data ◦ Auditor helps to catch inconsistency sooner than the external consistency checker ◦ Provides a context around the time when fault occurred, which helps to identify the root cause of faults • Disadvantages ◦ Impacts the overall performance of ZooKeeper ▪ replicas compute digest on every commit which adds an extra CPU load ◦ Doesn't provide complete protection against the propagation of corruption
  14. Offline Comparisons Method • Advantages ◦ Easy to develop ◦

    Has no impact no ZooKeeper's performance • Disadvantages ◦ Inefficient ▪ Need to copy replica data every time consistency checker runs ▪ Can't detect faults unless external consistency checker runs ▪ Doesn't protect against the propagation of corruption
  15. Realtime Detection • Advantages ◦ Detects faults as soon as

    they occur (as data is changing) ◦ This helps in preventing the propagation of corruption ◦ Provides very specific context around the time when fault occurred ▪ this helps in RCA of arbitrary behavior • Disadvantages ◦ Impacts overall performance of the system ◦ Replicas compute digest on every commit which adds an extra CPU load on all replicas ◦ Computation of predictive digests adds an additional load on the CPU of the leader server ▪ This slows down all the state modifying operations