Slide 1

Slide 1 text

Verify, And Then Trust: Data Inconsistency Detection in ZooKeeper Sushant Mane San José State University California, USA Fangmin Lyu Meta Platforms, Inc California, USA Benjamin Reed San José State University California, USA

Slide 2

Slide 2 text

Background ● ZooKeeper - source of truth for mission-critical applications ● Assumes crash-stop fault model ● In production not all faults are crash-stop failures ● Bugs in code, OS, etc. can corrupt the system state ● Unlike crash-stop faults state corruptions are not detectable by clients ● Hard to determine if system is functioning reliably until users notice it

Slide 3

Slide 3 text

Background

Slide 4

Slide 4 text

Background ● BFT protocols are used to tolerate arbitrary behavior but they are expensive ● Use fault detection instead as it is cheaper ● This paper presents ○ ZooKeeper's data inconsistency detection mechanisms ○ Their impact on the overall performance

Slide 5

Slide 5 text

Fault Model ● Covers replica divergence due to software bugs, configuration errors, and corrupted data ● Does not cover malicious-byzantine faults ● Two types of replica ○ Correct replica - consistent ○ Faulty replica - inconsistent

Slide 6

Slide 6 text

Data Inconsistency Detection Methods 1. Offline Comparisons 2. Online Comparisons via Auditor 3. Realtime Detection

Slide 7

Slide 7 text

Offline Comparisons Method ● Uses external consistency checker ● Copy snapshot and transaction log files of all replicas ● Deserialize DataTree of every replica ● Compute the digest of each DataTree ● Compare the digests to find diverged replica

Slide 8

Slide 8 text

Online Comparisons via Auditor Method ● Digest computations is embedded in ZooKeeper service ● Every replica maintains a replica digest and a digest log (historical digests) ● Replica digest ○ Represents the state of replica up to a particular transaction id ○ Updated upon a change to its DataTree ○ Incremental Hash (AdHASH) is used to compute digests efficiently ● Replica digest is added to the digest log after every fixed number of transactions ● External auditor is employed to compare historical digests

Slide 9

Slide 9 text

DataTree Digest Calculation using AdHASH* ReplicaDigest New = ReplicaDigest Old − Digest(OldNodeData) + Digest(NewNodeData) ● When creating a new node, the Digest(OldNodeData) will be 0 ● When deleting a node, the Digest(NewNodeData) will be 0 *Mihir Bellare and Daniele Micciancio. 1997. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology—EUROCRYPT’97: International Conference on the Theory and Application of Cryptographic Techniques Konstanz, Germany, May 11–15, 1997 Proceedings 16. Springer, 163–192.

Slide 10

Slide 10 text

Online consistency verification using external auditor Replicas with digest log

Slide 11

Slide 11 text

Realtime Detection Method ● Detect replica divergence as soon as it happens, i.e., as the replica state changes ● Both digest computation and comparisons are embedded in ZooKeeper ● Two types of digest ○ Replica Digest ○ Predictive Digest

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Performance Evaluation ● Request size: Payload(1024) + 17B (Path) + 60B (Stats) ● Async repeated read (getData) and write(setData) ● At most 100 outstanding request per client ● Clients remain connected to the same server ● 900 clients - 300 clients per server

Slide 14

Slide 14 text

Baseline vs Online Comparison vs Realtime Detection ● Throughput: Baseline > Online Comparison > Realtime Detection ● Reliability: Baseline < Online Comparison < Realtime Detection

Slide 15

Slide 15 text

Impact of hash function used in AdHASH ● Weak hash function - low penalty and low confidence in detection ● Strong hash function - high penalty and high confidence in detection Performance with different hash functions in Online Comparison Performance with different hash functions in Realtime Detection

Slide 16

Slide 16 text

Conclusion

Slide 17

Slide 17 text

Conclusion ● Both online comparison and real-time detection methods incur an acceptable performance penalty ■ Online Comparison - 2% with 100% write operations and CRC-32 ■ Realtime Detection - 20% with 100% write operations and CRC-32 ● We are able to detect data inconsistencies with no increase in deployment cost and only minor development and performance cost

Slide 18

Slide 18 text

Online Comparisons via Auditor Method ● Advantages ○ No need to download/copy replica data ○ Auditor helps to catch inconsistency sooner than the external consistency checker ○ Provides a context around the time when fault occurred, which helps to identify the root cause of faults ● Disadvantages ○ Impacts the overall performance of ZooKeeper ■ replicas compute digest on every commit which adds an extra CPU load ○ Doesn't provide complete protection against the propagation of corruption

Slide 19

Slide 19 text

Offline Comparisons Method ● Advantages ○ Easy to develop ○ Has no impact no ZooKeeper's performance ● Disadvantages ○ Inefficient ■ Need to copy replica data every time consistency checker runs ■ Can't detect faults unless external consistency checker runs ■ Doesn't protect against the propagation of corruption

Slide 20

Slide 20 text

Realtime Detection ● Advantages ○ Detects faults as soon as they occur (as data is changing) ○ This helps in preventing the propagation of corruption ○ Provides very specific context around the time when fault occurred ■ this helps in RCA of arbitrary behavior ● Disadvantages ○ Impacts overall performance of the system ○ Replicas compute digest on every commit which adds an extra CPU load on all replicas ○ Computation of predictive digests adds an additional load on the CPU of the leader server ■ This slows down all the state modifying operations

Slide 21

Slide 21 text

Impact of different request sizes Online Comparison Realtime Detection

Slide 22

Slide 22 text

Impact of different request sizes

Slide 23

Slide 23 text

No content