Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CAPtain Obvious

CAPtain Obvious

Kinda old presentation I did on CAP, Vector Clocks and CRDTs

Ab7179cd9980141fd3bd79902fa25147?s=128

Yann Schwartz

November 28, 2013
Tweet

Transcript

  1. Même pas CAP Yann Schwartz
 @abolibibelot Paris Data Geeks -

    Nov 2013
  2. In the beginning

  3. ACID • Atomic • Consistent • Isolated • Durable

  4. Wait… • Things don’t fit on disk • Things don’t

    fit in memory • Things are CPU/IO bound • My server goes down
  5. Problem solved! • I’ll use : • master / slave

    • sharding • masterless nodes • an infinite number of nodes!
  6. Wait… • Distributed systems fallacies • network is reliable •

    latency is zero • bandwidth is infinite • network is secure • topology doesn't change... • there is one administrator • transport cost is zero • network is homogenous
  7. What we want • Consistent • Available • Partition Tolerant

  8. Killjoy • Brewer 2000 - CAP conjecture • Gilbert, Lynch

    2002 - CAP Theorem?
  9. None
  10. Ok, I choose C • Nice, predictable • Need synchronisation

    and consensus • Linearizability - total order of operations
  11. Killjoy - the sequel • "Anything which needs agreement 


    will eventually fail at scale" 
 
 - Werner Vogels
  12. None
  13. OK, A then? • I can always read (staleness) •

    I can always write (conflicts)
  14. None
  15. Eventually consistent • “Dynamo: Amazon’s Highly Available Key-value Store” -

    2007 Werner Vogels et al.
  16. Dynamo-like • No master • Consistent hashing • Replication •

    Eventually consistent • Quorum (R, W, DW, etc.)
  17. The problem of time • It’s possible to reason on

    event timestamps if • there’s a total order of timestamps • Timestamps are unique • if t2 > t1 then t2 happened AFTER t1
  18. The problem of time • “Time, clocks and the ordering

    of events in a Distributed System”, Leslie Lamport 1978
  19. Timestamps are useless • Happened at t ? • Happened

    before • causality is better
  20. Vector clocks

  21. Conflict resolution is tricky • Is there a better way?

  22. CRDTs! • Conflict Free • Replicated • Data Types

  23. CRDT

  24. Definition • An ordered set is a Join Semi-lattice iff

    for all x,y E S, LUB(x,y) exits
 • (LUB = Least Upper Bound function) • An object taking its values from a JSL and where merge(x,y)= LUB(x,y) converges towards the LUB
  25. None
  26. None
  27. • Complex data types: • registers • counters • sets

    • maps • graphs
  28. • the data converges to a value without synchronisation

  29. Problems • Need to keep a lot of data

  30. what’s coming up • Garbage collection for CRDT • Server-Side

    CRDT • Cassandra • Riak 2.0
  31. Credits: Photo: Greg Pembroke ( http://www.reasonsmysoniscrying.com/ )