Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PWLSF#5=> Henry Robinson on FLP: Imp of Distrib...

PWLSF#5=> Henry Robinson on FLP: Imp of Distributed Consensus w One Faulty Process

Henry Robinson from Cloudera presents the paper "Impossibility of Distributed Consensus with One Faulty Process" by Fischer, Lynch and Patterson. This paper won the Dijkstra award given to the most influential papers in distributed computing so make sure you don't miss this!

Note that Henry will be focusing on the JACM version of the paper, not the PODS version. The JACM version is linked in the paper title above and you can also find it here.

If anyone really wants extra reading, you might consider the following:

• FLP proof walkthrough from Henry's blog (http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/)

• Lynch's '100 impossibility results for distributed computing' (http://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-TM-394.pdf)

• Chandra and Toueg's 'The weakest failure detector for solving consensus' (http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p685-chandra.pdf)

Henry's Bio

Henry is an engineer at Cloudera, where he has worked for five years on a wide variety of distributed systems. He currently works full-time on Impala, a SQL query engine for data stored in HDFS. Before Cloudera, he worked on ad-hoc networking at Cambridge University. He writes infrequently about databases and distributed systems at http://the-paper-trail.org/

Papers_We_Love

July 25, 2014
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. • Software engineer at Cloudera since 2009 • My interests

    are in databases and distributed systems • I write about them - in particular, about papers in those areas - at http://the-paper-trail.org
  2. Papers We Love San Francisco Edition July 24th, 2014 Henry

    Robinson [email protected] / @henryr Papers of which we are quite fond
  3. • Impossibility of Distributed Consensus with One Faulty Process, by

    Fischer, Lynch and Paterson (1985) • Dijkstra award winner 2001
  4. • Walk through the proof (leaving rigour for the paper

    itself) • Show how this gives rise to a framework for thinking about distributed systems
  5. • Consensus is the problem of having a set of

    processes agree on a value proposed by one of those processes
  6. • Validity: the value agreed upon must have been proposed

    by some process • Termination: at least one non-faulty process eventually decides • Agreement: all deciding processes agree on the same value
  7. • Validity: the value agreed upon must have been proposed

    by some process - safety • Termination: at least one non-faulty process eventually decides - liveness • Agreement: all deciding processes agree on the same value - safety
  8. Replicated State Machines Client Node 1 Node 2 Node 3

    N-2 N-3 N = S N-1 N-2 N-3 N = S N-1 N-2 N-3 N = S N-1 1: Client proposes ! state N should ! be S 2: Magic consensus ! protocol 3: New state written to! log
  9. Strong Leader Election A cast of millions 2: Magic consensus

    protocol 3: There can only be one 1: Who’s the leader?
  10. Distributed consensus is impossible when at least one process might

    fail No algorithm solves consensus in every case
  11. Omission failures! ! ! ! ! Fail by dropping messages

    Hierarchy of Failure Modes Crash failures! ! Fail by stopping
  12. Byzantine failures! ! ! ! ! ! ! ! !

    Fail by doing whatever the hell I like Omission failures! ! ! ! ! Fail by dropping messages Hierarchy of Failure Modes Crash failures! ! Fail by stopping
  13. • The system model is the abstraction we layer over

    messy computers and networks in order to actually reason about them.
  14. • Message deliveries are the only way that nodes may

    communicate • Messages are delivered in any order • But are never lost (c.f. crash model vs. omission model), and are always delivered exactly once
  15. • Nodes do not have access to a shared clock.

    • So cannot mutually estimate the passage of time • Messages are the only way that nodes may co- ordinate with each other
  16. Some definitions • Configuration: the state of every node in

    the system, plus the set of undelivered (but sent) messages! • Initial configuration: what each node in the system would propose as the decisions at time 0 • Univalent: a state from which only one decision is possible, no matter what messages are received (0- valent and 1-valent can only decide 0 or 1 respectively) • Bivalent: a state from which either decision value is still possible.
  17. Proof sketch Initial, ‘undecided’, configuration Undecided state Messages delivered Lemma

    2: This always exists! Lemma 3: You can always get here! More messages delivered
  18. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 (C:XY means process 0 has initial value X, process 1 has initial value Y)
  19. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 These two configurations differ only at one node, but their valencies are different (C:XY means process 0 has initial value X, process 1 has initial value Y)
  20. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 These two configurations differ only at one node, but their valencies are different (C:XY means process 0 has initial value X, process 1 has initial value Y)
  21. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 (C:XY means process 0 has initial value X, process 1 has initial value Y) I decided 1! All executions of the protocol - i.e. set of messages delivered
  22. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 (C:XY means process 0 has initial value X, process 1 has initial value Y) I decided 0! All executions of the protocol - i.e. set of messages delivered
  23. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 (C:XY means process 0 has initial value X, process 1 has initial value Y) I decided 0! What if process 1 fails? Are the configurations any different? I decided 1!
  24. 2-node system C: 00! V: 1 C: 01! V: 0

    C: 11! V: 0 C: 10! V: 1 (C:XY means process 0 has initial value X, process 1 has initial value Y) I decided 0! For the remaining processes: no difference in initial state, but different outcome ?! I decided 1! Same execution
  25. Configuration C (bivalent) e-not- delivered Configuration Configuration Configuration Configuration Configuration

    e-arrived-last Configuration Configuration Configuration Configuration Configuration Some message e is sent in C
  26. Configuration C (bivalent) e-not- delivered Configuration Configuration Configuration Configuration Configuration

    e-arrived-last Configuration Configuration Configuration Configuration Configuration One of these must be bivalent Some message e is sent in C Configuration set D
  27. • Consider the possibilities: • If one of those configurations

    in D is bivalent, we’re done • Otherwise show that lack of bivalent state leads to contradiction • Do this by first showing that there must be both 0-valent and 1-valent configurations in D • and that this leads to a contradiction
  28. D Configuration C (bivalent) 0-valent! e not received 0-valent! e

    received Either the protocol goes through D before it reaches the 0-valent configuration… 2. e is received 1. C moves to 0-valent configuration before receiving e
  29. D Configuration C (bivalent) 0-valent! e not received 0-valent! e

    received Or the protocol gets to the 0-valent configuration after receiving e in which case this state also must be 0-valent and in D 1. e is received 2. 0-valent state is arrived at
  30. • There must be two configurations C0 and C1 that

    are separated by a single message m where receiving e in Ci moves the configuration to Di • We will write that as Ci + e = Di • So C0 + m = C1 • and C0 + m + e = C1 + e = D1 • and C0 + e = D0
  31. • Now consider the destinations of m and e. If

    they go to different processes, their receipt is commutative • C0 + m + e = D1 • C0 + e + m = D0 + m = D1 • Contradiction: D0 is 0-valent!
  32. • Instead, e and m might go to the same

    process p. • Consider a deciding computation R from the original bivalent state C, where p does nothing (i.e. looks like it failed) • Since to get to D0 and D1, only e and m have been received, only p took any steps to get there. • So R can apply to both D0 and D1.
  33. • Since D0 and D1 are both univalent, so the

    configurations D0 + R and D1 + R are both univalent.
  34. • Now remember: • A = C + R •

    D1 = C + m + e • D0 = C + e • But what about • C + R + m + e = A + m + e = D1 + R => 1-valent • C + R + e = A + e = D0 + R => 0-valent
  35. • Let e be some event that might be sent

    in configuration C. Then let D be the set of all configurations where e is received last and let C be the set of configurations where e has not been received. • D either contains a bivalent configuration, or both 0- and 1-valent configurations. If it contains a bivalent configuration, we’re done. So assume it does not. • Now there must be some C0 and C1 in C where C0 + e is 0-valent, but C1 + e is 1-valent, and C1 = C0 + e’ • Consider two possibilities for the destination of e’ and e. If they are not the same, then we can say C0 + e + e’ == C0 + e’ + e = C1 + e = D1 -> 1-valent. But C0 + e -> 0-valent. • If they are the same, then let A be the configuration reached by a deciding run from C0 when p does nothing (looks like it failed). We can also apply that run from D0 and D1 to get to E0 and E1. But we can get from A to either E0 or E1 by applying e or e’ + e. This is a contradiction.
  36. ! “These results do not show that such problems cannot

    be “solved” in practice; rather, they point up the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings, and for less stringent requirements on the solution to such problems. (For example, termination might be required only with probability 1.) “
  37. Paxos • Paxos cleverly defers to its leader election scheme

    • If leader election is perfect, so is Paxos! • But perfect leader election is solvable iff consensus is. • Impossibilities all the way down…
  38. Randomized Consensus • Nice way to circumvent technical impossibilities: make

    their probability vanishingly small • Ben-Or gave an algorithm that terminates with probability 1 • (But the rate of convergence might be high)
  39. Failure Detectors • Deep connection between the ability to tell

    if a machine has failed, and consensus. • Lots of research into ‘weak’ failure detectors, and how weak they can be and still solve consensus
  40. • FLP and CAP are not the same thing (see

    http://the- paper-trail.org/blog/flp-and-cap-arent-the-same- thing/) • FLP is a stronger result, because the system model has fewer restrictions (crash stop vs omission)
  41. • 100 Impossibility Proofs for Distributed Computing (Lynch, 1989) •

    The Weakest Failure Detector for Solving Consensus (Chandra and Toueg, 1996) • Sharing Memory Robustly in Message-Passing Systems (Attiya et. al., 1995) • Wait-Free Synchronization (Herlihy, 1991) • Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Ben-Or, 1983)