Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What We Talk About When We Talk About Distributed Systems

What We Talk About When We Talk About Distributed Systems

Transcription of this talk with list of resources:

http://videlalvaro.github.io/2015/12/learning-about-distributed-systems.html

Distributed Systems are a complex topic. There's abundant research about it but sometimes it is hard for a beginner to know where to start. I would like to outline the main concepts of distributed systems, so the interested person can have a clear path on how to start their own research as well.

In this talk I will review the different models: asynchronous vs. synchronous distributed systems; message passing vs shared memory communication; failure detectors and leader election problems; consensus and different kinds of replication.

I will also review a series of books on distributed systems in order to recommend the best one according to the topics we would like to learn about, or the problems we would like to solve.

The goal of the talk is to set a good foundation for people interested in learning more about distributed systems.

Alvaro Videla

June 17, 2015
Tweet

More Decks by Alvaro Videla

Other Decks in Technology

Transcript

  1. WHAT WE TALK ABOUT
    WHEN WE TALK ABOUT
    DISTRIBUTED
    SYSTEMS
    ALVARO VIDELA - RABBITMQ

    View Slide

  2. DISTRIBUTED
    SYSTEMS FOR
    THE IKEA FAMILY
    ALVARO VIDELA - RABBITMQ

    View Slide

  3. DISTRIBUTED
    SYSTEMS

    View Slide

  4. “A DISTRIBUTED SYSTEM IS
    ONE IN WHICH THE FAILURE
    OF A COMPUTER YOU DID NOT
    EVEN KNOW EXISTED CAN
    RENDER YOUR OWN
    COMPUTER UNUSABLE”
    Leslie Lamport

    View Slide

  5. View Slide

  6. View Slide

  7. Google: define jargon

    View Slide

  8. DISTRIBUTED
    SYSTEMS

    View Slide

  9. DISTRIBUTED
    SYSTEMS
    • Many entities trying to solve a problem
    (nodes, processes)

    View Slide

  10. DISTRIBUTED
    SYSTEMS
    • Many entities trying to solve a problem
    (nodes, processes)
    • Partial Knowledge

    View Slide

  11. DISTRIBUTED
    SYSTEMS
    • Many entities trying to solve a problem
    (nodes, processes)
    • Partial Knowledge
    • Uncertainty

    View Slide

  12. DEEP RABBIT
    HOLE

    View Slide

  13. WHAT TO
    READ?

    View Slide

  14. WHICH
    PAPERS?

    View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. WHICH BOOKS?

    View Slide

  21. View Slide

  22. WHY?

    View Slide

  23. http://tobielangel.com

    View Slide

  24. THE PROBLEM

    View Slide

  25. DIFFERENT MODELS

    View Slide

  26. DIFFERENT MODELS
    • Timing Model

    View Slide

  27. DIFFERENT MODELS
    • Timing Model
    • Inter Process Communication Used (IPC
    method)

    View Slide

  28. DIFFERENT MODELS
    • Timing Model
    • Inter Process Communication Used (IPC
    method)
    • Failure Modes

    View Slide

  29. TIMING MODEL

    View Slide

  30. TIMING MODEL
    • Synchronous Model

    View Slide

  31. TIMING MODEL
    • Synchronous Model
    • Asynchronous Model

    View Slide

  32. TIMING MODEL
    • Synchronous Model
    • Asynchronous Model
    • Semi-synchronous Model

    View Slide

  33. INTERPROCESS
    COMMUNICATION

    View Slide

  34. INTERPROCESS
    COMMUNICATION
    • Message Passing

    View Slide

  35. INTERPROCESS
    COMMUNICATION
    • Message Passing
    • Shared Memory

    View Slide

  36. FAILURE MODES

    View Slide

  37. FAILURE MODES
    • Crash-stop

    View Slide

  38. FAILURE MODES
    • Crash-stop
    • Crash-recovery

    View Slide

  39. FAILURE MODES
    • Crash-stop
    • Crash-recovery
    • Omission Faults

    View Slide

  40. FAILURE MODES
    • Crash-stop
    • Crash-recovery
    • Omission Faults
    • Arbitrary Failures Mode (Byzantine)

    View Slide

  41. LIVENESS AND
    SAFETY

    View Slide

  42. LIVENESS AND SAFETY
    PROPERTIES OF ALGORITHMS

    View Slide

  43. SAFETY
    Some “bad” thing does not
    happens during execution

    View Slide

  44. SAFETY
    “Communication links should not
    invent messages out of thin air”

    View Slide

  45. LIVENESS
    A “good” thing happens during
    execution

    View Slide

  46. LIVENESS
    “A destination process eventually
    delivers the message”

    View Slide

  47. LET’S TAKE A LOOK
    AT FLP1
    1 - Fischer, Lynch, Paterson

    View Slide

  48. View Slide

  49. IMPOSSIBILITY OF DISTRIBUTED
    CONSENSUS WITH ONE FAULTY
    PROCESS

    View Slide

  50. IMPOSSIBILITY OF DISTRIBUTED
    CONSENSUS WITH ONE FAULTY
    PROCESS

    View Slide

  51. IMPOSSIBILITY OF DISTRIBUTED
    CONSENSUS WITH ONE FAULTY
    PROCESS

    View Slide

  52. IMPOSSIBILITY OF DISTRIBUTED
    CONSENSUS WITH ONE FAULTY
    PROCESS

    View Slide

  53. IMPOSSIBILITY OF DISTRIBUTED
    CONSENSUS WITH ONE FAULTY
    PROCESS

    View Slide

  54. WHAT’S CONSENSUS
    ANYWAY?

    View Slide

  55. “THE CONSENSUS
    PROBLEM IS A PARADIGM
    OF AGREEMENT
    PROBLEMS”
    https://dl.acm.org/citation.cfm?id=1052796.1052806

    View Slide

  56. PROPERTIES OF
    CONSENSUS

    View Slide

  57. PROPERTIES OF
    CONSENSUS
    • C-Termination: Every correct process eventually decides on some value

    View Slide

  58. PROPERTIES OF
    CONSENSUS
    • C-Termination: Every correct process eventually decides on some value
    • C-Validity: If a process decides v, then v was proposed by some process

    View Slide

  59. PROPERTIES OF
    CONSENSUS
    • C-Termination: Every correct process eventually decides on some value
    • C-Validity: If a process decides v, then v was proposed by some process
    • C-Agreement: No two correct processes decide differently

    View Slide

  60. PROPERTIES OF UNIFORM
    CONSENSUS
    • C-Termination: Every correct process eventually decides on some value
    • C-Validity: If a process decides v, then v was proposed by some process
    • C-Agreement: No two correct processes decide differently
    • C-Uniform Agreement: No two processes (correct or not) decide
    differently.

    View Slide

  61. WE NEED CONSENSUS
    WHEN:
    A SET OF PROCESSES
    HAVE TO AGREE TO TAKE
    A COMMON ACTION

    View Slide

  62. WE NEED CONSENSUS
    WHEN:
    A SET OF PROCESSES
    HAVE TO AGREE TO TAKE
    A COMMON ACTION
    Atomic
    Broadcast

    View Slide

  63. WE NEED CONSENSUS
    WHEN:
    A SET OF PROCESSES
    HAVE TO AGREE TO TAKE
    A COMMON ACTION
    Atomic
    Broadcast
    Group
    Membership

    View Slide

  64. ATOMIC BROADCAST
    “CORRECT PROCESSES
    DELIVER THE SAME SET OF
    MESSAGES IN THE SAME
    ORDER”

    View Slide

  65. FLP TELLS US THAT IF
    CONSENSUS CANNOT BE
    ACHIEVED, THEN ATOMIC
    BROADCAST OR GROUP
    MEMBERSHIP CANNOT BE
    ACHIEVED EITHER

    View Slide

  66. SO, WE PACK OUR BAGS
    AND GO?
    NOTHING TO SEE HERE?

    View Slide

  67. STUMBLING OVER
    CONSENSUS RESEARCH:
    MISUNDERSTANDING AND
    ISSUES
    Marcos K. Aguilera

    View Slide

  68. FAILURE
    DETECTORS

    View Slide

  69. View Slide

  70. FAILURE DETECTORS

    View Slide

  71. FAILURE DETECTORS
    • External process

    View Slide

  72. FAILURE DETECTORS
    • External process
    • Provides information about suspected processes

    View Slide

  73. FAILURE DETECTORS
    • External process
    • Provides information about suspected processes
    • Completeness property (crashed processes are
    detected)

    View Slide

  74. FAILURE DETECTORS
    • External process
    • Provides information about suspected processes
    • Completeness property (crashed processes are
    detected)
    • Accuracy (correct process are never suspected)

    View Slide

  75. “RUB SOME PERFECT
    FAILURE DETECTOR
    ON IT”

    View Slide

  76. http://www.amazon.com/Introduction-Reliable-Secure-
    Distributed-Programming/dp/3642152597
    PERFECT FAILURE
    DETECTOR

    View Slide

  77. EVENTUALLY ACCURATE
    FAILURE DETECTOR

    View Slide

  78. EVENTUALLY ACCURATE
    FAILURE DETECTOR
    • Strong Completeness: Eventually, every
    process that crashes is permanently
    suspected by every correct process.

    View Slide

  79. EVENTUALLY ACCURATE
    FAILURE DETECTOR
    • Strong Completeness: Eventually, every
    process that crashes is permanently
    suspected by every correct process.
    • Eventual Weak Accuracy: There is a time
    after which some correct process is never
    suspected by the correct processes.

    View Slide

  80. EVENTUALLY ACCURATE
    FAILURE DETECTOR
    • Strong Completeness: Eventually, every
    process that crashes is permanently
    suspected by every correct process.
    • Eventual Weak Accuracy: There is a time
    after which some correct process is never
    suspected by the correct processes.
    http://dl.acm.org/citation.cfm?id=1052806

    View Slide

  81. View Slide

  82. QUORUMS

    View Slide

  83. TL;DR:
    INTERSECTING
    SETS

    View Slide

  84. “A QUORUM IN A SYSTEM WITH N
    CRASH-FAULT PROCESS ABSTRACTIONS
    […] IS ANY MAJORITY OF
    PROCESSES, I.E., ANY SET OF MORE
    THAN N/2 PROCESSES”
    QUORUMS

    View Slide

  85. “IF F < N/2 PROCESSES FAIL BY
    CRASHING, THERE IS ALWAYS AT
    LEAST ONE QUORUM OF
    NONCRASHED PROCESSES IN SUCH
    SYSTEMS”
    QUORUMS

    View Slide

  86. CONSISTENCY

    View Slide

  87. View Slide

  88. CONCURRENT
    FIFO QUEUE

    View Slide

  89. CONSISTENCY
    CONDITIONS

    View Slide

  90. CONSISTENCY
    CONDITIONS
    • Atomic Consistency (Linearizabilty)

    View Slide

  91. CONSISTENCY
    CONDITIONS
    • Atomic Consistency (Linearizabilty)
    • Sequential Consistency

    View Slide

  92. CONSISTENCY
    CONDITIONS
    • Atomic Consistency (Linearizabilty)
    • Sequential Consistency
    • Causal Consistency

    View Slide

  93. CONSISTENCY
    CONDITIONS
    • Atomic Consistency (Linearizabilty)
    • Sequential Consistency
    • Causal Consistency
    https://aphyr.com/posts/313-strong-consistency-
    models

    View Slide

  94. LINEARIZABILTY
    http://www.amazon.com/Distributed-Algorithms-Message-Passing-Systems-Michel/dp/
    3642381227/

    View Slide

  95. LINEARIZABILTY
    http://www.amazon.com/Distributed-Algorithms-Message-Passing-Systems-Michel/dp/
    3642381227/

    View Slide

  96. SOME BOOKS

    View Slide

  97. http://www.amazon.com/Distributed-Algorithms-
    Message-Passing-Systems-Michel/dp/3642381227/

    View Slide

  98. http://www.amazon.com/Fault-tolerant-Agreement-
    Synchronous-Message-passing-Distributed/dp/
    1608455254/

    View Slide

  99. http://www.amazon.com/Communication-Abstractions-
    Fault-tolerant-Asynchronous-Distributed/dp/160845293X/

    View Slide

  100. http://www.amazon.com/Distributed-Algorithms-
    Kaufmann-Management-Systems/dp/1558603484/

    View Slide

  101. http://www.amazon.com/Introduction-Reliable-Secure-
    Distributed-Programming/dp/3642152597

    View Slide

  102. http://www.amazon.com/Guide-Reliable-Distributed-
    Systems-High-Assurance/dp/1447124154/

    View Slide

  103. http://www.amazon.com/Replication-Practice-Lecture-
    Computer-Theoretical/dp/3642112935/

    View Slide

  104. FINDING NON
    PAYWALLED PAPERS

    View Slide

  105. CONCLUSION

    View Slide

  106. CONCLUSION
    • Deep Rabbit Hole

    View Slide

  107. CONCLUSION
    • Deep Rabbit Hole
    • Computing Science where Science is Still a Thing™

    View Slide

  108. CONCLUSION
    • Deep Rabbit Hole
    • Computing Science where Science is Still a Thing™
    • History of the Field Matters

    View Slide

  109. CONCLUSION
    • Deep Rabbit Hole
    • Computing Science where Science is Still a Thing™
    • History of the Field Matters
    • Read, read, read

    View Slide

  110. THANKS!
    @old_sound

    View Slide