What We Talk About When We Talk About Distributed Systems

What We Talk About When We Talk About Distributed Systems

Transcription of this talk with list of resources:

http://videlalvaro.github.io/2015/12/learning-about-distributed-systems.html

Distributed Systems are a complex topic. There's abundant research about it but sometimes it is hard for a beginner to know where to start. I would like to outline the main concepts of distributed systems, so the interested person can have a clear path on how to start their own research as well.

In this talk I will review the different models: asynchronous vs. synchronous distributed systems; message passing vs shared memory communication; failure detectors and leader election problems; consensus and different kinds of replication.

I will also review a series of books on distributed systems in order to recommend the best one according to the topics we would like to learn about, or the problems we would like to solve.

The goal of the talk is to set a good foundation for people interested in learning more about distributed systems.

B3eb24dc767e178a2c7d67f1ee1af11f?s=128

Alvaro Videla

June 17, 2015
Tweet

Transcript

  1. WHAT WE TALK ABOUT WHEN WE TALK ABOUT DISTRIBUTED SYSTEMS

    ALVARO VIDELA - RABBITMQ
  2. DISTRIBUTED SYSTEMS FOR THE IKEA FAMILY ALVARO VIDELA - RABBITMQ

  3. DISTRIBUTED SYSTEMS

  4. “A DISTRIBUTED SYSTEM IS ONE IN WHICH THE FAILURE OF

    A COMPUTER YOU DID NOT EVEN KNOW EXISTED CAN RENDER YOUR OWN COMPUTER UNUSABLE” Leslie Lamport
  5. None
  6. None
  7. Google: define jargon

  8. DISTRIBUTED SYSTEMS

  9. DISTRIBUTED SYSTEMS • Many entities trying to solve a problem

    (nodes, processes)
  10. DISTRIBUTED SYSTEMS • Many entities trying to solve a problem

    (nodes, processes) • Partial Knowledge
  11. DISTRIBUTED SYSTEMS • Many entities trying to solve a problem

    (nodes, processes) • Partial Knowledge • Uncertainty
  12. DEEP RABBIT HOLE

  13. WHAT TO READ?

  14. WHICH PAPERS?

  15. None
  16. None
  17. None
  18. None
  19. None
  20. WHICH BOOKS?

  21. None
  22. WHY?

  23. http://tobielangel.com

  24. THE PROBLEM

  25. DIFFERENT MODELS

  26. DIFFERENT MODELS • Timing Model

  27. DIFFERENT MODELS • Timing Model • Inter Process Communication Used

    (IPC method)
  28. DIFFERENT MODELS • Timing Model • Inter Process Communication Used

    (IPC method) • Failure Modes
  29. TIMING MODEL

  30. TIMING MODEL • Synchronous Model

  31. TIMING MODEL • Synchronous Model • Asynchronous Model

  32. TIMING MODEL • Synchronous Model • Asynchronous Model • Semi-synchronous

    Model
  33. INTERPROCESS COMMUNICATION

  34. INTERPROCESS COMMUNICATION • Message Passing

  35. INTERPROCESS COMMUNICATION • Message Passing • Shared Memory

  36. FAILURE MODES

  37. FAILURE MODES • Crash-stop

  38. FAILURE MODES • Crash-stop • Crash-recovery

  39. FAILURE MODES • Crash-stop • Crash-recovery • Omission Faults

  40. FAILURE MODES • Crash-stop • Crash-recovery • Omission Faults •

    Arbitrary Failures Mode (Byzantine)
  41. LIVENESS AND SAFETY

  42. LIVENESS AND SAFETY PROPERTIES OF ALGORITHMS

  43. SAFETY Some “bad” thing does not happens during execution

  44. SAFETY “Communication links should not invent messages out of thin

    air”
  45. LIVENESS A “good” thing happens during execution

  46. LIVENESS “A destination process eventually delivers the message”

  47. LET’S TAKE A LOOK AT FLP1 1 - Fischer, Lynch,

    Paterson
  48. None
  49. IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS

  50. IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS

  51. IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS

  52. IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS

  53. IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS

  54. WHAT’S CONSENSUS ANYWAY?

  55. “THE CONSENSUS PROBLEM IS A PARADIGM OF AGREEMENT PROBLEMS” https://dl.acm.org/citation.cfm?id=1052796.1052806

  56. PROPERTIES OF CONSENSUS

  57. PROPERTIES OF CONSENSUS • C-Termination: Every correct process eventually decides

    on some value
  58. PROPERTIES OF CONSENSUS • C-Termination: Every correct process eventually decides

    on some value • C-Validity: If a process decides v, then v was proposed by some process
  59. PROPERTIES OF CONSENSUS • C-Termination: Every correct process eventually decides

    on some value • C-Validity: If a process decides v, then v was proposed by some process • C-Agreement: No two correct processes decide differently
  60. PROPERTIES OF UNIFORM CONSENSUS • C-Termination: Every correct process eventually

    decides on some value • C-Validity: If a process decides v, then v was proposed by some process • C-Agreement: No two correct processes decide differently • C-Uniform Agreement: No two processes (correct or not) decide differently.
  61. WE NEED CONSENSUS WHEN: A SET OF PROCESSES HAVE TO

    AGREE TO TAKE A COMMON ACTION
  62. WE NEED CONSENSUS WHEN: A SET OF PROCESSES HAVE TO

    AGREE TO TAKE A COMMON ACTION Atomic Broadcast
  63. WE NEED CONSENSUS WHEN: A SET OF PROCESSES HAVE TO

    AGREE TO TAKE A COMMON ACTION Atomic Broadcast Group Membership
  64. ATOMIC BROADCAST “CORRECT PROCESSES DELIVER THE SAME SET OF MESSAGES

    IN THE SAME ORDER”
  65. FLP TELLS US THAT IF CONSENSUS CANNOT BE ACHIEVED, THEN

    ATOMIC BROADCAST OR GROUP MEMBERSHIP CANNOT BE ACHIEVED EITHER
  66. SO, WE PACK OUR BAGS AND GO? NOTHING TO SEE

    HERE?
  67. STUMBLING OVER CONSENSUS RESEARCH: MISUNDERSTANDING AND ISSUES Marcos K. Aguilera

  68. FAILURE DETECTORS

  69. None
  70. FAILURE DETECTORS

  71. FAILURE DETECTORS • External process

  72. FAILURE DETECTORS • External process • Provides information about suspected

    processes
  73. FAILURE DETECTORS • External process • Provides information about suspected

    processes • Completeness property (crashed processes are detected)
  74. FAILURE DETECTORS • External process • Provides information about suspected

    processes • Completeness property (crashed processes are detected) • Accuracy (correct process are never suspected)
  75. “RUB SOME PERFECT FAILURE DETECTOR ON IT”

  76. http://www.amazon.com/Introduction-Reliable-Secure- Distributed-Programming/dp/3642152597 PERFECT FAILURE DETECTOR

  77. EVENTUALLY ACCURATE FAILURE DETECTOR

  78. EVENTUALLY ACCURATE FAILURE DETECTOR • Strong Completeness: Eventually, every process

    that crashes is permanently suspected by every correct process.
  79. EVENTUALLY ACCURATE FAILURE DETECTOR • Strong Completeness: Eventually, every process

    that crashes is permanently suspected by every correct process. • Eventual Weak Accuracy: There is a time after which some correct process is never suspected by the correct processes.
  80. EVENTUALLY ACCURATE FAILURE DETECTOR • Strong Completeness: Eventually, every process

    that crashes is permanently suspected by every correct process. • Eventual Weak Accuracy: There is a time after which some correct process is never suspected by the correct processes. http://dl.acm.org/citation.cfm?id=1052806
  81. None
  82. QUORUMS

  83. TL;DR: INTERSECTING SETS

  84. “A QUORUM IN A SYSTEM WITH N CRASH-FAULT PROCESS ABSTRACTIONS

    […] IS ANY MAJORITY OF PROCESSES, I.E., ANY SET OF MORE THAN N/2 PROCESSES” QUORUMS
  85. “IF F < N/2 PROCESSES FAIL BY CRASHING, THERE IS

    ALWAYS AT LEAST ONE QUORUM OF NONCRASHED PROCESSES IN SUCH SYSTEMS” QUORUMS
  86. CONSISTENCY

  87. None
  88. CONCURRENT FIFO QUEUE

  89. CONSISTENCY CONDITIONS

  90. CONSISTENCY CONDITIONS • Atomic Consistency (Linearizabilty)

  91. CONSISTENCY CONDITIONS • Atomic Consistency (Linearizabilty) • Sequential Consistency

  92. CONSISTENCY CONDITIONS • Atomic Consistency (Linearizabilty) • Sequential Consistency •

    Causal Consistency
  93. CONSISTENCY CONDITIONS • Atomic Consistency (Linearizabilty) • Sequential Consistency •

    Causal Consistency https://aphyr.com/posts/313-strong-consistency- models
  94. LINEARIZABILTY http://www.amazon.com/Distributed-Algorithms-Message-Passing-Systems-Michel/dp/ 3642381227/

  95. LINEARIZABILTY http://www.amazon.com/Distributed-Algorithms-Message-Passing-Systems-Michel/dp/ 3642381227/

  96. SOME BOOKS

  97. http://www.amazon.com/Distributed-Algorithms- Message-Passing-Systems-Michel/dp/3642381227/

  98. http://www.amazon.com/Fault-tolerant-Agreement- Synchronous-Message-passing-Distributed/dp/ 1608455254/

  99. http://www.amazon.com/Communication-Abstractions- Fault-tolerant-Asynchronous-Distributed/dp/160845293X/

  100. http://www.amazon.com/Distributed-Algorithms- Kaufmann-Management-Systems/dp/1558603484/

  101. http://www.amazon.com/Introduction-Reliable-Secure- Distributed-Programming/dp/3642152597

  102. http://www.amazon.com/Guide-Reliable-Distributed- Systems-High-Assurance/dp/1447124154/

  103. http://www.amazon.com/Replication-Practice-Lecture- Computer-Theoretical/dp/3642112935/

  104. FINDING NON PAYWALLED PAPERS

  105. CONCLUSION

  106. CONCLUSION • Deep Rabbit Hole

  107. CONCLUSION • Deep Rabbit Hole • Computing Science where Science

    is Still a Thing™
  108. CONCLUSION • Deep Rabbit Hole • Computing Science where Science

    is Still a Thing™ • History of the Field Matters
  109. CONCLUSION • Deep Rabbit Hole • Computing Science where Science

    is Still a Thing™ • History of the Field Matters • Read, read, read
  110. THANKS! @old_sound