A sample Application
Config System
Config System
Config System
Service Service
Service
Service Service
Service
Slide 5
Slide 5 text
A sample Application
Config System
Config System
Config System
Service Service
Service
Service Service
Service
Slide 6
Slide 6 text
A sample Application
Config System
Config System
Config System
Service Service
Service
Service Service
Service
Slide 7
Slide 7 text
A sample Application
Config System
Config System
Config System
Service Service
Service
Service Service
Service
Slide 8
Slide 8 text
A sample Application
Service Service
Service
Service Service
Service
Config System
Config System
Config System
Slide 9
Slide 9 text
We all build
Distributed Systems
Slide 10
Slide 10 text
Distributed Systems Are hard
ASYNCHRONY
No bound on message delay, clock drift, or the time
necessary to execute a step. No timing assumptions
Component(s) and communication attempts may
fail during execution
PARTIAL FAILURE
Slide 11
Slide 11 text
academic
Papers
Slide 12
Slide 12 text
“Consensus is unsolvable
[in asynchronous networks]
because accurate failure
detection is impossible”
- Impossibility of Distributed Consensus with One Faulty Process
Slide 13
Slide 13 text
https://www.flickr.com/photos/benny_lin/191393604
Slide 14
Slide 14 text
1996
Slide 15
Slide 15 text
“We believe that unreliable failure
detectors can be used to bridge
the gap between known
impossibility results and the need
for practical solutions for fault-
tolerant asynchronous systems.”
Slide 16
Slide 16 text
Failure Detectors
Every crashed process is
eventually suspected by a
correct process
Not enough on its own
(A paranoid failure
detector is complete)
Some correct process is
never suspected by a
correct process
Restricts mistakes that
can be made by the failure
detector
COMPLETENESS ACCURACY
Slide 17
Slide 17 text
Failure Detectors
ACCURACY
STRONG
WEAK
STRONG
COMPLETENESS
WEAK EVENTUALLY
STRONG
EVENTUALLY
WEAK
PERFECT STRONG
WEAK
EVENTUALLY
PERFECT
EVENTUALLY
STRONG
EVENTUALLY
WEAK
Slide 18
Slide 18 text
Failure Detectors
ACCURACY
STRONG
WEAK
STRONG
COMPLETENESS
WEAK EVENTUALLY
STRONG
EVENTUALLY
WEAK
PERFECT STRONG
WEAK
EVENTUALLY
PERFECT
EVENTUALLY
STRONG
EVENTUALLY
WEAK
Slide 19
Slide 19 text
CLOCK SYNCHRONIZATION
BYZANTINE GENERALS
NON-BLOCKING ATOMIC COMMIT
CONSENSUS
ATOMIC BROADCAST
RELIABLE BROADCAST
ASYNCHRONOUS SYSTEM
ASYNCHRONOUS SYSTEM &
EVENTUALLY WEAK FAILURE
DETECTOR
ASYNCHRONOUS SYSTEM &
PERFECT FAILURE DETECTOR
SYNCHRONOUS SYSTEM
Failure Detectors & problems
Slide 20
Slide 20 text
CLOCK SYNCHRONIZATION
BYZANTINE GENERALS PROBLEM
NON-BLOCKING ATOMIC COMMIT
Problem Solvability
CONSENSUS & ATOMIC BROADCAST
RELIABLE BROADCAST
Slide 21
Slide 21 text
We now have a mechanism
for solving consensus in
asynchronous systems with
crash failures & a way to
classify problem solvability
Slide 22
Slide 22 text
Embrace the nature of the asynchronous
model. Trying to find solutions that are perfect (don’t
assume failure) will give you a system that is fragile or
impossible to build.
Embrace academic research, it helps you
understand what is and isn’t possible.
Don’t waste time trying to build
impossible systems that are bound to fail!
Tl;DR
Slide 23
Slide 23 text
Where to start?
Slide 24
Slide 24 text
References & love @ github.com/Randommood/Velocity2016
@Caitie & @Randommood
Thank
you!