Slide 1

Slide 1 text

From Mainframe to Microservice An Introduction to Distributed Systems @tyler_treat Workiva

Slide 2

Slide 2 text

An Introduction to Distributed Systems ❖ Building a foundation of understanding ❖ Why distributed systems? ❖ Universal fallacies ❖ Characteristics and the CAP theorem ❖ Common pitfalls ❖ Digging deeper ❖ Byzantine Generals Problem and consensus ❖ Split-brain ❖ Hybrid consistency models ❖ Scaling shared data and CRDTs

Slide 3

Slide 3 text

–Leslie Lamport “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”

Slide 4

Slide 4 text

Scale Up vs. Scale Out ❖ Add resources to a node ❖ Increases node capacity, load is unaffected ❖ System complexity unaffected Vertical Scaling ❖ Add nodes to a cluster ❖ Decreases load, capacity is unaffected ❖ Availability and throughput w/ increased complexity Horizontal Scaling

Slide 5

Slide 5 text

A distributed system
 is a collection of independent computers that behave as a single coherent system.

Slide 6

Slide 6 text

Why Distributed Systems? Availability Fault Tolerance Throughput Architecture Economics serve every request resilient to failures parallel computation decoupled, focused services scale-out becoming manageable/
 cost-effective

Slide 7

Slide 7 text

oh shit…

Slide 8

Slide 8 text

–Ken Arnold “You have to design distributed systems with the expectation of failure.”

Slide 9

Slide 9 text

Distributed systems engineers are
 the world’s biggest pessimists.

Slide 10

Slide 10 text

Universal Fallacy #1 The network is reliable. ❖ Message delivery is never guaranteed ❖ Best effort ❖ Is it worth it? ❖ Resiliency/redundancy/failover

Slide 11

Slide 11 text

Universal Fallacy #2 Latency is zero. ❖ We cannot defy the laws of physics ❖ LAN to WAN deteriorates quickly ❖ Minimize network calls (batch) ❖ Design asynchronous systems

Slide 12

Slide 12 text

Universal Fallacy #3 Bandwidth is infinite. ❖ Out of our control ❖ Limit message sizes ❖ Use message queueing

Slide 13

Slide 13 text

Universal Fallacy #4 The network is secure. ❖ Everyone is out to get you ❖ Build in security from day 1 ❖ Multi-layered ❖ Encrypt, pentest, train developers

Slide 14

Slide 14 text

Universal Fallacy #5 Topology doesn’t change. ❖ Network topology is dynamic ❖ Don’t statically address hosts ❖ Collection of services, not nodes ❖ Service discovery

Slide 15

Slide 15 text

Universal Fallacy #6 There is one administrator. ❖ May integrate with third-party systems ❖ “Is it our problem or theirs?” ❖ Conflicting policies/priorities ❖ Third parties constrain; weigh the risk

Slide 16

Slide 16 text

Universal Fallacy #7 Transport cost is zero. ❖ Monetary and practical costs ❖ Building/maintaining a network is not trivial ❖ The “perfect” system might be too costly

Slide 17

Slide 17 text

Universal Fallacy #8 The network is homogenous. ❖ Networks are almost never homogenous ❖ Third-party integration? ❖ Consider interoperability ❖ Avoid proprietary protocols

Slide 18

Slide 18 text

These problems apply to LAN and WAN systems
 (single-data-center and cross-data-center)
 No one is safe.

Slide 19

Slide 19 text

–Murphy’s Law “Anything that can go wrong will go wrong.”

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Characteristics of a Reliable Distributed System Fault-tolerant Available Scalable 
 Consistent Secure Performant nodes can fail serve all the requests, all the time behave correctly with changing topologies state is coordinated across nodes access is authenticated it’s fast!

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Distributed systems are all about trade-offs.

Slide 24

Slide 24 text

CAP Theorem ❖ Presented in 1998 by Eric Brewer ❖ Impossible to guarantee all three: ❖ Consistency ❖ Availability ❖ Partition tolerance

Slide 25

Slide 25 text

Consistency

Slide 26

Slide 26 text

Consistency ❖ Linearizable - there exists a total order of all state updates and each update appears atomic ❖ E.g. mutexes make operations appear atomic ❖ When operations are linearizable, we can assign a unique “timestamp” to each one (total order) ❖ A system is consistent if every node shares the same total order ❖ Consistency which is both global and instantaneous is impossible

Slide 27

Slide 27 text

Consistency Eventual consistency
 replicas allowed to diverge, eventually converge
 Strong consistency
 replicas can’t diverge; requires linearizability

Slide 28

Slide 28 text

Availability ❖ Every request received by a non-failing node must be served ❖ If a piece of data required for a request is unavailable, the system is unavailable ❖ 100% availability is a myth

Slide 29

Slide 29 text

Partition Tolerance ❖ A partition is a split in the network—many causes ❖ Partition tolerance means partitions can happen ❖ CA is easy when your network is perfectly reliable ❖ Your network is not perfectly reliable

Slide 30

Slide 30 text

Partition Tolerance

Slide 31

Slide 31 text

Common Pitfalls ❖ Halting failure - machine stops ❖ Network failure - network connection breaks ❖ Omission failure - messages are lost ❖ Timing failure - clock skew ❖ Byzantine failure - arbitrary failure

Slide 32

Slide 32 text

Exploring some higher-level concepts Digging Deeper

Slide 33

Slide 33 text

Byzantine Generals Problem ❖ Consider a city under siege by two allied armies ❖ Each army has a general ❖ One general is the leader ❖ Armies must agree when to attack ❖ Must use messengers to communicate ❖ Messengers can be captured by defenders

Slide 34

Slide 34 text

Byzantine Generals Problem

Slide 35

Slide 35 text

Byzantine Generals Problem ❖ Send 100 messages, attack no matter what ❖ A might attack without B ❖ Send 100 messages, wait for acks, attack if confident ❖ B might attack without A ❖ Messages have overhead ❖ Can’t reliably make decision (provenly impossible)

Slide 36

Slide 36 text

Distributed Consensus ❖ Replace 2 generals with N generals ❖ Nodes must agree on data value ❖ Solutions: ❖ Multi-phase commit ❖ State replication

Slide 37

Slide 37 text

Two-Phase Commit ❖ Blocking protocol ❖ Coordinator waits for cohorts ❖ Cohorts wait for commit/rollback ❖ Can deadlock

Slide 38

Slide 38 text

Three-Phase Commit ❖ Non-blocking protocol ❖ Abort on timeouts ❖ Susceptible to network partitions

Slide 39

Slide 39 text

State Replication ❖ E.g. Paxos, Raft protocols ❖ Elect a leader (coordinator) ❖ All changes go through leader ❖ Each change appends log entry ❖ Each node has log replica

Slide 40

Slide 40 text

State Replication ❖ Must have quorum (majority) to proceed ❖ Commit once quorum acks ❖ Quorums mitigate partitions ❖ Logs allow state to be rebuilt

Slide 41

Slide 41 text

Split-Brain

Slide 42

Slide 42 text

Split-Brain

Slide 43

Slide 43 text

Split-Brain

Slide 44

Slide 44 text

Split-Brain ❖ Optimistic (AP) - let partitions work as usual ❖ Pessimistic (CP) - quorum partition works, fence others

Slide 45

Slide 45 text

Hybrid Consistency Models ❖ Weak == available, low latency, stale reads ❖ Strong == fresh reads, less available, high latency ❖ How do you choose a consistency model? ❖ Hybrid models ❖ Weaker models when possible (likes, followers, votes) ❖ Stronger models when necessary ❖ Tunable consistency models (Cassandra, Riak, etc.)

Slide 46

Slide 46 text

Scaling Shared Data ❖ Sharing mutable data at large scale is difficult ❖ Solutions: ❖ Immutable data ❖ Last write wins ❖ Application-level conflict resolution ❖ Causal ordering (e.g. vector clocks) ❖ Distributed data types (CRDTs)

Slide 47

Slide 47 text

Scaling Shared Data Imagine a shared, global counter…
 “Get, add 1, and put” transaction will not scale

Slide 48

Slide 48 text

CRDT ❖ Conflict-free Replicated Data Type ❖ Convergent: state-based ❖ Commutative: operations-based ❖ E.g. distributed sets, lists, maps, counters ❖ Update concurrently w/o writer coordination

Slide 49

Slide 49 text

CRDT ❖ CRDTs always converge (provably) ❖ Operations commute (order doesn’t matter) ❖ Highly available, eventually consistent ❖ Always reach consistent state ❖ Drawbacks: ❖ Requires knowledge of all clients ❖ Must be associative, commutative, and idempotent

Slide 50

Slide 50 text

G-Counter

Slide 51

Slide 51 text

CRDT ❖ Add to set is associative, commutative, idempotent ❖ add(“a”), add(“b”), add(“a”) => {“a”, “b”} ❖ Adding and removing items is not ❖ add(“a”), remove(“a”) => {} ❖ remove(“a”), add(“a”) => {“a”} ❖ CRDTs require interpretation of common data structures w/ limitations

Slide 52

Slide 52 text

Two-Phase Set ❖ Use two sets, one for adding, one for removing ❖ Elements can be added once and removed once ❖ {
 “a”: [“a”, “b”, “c”],
 “r”: [“a”]
 } ❖ => {“b”, “c”} ❖ add(“a”), remove(“a”) => {“a”: [“a”], “r”: [“a”]} ❖ remove(“a”), add(“a”) => {“a”: [“a”], “r”: [“a”]}

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

Let’s Recap...

Slide 55

Slide 55 text

Distributed architectures allow us to build highly available, fault-tolerant systems.

Slide 56

Slide 56 text

We can't live in this fantasy land
 where everything works perfectly
 all of the time.

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

Shit happens — network partitions, hardware failure, GC pauses, latency, dropped packets…

Slide 60

Slide 60 text

Build resilient systems.

Slide 61

Slide 61 text

Design for failure.

Slide 62

Slide 62 text

kill -9

Slide 63

Slide 63 text

Consider the trade-off between consistency and availability.

Slide 64

Slide 64 text

Partition tolerance is not an option,
 it’s required.
 
 (if you’re building a distributed system)

Slide 65

Slide 65 text

Use weak consistency when possible, strong when necessary.

Slide 66

Slide 66 text

Sharing data at scale is hard,
 let’s go shopping.
 
 (or consider your options)

Slide 67

Slide 67 text

State is hell.

Slide 68

Slide 68 text

Further Readings ❖ Jepsen series
 Kyle Kingsbury (aphyr) ❖ A Comprehensive Study of Convergent and Commutative Replicated Data Types
 Shapiro et al. ❖ In Search of an Understandable Consensus Algorithm
 Ongaro et al. ❖ CAP Twelve Years Later
 Eric Brewer ❖ Many, many more…

Slide 69

Slide 69 text

@tyler_treat github.com/tylertreat
 bravenewgeek.com Thanks!