Slide 1

Slide 1 text

Topics in Distributed Systems Arnon Rotem-Gal-Oz

Slide 2

Slide 2 text

What’s a “distributed system”? You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done. —LESLIE LAMPORT

Slide 3

Slide 3 text

Your mission, should you choose to accept it: • Read data from one “place” • Write it to another “place”

Slide 4

Slide 4 text

mov eax, [ebx] mov [ecx],eax (try (let [[partitioner msg] (channel/pull chan)] (kp/send-message @producer (kp/message topic (.getBytes ^String partitioner) (.getBytes ^String msg))) (counter-fn)) (catch Exception ex …

Slide 5

Slide 5 text

System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel® Optane™ DC persistent memory access ~350 ns 15 min Intel® Optane™ DC SSD I/O <10 μs 7 hrs NVMe SSD I/O ~25 μs 17 hrs SSD I/O 50–150 μs 1.5–4 days Rotational disk I/O 1–10 ms 1–9 months Internet call: San Francisco to New York City 65 ms 5 years Internet call: San Francisco to Hong Kong 141 ms 11 years Systems Performance: Enterprise and the Cloud, Brendan

Slide 6

Slide 6 text

System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel® Optane™ DC persistent memory access ~350 ns 15 min Intel® Optane™ DC SSD I/O <10 μs 7 hrs NVMe SSD I/O ~25 μs 17 hrs SSD I/O 50–150 μs 1.5–4 days Rotational disk I/O 1–10 ms 1–9 months Internet call: San Francisco to New York City 65 ms 5 years Internet call: San Francisco to Hong Kong 141 ms 11 years Systems Performance: Enterprise and the Cloud, Brendan

Slide 7

Slide 7 text

System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel® Optane™ DC persistent memory access ~350 ns 15 min Intel® Optane™ DC SSD I/O <10 μs 7 hrs NVMe SSD I/O ~25 μs 17 hrs SSD I/O 50–150 μs 1.5–4 days Rotational disk I/O 1–10 ms 1–9 months Internet call: San Francisco to New York City 65 ms 5 years Internet call: San Francisco to Hong Kong 141 ms 11 years Systems Performance: Enterprise and the Cloud, Brendan mov eax, [ebx] mov [ecx],eax (try (let [[partitioner msg] (cha (kp/send-message @pr message topic (.getBytes ^Str (.getBytes ^String msg)))

Slide 8

Slide 8 text

Request network

Slide 9

Slide 9 text

The network is reliable skb rides the rocket…

Slide 10

Slide 10 text

Latency is zero

Slide 11

Slide 11 text

System Event Actual Latency Scaled Latency One CPU cycle 0.4 ns 1 s Level 1 cache access 0.9 ns 2 s Level 2 cache access 2.8 ns 7 s Level 3 cache access 28 ns 1 min Main memory access (DDR DIMM) ~100 ns 4 min Intel® Optane™ DC persistent memory access ~350 ns 15 min Intel® Optane™ DC SSD I/O <10 μs 7 hrs NVMe SSD I/O ~25 μs 17 hrs SSD I/O 50–150 μs 1.5–4 days Rotational disk I/O 1–10 ms 1–9 months Internet call: San Francisco to New York City 65 ms 5 years Internet call: San Francisco to Hong Kong 141 ms 11 years Systems Performance: Enterprise and the Cloud, Brendan

Slide 12

Slide 12 text

Bandwidth is infinite

Slide 13

Slide 13 text

The network is secure

Slide 14

Slide 14 text

Topology doesn’t change

Slide 15

Slide 15 text

There is one administrator

Slide 16

Slide 16 text

Transport cost is zero

Slide 17

Slide 17 text

Network is homogeneous

Slide 18

Slide 18 text

Instances are free

Slide 19

Slide 19 text

Instances have 
 identities

Slide 20

Slide 20 text

Latency is zero

Slide 21

Slide 21 text

Latency is constant

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

What’s ”Happened Before”?

Slide 24

Slide 24 text

Concurrent Q A B X Y Firewall blocks all traffic: P can’t communicate to Q P

Slide 25

Slide 25 text

P Q A B P sends M Q receives M X Causal reation

Slide 26

Slide 26 text

Q LogicalClockQ A P sends M 0 1 2 3 0 1 2 Q receives M B X P LogicalClockP

Slide 27

Slide 27 text

Q computes: LogicalClockQ = max(0, 3) + 1 P LogicalClockP Q LogicalClockQ A P sends M 0 1 2 3 0 1 4 5 Q receives M B LogicalClockM = 3 X Y

Slide 28

Slide 28 text

Counter

Slide 29

Slide 29 text

Counter take 2

Slide 30

Slide 30 text

Decrements?

Slide 31

Slide 31 text

Sets ?

Slide 32

Slide 32 text

• Don’t take distributed actions lightly • Be careful when using abstractions that hide distributed calls • Big data means low- probability problems are daily occurances

Slide 33

Slide 33 text

Read more • Fallacies of distributed computing • Vector clocks • CRDTs - https://www.serverless.com/blog/crdt-explained- supercharge-serverless-at-edge • https://bartoszsypytkowski.com/the-state-of-a-state-based-crdts/ • Google Spanner https://static.googleusercontent.com/media/ research.google.com/en//archive/spanner-osdi2012.pdf • https://research.google/pubs/pub45855/