Slide 1

Slide 1 text

HASHICORP Vivaldi A Decentralized Network Coordinate System

Slide 2

Slide 2 text

HASHICORP Armon Dadgar @armon

Slide 3

Slide 3 text

HASHICORP

Slide 4

Slide 4 text

HASHICORP

Slide 5

Slide 5 text

HASHICORP Network Coordinates

Slide 6

Slide 6 text

HASHICORP Euclidean Coordinates p1 = {x: 1, y: 2, z: 3} p2 = {x: 4, y: 5, z: 6} dist(p1, p2) = sqrt((p2.x-p1.x)^2 + (p2.y-p1.y)^2 + (p2.z-p1.z)^2)

Slide 7

Slide 7 text

HASHICORP Euclidean Space Euclidean Distance defined in Euclidean Space Cartesian Coordinates {x, y, z} are Euclidean

Slide 8

Slide 8 text

HASHICORP Network Space p1 = ipv4(1.2.3.4) p2 = ipv4(5.6.7.8) dist(p1, p2) = ?

Slide 9

Slide 9 text

HASHICORP Network Space p1 = ipv4(1.2.3.4) p2 = ipv4(5.6.7.8) dist(p1, p2) = rtt(p1, p2)

Slide 10

Slide 10 text

HASHICORP Network Distance? Peer Peer Seed Seed Peer Seed P2P Application

Slide 11

Slide 11 text

HASHICORP Network Distance? Nearest Neighbor Routing Web Server API Server API Server API Server

Slide 12

Slide 12 text

HASHICORP Network Distance? Datacenter Failover

Slide 13

Slide 13 text

HASHICORP Network Space p1 = ipv4(1.2.3.4) p2 = ipv4(5.6.7.8) dist(p1, p2) = rtt(p1, p2) ping?

Slide 14

Slide 14 text

HASHICORP Ping Problem Suppose you have 20K+ peers (BitTorrent) Pair-wise distance from {PeerN, PeerM} requires N2 Probes Samples = 3 Probes = 1.2B Storage = 9.6GB (double)

Slide 15

Slide 15 text

HASHICORP Ping Representation Ping creates a matrix of pairwise latency dist(p1, p2) = rtt(p1, p2) rtt(p1, p2) = pairwise[p1][p2]

Slide 16

Slide 16 text

HASHICORP Cartesian Representation Cartesian Coordinates allow us to exploit Pythagorean Theorem a2 + b2 = c2

Slide 17

Slide 17 text

HASHICORP Vivaldi Decentralized Network Coordinates Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris

Slide 18

Slide 18 text

HASHICORP Vivaldi Pairwise connect peers with a spring Spring’s natural length is the RTT Compress down all peers to the origin and then relax

Slide 19

Slide 19 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 20

Slide 20 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 21

Slide 21 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 22

Slide 22 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 23

Slide 23 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 24

Slide 24 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer

Slide 25

Slide 25 text

HASHICORP Vivaldi Coordinates provide predictive model Communication between nodes updates the model Coordinates converge over time

Slide 26

Slide 26 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = euclidean_dist(local,remote) err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity

Slide 27

Slide 27 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity

Slide 28

Slide 28 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity

Slide 29

Slide 29 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity

Slide 30

Slide 30 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = local + scaled_direction * sensitivity

Slide 31

Slide 31 text

HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity = 0.25 var local = {x: -12.5, y: 75, z: 100} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = {x: -12.5, y: 75, z: 100}

Slide 32

Slide 32 text

HASHICORP Vivaldi const sensitivity changes how rapidly we adjust Large value = fast to update, but unstable Small value = slow to converge, but stable Dynamic value?

Slide 33

Slide 33 text

HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25 var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity

Slide 34

Slide 34 text

HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25 var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Remote Error => Low Sensitivity

Slide 35

Slide 35 text

HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25 var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Local Error => High Sensitivity

Slide 36

Slide 36 text

HASHICORP Vivaldi Each node tracks position and error estimate Coordinate converges over time Local error goes does as estimates become accurate Several tuning parameters, including dimensionality

Slide 37

Slide 37 text

HASHICORP Dimensionality Coordinates can be in any Euclidean Space 2D, 3D, or N Dimensions? Principle Component Analysis (PCA) to reduce dimensions

Slide 38

Slide 38 text

HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun 12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees

Slide 39

Slide 39 text

HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun 12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees

Slide 40

Slide 40 text

HASHICORP Dimensionality Performance dramatically reduced below 2D Marginal improvement past 5D Depends on the complexity of the underlying topology

Slide 41

Slide 41 text

HASHICORP Height / Fixed Costs Application Userspace Runtime Operating System Hypervisor Network Card Fixed Cost 0.5 msec

Slide 42

Slide 42 text

HASHICORP Coordinate + Height Allows coordinates to model non-fixed latency Improves the predictive power of the coordinates Reduces the dimensionality required RTT = dist(p1, p2) + p1.Height + p2.Height

Slide 43

Slide 43 text

HASHICORP Extensions to Vivaldi

Slide 44

Slide 44 text

HASHICORP Network Coordinates in the Wild Azureus BitTorrent Client (10K+ clients) Dimensionality Analysis in the Wild Latency and Update Filters Churn, Drift, Intrinsic Error, Latency Variation Ledlie, Gardner, and Seltzer

Slide 45

Slide 45 text

HASHICORP Drift Peer Peer Peer Peer Peer

Slide 46

Slide 46 text

HASHICORP Drift Peer Peer Peer Peer Peer

Slide 47

Slide 47 text

HASHICORP Gravity Applying small “gravity” toward origin Prevents run away coordinates Cluster can still “rotate” about the origin

Slide 48

Slide 48 text

HASHICORP On Suitability of Euclidean Embedding for Host-based Network Coordinate Systems Lee, Zhang, Sahu, Saha Analysis of Triangle Inequality Violations (TIV) - Intrinsic Error Understanding source of TIV Adjustment factor to compensate 7D < 2D + Adjustment

Slide 49

Slide 49 text

HASHICORP Triangle Inequality Violation Server 1 Server 2 Server 3 Core Router Top of Rack Switch Top of Rack Switch c < a + b Server 1 -> Server 2 : 0.1 msec Server 2 -> Server 3 : 0.3 msec Server 1 -> Server 3 : 0.3 msec Packet Processing Time > Transit Time

Slide 50

Slide 50 text

HASHICORP TIV Adjustment Track the estimation error from measurement Adjustment is the average over a sample window Adjustment (local and remote) is added to estimates

Slide 51

Slide 51 text

HASHICORP Serf Implementation

Slide 52

Slide 52 text

HASHICORP Serf Serf is a decentralized solution for cluster membership, failure detection, and orchestration. Built on gossip protocol (SWIM) Runs at 10K+ node scale https://serf.io

Slide 53

Slide 53 text

HASHICORP Serf Assign a coordinate to each node? Applications can leverage for intelligent routing, peer selection, etc Gossip is doing background communication

Slide 54

Slide 54 text

HASHICORP Failure Detection Peer Peer Ping

Slide 55

Slide 55 text

HASHICORP Failure Detection Peer Peer Ack

Slide 56

Slide 56 text

HASHICORP Serf Attach Coordinate to Ack messages RTT computed from the send time of Ping Coordinates of peers cached Random peers avoid selection bias

Slide 57

Slide 57 text

HASHICORP Serf Implementation uses 8D + Height 20 Sample Adjustment Term 3 Sample Latency Filter Small Gravity Coordinate Snapshotting

Slide 58

Slide 58 text

HASHICORP Estimated n1 <-> n2 rtt: 0.610 ms demo  master $ serf rtt n1 n2 demo  master Estimated n1 <-> n2 rtt: 0.610 ms $ serf rtt n2 # Running from n1

Slide 59

Slide 59 text

HASHICORP Consul Usage

Slide 60

Slide 60 text

HASHICORP Consul Consul is a solution for service discovery, monitoring, configuration and orchestration. Built on Serf + Raft (Paxos) Runs at 50K+ node scale https://consul.io

Slide 61

Slide 61 text

HASHICORP Consul Serf is already computing coordinates Coordinates are periodically pushed to central servers Servers expose the coordinates over APIs Nearest neighbor routing, datacenter failover, etc.

Slide 62

Slide 62 text

Terminal HASHICORP $ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt: 0.781 ms (using LAN coordinates)$ $ sleep 30 $ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt: 0.719 ms (using LAN coordinates)

Slide 63

Slide 63 text

Terminal HASHICORP $ curl localhost:8500/v1/catalog/nodes? near=node-78r16zb3q | jq '.[].Node' "node-78r16zb3q" "node-10-0-4-190" "node-10-0-1-7" "node-10-0-4-240" $ curl localhost:8500/v1/catalog/service/vault? near=node-78r16zb3q | jq '.[].Node' "node-10-0-1-71" "node-10-0-3-119" "node-10-0-3-249"

Slide 64

Slide 64 text

HASHICORP Conclusion Vivaldi provides a decentralized algorithm for coordinates Networks not Euclidean, leads to TIV Interesting uses in distributed systems Serf and Consul expose via APIs

Slide 65

Slide 65 text

HASHICORP Thanks! Q/A