August 18, 2016
620

# Vivaldi: Decentralized Network Coordinates

Large scale distributed systems can use round trip time estimates between peers to make intelligent decisions about request routing, data replication, and failure handling. Vivaldi is a distributed algorithm for computing network coordinates for a large set of peers efficiently. In this talk, we motivate the need for network coordinates and introduce the Vivaldi algorithm. We do a brief survey of interesting extensions and related work, both to understand how to use Vivaldi in the wild and to understand the source of errors in it's modeling. Lastly we talk about how Vivaldi is used in the Serf and Consul tools to solve user problems.

August 18, 2016

## Transcript

6. ### HASHICORP Euclidean Coordinates p1 = {x: 1, y: 2, z:

3} p2 = {x: 4, y: 5, z: 6} dist(p1, p2) = sqrt((p2.x-p1.x)^2 + (p2.y-p1.y)^2 + (p2.z-p1.z)^2)
7. ### HASHICORP Euclidean Space Euclidean Distance deﬁned in Euclidean Space Cartesian

Coordinates {x, y, z} are Euclidean

p2) = ?
9. ### HASHICORP Network Space p1 = ipv4(1.2.3.4) p2 = ipv4(5.6.7.8) dist(p1,

p2) = rtt(p1, p2)

Application
11. ### HASHICORP Network Distance? Nearest Neighbor Routing Web Server API Server

API Server API Server

13. ### HASHICORP Network Space p1 = ipv4(1.2.3.4) p2 = ipv4(5.6.7.8) dist(p1,

p2) = rtt(p1, p2) ping?
14. ### HASHICORP Ping Problem Suppose you have 20K+ peers (BitTorrent) Pair-wise

distance from {PeerN, PeerM} requires N2 Probes Samples = 3 Probes = 1.2B Storage = 9.6GB (double)
15. ### HASHICORP Ping Representation Ping creates a matrix of pairwise latency

dist(p1, p2) = rtt(p1, p2) rtt(p1, p2) = pairwise[p1][p2]
16. ### HASHICORP Cartesian Representation Cartesian Coordinates allow us to exploit Pythagorean

Theorem a2 + b2 = c2
17. ### HASHICORP Vivaldi Decentralized Network Coordinates Frank Dabek, Russ Cox, Frans

Kaashoek, Robert Morris
18. ### HASHICORP Vivaldi Pairwise connect peers with a spring Spring’s natural

length is the RTT Compress down all peers to the origin and then relax

25. ### HASHICORP Vivaldi Coordinates provide predictive model Communication between nodes updates

the model Coordinates converge over time
26. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = euclidean_dist(local,remote) err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
27. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
28. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
29. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
30. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = local + scaled_direction * sensitivity
31. ### HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

0.25 var local = {x: -12.5, y: 75, z: 100} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = {x: -12.5, y: 75, z: 100}
32. ### HASHICORP Vivaldi const sensitivity changes how rapidly we adjust Large

value = fast to update, but unstable Small value = slow to converge, but stable Dynamic value?
33. ### HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity
34. ### HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Remote Error => Low Sensitivity
35. ### HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Local Error => High Sensitivity
36. ### HASHICORP Vivaldi Each node tracks position and error estimate Coordinate

converges over time Local error goes does as estimates become accurate Several tuning parameters, including dimensionality
37. ### HASHICORP Dimensionality Coordinates can be in any Euclidean Space 2D,

3D, or N Dimensions? Principle Component Analysis (PCA) to reduce dimensions
38. ### HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun

12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees
39. ### HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun

12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees
40. ### HASHICORP Dimensionality Performance dramatically reduced below 2D Marginal improvement past

5D Depends on the complexity of the underlying topology
41. ### HASHICORP Height / Fixed Costs Application Userspace Runtime Operating System

Hypervisor Network Card Fixed Cost 0.5 msec
42. ### HASHICORP Coordinate + Height Allows coordinates to model non-ﬁxed latency

Improves the predictive power of the coordinates Reduces the dimensionality required RTT = dist(p1, p2) + p1.Height + p2.Height

44. ### HASHICORP Network Coordinates in the Wild Azureus BitTorrent Client (10K+

clients) Dimensionality Analysis in the Wild Latency and Update Filters Churn, Drift, Intrinsic Error, Latency Variation Ledlie, Gardner, and Seltzer

47. ### HASHICORP Gravity Applying small “gravity” toward origin Prevents run away

coordinates Cluster can still “rotate” about the origin
48. ### HASHICORP On Suitability of Euclidean Embedding for Host-based Network Coordinate

Systems Lee, Zhang, Sahu, Saha Analysis of Triangle Inequality Violations (TIV) - Intrinsic Error Understanding source of TIV Adjustment factor to compensate 7D < 2D + Adjustment
49. ### HASHICORP Triangle Inequality Violation Server 1 Server 2 Server 3

Core Router Top of Rack Switch Top of Rack Switch c < a + b Server 1 -> Server 2 : 0.1 msec Server 2 -> Server 3 : 0.3 msec Server 1 -> Server 3 : 0.3 msec Packet Processing Time > Transit Time
50. ### HASHICORP TIV Adjustment Track the estimation error from measurement Adjustment

is the average over a sample window Adjustment (local and remote) is added to estimates

52. ### HASHICORP Serf Serf is a decentralized solution for cluster membership,

failure detection, and orchestration. Built on gossip protocol (SWIM) Runs at 10K+ node scale https://serf.io
53. ### HASHICORP Serf Assign a coordinate to each node? Applications can

leverage for intelligent routing, peer selection, etc Gossip is doing background communication

56. ### HASHICORP Serf Attach Coordinate to Ack messages RTT computed from

the send time of Ping Coordinates of peers cached Random peers avoid selection bias
57. ### HASHICORP Serf Implementation uses 8D + Height 20 Sample Adjustment

Term 3 Sample Latency Filter Small Gravity Coordinate Snapshotting
58. ### HASHICORP Estimated n1 <-> n2 rtt: 0.610 ms demo 

master \$ serf rtt n1 n2 demo  master Estimated n1 <-> n2 rtt: 0.610 ms \$ serf rtt n2 # Running from n1

60. ### HASHICORP Consul Consul is a solution for service discovery, monitoring,

conﬁguration and orchestration. Built on Serf + Raft (Paxos) Runs at 50K+ node scale https://consul.io
61. ### HASHICORP Consul Serf is already computing coordinates Coordinates are periodically

pushed to central servers Servers expose the coordinates over APIs Nearest neighbor routing, datacenter failover, etc.
62. ### Terminal HASHICORP \$ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6

rtt: 0.781 ms (using LAN coordinates)\$ \$ sleep 30 \$ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt: 0.719 ms (using LAN coordinates)
63. ### Terminal HASHICORP \$ curl localhost:8500/v1/catalog/nodes? near=node-78r16zb3q | jq '.[].Node' "node-78r16zb3q"

"node-10-0-4-190" "node-10-0-1-7" "node-10-0-4-240" \$ curl localhost:8500/v1/catalog/service/vault? near=node-78r16zb3q | jq '.[].Node' "node-10-0-1-71" "node-10-0-3-119" "node-10-0-3-249"
64. ### HASHICORP Conclusion Vivaldi provides a decentralized algorithm for coordinates Networks

not Euclidean, leads to TIV Interesting uses in distributed systems Serf and Consul expose via APIs