Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vivaldi: Decentralized Network Coordinates

Vivaldi: Decentralized Network Coordinates

Large scale distributed systems can use round trip time estimates between peers to make intelligent decisions about request routing, data replication, and failure handling. Vivaldi is a distributed algorithm for computing network coordinates for a large set of peers efficiently. In this talk, we motivate the need for network coordinates and introduce the Vivaldi algorithm. We do a brief survey of interesting extensions and related work, both to understand how to use Vivaldi in the wild and to understand the source of errors in it's modeling. Lastly we talk about how Vivaldi is used in the Serf and Consul tools to solve user problems.

Armon Dadgar

August 18, 2016
Tweet

More Decks by Armon Dadgar

Other Decks in Technology

Transcript

  1. HASHICORP Euclidean Coordinates p1 = {x: 1, y: 2, z:

    3} p2 = {x: 4, y: 5, z: 6} dist(p1, p2) = sqrt((p2.x-p1.x)^2 + (p2.y-p1.y)^2 + (p2.z-p1.z)^2)
  2. HASHICORP Ping Problem Suppose you have 20K+ peers (BitTorrent) Pair-wise

    distance from {PeerN, PeerM} requires N2 Probes Samples = 3 Probes = 1.2B Storage = 9.6GB (double)
  3. HASHICORP Ping Representation Ping creates a matrix of pairwise latency

    dist(p1, p2) = rtt(p1, p2) rtt(p1, p2) = pairwise[p1][p2]
  4. HASHICORP Vivaldi Pairwise connect peers with a spring Spring’s natural

    length is the RTT Compress down all peers to the origin and then relax
  5. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = euclidean_dist(local,remote) err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
  6. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = rtt - estimate direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
  7. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = unitVector(local - remote) scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
  8. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = direction_of_err * err local = local + scaled_direction * sensitivity
  9. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: 0, y: 0, z: 0} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = local + scaled_direction * sensitivity
  10. HASHICORP Vivaldi Peer Peer Peer Peer Peer const sensitivity =

    0.25 var local = {x: -12.5, y: 75, z: 100} var remote = {x: 0, y: 0, z: 0} def update(rtt=500msec, remote): estimate = 0msec err = 500msec direction_of_err = {x: -0.1, y: 0.6, z: 0.8} scaled_direction = {x: -50, y: 300, z: 400} local = {x: -12.5, y: 75, z: 100}
  11. HASHICORP Vivaldi const sensitivity changes how rapidly we adjust Large

    value = fast to update, but unstable Small value = slow to converge, but stable Dynamic value?
  12. HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

    var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity
  13. HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

    var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Remote Error => Low Sensitivity
  14. HASHICORP Vivaldi const error_sensitivity_adj = 0.25 const position_sensitivity_adj = 0.25

    var local_err = 1000msec def update(rtt, remote, remote_err): … balance_err = local_err / (local_err + remote_err) rel_err = (estimate - rtt) / rtt local_err = rel_err * error_sensitivity_adj * balance_err + local_err * (1-error_sensitivity_adj*balance_err) sensitivity = position_sensitivity_adj * balance_err local = local + scaled_direction * sensitivity High Local Error => High Sensitivity
  15. HASHICORP Vivaldi Each node tracks position and error estimate Coordinate

    converges over time Local error goes does as estimates become accurate Several tuning parameters, including dimensionality
  16. HASHICORP Dimensionality Coordinates can be in any Euclidean Space 2D,

    3D, or N Dimensions? Principle Component Analysis (PCA) to reduce dimensions
  17. HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun

    12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees
  18. HASHICORP Dimensionality Reduction Time of Day Brightness Angle of Sun

    12PM Very Bright 90 degrees 3PM Very Bright 80 degrees 9PM Very Dark 0 degrees 12AM Very Dark 0 degrees
  19. HASHICORP Coordinate + Height Allows coordinates to model non-fixed latency

    Improves the predictive power of the coordinates Reduces the dimensionality required RTT = dist(p1, p2) + p1.Height + p2.Height
  20. HASHICORP Network Coordinates in the Wild Azureus BitTorrent Client (10K+

    clients) Dimensionality Analysis in the Wild Latency and Update Filters Churn, Drift, Intrinsic Error, Latency Variation Ledlie, Gardner, and Seltzer
  21. HASHICORP Gravity Applying small “gravity” toward origin Prevents run away

    coordinates Cluster can still “rotate” about the origin
  22. HASHICORP On Suitability of Euclidean Embedding for Host-based Network Coordinate

    Systems Lee, Zhang, Sahu, Saha Analysis of Triangle Inequality Violations (TIV) - Intrinsic Error Understanding source of TIV Adjustment factor to compensate 7D < 2D + Adjustment
  23. HASHICORP Triangle Inequality Violation Server 1 Server 2 Server 3

    Core Router Top of Rack Switch Top of Rack Switch c < a + b Server 1 -> Server 2 : 0.1 msec Server 2 -> Server 3 : 0.3 msec Server 1 -> Server 3 : 0.3 msec Packet Processing Time > Transit Time
  24. HASHICORP TIV Adjustment Track the estimation error from measurement Adjustment

    is the average over a sample window Adjustment (local and remote) is added to estimates
  25. HASHICORP Serf Serf is a decentralized solution for cluster membership,

    failure detection, and orchestration. Built on gossip protocol (SWIM) Runs at 10K+ node scale https://serf.io
  26. HASHICORP Serf Assign a coordinate to each node? Applications can

    leverage for intelligent routing, peer selection, etc Gossip is doing background communication
  27. HASHICORP Serf Attach Coordinate to Ack messages RTT computed from

    the send time of Ping Coordinates of peers cached Random peers avoid selection bias
  28. HASHICORP Serf Implementation uses 8D + Height 20 Sample Adjustment

    Term 3 Sample Latency Filter Small Gravity Coordinate Snapshotting
  29. HASHICORP Estimated n1 <-> n2 rtt: 0.610 ms demo 

    master $ serf rtt n1 n2 demo  master Estimated n1 <-> n2 rtt: 0.610 ms $ serf rtt n2 # Running from n1
  30. HASHICORP Consul Consul is a solution for service discovery, monitoring,

    configuration and orchestration. Built on Serf + Raft (Paxos) Runs at 50K+ node scale https://consul.io
  31. HASHICORP Consul Serf is already computing coordinates Coordinates are periodically

    pushed to central servers Servers expose the coordinates over APIs Nearest neighbor routing, datacenter failover, etc.
  32. Terminal HASHICORP $ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6

    rtt: 0.781 ms (using LAN coordinates)$ $ sleep 30 $ consul rtt node-10-0-1-8 Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt: 0.719 ms (using LAN coordinates)
  33. Terminal HASHICORP $ curl localhost:8500/v1/catalog/nodes? near=node-78r16zb3q | jq '.[].Node' "node-78r16zb3q"

    "node-10-0-4-190" "node-10-0-1-7" "node-10-0-4-240" $ curl localhost:8500/v1/catalog/service/vault? near=node-78r16zb3q | jq '.[].Node' "node-10-0-1-71" "node-10-0-3-119" "node-10-0-3-249"
  34. HASHICORP Conclusion Vivaldi provides a decentralized algorithm for coordinates Networks

    not Euclidean, leads to TIV Interesting uses in distributed systems Serf and Consul expose via APIs