Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vivaldi: Decentralized Network Coordinates

Vivaldi: Decentralized Network Coordinates

Large scale distributed systems can use round trip time estimates between peers to make intelligent decisions about request routing, data replication, and failure handling. Vivaldi is a distributed algorithm for computing network coordinates for a large set of peers efficiently. In this talk, we motivate the need for network coordinates and introduce the Vivaldi algorithm. We do a brief survey of interesting extensions and related work, both to understand how to use Vivaldi in the wild and to understand the source of errors in it's modeling. Lastly we talk about how Vivaldi is used in the Serf and Consul tools to solve user problems.

Armon Dadgar

August 18, 2016
Tweet

More Decks by Armon Dadgar

Other Decks in Technology

Transcript

  1. HASHICORP
    Vivaldi
    A Decentralized Network Coordinate System

    View full-size slide

  2. HASHICORP
    Armon Dadgar
    @armon

    View full-size slide

  3. HASHICORP
    Network Coordinates

    View full-size slide

  4. HASHICORP
    Euclidean Coordinates
    p1 = {x: 1, y: 2, z: 3}
    p2 = {x: 4, y: 5, z: 6}
    dist(p1, p2) = sqrt((p2.x-p1.x)^2
    + (p2.y-p1.y)^2
    + (p2.z-p1.z)^2)

    View full-size slide

  5. HASHICORP
    Euclidean Space
    Euclidean Distance defined in Euclidean Space
    Cartesian Coordinates {x, y, z} are Euclidean

    View full-size slide

  6. HASHICORP
    Network Space
    p1 = ipv4(1.2.3.4)
    p2 = ipv4(5.6.7.8)
    dist(p1, p2) = ?

    View full-size slide

  7. HASHICORP
    Network Space
    p1 = ipv4(1.2.3.4)
    p2 = ipv4(5.6.7.8)
    dist(p1, p2) = rtt(p1, p2)

    View full-size slide

  8. HASHICORP
    Network Distance?
    Peer
    Peer
    Seed
    Seed
    Peer
    Seed
    P2P Application

    View full-size slide

  9. HASHICORP
    Network Distance?
    Nearest Neighbor Routing
    Web Server
    API Server
    API Server
    API Server

    View full-size slide

  10. HASHICORP
    Network Distance?
    Datacenter Failover

    View full-size slide

  11. HASHICORP
    Network Space
    p1 = ipv4(1.2.3.4)
    p2 = ipv4(5.6.7.8)
    dist(p1, p2) = rtt(p1, p2)
    ping?

    View full-size slide

  12. HASHICORP
    Ping Problem
    Suppose you have 20K+ peers (BitTorrent)
    Pair-wise distance from {PeerN, PeerM} requires N2 Probes
    Samples = 3 Probes = 1.2B Storage = 9.6GB (double)

    View full-size slide

  13. HASHICORP
    Ping Representation
    Ping creates a matrix of pairwise latency
    dist(p1, p2) = rtt(p1, p2)
    rtt(p1, p2) = pairwise[p1][p2]

    View full-size slide

  14. HASHICORP
    Cartesian Representation
    Cartesian Coordinates allow us to exploit Pythagorean Theorem
    a2 + b2 = c2

    View full-size slide

  15. HASHICORP
    Vivaldi
    Decentralized Network Coordinates
    Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris

    View full-size slide

  16. HASHICORP
    Vivaldi
    Pairwise connect peers with a spring
    Spring’s natural length is the RTT
    Compress down all peers to the origin and then relax

    View full-size slide

  17. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer
    Peer

    View full-size slide

  18. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer

    View full-size slide

  19. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer

    View full-size slide

  20. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer

    View full-size slide

  21. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer

    View full-size slide

  22. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer
    Peer

    View full-size slide

  23. HASHICORP
    Vivaldi
    Coordinates provide predictive model
    Communication between nodes updates the model
    Coordinates converge over time

    View full-size slide

  24. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: 0, y: 0, z: 0}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = euclidean_dist(local,remote)
    err = rtt - estimate
    direction_of_err = unitVector(local - remote)
    scaled_direction = direction_of_err * err
    local = local + scaled_direction * sensitivity

    View full-size slide

  25. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: 0, y: 0, z: 0}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = 0msec
    err = rtt - estimate
    direction_of_err = unitVector(local - remote)
    scaled_direction = direction_of_err * err
    local = local + scaled_direction * sensitivity

    View full-size slide

  26. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: 0, y: 0, z: 0}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = 0msec
    err = 500msec
    direction_of_err = unitVector(local - remote)
    scaled_direction = direction_of_err * err
    local = local + scaled_direction * sensitivity

    View full-size slide

  27. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: 0, y: 0, z: 0}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = 0msec
    err = 500msec
    direction_of_err = {x: -0.1, y: 0.6, z: 0.8}
    scaled_direction = direction_of_err * err
    local = local + scaled_direction * sensitivity

    View full-size slide

  28. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: 0, y: 0, z: 0}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = 0msec
    err = 500msec
    direction_of_err = {x: -0.1, y: 0.6, z: 0.8}
    scaled_direction = {x: -50, y: 300, z: 400}
    local = local + scaled_direction * sensitivity

    View full-size slide

  29. HASHICORP
    Vivaldi
    Peer
    Peer
    Peer
    Peer Peer
    const sensitivity = 0.25
    var local = {x: -12.5, y: 75, z: 100}
    var remote = {x: 0, y: 0, z: 0}
    def update(rtt=500msec, remote):
    estimate = 0msec
    err = 500msec
    direction_of_err = {x: -0.1, y: 0.6, z: 0.8}
    scaled_direction = {x: -50, y: 300, z: 400}
    local = {x: -12.5, y: 75, z: 100}

    View full-size slide

  30. HASHICORP
    Vivaldi
    const sensitivity changes how rapidly we adjust
    Large value = fast to update, but unstable
    Small value = slow to converge, but stable
    Dynamic value?

    View full-size slide

  31. HASHICORP
    Vivaldi
    const error_sensitivity_adj = 0.25
    const position_sensitivity_adj = 0.25
    var local_err = 1000msec
    def update(rtt, remote, remote_err):

    balance_err = local_err / (local_err + remote_err)
    rel_err = (estimate - rtt) / rtt
    local_err = rel_err * error_sensitivity_adj * balance_err
    + local_err * (1-error_sensitivity_adj*balance_err)
    sensitivity = position_sensitivity_adj * balance_err
    local = local + scaled_direction * sensitivity

    View full-size slide

  32. HASHICORP
    Vivaldi
    const error_sensitivity_adj = 0.25
    const position_sensitivity_adj = 0.25
    var local_err = 1000msec
    def update(rtt, remote, remote_err):

    balance_err = local_err / (local_err + remote_err)
    rel_err = (estimate - rtt) / rtt
    local_err = rel_err * error_sensitivity_adj * balance_err
    + local_err * (1-error_sensitivity_adj*balance_err)
    sensitivity = position_sensitivity_adj * balance_err
    local = local + scaled_direction * sensitivity
    High Remote Error =>
    Low Sensitivity

    View full-size slide

  33. HASHICORP
    Vivaldi
    const error_sensitivity_adj = 0.25
    const position_sensitivity_adj = 0.25
    var local_err = 1000msec
    def update(rtt, remote, remote_err):

    balance_err = local_err / (local_err + remote_err)
    rel_err = (estimate - rtt) / rtt
    local_err = rel_err * error_sensitivity_adj * balance_err
    + local_err * (1-error_sensitivity_adj*balance_err)
    sensitivity = position_sensitivity_adj * balance_err
    local = local + scaled_direction * sensitivity
    High Local Error =>
    High Sensitivity

    View full-size slide

  34. HASHICORP
    Vivaldi
    Each node tracks position and error estimate
    Coordinate converges over time
    Local error goes does as estimates become accurate
    Several tuning parameters, including dimensionality

    View full-size slide

  35. HASHICORP
    Dimensionality
    Coordinates can be in any Euclidean Space
    2D, 3D, or N Dimensions?
    Principle Component Analysis (PCA) to reduce dimensions

    View full-size slide

  36. HASHICORP
    Dimensionality Reduction
    Time of Day Brightness Angle of Sun
    12PM Very Bright 90 degrees
    3PM Very Bright 80 degrees
    9PM Very Dark 0 degrees
    12AM Very Dark 0 degrees

    View full-size slide

  37. HASHICORP
    Dimensionality Reduction
    Time of Day Brightness Angle of Sun
    12PM Very Bright 90 degrees
    3PM Very Bright 80 degrees
    9PM Very Dark 0 degrees
    12AM Very Dark 0 degrees

    View full-size slide

  38. HASHICORP
    Dimensionality
    Performance dramatically reduced below 2D
    Marginal improvement past 5D
    Depends on the complexity of the underlying topology

    View full-size slide

  39. HASHICORP
    Height / Fixed Costs
    Application
    Userspace Runtime
    Operating System
    Hypervisor
    Network Card
    Fixed Cost
    0.5 msec

    View full-size slide

  40. HASHICORP
    Coordinate + Height
    Allows coordinates to model non-fixed latency
    Improves the predictive power of the coordinates
    Reduces the dimensionality required
    RTT = dist(p1, p2) + p1.Height + p2.Height

    View full-size slide

  41. HASHICORP
    Extensions to Vivaldi

    View full-size slide

  42. HASHICORP
    Network Coordinates in the Wild
    Azureus BitTorrent Client (10K+ clients)
    Dimensionality Analysis in the Wild
    Latency and Update Filters
    Churn, Drift, Intrinsic Error, Latency Variation
    Ledlie, Gardner, and Seltzer

    View full-size slide

  43. HASHICORP
    Drift
    Peer
    Peer
    Peer
    Peer
    Peer

    View full-size slide

  44. HASHICORP
    Drift
    Peer
    Peer
    Peer
    Peer
    Peer

    View full-size slide

  45. HASHICORP
    Gravity
    Applying small “gravity” toward origin
    Prevents run away coordinates
    Cluster can still “rotate” about the origin

    View full-size slide

  46. HASHICORP
    On Suitability of Euclidean Embedding for
    Host-based Network Coordinate Systems
    Lee, Zhang, Sahu, Saha
    Analysis of Triangle Inequality Violations (TIV) - Intrinsic Error
    Understanding source of TIV
    Adjustment factor to compensate
    7D < 2D + Adjustment

    View full-size slide

  47. HASHICORP
    Triangle Inequality Violation
    Server 1
    Server 2
    Server 3
    Core Router
    Top of Rack Switch Top of Rack Switch
    c < a + b
    Server 1 -> Server 2 : 0.1 msec
    Server 2 -> Server 3 : 0.3 msec
    Server 1 -> Server 3 : 0.3 msec
    Packet Processing Time > Transit Time

    View full-size slide

  48. HASHICORP
    TIV Adjustment
    Track the estimation error from measurement
    Adjustment is the average over a sample window
    Adjustment (local and remote) is added to estimates

    View full-size slide

  49. HASHICORP
    Serf Implementation

    View full-size slide

  50. HASHICORP
    Serf
    Serf is a decentralized solution for cluster
    membership, failure detection, and
    orchestration.
    Built on gossip protocol (SWIM)
    Runs at 10K+ node scale
    https://serf.io

    View full-size slide

  51. HASHICORP
    Serf
    Assign a coordinate to each node?
    Applications can leverage for intelligent routing,
    peer selection, etc
    Gossip is doing background communication

    View full-size slide

  52. HASHICORP
    Failure Detection
    Peer Peer
    Ping

    View full-size slide

  53. HASHICORP
    Failure Detection
    Peer Peer
    Ack

    View full-size slide

  54. HASHICORP
    Serf
    Attach Coordinate to Ack messages
    RTT computed from the send time of Ping
    Coordinates of peers cached
    Random peers avoid selection bias

    View full-size slide

  55. HASHICORP
    Serf
    Implementation uses 8D + Height
    20 Sample Adjustment Term
    3 Sample Latency Filter
    Small Gravity
    Coordinate Snapshotting

    View full-size slide

  56. HASHICORP
    Estimated n1 <-> n2 rtt: 0.610 ms
    demo  master $ serf rtt n1 n2
    demo  master
    Estimated n1 <-> n2 rtt: 0.610 ms
    $ serf rtt n2 # Running from n1

    View full-size slide

  57. HASHICORP
    Consul Usage

    View full-size slide

  58. HASHICORP
    Consul
    Consul is a solution for service discovery,
    monitoring, configuration and
    orchestration.
    Built on Serf + Raft (Paxos)
    Runs at 50K+ node scale
    https://consul.io

    View full-size slide

  59. HASHICORP
    Consul
    Serf is already computing coordinates
    Coordinates are periodically pushed to central servers
    Servers expose the coordinates over APIs
    Nearest neighbor routing, datacenter failover, etc.

    View full-size slide

  60. Terminal
    HASHICORP
    $ consul rtt node-10-0-1-8
    Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt:
    0.781 ms (using LAN coordinates)$
    $ sleep 30
    $ consul rtt node-10-0-1-8
    Estimated node-10-0-1-8 <-> node-10-0-1-6 rtt:
    0.719 ms (using LAN coordinates)

    View full-size slide

  61. Terminal
    HASHICORP
    $ curl localhost:8500/v1/catalog/nodes?
    near=node-78r16zb3q | jq '.[].Node'
    "node-78r16zb3q"
    "node-10-0-4-190"
    "node-10-0-1-7"
    "node-10-0-4-240"
    $ curl localhost:8500/v1/catalog/service/vault?
    near=node-78r16zb3q | jq '.[].Node'
    "node-10-0-1-71"
    "node-10-0-3-119"
    "node-10-0-3-249"

    View full-size slide

  62. HASHICORP
    Conclusion
    Vivaldi provides a decentralized algorithm for coordinates
    Networks not Euclidean, leads to TIV
    Interesting uses in distributed systems
    Serf and Consul expose via APIs

    View full-size slide

  63. HASHICORP
    Thanks!
    Q/A

    View full-size slide