Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EverybodyTalks.pdf

 EverybodyTalks.pdf

Sarah Christoff

August 27, 2019
Tweet

More Decks by Sarah Christoff

Other Decks in Technology

Transcript

  1. Who am I? - Hashidork at Hashicorp - My mom

    says I’m cool - Member of the Consulate
  2. “I’m sick of hardcoding IP addresses everywhere” - me, like

    all the time and Armon probably once maybe What was the problem?
  3. Everyone is the client. Everyone is the server. We are

    all one. We are all no one. DECENTRALIZED
  4. We focus on a weaker variant of group membership, where

    membership lists at different members need not be consistent across the group at the same (causal) point in time. Stronger guarantees could be provided by augmenting the membership sub-system, e.g. a virtually-synchronous style membership can be provided through a sequencer process that checkpoints the membership list periodically. However, unlike the weakly consistent problem, strongly consistent specifications might have fundamental scalability limitations - SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol Abhinandan Das, Indranil Gupta, Ashish Motivala
  5. © 2019 HashiCorp History • Constant load per member regardless

    of group size • Failure detection latency is independent of cluster size • Infection-style (Gossip) for membership updates ▪ Alive, Suspect, Dead Properties of SWIM
  6. © 2019 HashiCorp History What does SWIM give us? •

    Propagating membership updates ◦ Joining, Failing, Leaving • Failure Detection • Not Heartbeat Driven
  7. © 2019 HashiCorp History • Incarnation numbers are used to

    order messages ◦ Numbers start at 0 and increments when it receives information about it being suspected • Incarnation numbers are local, only the node can increment it’s own incarnation number • Incarnation numbers increment when a suspicion is received about ourselves but we are alive! Properties of SWIM
  8. “We built that as the Memberlist library, and tried to

    have that be as pure of an implementation as possible”
  9. - Member States: - Alive, Suspect, Dead - Indirect and

    Direct probes for liveness - Member updates piggyback on probes Memberlist Basics
  10. - Using TCP and UDP for direct probes - Anti-Entropy

    mechanism - Full state syncs over TCP periodically with random members - Separate messaging layer for member updates - Nodes will send out messages on their own for member updates periodically Memberlist Additions
  11. - Dynamic Fault Detector Timeouts (Self-Awareness) - Dynamic Suspicion Timeouts

    (Dogpile) - More Timely Refutation (Buddy System) Lifeguard: SWIM-ing with situational awareness
  12. © 2019 HashiCorp Lifeguard: Self Awareness - Prevents unhealthy nodes

    from sending inaccurate suspect messages to reduce network traffic until local conditions improve - Useful in times of high local resource utilization, network partitions - Awareness score increases if the node suspects it is resource constrained
  13. Lifeguard: Dogpile - New suspicions get longest time to respond,

    subsequent nodes get shorter timeframes. - Shortens suspicion timeout based of responses to reduce time failed node is in suspect state “We wait for the truth before we start spreading falsehoods” - Solomon Christoff, Golden Retriever
  14. Lifeguard: Buddy System - Prioritize notifying a member of a

    suspicion - Anytime we probe a node, indirect or directly, we’ll let them know we think they’re having problems - Expedited refuting false failures
  15. - Keyring used to encrypt communication between nodes. - Many

    keys be stored and to try and decrypt a message, similarly many keys can be used to encrypt a message. Encryption
  16. - Graceful Leave - Gives nodes the option to leave

    the cluster on their own behalf. - Snapshotting - Save state of a node - Network Coordinates - get Network Coordinates locally - KeyManager - Install, Uninstall keys - And much more! Serf Salesman: *slaps roof of Serf* “This bad boy has so many new functions”
  17. Custom Event Propagation - Run a shell script, exec a

    command, choose your own adventure - Can trigger off member events or be localized to a specific member
  18. Vivaldi Network Tomography System “Study of network’s internal characteristics using

    information from end point data” Uses Round Trip Time calculates distance between peers in a cluster ^ Vivaldi, irl.
  19. Lamport Clocks Leslie Lamport is back at it, in 1978

    Replaces incarnation numbers to keep messages ordered Logical clock that is event based
  20. © 2019 HashiCorp In the beginning Raft has to decide

    a leader. Each node will have a randomized timeout set 150ms 157ms 190ms 300ms 201ms Breakdown
  21. © 2019 HashiCorp The first node to reach the end

    of it’s timeout will request to be leader A node will typically reach the end of it’s timeout when it doesn’t get a message from the leader 0ms 7ms 40ms 150ms 51ms Vote for me please! Breakdown
  22. © 2019 HashiCorp The elected leader will send out health

    checks which will restart the other node’s timeouts. 51ms 150ms 165ms 40ms 150ms 51ms New phone, who dis?? Breakdown
  23. © 2019 HashiCorp Server can be in any of the

    three states at any given time: Follower Listening for heartbeats Candidate Polling for votes Leader Listening for incoming commands, sending out heartbeats to keep term alive
  24. © 2019 HashiCorp Breakdown Raft is divided into terms, where

    at most there is one leader per term. - Some terms can have no leaders “Terms identify obsolete information” - John Ousterhout - Leader’s log is seen as the truth, and is the most up to date log. Breakdown: Terms
  25. © 2019 HashiCorp Breakdown: Leader Election Timeout occurs after not

    receiving heartbeat from leader Request others to vote for you Becomes leader, send out heartbeats Somebody else becomes leader, become a follower Vote split, nobody wins. New term Breakdown: Leader Election
  26. © 2019 HashiCorp Candidates will deny a leader if their

    log has a higher term, higher index then the proposed-leaders log. 1 X = 3 1 X = 3 1 X = 3 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 2 Y = 8 2 Y = 8 INDEX Value INDEX Value Different color represents new term 2 Y = 8 3 Y = 8 3 N = 9 3 N = 9 3 N = 9 3 N = 9 Leader Election Vote for me please!
  27. © 2019 HashiCorp “Keeping the replicated log consistent is the

    job of the consensus algorithm.” - Raft is designed around the log. Servers with inconsistent logs will never get elected as leader - Normal operation of Raft will repair inconsistencies Breakdown: Log Replication
  28. © 2019 HashiCorp 1 X = 3 1 X =

    3 2 Y = 8 2 Y = 8 3 N = 9 3 N = 9 Logs must persist through crashes Any committed entry is safe to execute in state machines A committed entry is replicated on the majority of servers 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 4 P = 6 4 P = 6 4 P = 6 4 P = 6 4 P = 6 5 L = 0 5 L = 0 5 L = 0 5 L = 0 6 R = 7 6 R = 7 6 R = 7 7 Z = 6 7 Z = 6 Committed Entries Breakdown
  29. © 2019 HashiCorp Eventually Consistent, Cluster Membership, Failure Detection Strongly

    Consistent Service Discovery, Service Monitoring, K/V store
  30. - Strongly Consistent (via Raft) - Multiple Gossip pools to

    span datacenters - Key/Value Store - Service Discovery & Service Level Health Checks - Centralized API and UI Consul Basics
  31. Consul Vocabulary Agent There is an agent process that runs

    on every machine within a Consul cluster, it can be either a server or client. Server Typically a standalone instance that is involved in the Raft quorum and maintains state. Can communicate across datacenters. Client An agent that monitors an application* which is not apart of the Raft quorum and does not have state. Cannot communicate across datacenters.
  32. Service Configuration: Key/Value Store - K/V store is strongly consistent

    - Implemented by a “simple in-memory database” based off Radix Trees - hashicorp/go-memdb - Stored on Consul Servers however can be accessed by both agents (clients or servers)
  33. Network Coordinates: Implemented - Prepared Queries are rules or guidelines

    for Consul to follow. - Using the network coordinates we can provide failover for services based off geo location. - https://learn.hashicorp.com/consul/develo per-discovery/geo-failover
  34. Service Discovery & Service Monitoring - Stored service information the

    same in mem database as K/Vs, but different tables! - Health checks are configurable per service definition - HTTP - TCP - TTL - Docker - Script (Build your own!)
  35. © 2019 HashiCorp Service Discovery Service Monitoring Service Configuration Group

    Membership Failure Detection Decentralized Cluster Membership, Failure Detection, Network Coordinates