Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pain-free Leader election with Consul.

Pain-free Leader election with Consul.

Electing a leader among a group of peer processes is a problem that needs to be solved time and again for distributed systems. This talk shows our solution for nodejs services; using Consul and a simple, event-based client library.

Avatar for Mattias Norlander

Mattias Norlander

September 12, 2016
Tweet

Other Decks in Programming

Transcript

  1. • Batch-type application deployed to several nodes. • Only one

    of the nodes should be active and do work • Should that node go down, one of the other nodes should take over. Our leader election needs
  2. What we had: Rabbit MQ • Hack based on exclusive

    subscriptions that someone copied from a blog post. • Works quite well until Rabbit MQ experienced a network partition
  3. What we wanted: Reliable, strong coordinator • Reliable (no downtime).

    • Highly available (distributed). • Async notifications.
  4. What we got: Consul 1. Has solid key value store

    with locking. Comes with recommended algorithm for doing leader election 2. Nodes agree on common key in KV store. (eg “/locks/myservice”) 3. Each node: a. Create session using session API b. Attempt to acquire lock using session id. i. OK => LEADERSHIP! ii. Fail => Blocking and wait for changes on key, re-attempt (a.) if “session” field is blank.
  5. Leader election using Consul Service node 1 Consul Service node

    2 Service node 3 Leader node just keeps session alive Others wait for changes using long polling
  6. exp-leader-election • Simple, event-based module for leader election with nodejs/Consul.

    • Let-it-crash philosophy - best used with process manager such as pm2. • Uses Consuls HTTP API https://www.npmjs.com/package/exp-leader-election
  7. Failure modes • Leader node crash ◦ session will timeout

    and other nodes will be notifed and can claim leadership. • Network problems ◦ “error” event will be raised causing leader to crash/stop doing work ◦ session will timeout so that other nodes will be notifed and can claim leadership once network is stable. • Consul crash ◦ “error” event will be raised causing client to crash/stop doing work
  8. Takeaways • Consul - good fit for our leader election

    needs. • HTTP based - stable but slow. Not for near-realtime use cases. • Guarantees only one leader elected at a time. • There will be gaps with zero leaders during recovery.
  9. Use it! • Open source • Field tested - used

    for all our services the past 7 months https://www.npmjs.com/package/exp-leader-election