Upgrade to Pro — share decks privately, control downloads, hide ads and more …

etcd: Next Steps with the Cornerstone of Distributed Systems

Brandon Philips
August 23, 2016
290

etcd: Next Steps with the Cornerstone of Distributed Systems

Brandon Philips

August 23, 2016
Tweet

Transcript

  1. Next Steps with the Cornerstone of Distributed Systems Brandon Philips

    @brandonphilips | [email protected] Demo Code: http://goo.gl/R6Og3Y Free Stickers @ Podium!
  2. Motivation CoreOS cluster reboot lock - Decrement a semaphore key

    atomically - Reboot and wait... - After reboot increment the semaphore key
  3. Requirements Strong Consistency - mutual exclusive at any time for

    locking purpose Highly Available - resilient to single points of failure & network partitions Watchable - push configuration updates to application
  4. Common Problem Amazon - Replicated log for ec2 Microsoft -

    Boxwood for storage infrastructure Hadoop - ZooKeeper is the heart of the ecosystem
  5. History of etcd ◦ 2013.8 Alpha release (v0.x) ◦ 2015.2

    Stable release (v2.0+) ◦ stable replication engine (new Raft implementation) ◦ stable v2 API ◦ 2016.6 (v3.0+) ◦ efficient, powerful API ◦ highly scalable backend
  6. How does etcd work? • Raft consensus algorithm ◦ Using

    a replicated log to model a state machine ◦ "In Search of an Understandable Consensus Algorithm" (Ongaro, 2014) • Three key concepts ◦ Leaders ◦ Elections ◦ Terms
  7. How does etcd work? • The cluster elects a leader

    for every given term • All log appends (--> state machine changes) are decided by that leader and propagated to followers • Much much more at http://raft.github.io/
  8. How does etcd work? • Written in Go, statically linked

    • /bin/etcd ◦ daemon ◦ 2379 (client requests/HTTP + JSON API) ◦ 2380 (peer-to-peer/HTTP + protobuf) • /bin/etcdctl ◦ command line client • net/http, encoding/json, golang/protobuf, ...
  9. locksmith • cluster wide reboot lock ◦ "semaphore for reboots"

    • CoreOS updates happen automatically ◦ prevent all the machines restarting at once...
  10. Cluster Wide Reboot Lock • Need to reboot? Decrement the

    semaphore key (atomically) with etcd • manager.Reboot() and wait... • After reboot, increment the semaphore key in etcd (atomically)
  11. Canal Today • virtual (overlay) network for constrained envs •

    BGP for physical environments • Connection policies • Built for Kubernetes useful in other systems with CNI
  12. confd • simple configuration templating • for "dumb" applications •

    watch etcd for changes, render templates with new values, reload applications
  13. Reliability • 99% at small scale is easy ◦ Failure

    is infrequent and human manageable • 99% at large scale is not enough ◦ Not manageable by humans • 99.99% at large scale ◦ Reliable systems at bottom layer
  14. Write Ahead Log • Append only ◦ Simple is good

    • Rolling CRC protected ◦ Storage & OSes can be unreliable
  15. Snapshots • Torturing DBs for Fun and Profit (OSDI2014) ◦

    The simpler database is safer ◦ LMDB was the winner • Boltdb an append only B+Tree ◦ A simpler LMDB written in Go
  16. Testing Clusters Failure • Inject failures into running clusters •

    White box runtime checking ◦ Hash state of the system ◦ Progress of the system
  17. etcd/raft Reliability • Designed for testability and flexibility • Used

    by large scale db systems and others ◦ Cockroachdb, TiKV, Dgraph
  18. Training San Francisco September 13 & 14 New York City

    September 27 & 28 San Francisco October 11 & 12 New York City October 25 & 26 Seattle November 10 & 11 https://coreos.com/training
  19. Thank you! Brandon Philips @brandonphilips | [email protected] | coreos.com We’re

    hiring in all departments! Email: [email protected] Positions: coreos.com/ careers