Release Engineering from the Ground Up

7c4bac30ed2d3a9d346ced746b1d985d?s=47 Tom Santero
November 10, 2014

Release Engineering from the Ground Up

Slides from my talk at the USENIX Release Engineering Summit West '14: https://www.usenix.org/conference/ures14west/summit-program/presentation/santero

7c4bac30ed2d3a9d346ced746b1d985d?s=128

Tom Santero

November 10, 2014
Tweet

Transcript

  1. Release Engineering from the Ground Up Tom Santero @tsantero The

    New York Times Company
  2. None
  3. None
  4. None
  5. None
  6. None
  7. Search http://nytimes.com

  8. Listener Rules Engine Idx Mgr Asset Data Directory

  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. An Experience Report releng

  16. None
  17. Continuous Automated Deploys

  18. Single Target Environment

  19. Repository Migrations

  20. SVN GitHub

  21. Master Feature - release by commit - commit

  22. Listens for commits - builds on every push to any

    branch ! Run unit tests, reports build/test statistics ! If branch == master: - cut release as RPM - increment version number - push RPM to yum repo
  23. None
  24. provisioning / termination ! release ver upgrades ! host system

    configuration - registration and discovery
  25. single repo: roles, tasks, files ! abstract out common tasks

    e.g. ElasticSearch, Riak, Jenkins ! parameterized per env + svc
  26. Jenkins: update release tag in Ansible repo ! Source of

    Truth? - correlate builds, releases and environments *
  27. None
  28. Load Balancer

  29. nyt_lb* * naming is hard (also, too bad there’s no

    logo) service registration + discovery ! allow for load balancing internal + external traffic ! lightweight, robust, redundant ! scalable, highly-available
  30. RESTful API svc plugins: nginx, haproxy… in-memory db persistence &

    failure recovery distributed systems magic ! gossip + CRDTs
  31. nyt_lb nyt_lb nyt_lb all cluster state are CRDTs - node

    membership - registered services - service attributes
  32. nyt_lb nyt_lb nyt_lb quorum operations + gossip ! all state

    is monotonic & confluent ! new state converges
  33. nyt_lb nyt_lb nyt_lb upon provision and configuration, services register themselves

    ! take themselves out of LBs during upgrades; maintenance; destroy
  34. what’s up?

  35. unique identifiers env-level tagging

  36. event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric'

    : 0.7, 'state' : ok, 'time' : 1413551091.341055, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description } event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric' : 3.2, 'state' : warning, 'time' : 1413551176.852009, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description }
  37. operational challenges and failures are a given isolate and identify

    root causes ! check logic belongs close to the thing monitored ! push events ; compute per grp/env + expectation
  38. Graphite

  39. build dev stg prd

  40. Test Metrics System Metrics Event Metrics

  41. what does a green test really mean, anyway?

  42. maybe the build is red because we fixed all the

    bugs?
  43. test coverage as actionable ! becomes a problem of categorization

  44. which machines are working harder? ! do failures have a

    pattern?
  45. how often does X happen? ! logging, alerts: indicators

  46. Lessons Learned and Future(?) Work Lot of work; difficult tradeoff

    for low-barrier to entry + robust system ! Containers are nice, but ecosystem is still too immature ! Correlating application, system, build metrics still manual - maybe emit events from Jenkins —> Riemann —> Datomic - Push button re-deploys of point-in-time environments ! Historical performance metrics as automated regression testing ! Automated security auditing, static code analysis, etc..
  47. Questions? Tom Santero @tsantero The New York Times Company 8D