Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Release Engineering from the Ground Up

Release Engineering from the Ground Up

The Search Team at the New York Times manages multiple internal and external facing services, powering everything from Site Search to a public Semantic API. Each of these services are unique, comprising various programming languages, API servers, webservers, distributed databases, and so on.

Recently we undertook a complete revamp of our entire toolchain: migrating from SVN to GitHub, running and configuring a new build system, taking ownership over metrics and monitoring throughout the entire stack. Building out this toolchain from scratch afforded us the opportunity to carefully evaluate our needs and weigh the tradeoffs. One of our primary focuses was achieving continuous automated deployments.

Software engineer Tom Santero presented an overview of the process and tooling we selected, illustrating the path code travels from development to production, including serious deliberation over balancing time to production vs test coverage, and a discussion of the custom tooling we developed for collecting and displaying release metrics.

(Presented at USENIX Release Engineering Summit West '14.)

The New York Times Developers

November 10, 2014
Tweet

More Decks by The New York Times Developers

Other Decks in Programming

Transcript

  1. Listens for commits - builds on every push to any

    branch ! Run unit tests, reports build/test statistics ! If branch == master: - cut release as RPM - increment version number - push RPM to yum repo
  2. provisioning / termination ! release ver upgrades ! host system

    configuration - registration and discovery
  3. single repo: roles, tasks, files ! abstract out common tasks

    e.g. ElasticSearch, Riak, Jenkins ! parameterized per env + svc
  4. Jenkins: update release tag in Ansible repo ! Source of

    Truth? - correlate builds, releases and environments *
  5. nyt_lb* * naming is hard (also, too bad there’s no

    logo) service registration + discovery ! allow for load balancing internal + external traffic ! lightweight, robust, redundant ! scalable, highly-available
  6. RESTful API svc plugins: nginx, haproxy… in-memory db persistence &

    failure recovery distributed systems magic ! gossip + CRDTs
  7. nyt_lb nyt_lb nyt_lb all cluster state are CRDTs - node

    membership - registered services - service attributes
  8. nyt_lb nyt_lb nyt_lb quorum operations + gossip ! all state

    is monotonic & confluent ! new state converges
  9. nyt_lb nyt_lb nyt_lb upon provision and configuration, services register themselves

    ! take themselves out of LBs during upgrades; maintenance; destroy
  10. event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric'

    : 0.7, 'state' : ok, 'time' : 1413551091.341055, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description } event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric' : 3.2, 'state' : warning, 'time' : 1413551176.852009, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description }
  11. operational challenges and failures are a given isolate and identify

    root causes ! check logic belongs close to the thing monitored ! push events ; compute per grp/env + expectation
  12. Lessons Learned and Future(?) Work Lot of work; difficult tradeoff

    for low-barrier to entry + robust system ! Containers are nice, but ecosystem is still too immature ! Correlating application, system, build metrics still manual - maybe emit events from Jenkins —> Riemann —> Datomic - Push button re-deploys of point-in-time environments ! Historical performance metrics as automated regression testing ! Automated security auditing, static code analysis, etc..