Release Engineering from the Ground Up

Release Engineering from the Ground Up Tom Santero @tsantero The
New York Times Company

Search http://nytimes.com

Listener Rules Engine Idx Mgr Asset Data Directory

An Experience Report releng

Continuous Automated Deploys

Single Target Environment

Repository Migrations

SVN GitHub

Master Feature - release by commit - commit

Listens for commits - builds on every push to any
branch ! Run unit tests, reports build/test statistics ! If branch == master: - cut release as RPM - increment version number - push RPM to yum repo

provisioning / termination ! release ver upgrades ! host system
conﬁguration - registration and discovery

single repo: roles, tasks, ﬁles ! abstract out common tasks
e.g. ElasticSearch, Riak, Jenkins ! parameterized per env + svc

Jenkins: update release tag in Ansible repo ! Source of
Truth? - correlate builds, releases and environments *

Load Balancer

nyt_lb* * naming is hard (also, too bad there’s no
logo) service registration + discovery ! allow for load balancing internal + external trafﬁc ! lightweight, robust, redundant ! scalable, highly-available

RESTful API svc plugins: nginx, haproxy… in-memory db persistence &
failure recovery distributed systems magic ! gossip + CRDTs

nyt_lb nyt_lb nyt_lb all cluster state are CRDTs - node
membership - registered services - service attributes

nyt_lb nyt_lb nyt_lb quorum operations + gossip ! all state
is monotonic & conﬂuent ! new state converges

nyt_lb nyt_lb nyt_lb upon provision and conﬁguration, services register themselves
! take themselves out of LBs during upgrades; maintenance; destroy

what’s up?

unique identiﬁers env-level tagging

event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric'
: 0.7, 'state' : ok, 'time' : 1413551091.341055, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description } event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric' : 3.2, 'state' : warning, 'time' : 1413551176.852009, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description }

operational challenges and failures are a given isolate and identify
root causes ! check logic belongs close to the thing monitored ! push events ; compute per grp/env + expectation

Graphite

build dev stg prd

Test Metrics System Metrics Event Metrics

what does a green test really mean, anyway?

maybe the build is red because we ﬁxed all the
bugs?

test coverage as actionable ! becomes a problem of categorization

which machines are working harder? ! do failures have a
pattern?

how often does X happen? ! logging, alerts: indicators

Lessons Learned and Future(?) Work Lot of work; diﬃcult tradeoﬀ
for low-barrier to entry + robust system ! Containers are nice, but ecosystem is still too immature ! Correlating application, system, build metrics still manual - maybe emit events from Jenkins —> Riemann —> Datomic - Push button re-deploys of point-in-time environments ! Historical performance metrics as automated regression testing ! Automated security auditing, static code analysis, etc..

Questions? Tom Santero @tsantero The New York Times Company 8D

Release Engineering from the Ground Up

Release Engineering from the Ground Up

Tom Santero

More Decks by Tom Santero

Other Decks in Programming

Featured

Transcript