Development, Deployment
and Collaboration at Etsy
Daniel Schauenberg
dschauenberg@etsy.com
@mrtazz
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
@mrtazz
Slide 4
Slide 4 text
@mrtazz
Slide 5
Slide 5 text
@mrtazz
Item by TheBackPackShoppe
Slide 6
Slide 6 text
http://www.flickr.com/photos/brianglanz/1095706242
Slide 7
Slide 7 text
avg 50 deploys/
day
Slide 8
Slide 8 text
avg n > m deploys/
day
Slide 9
Slide 9 text
How comfortable
are you deploying
a change right
now?
Slide 10
Slide 10 text
@mrtazz
http://www.flickr.com/photos/renaissancechambara/2349811492
small change
Slide 11
Slide 11 text
Config
Flags
Item by RocajoStudio
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
“If this is your first
day at Etsy, you
deploy the site”
Slide 14
Slide 14 text
Developer VMs
Slide 15
Slide 15 text
@mrtazz
Developer VMs
• KVM
• Every engineer has one
• Fully Chef’d with the Etsy Stack
• Different sizes and Chef roles
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
Continuous
Integration
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
@mrtazz
Continuous Integration
• Run set of tests before each deploy
• Full QA suite
• Princess/Production smoker tests
• Try (yup, there is one)
Slide 20
Slide 20 text
http://www.flickr.com/photos/egfocus/6962179321
Slide 21
Slide 21 text
@mrtazz
The Bobs
• LXC virtualized hosts
• 14/physical hosts
• Spread over 3 SSDs
• Most of them attached to try
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
Item by decomodwalls
Slide 24
Slide 24 text
Deployinator
Slide 25
Slide 25 text
@mrtazz
Deployinator
• 2 Buttons, no ambiguity
• Overview of current state of deploy
• Links to Logwatcher and Dashboards
• Easy to add stacks for new tools to deploy
Slide 26
Slide 26 text
http://www.flickr.com/photos/jbgeronimi/6363087361
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
Monitoring
Slide 29
Slide 29 text
@mrtazz
shouldigraphit.com
Slide 30
Slide 30 text
@mrtazz
Monitoring
• Devs do their feature monitoring
• Everybody can access all the graphs
• Dashboard All The Things!
• Stream All The Logs!
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
On Call
Slide 35
Slide 35 text
If you are writing
code, you are
on-call
Slide 36
Slide 36 text
@mrtazz
On-Call Schedules
• ops on-call
• dev on-call
• payments on-call
• support on-call
Slide 37
Slide 37 text
@mrtazz
Dev On-Call
• Scheduled for 6 months
• On-call roughly every 4 weeks for 1 week
• L1 and L2 escalations
• L1 if it’s your first time
Slide 38
Slide 38 text
Incident Response
Slide 39
Slide 39 text
@mrtazz
Incident Response
• “This graph looks funny”
• “Hey I just got paged for elevated error rate
after deploys”
• “Supergrep is going crazy!!”
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
#warroom
Slide 42
Slide 42 text
@mrtazz
#warroom
• only outage related conversations
• coordinate investigation, communication,
countermeasures and monitoring
• good place to lurk for new engineers
Slide 43
Slide 43 text
Post Mortems
Slide 44
Slide 44 text
blameless
Slide 45
Slide 45 text
Everybody’s invited
Slide 46
Slide 46 text
Learning Opportunity
Slide 47
Slide 47 text
Summary
Slide 48
Slide 48 text
@mrtazz
Summary
• These are things that work for *us*
• Culture is an on-going effort
• Share everything
• Encourage learning/teaching
Slide 49
Slide 49 text
@mrtazz
Summary
• Lunch ’n learns
• DC visits
• On-call for a day
• Bootcamps/Senior rotations