Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Development, Deployment & Collaboration at Etsy

Development, Deployment & Collaboration at Etsy

At Etsy about 150 engineers deploy a single monolithic application more than 60 times a day. This process of deploying small changesets continuously enables us to build up and release robust features and detect and fix bugs extremely fast. All while serving over a billion page views per month. Developing and deploying at such a high velocity however only works because product developers and designers, infrastructure and operations engineers and the security team work closely together. We have an extremely open culture of sharing (inside and outside the company) and make sure we run into as few surprises as possible by bringing everybody on the same page about changes.

In order to explain how we make this work at Etsy I will give details about how the general development process is laid out. A huge part of this is the setup of our development environment. Each engineer has their own VM which runs a slimmed down version of the Etsy stack. We use Chef to keep our infrastructure in sync and the developer VMs are no exception, they run the same cookbooks as the production infrastructure. This is paramount in making sure features are being developed in an environment as close to production as possible.

Our whole development process is wrapped into a tight feedback loop of which our CI cluster and our monitoring stack are the centerpiece. The CI system has two central tasks. One is to run the full suite of tests before deployment and smoker tests against staging and production. And the second one - which is much more resource intensive - is to provide a system for engineers to test their work in progress changes against the whole test suite with a single command line script. I will go into detail how our setup, which currently consists of about 250 Jenkins build slaves, enables quick feedback and how we continuously work on keeping it fast.

Once changes are in production, we have a big set of dashboards, log parsing and alerting tools to make sure we can detect regressions and bugs as fast as possible and fix them with the next deploy. In addition to providing a quick method to detect problems our myriad of dashboards also provide a way to quickly share the current state of etsy.com and enables us to have efficient and productive discussions within and across teams by sharing a simple URL in IRC. I will talk about how we use those tools every day and how everybody sits down and investigates what's going on in case of a faulty deploy and how we all learn from those incidents by sharing successes and failures openly.

At Etsy it is in every engineer's responsibility to deploy their changes themselves using Deployinator, a one button deployment system we have written and open sourced. This system is integrated into the company wide IRC network and serves as the canonical way to deploy changes and provides a set of features to gain confidence in the changeset that is about to go live. I will give insights into how the system works and has changed over time to accomodate use cases we saw for better communicating change and enabling people to have an efficient discussion and proper view of the current state when something doesn't go according to plan.

Continuous Deployment and the ongoing collaboration across teams in engineering and operations are the foundation of moving fast and iterating on products and features. We have a strong culture of taking responsibility and sharing knowledge, successes and failures to build a succesful and resilient engineering team. This talk will give deep insights into how we develop software at Etsy and what tools and processes we utilize to help us achieve our goals.

This is a revised version of my talk from QCon London March 2014

Daniel Schauenberg

June 19, 2014
Tweet

More Decks by Daniel Schauenberg

Other Decks in Technology

Transcript

  1. Development, Deployment
    and Collaboration at Etsy
    Daniel Schauenberg
    [email protected]
    @mrtazz

    View Slide

  2. View Slide

  3. @mrtazz
    Etsy Stats

    View Slide

  4. @mrtazz
    Etsy Stats

    View Slide

  5. @mrtazz
    Item by TheBackPackShoppe

    View Slide

  6. http://www.flickr.com/photos/brianglanz/1095706242

    View Slide

  7. avg 50 deploys/
    day

    View Slide

  8. avg n > m deploys/
    day

    View Slide

  9. How comfortable
    are you deploying
    a change right
    now?

    View Slide

  10. @mrtazz
    http://www.flickr.com/photos/renaissancechambara/2349811492
    small change

    View Slide

  11. Config
    Flags
    Item by RocajoStudio

    View Slide

  12. View Slide

  13. “If this is your first
    day at Etsy, you
    deploy the site”

    View Slide

  14. Developer VMs

    View Slide

  15. @mrtazz
    Developer VMs
    • KVM
    • Every engineer has one
    • Fully Chef’d with the Etsy Stack
    • Different sizes and Chef roles

    View Slide

  16. View Slide

  17. Continuous
    Integration

    View Slide

  18. View Slide

  19. @mrtazz
    Continuous Integration
    • Run set of tests before each deploy
    • Full QA suite
    • Princess/Production smoker tests
    • Try (yup, there is one)

    View Slide

  20. http://www.flickr.com/photos/egfocus/6962179321

    View Slide

  21. @mrtazz
    The Bobs
    • LXC virtualized hosts
    • 14/physical hosts
    • Spread over 3 SSDs
    • Most of them attached to try

    View Slide

  22. View Slide

  23. Item by decomodwalls

    View Slide

  24. Deployinator

    View Slide

  25. @mrtazz
    Deployinator
    • 2 Buttons, no ambiguity
    • Overview of current state of deploy
    • Links to Logwatcher and Dashboards
    • Easy to add stacks for new tools to deploy

    View Slide

  26. http://www.flickr.com/photos/jbgeronimi/6363087361

    View Slide

  27. View Slide

  28. Monitoring

    View Slide

  29. @mrtazz
    shouldigraphit.com

    View Slide

  30. @mrtazz
    Monitoring
    • Devs do their feature monitoring
    • Everybody can access all the graphs
    • Dashboard All The Things!
    • Stream All The Logs!

    View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. On Call

    View Slide

  35. If you are writing
    code, you are
    on-call

    View Slide

  36. @mrtazz
    On-Call Schedules
    • ops on-call
    • dev on-call
    • payments on-call
    • support on-call

    View Slide

  37. View Slide

  38. @mrtazz
    Dev On-Call
    • On-call for 3 days
    • All developers who are not in another
    rotation
    • L1 and L2 escalations
    • L1 if it’s your first time

    View Slide

  39. Incident Response

    View Slide

  40. @mrtazz
    Incident Response
    • “This graph looks funny”
    • “Hey I just got paged for elevated error rate
    after deploys”
    • “Supergrep is going crazy!!”

    View Slide

  41. Is the site down?

    View Slide

  42. View Slide

  43. #warroom

    View Slide

  44. @mrtazz
    #warroom
    • only outage related conversations
    • coordinate investigation, communication,
    countermeasures and monitoring
    • good place to lurk for new engineers

    View Slide

  45. Post Mortems

    View Slide

  46. blameless

    View Slide

  47. Everybody’s invited

    View Slide

  48. Learning Opportunity

    View Slide

  49. Summary

    View Slide

  50. @mrtazz
    Summary
    • These are things that work for *us*
    • Culture is an on-going effort
    • Share everything
    • Encourage learning/teaching

    View Slide

  51. @mrtazz
    Summary
    • Lunch ’n learns
    • DC visits
    • On-call for a day
    • Bootcamps/Senior rotations

    View Slide

  52. codeascraft.com

    etsy.com/codeascraft/talks

    etsy.github.com

    etsy.com/careers

    View Slide

  53. Questions?

    View Slide

  54. Development,
    Deployment and
    Collaboration at Etsy
    Daniel Schauenberg
    [email protected]

    View Slide