Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Continuous Delivery at Shopify

John Arthorne
September 07, 2017

Continuous Delivery at Shopify

Talk for DevOps Ottawa Meetup, September 2017

John Arthorne

September 07, 2017
Tweet

More Decks by John Arthorne

Other Decks in Programming

Transcript

  1. 2 Data Center Host Web Server Load Balancers Host Job

    Server Host Web Server Hosts Web Servers Host Job Server Hosts Job Servers Host DB Standby The Internet Host DB Reader Load Balancers Host DB Writer Edge Router Edge Router Data Center Host Web Server Load Balancers Host Job Server Host Web Server Hosts Web Servers Host Job Server Hosts Job Servers Host DB Standby Host DB Reader Load Balancers Host DB Writer Edge Router Edge Router Shopify Architecture CDN
  2. More environments Dev Test Stage Prod App Code Parity +OS

    +Container +Hardware +Database +Middleware +Traffic Volume +Credentials
  3. Enter continuous delivery • You can’t be sure your code

    works until it is in production • Minimize time to production for all changes • Small batch sizes keep the risk low • Dark launches, beta flags, ...
  4. Shopify style continuous delivery • Code handoffs slow us down

    and hurt problem determination • Everyone in Shopify R&D can deploy • Everyone in Shopify R&D must deploy • Dedicated team to build the tools to enable everyone to ship their changes with confidence
  5. Continuous delivery culture • There is a higher level of

    chaos with CD • Every dev takes ownership of ensuring their change lands safely • Every dev needs access and permission to act • ATC role is very helpful for herding the chaos
  6. Mechanics of Shipping • Develop in a localhost environment •

    Push changes in a branch, make the test suite pass • Code review • Add to merge queue (or manual git merge) • Deploy to production (usually automatic) • Monitor/verify your changes
  7. Local Development • Big investment in tools to automate local

    dev setup • Ensure it is easy to set up an env locally that is as close as possible to production
  8. Value of code review • Extra eyes catch mistakes missed

    during development • Pushes code towards cultural/style norms • Shared understanding of code - reduced bus factor
  9. Getting things deployed: a pipeline built for speed Image Build

    Git Merge Automated Tests Deploy 5s 5m 5m 5m Goal: Merged to deployed in 15 minutes Pull Request
  10. Deploy speed: webscale It required some considerable feats of engineering

    to make this pipeline fast. Why is this important? • Less wasted time for developers • Faster time to a fix for merchants • Fewer changes per deploy, so it’s safer
  11. Batch Size vs Pipeline Speed 200 commits merged to shopify

    master on a busy day Commit every 2.4 minutes assuming 8 hour work day 3 minute deploy required for smallest batch size Builds have to keep getting faster to keep batch size down
  12. As soon as you merge, Pipa will start building 2

    Docker images, one for production, and one for the automated tests. Automated Tests Deploy Container Build Git Merge
  13. Buildkite will run the 70,000+ automated tests. If the test

    succeeded on your branch, they will likely succeed on master after merging as well. If not, the failure has to be investigated, and potentially your merge has to be reverted. Automated Tests Deploy Container Build Git Merge
  14. Buildkite Hosted build and test orchestration service Test agents run

    in parallel on our own GKE boxes Agents pull tests from Redis queue Ruby tests + Browser tests run with Selenium/Chrome 330 N1-standard-16 VMs 7000 Peak agents 73k Tests/Build
  15. Shipit automatically deploys code to production. Changes deployed in parallel

    across 4 data centres, ~800 servers, and 500,000+ merchants. Automated Tests Deploy Container Build Git Merge
  16. • Lock automatic deploys • Roll back to previously deployed

    version using shipit. • Revert change in Git • Always be communicating • ATC and incident response team standing by to help What if shit hits the fan?
  17. • It is impossible to simulate a production environment •

    Strive to keep environment differences to a minimum • Push smallest possible units of change to production continuously in order to validate code • Invest in tools to keep it flowing smoothly Summary