DevOps Toronto 2016: Scaling out Continuous Delivery

Scaling out Continuous Delivery DevOpsDays Toronto 2016 John Arthorne @jarthorne
http://jarthorn.github.io/

Shopify Makes Software for Commerce

Key Platform Characteristics ~850 M Daily HTTP Traffic +150% rpm
Flash Sale Traffic Spikes <100 ms Storefront Response Time

Continuous Delivery at Shopify Shopify has a long running culture
of rapid delivery 2013: Capistrano deploy by dedicated ops team 2014: 3rd party CI, struggling with scalability and production fidelity 2015: Building out new generation deploy pipeline

A Pipeline Built for Speed and Scale Container Build Git
Push Automated Tests Deploy 5s 90s 200s 300s Commit to deployed in ~10 minutes Every developer can deploy to production Goals:

Why a Fast Pipeline is Important Less wait time for
developers Faster time to a fix for customers Continuous uptime requires many small changes which magnifies wait times But the #1 reason: keeping the batch size small

Batch Size vs Pipeline Speed 160 commits merged to shopify
master on a busy day Commit every 3 minutes assuming 8 hour work day 3 minute deploy required for smallest batch size Builds have to keep getting faster to keep batch size down

Why Small Batches are Important Decreased chance of failure in
a given batch Faster time to find root cause when deploy causes problems Forces optimization of release process Higher chance of clean rollback #1 reason: making developers feel invested in deploy process

Container Build 90s Commit to deployed in ~10 minutes Every
developer can deploy to production Automated Tests Deploy 200s 300s Git Push 5s Container Build Goals:

Locutus Locutus is a docker image building service for Shopify.
Locutus receives GitHub webhooks on each push. On each push, it pulls the new source, builds a container, and pushes it to our docker registry. It has a few levels of caching to make builds faster and deploys smaller.

Building Containers Building 1500 containers/day requires a lot of compute
(>3 per minute during work day) Container layers optimized for build and deploy time Built for commit to branch, then again for master

Container Build 90s Automated Tests Deploy 200s 300s Git Push
5s Testing Commit to deployed in ~10 minutes Every developer can deploy to production Goals:

Buildkite Hosted build and test orchestration service Test agents run
in parallel on our own EC2 boxes Agents pull tests from Redis queue Ruby tests + Browser tests run with Selenium/Chrome 102 C4.8xlarge VMs 1472 Peak agents 45k Tests/Build

Buildkite Details

Test Flow BuildKite GitHub Locutus Docker Registry BK Agent Web
Hook Wait for Container Container Ready Start Tests Fetch Container Tests Done Status Update Tests

Scrooge Monitors Buildkite agents Starts/stops EC2 nodes based on current
demand Ensure we only pay for the capacity we need Also determines level of parallelism for a given test run Tests are 48-192 way parallel

Container Build 90s Automated Tests Deploy 200s 300s Git Push
5s Deploy Commit to deployed in ~10 minutes Every developer can deploy to production Goals:

Deploy Flow Deploy kicked off using Capistrano Revision file dictates
which container to run Containers restarted using sv-rollout / runit Each node fetches its container from docker registry through local DC caching proxy Containers start on each node ready to run 40 Peak deploys/day 289 Machines deployed 1500+ Production Containers

ShipIt Open source deploy orchestration tool Allows humans to participate
in deployment process Provides easy visibility, rollback, locking Integrated with our Slack bot for ChatOps notifications and triggers

ShipIt Deploy View

Deploying Containers Containers can make terrible deployment vehicles if not
used carefully Docker can be flaky, need to be fault tolerant, retry, use canary container sv-rollout for container rollout, with canaries and parallel deploys Docker image caching close to production machines critical

DC’s sb290 Putting it all Together: Pipeline Architecture BuildKite GitHub
Amazon EC2 bk95 Locutus Docker Registry bk1 bk2 Redis Test Q loc1 loc7 loc2 Locutus Agents Buildkite Agents sb1 Docker Cache ShipIt

Summary Maintaining a fast deploy pipeline is a challenge as
a team scales up The Shopify deploy pipeline is heavily optimized for speed to keep deploy batches small Tests tend to scale well with a lot of parallelism and hardware thrown at it Fast container build and deployment is a major challenge and requires careful optimization

We’re Hiring ! 50+ open positions in a wide range
of disciplines Ottawa, Montreal, Toronto, Waterloo, Remote https://shopify.com/careers

Questions? DevOpsDays Toronto 2016 John Arthorne @jarthorne http://jarthorn.github.io/

DevOps Toronto 2016: Scaling out Continuous Del...

DevOps Toronto 2016: Scaling out Continuous Delivery

John Arthorne

More Decks by John Arthorne

Other Decks in Technology

Featured

Transcript

Scaling out Continuous Delivery DevOpsDays Toronto 2016 John Arthorne @jarthorne

Shopify Makes Software for Commerce

Key Platform Characteristics ~850 M Daily HTTP Traffic +150% rpm

Continuous Delivery at Shopify Shopify has a long running culture

A Pipeline Built for Speed and Scale Container Build Git

Why a Fast Pipeline is Important Less wait time for

Batch Size vs Pipeline Speed 160 commits merged to shopify

Why Small Batches are Important Decreased chance of failure in

Container Build 90s Commit to deployed in ~10 minutes Every

Locutus Locutus is a docker image building service for Shopify.

Building Containers Building 1500 containers/day requires a lot of compute

Container Build 90s Automated Tests Deploy 200s 300s Git Push

Buildkite Hosted build and test orchestration service Test agents run

Buildkite Details

Test Flow BuildKite GitHub Locutus Docker Registry BK Agent Web

Scrooge Monitors Buildkite agents Starts/stops EC2 nodes based on current

Container Build 90s Automated Tests Deploy 200s 300s Git Push

Deploy Flow Deploy kicked off using Capistrano Revision file dictates

ShipIt Open source deploy orchestration tool Allows humans to participate

ShipIt Deploy View

Deploying Containers Containers can make terrible deployment vehicles if not

DC’s sb290 Putting it all Together: Pipeline Architecture BuildKite GitHub

Summary Maintaining a fast deploy pipeline is a challenge as

We’re Hiring ! 50+ open positions in a wide range

Questions? DevOpsDays Toronto 2016 John Arthorne @jarthorne http://jarthorn.github.io/