Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fearless Deployment

Fearless Deployment

Presented at 2016/06/28 at OpenCommerceConf.org, by Sean Schofield and Richard Lister (Spree Commerce).

Ric Lister

June 28, 2016
Tweet

More Decks by Ric Lister

Other Decks in Technology

Transcript

  1. The “Real World” • Differences between staging and production •

    Volume of data • Nature of data • Missing configuration
  2. Instability • Deployments cause most of the problems that impact

    customers • Code being deployed as well as the deployment itself • Risk increases over time • External sources of instability
  3. Going slow • Speed of development ◦ We don’t want

    stability at the expense of speed ◦ Whatever solution we come up with it will just slow us down • Intervals between deployments ◦ The longer we go between deploys, the more worried we are about the next one ◦ Migrations are more likely to fail ◦ We’re only making the problem worse by delaying our deployments
  4. Embracing the “Real World” • Two things keep us separated

    from the “Real World” ◦ Application behavior ◦ User behavior • Let’s figure out a way to eliminate those differences • No more surprises when we deploy!
  5. Use the stacks to go live • Each release is

    done as a self-contained “stack” • No more staging environment • No more RAILS_ENV • Think release candidate for your infrastructure • No more surprises based on real world data
  6. Stop separating the test data • DynamoDB is designed for

    massive amounts of data • Test data and live customer data can peacefully co-exist • Use a test attribute to identify our test records • Everything lives together in a single database!
  7. Stop using ActiveRecord • Learned things the hard way with

    Spree • Really slow when doing a lot of writes • Use Plain Old Ruby Objects (PORO) instead • All of our tables have the same structure ◦ store_id ◦ object_id ◦ object_value
  8. Protect the real world data • No database write access

    for developers • Only the store owner change their own data • No super admin • Impossible for developers to change data while testing • Ensure no real world side effects whenever we write data
  9. Complete copy of the database • Every stack has a

    complete database copy • Migrations are performed at the same time as copy • Shoryuken workers for multi-threaded processing • We can copy 500,000 records in under ten minutes
  10. Sync changes after the copy • Track changes since our

    bulk copy • DynamoDB streams to monitor these changes • New data is continuously migrated • Same migration logic as with bulk copy • No more migrations on release day!
  11. Ops Code as First Class Citizen • Infrastructure must be

    change-controlled and repeatable • Operations source-code is in same git repo as application code • Every release is tracked as a single SHA in Github • Check out a SHA to get a fully self-contained ops+app setup • We use AWS Cloudformation templates to describe all resources
  12. The stack contains everything we need • Networking • Load-balancers

    • Auto-scaling groups • Instance config • Permissions • Database
  13. Docker Containers • Provide a runnable application artifact • Dependency

    management ◦ System libraries ◦ Ruby + Gems ◦ Application code
  14. Docker Decouples Application from OS • Protect against changes in

    the underlying OS, which just provides: ◦ Kernel ◦ Docker daemon ◦ Systemd, to start containers • We are safer making OS updates ◦ Updates to system libraries do not affect application
  15. Amazon Machine Image • AMI provides a runnable server artifact

    ◦ We get the same artifact every time • What if Docker repository goes down? ◦ Create AMI with packer and bake in all docker images ◦ We’re happy to trade AMI build time for stability • What if Github or rubygems are down? ◦ Instance needs no external information to start app
  16. Auto Scaling • Stop caring about individual instances • Autoscaling

    replaces failed instances • We trust replacement because we do it all the time • Copy easily with changing load
  17. Release Procedure • Tag branch in git • Build docker

    container • Build AMI • Create stack • Copy data from production • Sync new data from production • Test, test, test • Update DNS • Delete old stack
  18. Immutable once we go live • New releases require a

    new stack • Emergency hotfixes require a new AMI • Instances are replaced, not modified • Once deployed nothing can be changed • There is no SSH
  19. Continuous Deployment for Developers • We deploy many times a

    day - just not to production ◦ Devs get a stack for each feature branch, with a full copy of production data ◦ Go crazy, break things, it will be entirely deleted when done • Docker lets us build image fast ◦ We don’t want to wait for a brand new AMI with each commit ◦ Write Dockerfile to use caching in a smart way • Dev stacks can be deployed by just replacing docker image
  20. Argus for Fast Docker Builds • Enqueue docker builds using

    SQS • Distributed workers for fast builds • Workers pre-pull existing image layers • This means all workers can use docker cache • Pushes image to AWS EC2 Container Registry github.com/rlister/argus
  21. Developer Deploys Are Fast • If the bundle is cached,

    docker build takes about 15 seconds • AWS SSM Run Command runs a canned script • Simply pulls latest docker image and restarts container • Access is controlled with IAM • Logs are in logstash
  22. Summary • All infrastructure and code is in the stack

    • The stack is immutable • We use stacks instead of a having a special staging environment • We use a complete copy of real world data in our stacks • We’re constantly deploying - just not to production • Production deploys are just updating the DNS to the new stack
  23. Resources • github.com/solnic/virtus - Ruby library for PORO • github.com/phstc/shoryuken

    - asynchronous Ruby workers with SQS • github.com/rlister/argus - fast Docker build and push to ECR • github.com/rlister/awful - Ruby library for common stack operations • github.com/seanedwards/cfer - Ruby DSL for Cloudformation templates • 12factor.net - guidelines for stateless software as a service