• It's a really complicated graph of vendors, menus, menu items, schedules, recurring events, and orders for multiple products. • It's logistics. • It's curated, it's distributed. • We send a lot of email • It's a stock market model and needs to be scalable and reliable • If you don't get lunch, you're going to be pissed off.
adept to lots of tasks regarding administration and is largely topology dependent. • Hey look, OpsWorks, I know Chef, it sounds like a good idea. • Fast forward a few months...
I have lots of systems automation experience, it's my primary field. • I've done Chef consulting for world class organizations that specialize in Chef consulting.
auto scaling ability. 25 minutes. • QA gets nothing done other than wearing out mice and track pads. No real automation without writing Chef recipes for each case. • Six hours a day of my own time did not need to be spent on deployment support. • CI had no bearing on anything other than having to wear the fez of shame. • We needed a Godzilla plan
it. All these people talking... • "Everything sucks, but it looks like this sucks less" - Alex B. • Large organizations with complex problems • Almost all cases were logical progressions where people had built the majority of the components themselves and then realized they had Docker.
revision control. • Caches build steps for speed. • Use the same container with the same code for automated tests, QA tests, staging, production, and demos.
Docker build starts from a base image, you add your app, and end up with an image. • Tag the image and host it in a registry, public or private. • Pull and run it as required
GitHub hook hits Docker.io account • Docker autobuilds your container. • Upon successful container build, callback hits Ansible Tower. • Tower deploys your image and runs it with tests as an argument. • If test exit status is 0, Ansible continues with it's playbook.
machines and pulls your new staging image. • Half your instances are removed from your load balancer. • Stop old staging build containers • Start new ones • If the new ones start, Ansible continues and runs migrations
the load balancer and remove the old ones. • Continue to upgrade the old instances, if successful, add them back to the load balancer • If any single instance fails, Ansible will not add it back to the load balancer.
your container and run tests for all branches • If tests are green, Jenkins executes Ansible playbook • If successful for the following steps... • Said playbook tags container with the feature branch name and pushes it to a docker registry • Starts an EC2 AMI with docker preinstalled. • Launch your docker containers on that host with the appropriate tags • Update route53 DNS: feature-branch.dev.example.com • Ansible queues an email to QA with the new URL: hey, check this out!
exposure, ambassadors handle ports. • Git-push with release-tag • Triggers container build on jenkins, if tests green, pushed to S3 backed private registry • Build success, knife environment from file in repo to update release tag • trigger 1000’s of chef-client updates or wait for the cron run. • 1000’s of nodes deferred to S3 to pull new image
the ambassador and nginx/rails/old release are still running, start the new release container • Restart the ambassador to point to the new nginx/rails release. • If you’re happy, kill the old release. If you’re not, restart the ambassadors to point to the old release and kill the new one. • Try with HAProxy for actual zero downtime.
a dark art. Read the source, read the reference. • Read all the docs, then read them again. . • Come to know, initially hate, then love Phusion's docker image. • Use an OS that has AppArmor/SELinux
(includes AMI boot) • Only asset changes: 11 minutes • Asset + gemfile changes: 18 minutes • As many parallel builds as we want • All results include push to docker registry
(20+ minutes on OpsWorks) including instance creation. • 30 second staging deploys • Cross cloud deploys just as fast. • 3 deploys a day was: 3 x 1.5 hours x multiple team members. Now 3 x 15 minutes.
and dev environments that are clean and potentially work offline • Über fast CI: split your tests across X instances. • Process isolation and optimization = far cheaper EC2 instances per Rails app (Large -> medium)