This Talk • Saltside: A Curious Case • Our production infrastructure • Why we are the way we are (for better & worse) • Saltside’s future • How you can make a better future (for yourself and your team)
Quick Stats • 2 Product Development Offices. • Hundreds global employees, ~50 in PD • 1 Product, 4 Markets • 2 AWS Regions • SoA; 15+ production services • 5 production Languages • ~300 production containers • Mix of cross functional teams; 4 People in SRE Team (pssst. We need SRE!)
Q3 2014 • Rewrite entire product and infrastructure from scratch starts (Never do this; no seriously, never do this!) • Docker 1.0 Released • Introduce market specific infrastructure • Shift to SoA • I am the quasi-architect for new system
Why Docker? • Give engineers the freedom to choose best stack stack for the problem • Infrastructure standardization • dev & production parity. The development environment is make, docker, docker-compose, and few others
Rolling Our Own: Apollo • configuration file driven (similar to docker-compose) • Set’s horizontal & vertical scales per market/container • It’s dynamic CloudFormation all the way down • One EC2 instances runs one Docker container • Every EC2 instance is behind an ELB • Zero down time deploys via HAproxy • Instances poll S3 for which image/tag to use; change containers where appropriate
2 Years Later • 1 container per 1 EC2 instance is damn expensive • Things are stable and work • No blue/green, canary, or alternate deployment strategies • No indication of deploy status (started, failed, rollout percentage etc) • Time to bootstrap new services = Days • Time to each new employees apollo = Months • Sunset
Kubernetes PoC • One cluster per-region; multiple markets per cluster • Decrease costs • Increase reliability • Move to maintained and active open source project instead of end-of-life’d private internal tools • Increase velocity
Issues • Cannot do with apollo; only support fixed environment names. Also too slow and expensive • How to give client developers a functioning API platform to develop against? • How to give QA access to N number of service configurations? • How give engineers a place to experiment outside production? • How to save everyone from configuring 15+ services?
• sandbox create # create a new environment • sandbox sync # Pull image tags from production • sandbox reset # wipe data and boot everything • sandbox logs # Get logs from containers • sandbox tag foo-service bar # change tags • sandbox dev foo-service # Build images locally
Take Aways • Prefer Kubernetes/Mesos/DCOS instead of rolling your own • Prefer docker-compose over manual docker commands • Google Container Engine is the easiest way to get production ready container infrastructure • Package up your distributed application instead of making engineers manage it themselves • Include log and metrics systems in your budgets from t-zero • Create deployment APIs instead of CLIs • Distribute internal tools as Docker images • Prefer containerized workflows over host level dependencies • Prefer one Dockerfile per project • Prefer the official Docker images over internally maintained base images