Origin Story
• Fifteen years of operations experience
• Observability, infrastructure architecture
• Operations/Systems/Software engineer
Slide 3
Slide 3 text
Opsee
• Effortless monitoring for your AWS
environment
• Reacts to changes in your infrastructure
• Monitoring that stays out of the way
Slide 4
Slide 4 text
Infrastructure: What do we want?
• As little code as possible
• Reproducibility
• Easy engineer onboarding
• Automate all the things
Slide 5
Slide 5 text
Development Process
• Open PR
• Run build against PR
• Merge to master
• Build master and publish artifacts
• Deploy
Slide 6
Slide 6 text
Reproducibility
• If it builds in CI, it should build locally.
• CI should not be a magical, mystical place that
makes tests pass.
• In case of emergency, break glass and deploy
from my laptop to production with confidence.
Slide 7
Slide 7 text
I kind of like Docker.
Slide 8
Slide 8 text
Docker?
• "Docker is great for local dev."
• "I don't do anything in production with Docker
yet."
• "I'm not sure Docker is ready for production."
Slide 9
Slide 9 text
Really?
Slide 10
Slide 10 text
One year of Docker and ECS later…
• A year of Docker in production
• Started with CoreOS's Fleet
• Migrated to ECS a year ago
Slide 11
Slide 11 text
Why Fleet?
• It was there (we use CoreOS)
• Trivial to setup/use (already running)
• It did a thing (deployed services to VMs)
Slide 12
Slide 12 text
What could go wrong?
Slide 13
Slide 13 text
Highly problematic.
• Etcd outage -> cannot deploy
• Etcd and Docker share same disk space by
default
• Systemd
• (Very) Bad default configurations
Slide 14
Slide 14 text
How do I even Docker?
• Container fills up /var/lib/docker
• Node crashes
• Naïve scheduling in Fleet causes containers to
be scheduled to new nodes
• Next node crashes…
Slide 15
Slide 15 text
Ffffffffffffffleet
Slide 16
Slide 16 text
We could fix Fleet, but…
• Seed-stage startup
• Must focus solely on our own product
• Just Make it Work Mode
Slide 17
Slide 17 text
What are we trying to do here?
Slide 18
Slide 18 text
Make Building and Deploying Easy
• Commit
• Push
• Merge
• Deploy
Slide 19
Slide 19 text
Make building and deploying easy.
• Commit
• Push
• Merge
• Done.
What did we use?
• Container scheduler - ECS
• Continuous integration - CircleCI
• Automated deploys - Ansible
Slide 24
Slide 24 text
Allons-y!
Slide 25
Slide 25 text
How do we ECS?
• Deploy ecs-agent with CoreOS cloud-config
• Service and Task definitions in YAML w/
Ansible
• Ansible playbook to deploy Task definitions /
trigger service updates
Slide 26
Slide 26 text
ECS on CoreOS w/ cloud-config
• Cloud-config w/ EC2 instance userdata
• Add a systemd unit and configuration data
• We do this with Ansible
• https://coreos.com/os/docs/latest/booting-
on-ecs.htmlECS in Ansible
Slide 27
Slide 27 text
ECS and Ansible
• Task and service definitions in YAML
• Small Ansible libraries using boto3 for task/
service definition
• Deploy with a simple playbook that creates a
task definition and updates service
What did we do Wrong?
• Didn’t use CloudFormation
• Deployed 'latest' tag
• Used default configurations
• Poor management of service configuration
material
Slide 31
Slide 31 text
If I could turn back time…
• Tag Docker image w/ Git SHA
• Use CloudFormation: stack all the things
• Remember that defaults are usually bad
• Configuration…
Slide 32
Slide 32 text
What about configuration?
• Configuration data stored in S3 with KMS
encryption
• Startup script pulls config data from S3
• Sources the config data (all in env vars)
• Starts the service
Slide 33
Slide 33 text
What to do instead?
• Separate container for config.
• Export config file as a volume.
• Mount config volume in service’s container.
Slide 34
Slide 34 text
Mostly, I am happy… Mostly.
Slide 35
Slide 35 text
What does ECS do well?
• Easy to deploy
• “Transactional” changes to ECS cluster
• ELB Integration
• Keeps pace with Docker
Slide 36
Slide 36 text
Let ECS tell you its secrets.
• Task-specific information
• Exposed ports
• Service information
• ELB name, Service name, desired count
• Metrics
• CPU Utilization, Memory utilization
Slide 37
Slide 37 text
Go a step further.
• CloudWatch -> Lambda -> Autoscale ECS
Services
• Compute nodes in ASG
• CloudWatch on cluster utilization to scale up
compute ASG
Slide 38
Slide 38 text
What is ECS missing?
• IP-per-Container w/ ELB integration
• Persistent storage
• Per-Task/Service IAM roles
• Per-Task/Service security groups
Slide 39
Slide 39 text
IP per Container
• Don’t have to worry about port collisions.
• Directly addressable containers are useful.
• Plenty of private IP space to go around.
Slide 40
Slide 40 text
Least-Privileged Access
• Every service has the same IAM privileges as
the service with the most IAM privileges.
• This goes directly against AWS best practices.
Slide 41
Slide 41 text
I dreamed a dream…
Slide 42
Slide 42 text
What do I want, though?
• I want to think less about infrastructure.
• No more “instances.”
• I want a framework for building and deploying
applications.
Slide 43
Slide 43 text
Questions?
Slide 44
Slide 44 text
Thank you!
• Thanks AdvAWS and sponsors.
• Thank you for attending.
• Greg Poirier - @grepory
• Opsee - https://opsee.com - @GetOpsee