A Year with ECS

A Year With ECS Greg Poirier CTO - Opsee

Origin Story • Fifteen years of operations experience • Observability,
infrastructure architecture • Operations/Systems/Software engineer

Opsee • Effortless monitoring for your AWS environment • Reacts
to changes in your infrastructure • Monitoring that stays out of the way

Infrastructure: What do we want? • As little code as
possible • Reproducibility • Easy engineer onboarding • Automate all the things

Development Process • Open PR • Run build against PR
• Merge to master • Build master and publish artifacts • Deploy

Reproducibility • If it builds in CI, it should build
locally. • CI should not be a magical, mystical place that makes tests pass. • In case of emergency, break glass and deploy from my laptop to production with confidence.

I kind of like Docker.

Docker? • "Docker is great for local dev." • "I
don't do anything in production with Docker yet." • "I'm not sure Docker is ready for production."

Really?

One year of Docker and ECS later… • A year
of Docker in production • Started with CoreOS's Fleet • Migrated to ECS a year ago

Why Fleet? • It was there (we use CoreOS) •
Trivial to setup/use (already running) • It did a thing (deployed services to VMs)

What could go wrong?

Highly problematic. • Etcd outage -> cannot deploy • Etcd
and Docker share same disk space by default • Systemd • (Very) Bad default configurations

How do I even Docker? • Container fills up /var/lib/docker
• Node crashes • Naïve scheduling in Fleet causes containers to be scheduled to new nodes • Next node crashes…

Ffffffffffffffleet

We could fix Fleet, but… • Seed-stage startup • Must
focus solely on our own product • Just Make it Work Mode

What are we trying to do here?

Make Building and Deploying Easy • Commit • Push •
Merge • Deploy

Make building and deploying easy. • Commit • Push •
Merge • Done.

How do we do it?

Components we need. • Container scheduler • Continuous integration •
Automated deploys

Don't build what you don't sell.

What did we use? • Container scheduler - ECS •
Continuous integration - CircleCI • Automated deploys - Ansible

Allons-y!

How do we ECS? • Deploy ecs-agent with CoreOS cloud-config
• Service and Task definitions in YAML w/ Ansible • Ansible playbook to deploy Task definitions / trigger service updates

ECS on CoreOS w/ cloud-config • Cloud-config w/ EC2 instance
userdata • Add a systemd unit and configuration data • We do this with Ansible • https://coreos.com/os/docs/latest/booting- on-ecs.htmlECS in Ansible

ECS and Ansible • Task and service definitions in YAML
• Small Ansible libraries using boto3 for task/ service definition • Deploy with a simple playbook that creates a task definition and updates service

But not everything is totally automated…

It’s true. • Commit • Push • Merge • Push-button
deploy

What did we do Wrong? • Didn’t use CloudFormation •
Deployed 'latest' tag • Used default configurations • Poor management of service configuration material

If I could turn back time… • Tag Docker image
w/ Git SHA • Use CloudFormation: stack all the things • Remember that defaults are usually bad • Configuration…

What about configuration? • Configuration data stored in S3 with
KMS encryption • Startup script pulls config data from S3 • Sources the config data (all in env vars) • Starts the service

What to do instead? • Separate container for config. •
Export config file as a volume. • Mount config volume in service’s container.

Mostly, I am happy… Mostly.

What does ECS do well? • Easy to deploy •
“Transactional” changes to ECS cluster • ELB Integration • Keeps pace with Docker

Let ECS tell you its secrets. • Task-specific information •
Exposed ports • Service information • ELB name, Service name, desired count • Metrics • CPU Utilization, Memory utilization

Go a step further. • CloudWatch -> Lambda -> Autoscale
ECS Services • Compute nodes in ASG • CloudWatch on cluster utilization to scale up compute ASG

What is ECS missing? • IP-per-Container w/ ELB integration •
Persistent storage • Per-Task/Service IAM roles • Per-Task/Service security groups

IP per Container • Don’t have to worry about port
collisions. • Directly addressable containers are useful. • Plenty of private IP space to go around.

Least-Privileged Access • Every service has the same IAM privileges
as the service with the most IAM privileges. • This goes directly against AWS best practices.

I dreamed a dream…

What do I want, though? • I want to think
less about infrastructure. • No more “instances.” • I want a framework for building and deploying applications.

Questions?

Thank you! • Thanks AdvAWS and sponsors. • Thank you
for attending. • Greg Poirier - @grepory • Opsee - https://opsee.com - @GetOpsee

A Year with ECS

A Year with ECS

More Decks by Greg Poirier

Other Decks in Technology

Featured

Transcript