Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Year with ECS

A Year with ECS

When we first began evaluating solutions for managing containers, we knew very little about our options. We evaluated ECS, Fleet, and Kubernetes--eventually deciding to use Fleet. ECS had only just been released to general availability, and we were very unfamiliar with it. Kubernetes was significantly more complexity than we wanted to own. Ultimately, Fleet was the easiest option. Once we got to the double digits in services and had multiple developers launching and maintaining services, we found that Fleet was lacking in many ways, so we decided to reevaluate our options. Migrating to ECS allowed us to consolidate responsibility for managing containers in AWS--allowing us to focus on shipping features instead of building infrastructure tooling.

In this talk, we will explore the migration process, lessons learned, and what we plan to do to improve our internal developer experience for ECS.

Greg Poirier

May 10, 2016
Tweet

More Decks by Greg Poirier

Other Decks in Technology

Transcript

  1. Origin Story • Fifteen years of operations experience • Observability,

    infrastructure architecture • Operations/Systems/Software engineer
  2. Opsee • Effortless monitoring for your AWS environment • Reacts

    to changes in your infrastructure • Monitoring that stays out of the way
  3. Infrastructure: What do we want? • As little code as

    possible • Reproducibility • Easy engineer onboarding • Automate all the things
  4. Development Process • Open PR • Run build against PR

    • Merge to master • Build master and publish artifacts • Deploy
  5. Reproducibility • If it builds in CI, it should build

    locally. • CI should not be a magical, mystical place that makes tests pass. • In case of emergency, break glass and deploy from my laptop to production with confidence.
  6. Docker? • "Docker is great for local dev." • "I

    don't do anything in production with Docker yet." • "I'm not sure Docker is ready for production."
  7. One year of Docker and ECS later… • A year

    of Docker in production • Started with CoreOS's Fleet • Migrated to ECS a year ago
  8. Why Fleet? • It was there (we use CoreOS) •

    Trivial to setup/use (already running) • It did a thing (deployed services to VMs)
  9. Highly problematic. • Etcd outage -> cannot deploy • Etcd

    and Docker share same disk space by default • Systemd • (Very) Bad default configurations
  10. How do I even Docker? • Container fills up /var/lib/docker

    • Node crashes • Naïve scheduling in Fleet causes containers to be scheduled to new nodes • Next node crashes…
  11. We could fix Fleet, but… • Seed-stage startup • Must

    focus solely on our own product • Just Make it Work Mode
  12. What did we use? • Container scheduler - ECS •

    Continuous integration - CircleCI • Automated deploys - Ansible
  13. How do we ECS? • Deploy ecs-agent with CoreOS cloud-config

    • Service and Task definitions in YAML w/ Ansible • Ansible playbook to deploy Task definitions / trigger service updates
  14. ECS on CoreOS w/ cloud-config • Cloud-config w/ EC2 instance

    userdata • Add a systemd unit and configuration data • We do this with Ansible • https://coreos.com/os/docs/latest/booting- on-ecs.htmlECS in Ansible
  15. ECS and Ansible • Task and service definitions in YAML

    • Small Ansible libraries using boto3 for task/ service definition • Deploy with a simple playbook that creates a task definition and updates service
  16. What did we do Wrong? • Didn’t use CloudFormation •

    Deployed 'latest' tag • Used default configurations • Poor management of service configuration material
  17. If I could turn back time… • Tag Docker image

    w/ Git SHA • Use CloudFormation: stack all the things • Remember that defaults are usually bad • Configuration…
  18. What about configuration? • Configuration data stored in S3 with

    KMS encryption • Startup script pulls config data from S3 • Sources the config data (all in env vars) • Starts the service
  19. What to do instead? • Separate container for config. •

    Export config file as a volume. • Mount config volume in service’s container.
  20. What does ECS do well? • Easy to deploy •

    “Transactional” changes to ECS cluster • ELB Integration • Keeps pace with Docker
  21. Let ECS tell you its secrets. • Task-specific information •

    Exposed ports • Service information • ELB name, Service name, desired count • Metrics • CPU Utilization, Memory utilization
  22. Go a step further. • CloudWatch -> Lambda -> Autoscale

    ECS Services • Compute nodes in ASG • CloudWatch on cluster utilization to scale up compute ASG
  23. What is ECS missing? • IP-per-Container w/ ELB integration •

    Persistent storage • Per-Task/Service IAM roles • Per-Task/Service security groups
  24. IP per Container • Don’t have to worry about port

    collisions. • Directly addressable containers are useful. • Plenty of private IP space to go around.
  25. Least-Privileged Access • Every service has the same IAM privileges

    as the service with the most IAM privileges. • This goes directly against AWS best practices.
  26. What do I want, though? • I want to think

    less about infrastructure. • No more “instances.” • I want a framework for building and deploying applications.
  27. Thank you! • Thanks AdvAWS and sponsors. • Thank you

    for attending. • Greg Poirier - @grepory • Opsee - https://opsee.com - @GetOpsee