Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Missing Pieces of Amazon ECS (for me)

The Missing Pieces of Amazon ECS (for me)

Takumi Sakamoto

September 21, 2016

More Decks by Takumi Sakamoto

Other Decks in Programming


  1. The Missing Pieces of Amazon ECS (for me) Takumi Sakamoto

  2. Who am I? @takus Site Reliability Engineer in SmartNews

  3. http://press.forkwell.com/post/150381780957/interview-takus

  4. What is SmartNews? • News Discovery App for Mobile •

    Algorithm-driven article selection • 18M+ Downloads in World Wide https://www.smartnews.com/en/
  5. How SmartNews uses Amazon ECS? (Digests ver.)

  6. Amazon ECS in SmartNews • Running dozens of production services

    • origin server for CDN • internal API • web crawler • Provides internal deployment tool for developers • heroku-style CLI tool • service discovery with Consul • monitoring with Datadog • logging with Fluentd logging driver
  7. Deploy w/Heroku-style CLI Amazon ECR $ spaas images:init --repository myapp

    $ docker build -t NAMESPACE/myapp:0.0.1 . $ docker push NAMESPACE/myapp:0.0.1 Amazon ECS $ spaas create --service myapp --image NAMESPACE/myapp --tag 0.0.1 $ spaas deploy --service myapp 0.0.2 $ spaas rollback --service myapp $ spaas ps:scale --service myapp 2 $ spaas ps:scale:cpu --service myapp 1024 $ spaas ps:scale:memory --service myapp 2048 $ spaas ps:role --service myapp MYAPP_IAM_ROLE $ spaas config:set SERVICE_TAGS=web
  8. Monitoring w/ Datadog By Container Instance By ECS Task Family

    By ECS Task Revision
  9. The Missing Pieces

  10. 1. Run a task on every instance • Similar to

    DaemonSet in Kubernetes • run a ECS task on every container instances • define RestartPolicy for each task • Use case • run a logs collection daemon (fluentd) • run a monitoring daemon (dd-agent) • run a cluster storage daemon (glusterd)
  11. ECS Create Daemon API // Running dd-agent on every container

    instance in mycluster // Restart dd-agent on unexpected failure $ aws ecs create-daemon \ --cluster mycluster \ --daemon-name dd-agent \ --task-definition dd-agent:10 \ --restart-policy always
  12. Workaround • Start an ECS task with a user data

    script • the task will not be restarted when it dies • hard to deploy new version • other tasks will be deployed without required daemon • Writing an ECS scheduler • yes, we can. but I want managed one
  13. 2. Mark an instance unschedulable • Similar to kubectl cordon

    in Kubernetes • instance will be marked unschedulable • new tasks will not be scheduled in this mode • Use case • container instance replacement for security updates • shrink cluster size
  14. ECS Maintenance API // Mark instance as unschedulable $ aws

    ecs disable-container-instance \ --cluster mycluster \ --container-instance 04ad4550-f218-4d91-86f9-e8012b239261 // Wait until long running tasks finished // Do maintenance tasks or Terminate instance // Mark instance as schedulable $ aws ecs enable-container-instance \ --cluster mycluster \ --container-instance 04ad4550-f218-4d91-86f9-e8012b239261
  15. Workaround • Terminate hook in AutoScaling • new tasks will

    be assigned while waiting current running tasks finished • SpotFleet doesn't support terminate hook
  16. Summaries • The missing pieces for me • run a

    task on every container instance • mark an instance unschedulable • Hope these feature will be released soon