Slide 1

Slide 1 text

The Missing Pieces of Amazon ECS (for me) Takumi Sakamoto 2016.09.21

Slide 2

Slide 2 text

Who am I? @takus Site Reliability Engineer in SmartNews

Slide 3

Slide 3 text

http://press.forkwell.com/post/150381780957/interview-takus

Slide 4

Slide 4 text

What is SmartNews? • News Discovery App for Mobile • Algorithm-driven article selection • 18M+ Downloads in World Wide https://www.smartnews.com/en/

Slide 5

Slide 5 text

How SmartNews uses Amazon ECS? (Digests ver.)

Slide 6

Slide 6 text

Amazon ECS in SmartNews • Running dozens of production services • origin server for CDN • internal API • web crawler • Provides internal deployment tool for developers • heroku-style CLI tool • service discovery with Consul • monitoring with Datadog • logging with Fluentd logging driver

Slide 7

Slide 7 text

Deploy w/Heroku-style CLI Amazon ECR $ spaas images:init --repository myapp $ docker build -t NAMESPACE/myapp:0.0.1 . $ docker push NAMESPACE/myapp:0.0.1 Amazon ECS $ spaas create --service myapp --image NAMESPACE/myapp --tag 0.0.1 $ spaas deploy --service myapp 0.0.2 $ spaas rollback --service myapp $ spaas ps:scale --service myapp 2 $ spaas ps:scale:cpu --service myapp 1024 $ spaas ps:scale:memory --service myapp 2048 $ spaas ps:role --service myapp MYAPP_IAM_ROLE $ spaas config:set SERVICE_TAGS=web

Slide 8

Slide 8 text

Monitoring w/ Datadog By Container Instance By ECS Task Family By ECS Task Revision

Slide 9

Slide 9 text

The Missing Pieces

Slide 10

Slide 10 text

1. Run a task on every instance • Similar to DaemonSet in Kubernetes • run a ECS task on every container instances • define RestartPolicy for each task • Use case • run a logs collection daemon (fluentd) • run a monitoring daemon (dd-agent) • run a cluster storage daemon (glusterd)

Slide 11

Slide 11 text

ECS Create Daemon API // Running dd-agent on every container instance in mycluster // Restart dd-agent on unexpected failure $ aws ecs create-daemon \ --cluster mycluster \ --daemon-name dd-agent \ --task-definition dd-agent:10 \ --restart-policy always

Slide 12

Slide 12 text

Workaround • Start an ECS task with a user data script • the task will not be restarted when it dies • hard to deploy new version • other tasks will be deployed without required daemon • Writing an ECS scheduler • yes, we can. but I want managed one

Slide 13

Slide 13 text

2. Mark an instance unschedulable • Similar to kubectl cordon in Kubernetes • instance will be marked unschedulable • new tasks will not be scheduled in this mode • Use case • container instance replacement for security updates • shrink cluster size

Slide 14

Slide 14 text

ECS Maintenance API // Mark instance as unschedulable $ aws ecs disable-container-instance \ --cluster mycluster \ --container-instance 04ad4550-f218-4d91-86f9-e8012b239261 // Wait until long running tasks finished // Do maintenance tasks or Terminate instance // Mark instance as schedulable $ aws ecs enable-container-instance \ --cluster mycluster \ --container-instance 04ad4550-f218-4d91-86f9-e8012b239261

Slide 15

Slide 15 text

Workaround • Terminate hook in AutoScaling • new tasks will be assigned while waiting current running tasks finished • SpotFleet doesn't support terminate hook

Slide 16

Slide 16 text

Summaries • The missing pieces for me • run a task on every container instance • mark an instance unschedulable • Hope these feature will be released soon