Slide 1

Slide 1 text

ECS Events & Lambda ͰΧδϡΞϧʹ࢝ ΊΔίϯςφεέδϡʔϥʔ JAWS-UG Container ࢧ෦ #10 2017.12.12 (Tue) LT

Slide 2

Slide 2 text

whoami ኍ੉ ଠ࿠ / Taro Hirose ▸ OPENREC.tv / CyberZ, Inc. ▸ Backend Engineer ▸ id: @uorat ▸ http://uorat.hatenablog.com

Slide 3

Slide 3 text

ECS Events Introduction

Slide 4

Slide 4 text

ECS Event is Կ ECS Cluster ಺Ϧιʔεͷঢ়ଶมߋʹԠͯ͡௨஌͞ΕΔ CloudWatch Events ▸ ҎԼͷঢ়ଶมߋΠϕϯτΛडऔՄೳ ▸ Container Instance ▸ Task ▸ 2016.11.25 ։௨ ▸ Amazon ECSΠϕϯτετϦʔϜͰɺΫϥελͷঢ়ଶΛ؂ࢹ | Amazon Web Services ϒϩά ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/ Amazon ECS CloudWatch Events Lambda Event Stream Events SNS Kinesis

Slide 5

Slide 5 text

ECS Event is Կ e.g. Task ىಈ { "version": "0", "id": "451dda85-ca1a-9045-5121-7a12dfb9317f", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:28:04Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/b280d725-7382-43b8-a50d-ef909a36cb80" ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/ff83c4a8-67fc-4a13-8134-897c6dd2195a", ... "desiredStatus": "RUNNING", ... "lastStatus": "PENDING", ... "taskDefinitionArn": "arn:aws:ecs:ap-northeast-1:123456789012:task-definition/uorat-ecs-event-test:35", ... } } ECS Task Event

Slide 6

Slide 6 text

ECS Event is Կ Կ͕Ͱ͖Δͷʁ ▸ “ECSΠϕϯτετϦʔϜͰΫϥελͷঢ়ଶΛ؂ࢹ” ΑΓҾ༻ ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/ ▸ “͜ͷ৘ใΛ࢖ͬͯɺίϯςφͷ഑ஔͱεέʔϧΛࣗಈԽ͢Δ͜ͱ΋ՄೳͰɺΫϥελΛඇৗʹਫ਼ີͳϨ ϕϧͰ”ਖ਼͍͠αΠζ”ʹ͢Δ͜ͱ͕Ͱ͖·͢ɻϓϧܕͰ͸ͳ͘ΠϕϯτۦಈͰΫϥελͷঢ়ଶͷ৘ใΛ४ ϦΞϧλΠϜͰ഑ૹ͢Δ͜ͱʹΑΓɺECSΠϕϯτετϦʔϜػೳ͸ίϯςφΠϯϑϥͷ؂ࢹͱεέʔϧ ʹରͯ͠ඇৗʹ޿ൣғͳՄೳੑΛఏڙ͍ͯ͠·͢ɻ”

Slide 7

Slide 7 text

ECS Event is Կ ΠϕϯτۦಈͳλεΫ഑උ΍γεςϜ࿈ܞ ▸ ྫ͑͹ ▸ λεΫͷՔಈཤྺΛ Elasticsearch ΍ DynamoDB ʹอଘͯ͠ղੳ༻ʹ׆༻ ▸ λεΫ΍ίϯςφΠϯελϯεͷىಈ/ఀࢭ࣌ʹԿ͔͠ΒͷॲཧΛ࣮ߦ ▸ ؂ࢹγεςϜ࿈ܞ

Slide 8

Slide 8 text

ECS Event is Կ e.g. ▸ Container Scheduler for Amazon ECS ▸ re:Invent 2016 Ͱެ։͞Εͨ golang ੡ OSS ▸ ECS Cluster ༻ͷΧελϜεέδϡʔϥΛ࣮૷Մೳ ▸ ECS Cluster ͷΠϕϯτݕ஌ ▸ ECS Cluster ͷঢ়ଶ௥੻ ▸ ΧελϜεέδϡʔϥʔͷ࣮ߦ ▸ REST API ͷఏڙ

Slide 9

Slide 9 text

OPENREC.tv Case

Slide 10

Slide 10 text

Case: OPENREC.tv ήʔϜʹಛԽͨ͠ಈը഑৴ϝσΟΞ ▸ ௿஗Ԇɾߴը࣭ ▸ ίϯςϯπͷ9ׂ͸UGC ▸ ϢʔβʔओಋͷϥΠϒ഑৴͕த৺ ▸ ಉ࣌഑৴਺΋഑৴࣌ؒ΋഑৴ऀ࣍ୈ ▸ ू٬ྗ΋഑৴ऀ࣍ୈ ▸ ∴ ෛՙ͕ಡΈͮΒ͍ ▸ ಉ࣌ࢹௌऀ਺ ແ੍ݶ ▸ ͍ΘΏΔ “࿮” ͸ແ͍ ▸ શϢʔβʔ౳͘͠ϥΠϒࢹௌͰ͖Δ͜ͱ

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Architecture of live transcoding system CloudWatch Events (scheduled/1min, ECS Event Stream) + Lambda + API ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events

Slide 13

Slide 13 text

Architecture of live transcoding system ▸ EC2/ECS Auto Scaling ͸૬ੑ͕ѱ͍ ▸ RTMP = ৗ࣌઀ଓ ▸ ෛՙ͕௿ͯ͘΋഑৴͍ͯ͠Ε͹ॖୀͰ͖ͳ͍ ▸ “഑৴ঢ়گ” ͱ͍͏ಠࣗࢦඪʹج͍ͮͯ εέʔϧ ͤ͞Δεέδϡʔϥʔ͕ඞཁ ▸ Rolling Deploy ͷਏΈ ▸ ഑৴ऴྃ΋Ϣʔβʔ࣍ୈ ▸ ഑৴͕ऴΘΔ·Ͱجຊతʹམͱͤͳ͍ ▸ தʹ͸ 24 ࣌ؒ഑৴΋…

Slide 14

Slide 14 text

Architecture of live transcoding system Stateful application, but disposability ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events

Slide 15

Slide 15 text

Architecture of live transcoding system Expire >> Drain >> Stop Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Broadcaster Container Instance Task (Container)

Slide 16

Slide 16 text

Architecture of live transcoding system Expire >> Drain >> Stop Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance Task (Container)

Slide 17

Slide 17 text

Architecture of live transcoding system Expire >> Drain >> Stop Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance

Slide 18

Slide 18 text

Architecture of live transcoding system ഑৴਺/഑৴ෛՙʹԠͯ͡ Container ΍ Instance Λ AutoScale ▸ ϦϦʔε΋ `docker image push` ͢Ε͹ɺউखʹ৽Πϝʔδ͕ਁಁ͢Δ ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events

Slide 19

Slide 19 text

w/ ECS Events Case

Slide 20

Slide 20 text

Case1: Monitoring Tasks Container/Application ؂ࢹͷࣗಈઃఆ ▸ Task ͷঢ়ଶมԽʹԠͯ͡ ؂ࢹ ON/OFF 1. Task ։࢝ → JVM, Wowza ૚ͷ؂ࢹ։࢝ 2. Task ਖ਼ৗఀࢭ → ؂ࢹఀࢭ 3. Task ҟৗఀࢭ → ؂ࢹఀࢭͤͣΞϥʔτൃ๒ ▸ ؂ࢹγεςϜ͸طଘ Zabbix ࢖͍ճ͠ ▸ ্ͷ 1, 2 Ͱ Zabbix API Λίʔϧ ▸ ͋Γ΋ͷͳͷͰ্͕҆Γ

Slide 21

Slide 21 text

Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ { "version": "0", "id": "41f02974-8365-f955-8465-264ef8b189ca", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:10:36Z", "region": "ap-northeast-1", "resources": [ “arn:aws:ecs:ap-northeast-1:123456789012:task/55a0cfde-a377-4b97-…” ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", ... "lastStatus": “RUNNING”, ... "stoppedReason": "Task stopped by user", ... } } ECS Task Event ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask

Slide 22

Slide 22 text

Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ def handle(event, context): ... if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask Lambda function: main.py

Slide 23

Slide 23 text

Case2: Respawn failed tasks ҟৗऴྃͨ͠ Task Λୟ͖ى͜͢ ▸ ࣋ଓ઀ଓͷͨΊ഑৴தͷ Task ͸ཁٹग़ ▸ “StoppedReason” ͕ظ଴֎ͷ৔߹ TaskΛ Re-run ▸ ServiceTask Ͱ͸ͳ͘ RunTask Ώ͑ʹඞཁ ▸ ͦͷޙͷରԠʹඞཁͳॲཧΛ࣮ߦ ▸ ҟৗऴྃ͸Ξϥʔτཁൃ๒ɺ؂ࢹࣗಈແޮ͠ͳ͍ ▸ Өڹͷग़ͨ഑৴৘ใΛ಺෦޲͚ʹ௨஌ ▸ ௐࠪ/߃ٱରԠͷͨΊ Task / Instance ΛϩοΫ

Slide 24

Slide 24 text

Case2: Respawn failed tasks e.g. Task ҟৗऴྃ ECS Cluster Container Instance Container Instance Task (Container) { "version": "0", "id": "faeb52d8-e2ef-a726-655a-80f2373046b9", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:40:33Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/d56d76f1-eb2a-42e5-..." ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", … "lastStatus": "STOPPED", ... "stoppedReason": "Essential container in task exited", ... } } ECS Task Event

Slide 25

Slide 25 text

Case2: Respawn failed tasks e.g. Task ҟৗऴྃ def handle(event, context): ... if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) Lambda function: main.py ECS Cluster Container Instance Container Instance Task (Container) ecs:StartTask

Slide 26

Slide 26 text

StoppedReason ͷछྨ Documented ▸ ఀࢭ͞ΕͨλεΫͰͷΤϥʔͷ֬ೝ ▸ docs.aws.amazon.com/ja_jp/AmazonECS/latest/ developerguide/stopped-task-errors.html

Slide 27

Slide 27 text

Summary

Slide 28

Slide 28 text

Summary ͜Μͳوํʹ ͓͢͢Ί ECS Events & Lambda ▸ ECS ඪ४ͷ Task Placement ͩͱগ͠෺ ଍Γͳ͍ ▸ k8s ΍ Blox ͸ڇ౛͔΋ ▸ ҟৗऴྃͨ͠ Task Λٹग़͍ͨ͠ ▸ ΠϕϯτۦಈͰ͜·ΊʹϦιʔε੍ޚ͠ ͍ͨ

Slide 29

Slide 29 text

Summary How about Fargate ? ▸ ແࣄ Task Events ྲྀΕ·ͨ͠ AWS Fargate

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

RAGE Shadowverse World Grand Prix