Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
ECS Events & Lambda でカジュアルにはじめるコンテナスケジューラー / 20171212_jawsug-container-lt
Taro Hirose
December 12, 2017
Technology
2
800
ECS Events & Lambda でカジュアルにはじめるコンテナスケジューラー / 20171212_jawsug-container-lt
JAWS-UG コンテナ支部 #10 - connpass
https://jawsug-container.connpass.com/event/71130/
Taro Hirose
December 12, 2017
Tweet
Share
More Decks by Taro Hirose
See All by Taro Hirose
uorat
1
41
uorat
5
3.6k
uorat
0
1.7k
uorat
2
1.4k
uorat
0
900
uorat
0
3.5k
uorat
1
130
Other Decks in Technology
See All in Technology
oracle4engineer
1
220
papix
0
120
masakick
0
120
ytaka23
0
440
sakon310
4
4.3k
ishiayaya
PRO
0
340
hecateball
1
12k
unifa_dev
0
380
ymas0315
0
160
tzkoba
0
390
clustervr
0
200
sumi
0
440
Featured
See All Featured
maggiecrowley
8
400
lara
16
2.6k
colly
186
14k
jakevdp
774
200k
holman
448
130k
colly
66
3k
robhawkes
52
2.8k
kneath
219
15k
afnizarnur
176
14k
hannesfritz
27
930
tammielis
237
23k
62gerente
587
200k
Transcript
ECS Events & Lambda ͰΧδϡΞϧʹ࢝ ΊΔίϯςφεέδϡʔϥʔ JAWS-UG Container ࢧ෦ #10
2017.12.12 (Tue) LT
whoami ኍ ଠ / Taro Hirose ▸ OPENREC.tv / CyberZ,
Inc. ▸ Backend Engineer ▸ id: @uorat ▸ http://uorat.hatenablog.com
ECS Events Introduction
ECS Event is Կ ECS Cluster Ϧιʔεͷঢ়ଶมߋʹԠͯ͡௨͞ΕΔ CloudWatch Events ▸
ҎԼͷঢ়ଶมߋΠϕϯτΛडऔՄೳ ▸ Container Instance ▸ Task ▸ 2016.11.25 ։௨ ▸ Amazon ECSΠϕϯτετϦʔϜͰɺΫϥελͷঢ়ଶΛࢹ | Amazon Web Services ϒϩά ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/ Amazon ECS CloudWatch Events Lambda Event Stream Events SNS Kinesis
ECS Event is Կ e.g. Task ىಈ { "version": "0",
"id": "451dda85-ca1a-9045-5121-7a12dfb9317f", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:28:04Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/b280d725-7382-43b8-a50d-ef909a36cb80" ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/ff83c4a8-67fc-4a13-8134-897c6dd2195a", ... "desiredStatus": "RUNNING", ... "lastStatus": "PENDING", ... "taskDefinitionArn": "arn:aws:ecs:ap-northeast-1:123456789012:task-definition/uorat-ecs-event-test:35", ... } } ECS Task Event
ECS Event is Կ Կ͕Ͱ͖Δͷʁ ▸ “ECSΠϕϯτετϦʔϜͰΫϥελͷঢ়ଶΛࢹ” ΑΓҾ༻ ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/
▸ “͜ͷใΛͬͯɺίϯςφͷஔͱεέʔϧΛࣗಈԽ͢Δ͜ͱՄೳͰɺΫϥελΛඇৗʹਫ਼ີͳϨ ϕϧͰ”ਖ਼͍͠αΠζ”ʹ͢Δ͜ͱ͕Ͱ͖·͢ɻϓϧܕͰͳ͘ΠϕϯτۦಈͰΫϥελͷঢ়ଶͷใΛ४ ϦΞϧλΠϜͰૹ͢Δ͜ͱʹΑΓɺECSΠϕϯτετϦʔϜػೳίϯςφΠϯϑϥͷࢹͱεέʔϧ ʹରͯ͠ඇৗʹൣғͳՄೳੑΛఏڙ͍ͯ͠·͢ɻ”
ECS Event is Կ ΠϕϯτۦಈͳλεΫඋγεςϜ࿈ܞ ▸ ྫ͑ ▸ λεΫͷՔಈཤྺΛ Elasticsearch
DynamoDB ʹอଘͯ͠ղੳ༻ʹ׆༻ ▸ λεΫίϯςφΠϯελϯεͷىಈ/ఀࢭ࣌ʹԿ͔͠ΒͷॲཧΛ࣮ߦ ▸ ࢹγεςϜ࿈ܞ
ECS Event is Կ e.g. ▸ Container Scheduler for Amazon
ECS ▸ re:Invent 2016 Ͱެ։͞Εͨ golang OSS ▸ ECS Cluster ༻ͷΧελϜεέδϡʔϥΛ࣮Մೳ ▸ ECS Cluster ͷΠϕϯτݕ ▸ ECS Cluster ͷঢ়ଶ ▸ ΧελϜεέδϡʔϥʔͷ࣮ߦ ▸ REST API ͷఏڙ
OPENREC.tv Case
Case: OPENREC.tv ήʔϜʹಛԽͨ͠ಈը৴ϝσΟΞ ▸ Ԇɾߴը࣭ ▸ ίϯςϯπͷ9ׂUGC ▸ ϢʔβʔओಋͷϥΠϒ৴͕த৺ ▸
ಉ࣌৴৴࣌ؒ৴ऀ࣍ୈ ▸ ू٬ྗ৴ऀ࣍ୈ ▸ ∴ ෛՙ͕ಡΈͮΒ͍ ▸ ಉ࣌ࢹௌऀ ແ੍ݶ ▸ ͍ΘΏΔ “” ແ͍ ▸ શϢʔβʔ͘͠ϥΠϒࢹௌͰ͖Δ͜ͱ
None
Architecture of live transcoding system CloudWatch Events (scheduled/1min, ECS Event
Stream) + Lambda + API ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
Architecture of live transcoding system ▸ EC2/ECS Auto Scaling ૬ੑ͕ѱ͍
▸ RTMP = ৗ࣌ଓ ▸ ෛՙ͕ͯ͘৴͍ͯ͠ΕॖୀͰ͖ͳ͍ ▸ “৴ঢ়گ” ͱ͍͏ಠࣗࢦඪʹج͍ͮͯ εέʔϧ ͤ͞Δεέδϡʔϥʔ͕ඞཁ ▸ Rolling Deploy ͷਏΈ ▸ ৴ऴྃϢʔβʔ࣍ୈ ▸ ৴͕ऴΘΔ·Ͱجຊతʹམͱͤͳ͍ ▸ தʹ 24 ࣌ؒ৴…
Architecture of live transcoding system Stateful application, but disposability ECS
Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
Architecture of live transcoding system Expire >> Drain >> Stop
Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Broadcaster Container Instance Task (Container)
Architecture of live transcoding system Expire >> Drain >> Stop
Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance Task (Container)
Architecture of live transcoding system Expire >> Drain >> Stop
Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance
Architecture of live transcoding system ৴/৴ෛՙʹԠͯ͡ Container Instance Λ
AutoScale ▸ ϦϦʔε `docker image push` ͢Εɺউखʹ৽Πϝʔδ͕ਁಁ͢Δ ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
w/ ECS Events Case
Case1: Monitoring Tasks Container/Application ࢹͷࣗಈઃఆ ▸ Task ͷঢ়ଶมԽʹԠͯ͡ ࢹ ON/OFF
1. Task ։࢝ → JVM, Wowza ͷࢹ։࢝ 2. Task ਖ਼ৗఀࢭ → ࢹఀࢭ 3. Task ҟৗఀࢭ → ࢹఀࢭͤͣΞϥʔτൃ๒ ▸ ࢹγεςϜطଘ Zabbix ͍ճ͠ ▸ ্ͷ 1, 2 Ͱ Zabbix API Λίʔϧ ▸ ͋ΓͷͳͷͰ্͕҆Γ
Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ { "version": "0", "id":
"41f02974-8365-f955-8465-264ef8b189ca", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:10:36Z", "region": "ap-northeast-1", "resources": [ “arn:aws:ecs:ap-northeast-1:123456789012:task/55a0cfde-a377-4b97-…” ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", ... "lastStatus": “RUNNING”, ... "stoppedReason": "Task stopped by user", ... } } ECS Task Event ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask
Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ def handle(event, context): ...
if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask Lambda function: main.py
Case2: Respawn failed tasks ҟৗऴྃͨ͠ Task Λୟ͖ى͜͢ ▸ ࣋ଓଓͷͨΊ৴தͷ Task
ཁٹग़ ▸ “StoppedReason” ͕ظ֎ͷ߹ TaskΛ Re-run ▸ ServiceTask Ͱͳ͘ RunTask Ώ͑ʹඞཁ ▸ ͦͷޙͷରԠʹඞཁͳॲཧΛ࣮ߦ ▸ ҟৗऴྃΞϥʔτཁൃ๒ɺࢹࣗಈແޮ͠ͳ͍ ▸ Өڹͷग़ͨ৴ใΛ෦͚ʹ௨ ▸ ௐࠪ/߃ٱରԠͷͨΊ Task / Instance ΛϩοΫ
Case2: Respawn failed tasks e.g. Task ҟৗऴྃ ECS Cluster Container
Instance Container Instance Task (Container) { "version": "0", "id": "faeb52d8-e2ef-a726-655a-80f2373046b9", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:40:33Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/d56d76f1-eb2a-42e5-..." ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", … "lastStatus": "STOPPED", ... "stoppedReason": "Essential container in task exited", ... } } ECS Task Event
Case2: Respawn failed tasks e.g. Task ҟৗऴྃ def handle(event, context):
... if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) Lambda function: main.py ECS Cluster Container Instance Container Instance Task (Container) ecs:StartTask
StoppedReason ͷछྨ Documented ▸ ఀࢭ͞ΕͨλεΫͰͷΤϥʔͷ֬ೝ ▸ docs.aws.amazon.com/ja_jp/AmazonECS/latest/ developerguide/stopped-task-errors.html
Summary
Summary ͜Μͳوํʹ ͓͢͢Ί ECS Events & Lambda ▸ ECS ඪ४ͷ
Task Placement ͩͱগ͠ Γͳ͍ ▸ k8s Blox ڇ͔ ▸ ҟৗऴྃͨ͠ Task Λٹग़͍ͨ͠ ▸ ΠϕϯτۦಈͰ͜·ΊʹϦιʔε੍ޚ͠ ͍ͨ
Summary How about Fargate ? ▸ ແࣄ Task Events ྲྀΕ·ͨ͠
AWS Fargate
None
RAGE Shadowverse World Grand Prix