Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ECS Events & Lambda でカジュアルにはじめるコンテナスケジューラー / 20171212_jawsug-container-lt

382b3308736365eb85f316a531f92252?s=47 Taro Hirose
December 12, 2017

ECS Events & Lambda でカジュアルにはじめるコンテナスケジューラー / 20171212_jawsug-container-lt

JAWS-UG コンテナ支部 #10 - connpass
https://jawsug-container.connpass.com/event/71130/

382b3308736365eb85f316a531f92252?s=128

Taro Hirose

December 12, 2017
Tweet

Transcript

  1. ECS Events & Lambda ͰΧδϡΞϧʹ࢝ ΊΔίϯςφεέδϡʔϥʔ JAWS-UG Container ࢧ෦ #10

    2017.12.12 (Tue) LT
  2. whoami ኍ੉ ଠ࿠ / Taro Hirose ▸ OPENREC.tv / CyberZ,

    Inc. ▸ Backend Engineer ▸ id: @uorat ▸ http://uorat.hatenablog.com
  3. ECS Events Introduction

  4. ECS Event is Կ ECS Cluster ಺Ϧιʔεͷঢ়ଶมߋʹԠͯ͡௨஌͞ΕΔ CloudWatch Events ▸

    ҎԼͷঢ়ଶมߋΠϕϯτΛडऔՄೳ ▸ Container Instance ▸ Task ▸ 2016.11.25 ։௨ ▸ Amazon ECSΠϕϯτετϦʔϜͰɺΫϥελͷঢ়ଶΛ؂ࢹ | Amazon Web Services ϒϩά ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/ Amazon ECS CloudWatch Events Lambda Event Stream Events SNS Kinesis
  5. ECS Event is Կ e.g. Task ىಈ { "version": "0",

    "id": "451dda85-ca1a-9045-5121-7a12dfb9317f", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:28:04Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/b280d725-7382-43b8-a50d-ef909a36cb80" ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/ff83c4a8-67fc-4a13-8134-897c6dd2195a", ... "desiredStatus": "RUNNING", ... "lastStatus": "PENDING", ... "taskDefinitionArn": "arn:aws:ecs:ap-northeast-1:123456789012:task-definition/uorat-ecs-event-test:35", ... } } ECS Task Event
  6. ECS Event is Կ Կ͕Ͱ͖Δͷʁ ▸ “ECSΠϕϯτετϦʔϜͰΫϥελͷঢ়ଶΛ؂ࢹ” ΑΓҾ༻ ▸ https://aws.amazon.com/jp/blogs/news/monitor-cluster-state-with-amazon-ecs-event-stream/

    ▸ “͜ͷ৘ใΛ࢖ͬͯɺίϯςφͷ഑ஔͱεέʔϧΛࣗಈԽ͢Δ͜ͱ΋ՄೳͰɺΫϥελΛඇৗʹਫ਼ີͳϨ ϕϧͰ”ਖ਼͍͠αΠζ”ʹ͢Δ͜ͱ͕Ͱ͖·͢ɻϓϧܕͰ͸ͳ͘ΠϕϯτۦಈͰΫϥελͷঢ়ଶͷ৘ใΛ४ ϦΞϧλΠϜͰ഑ૹ͢Δ͜ͱʹΑΓɺECSΠϕϯτετϦʔϜػೳ͸ίϯςφΠϯϑϥͷ؂ࢹͱεέʔϧ ʹରͯ͠ඇৗʹ޿ൣғͳՄೳੑΛఏڙ͍ͯ͠·͢ɻ”
  7. ECS Event is Կ ΠϕϯτۦಈͳλεΫ഑උ΍γεςϜ࿈ܞ ▸ ྫ͑͹ ▸ λεΫͷՔಈཤྺΛ Elasticsearch

    ΍ DynamoDB ʹอଘͯ͠ղੳ༻ʹ׆༻ ▸ λεΫ΍ίϯςφΠϯελϯεͷىಈ/ఀࢭ࣌ʹԿ͔͠ΒͷॲཧΛ࣮ߦ ▸ ؂ࢹγεςϜ࿈ܞ
  8. ECS Event is Կ e.g. ▸ Container Scheduler for Amazon

    ECS ▸ re:Invent 2016 Ͱެ։͞Εͨ golang ੡ OSS ▸ ECS Cluster ༻ͷΧελϜεέδϡʔϥΛ࣮૷Մೳ ▸ ECS Cluster ͷΠϕϯτݕ஌ ▸ ECS Cluster ͷঢ়ଶ௥੻ ▸ ΧελϜεέδϡʔϥʔͷ࣮ߦ ▸ REST API ͷఏڙ
  9. OPENREC.tv Case

  10. Case: OPENREC.tv ήʔϜʹಛԽͨ͠ಈը഑৴ϝσΟΞ ▸ ௿஗Ԇɾߴը࣭ ▸ ίϯςϯπͷ9ׂ͸UGC ▸ ϢʔβʔओಋͷϥΠϒ഑৴͕த৺ ▸

    ಉ࣌഑৴਺΋഑৴࣌ؒ΋഑৴ऀ࣍ୈ ▸ ू٬ྗ΋഑৴ऀ࣍ୈ ▸ ∴ ෛՙ͕ಡΈͮΒ͍ ▸ ಉ࣌ࢹௌऀ਺ ແ੍ݶ ▸ ͍ΘΏΔ “࿮” ͸ແ͍ ▸ શϢʔβʔ౳͘͠ϥΠϒࢹௌͰ͖Δ͜ͱ
  11. None
  12. Architecture of live transcoding system CloudWatch Events (scheduled/1min, ECS Event

    Stream) + Lambda + API ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
  13. Architecture of live transcoding system ▸ EC2/ECS Auto Scaling ͸૬ੑ͕ѱ͍

    ▸ RTMP = ৗ࣌઀ଓ ▸ ෛՙ͕௿ͯ͘΋഑৴͍ͯ͠Ε͹ॖୀͰ͖ͳ͍ ▸ “഑৴ঢ়گ” ͱ͍͏ಠࣗࢦඪʹج͍ͮͯ εέʔϧ ͤ͞Δεέδϡʔϥʔ͕ඞཁ ▸ Rolling Deploy ͷਏΈ ▸ ഑৴ऴྃ΋Ϣʔβʔ࣍ୈ ▸ ഑৴͕ऴΘΔ·Ͱجຊతʹམͱͤͳ͍ ▸ தʹ͸ 24 ࣌ؒ഑৴΋…
  14. Architecture of live transcoding system Stateful application, but disposability ECS

    Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
  15. Architecture of live transcoding system Expire >> Drain >> Stop

    Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Broadcaster Container Instance Task (Container)
  16. Architecture of live transcoding system Expire >> Drain >> Stop

    Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance Task (Container)
  17. Architecture of live transcoding system Expire >> Drain >> Stop

    Container ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events Container Instance
  18. Architecture of live transcoding system ഑৴਺/഑৴ෛՙʹԠͯ͡ Container ΍ Instance Λ

    AutoScale ▸ ϦϦʔε΋ `docker image push` ͢Ε͹ɺউखʹ৽Πϝʔδ͕ਁಁ͢Δ ECS Cluster Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Broadcaster Container Instance Task (Container) Container Instance Aurora LIVE API ELB EC2 RDS CloudWatch Events & Lambda ec2:RunInstances ecs:RunTask ecs:StopTask ec2:StopInstances … ECS Events
  19. w/ ECS Events Case

  20. Case1: Monitoring Tasks Container/Application ؂ࢹͷࣗಈઃఆ ▸ Task ͷঢ়ଶมԽʹԠͯ͡ ؂ࢹ ON/OFF

    1. Task ։࢝ → JVM, Wowza ૚ͷ؂ࢹ։࢝ 2. Task ਖ਼ৗఀࢭ → ؂ࢹఀࢭ 3. Task ҟৗఀࢭ → ؂ࢹఀࢭͤͣΞϥʔτൃ๒ ▸ ؂ࢹγεςϜ͸طଘ Zabbix ࢖͍ճ͠ ▸ ্ͷ 1, 2 Ͱ Zabbix API Λίʔϧ ▸ ͋Γ΋ͷͳͷͰ্͕҆Γ
  21. Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ { "version": "0", "id":

    "41f02974-8365-f955-8465-264ef8b189ca", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:10:36Z", "region": "ap-northeast-1", "resources": [ “arn:aws:ecs:ap-northeast-1:123456789012:task/55a0cfde-a377-4b97-…” ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", ... "lastStatus": “RUNNING”, ... "stoppedReason": "Task stopped by user", ... } } ECS Task Event ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask
  22. Case1: Monitoring Tasks e.g. ecs:StopTask ࣮ߦ def handle(event, context): ...

    if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) ECS Cluster Container Instance Task (Container) Container Instance Task (Container) ecs:StopTask Lambda function: main.py
  23. Case2: Respawn failed tasks ҟৗऴྃͨ͠ Task Λୟ͖ى͜͢ ▸ ࣋ଓ઀ଓͷͨΊ഑৴தͷ Task

    ͸ཁٹग़ ▸ “StoppedReason” ͕ظ଴֎ͷ৔߹ TaskΛ Re-run ▸ ServiceTask Ͱ͸ͳ͘ RunTask Ώ͑ʹඞཁ ▸ ͦͷޙͷରԠʹඞཁͳॲཧΛ࣮ߦ ▸ ҟৗऴྃ͸Ξϥʔτཁൃ๒ɺ؂ࢹࣗಈແޮ͠ͳ͍ ▸ Өڹͷग़ͨ഑৴৘ใΛ಺෦޲͚ʹ௨஌ ▸ ௐࠪ/߃ٱରԠͷͨΊ Task / Instance ΛϩοΫ
  24. Case2: Respawn failed tasks e.g. Task ҟৗऴྃ ECS Cluster Container

    Instance Container Instance Task (Container) { "version": "0", "id": "faeb52d8-e2ef-a726-655a-80f2373046b9", "detail-type": "ECS Task State Change", "source": "aws.ecs", "account": "123456789012", "time": "2017-09-07T08:40:33Z", "region": "ap-northeast-1", "resources": [ "arn:aws:ecs:ap-northeast-1:123456789012:task/d56d76f1-eb2a-42e5-..." ], "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:123456789012:cluster/uorat-ecs-event-test", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:123456789012:container-instance/a04...", ... "desiredStatus": "STOPPED", … "lastStatus": "STOPPED", ... "stoppedReason": "Essential container in task exited", ... } } ECS Task Event
  25. Case2: Respawn failed tasks e.g. Task ҟৗऴྃ def handle(event, context):

    ... if desire_status == "RUNNING" and last_status == "PENDING": logger.info("Enable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_register(tag_name, private_ip) elif desire_status == "STOPPED": logger.info("Found the stopped task: task_arn=%s, last_status=%s" % ( task_arn, last_status )) stopped_reason = event["detail"]["stoppedReason"] if stopped_reason == "Task stopped by user" and last_status == "RUNNING": logger.info("Disable monitoring by Zabbix: host=%s" % (tag_name)) zabbix_disable(tag_name) elif stopped_reason != "Task stopped by user": logger.warn("Found the failed task: host=%s, task_arn=%s, stopped_reason=%s" % ( tag_name, task_arn, stopped_reason )) respawn(task_arn, ec2_instance_id) logger.warn("Respawned and locked the task: host=%s, task_id=%s" % ( tag_name, task_arn )) Lambda function: main.py ECS Cluster Container Instance Container Instance Task (Container) ecs:StartTask
  26. StoppedReason ͷछྨ Documented ▸ ఀࢭ͞ΕͨλεΫͰͷΤϥʔͷ֬ೝ ▸ docs.aws.amazon.com/ja_jp/AmazonECS/latest/ developerguide/stopped-task-errors.html

  27. Summary

  28. Summary ͜Μͳوํʹ ͓͢͢Ί ECS Events & Lambda ▸ ECS ඪ४ͷ

    Task Placement ͩͱগ͠෺ ଍Γͳ͍ ▸ k8s ΍ Blox ͸ڇ౛͔΋ ▸ ҟৗऴྃͨ͠ Task Λٹग़͍ͨ͠ ▸ ΠϕϯτۦಈͰ͜·ΊʹϦιʔε੍ޚ͠ ͍ͨ
  29. Summary How about Fargate ? ▸ ແࣄ Task Events ྲྀΕ·ͨ͠

    AWS Fargate
  30. None
  31. RAGE Shadowverse World Grand Prix