Upgrade to Pro — share decks privately, control downloads, hide ads and more …

広告配信を支えるバッチ基盤をサーバーレス移行した話(ECS Fargate, Step Functions)@ Serverless Meetup Tokyo #16

広告配信を支えるバッチ基盤をサーバーレス移行した話(ECS Fargate, Step Functions)@ Serverless Meetup Tokyo #16

F3358acf3b3f8423f2db7718b51b5022?s=128

Jumpei Chikamori

February 27, 2020
Tweet

Transcript

  1. ޿ࠂ഑৴Λࢧ͑Δόονج൫Λ αʔόʔϨεҠߦͨ͠࿩ ECS Fargate, Step Functions Serverless Meetup Tokyo #16

    2020/02/27
  2. ۙ৿३ฏ pei0804 Zucks(VOYAGE GROUP)ΤϯδχΞ 2018೥ʹ৽ଔೖࣾ͠ɺӡ༻͔Β։ൃΛͯ͠·͢ɻ ࠷ۙ͸νʔϜΛԣஅͨ͠ηΩϡϦςΟؔ࿈ͷ ੔උ΍DSP։ൃΛ΍ͬͯ·͢ɻ ޷͖ͳαʔϏε͸ɺϑϧϚωʔδυαʔϏεͰ͢ɻ

  3. ΞδΣϯμ • Ҡߦݩͷόοναʔόʔͷ՝୊ • αʔόʔϨεҠߦ • ࣮૷ • ؂ࢹ •

    ӡ༻ͯ͠ΈͯͲ͏͔ͩͬͨ
  4. Ҡߦݩͷόοναʔόʔͷ՝୊

  5. Ҡߦݩͷόοναʔόʔ      DSPOVTFSUJNFPVUqPDLXIPHFMPDLNBLFSVOIPHF   

      DSPOVTFSUJNFPVUqPDLXGVHBMPDLNBLFSVOGVHB      DSPOVTFSUJNFPVUqPDLXGVHBMPDLNBLFSVOGPP ࡉ͔͍࣮૷ͷ࢓ํ͸ɺνʔϜʹΑͬͯҧ͏ͱࢥ͍·͢ɻ
  6. ӡ༻͕೉͘͠ͳΓ͕ͪ

  7. ӡ༻͕೉͍͠ཁҼ • ໰୊ൃੜ࣌ʹߟ͑Δ͜ͱ͕ଟ͍ɻ • ෮ݩͷ࢓ํ஌ͬͯΔਓډΔʁ

  8. ໰୊ൃੜ࣌ʹߟ͑Δ͜ͱ͕ଟ͍ • όονॲཧ͕མͪͨ࣌ʹɺϗετͷ໰୊͔ɺ ͦΕͱ΋ॻ͔Ε͍ͯΔίʔυ͕ո͍͠ʁ ͱ͔ߟ͑ͳ͍ͱ͍͚ͳ͍ɻ • ৭Μͳcron͕ಈ͍͍ͯΔέʔε͕͋ͬͯɺ Πϯελϯε্ཱͪ͛௚͚ͩ͢Ͱ͸ͩΊͩͬͨΓ͢Δɻ ඍົʹಈ͍͍ͯΔόονॲཧ͕ډͨΓ

  9. ෮ݩͷ࢓ํ஌ͬͯΔਓډΔʁ • Ͳ͏΍ͬͯಈ͍͍ͯΔ͔ͷ஌͕ࣝɺ ଐਓԽ͍ͯ͠Δέʔε͕ଟ͍ɻ • ্ཱͪ͛௚͢ΦϖϨʔγϣϯΛɺ ීஈ΍Βͳ͍ͷͰɺૉૣ͘ग़དྷͳ͍ɻ • ίʔυԽ͞Ε͍ͯͳ͍෦෼͕͋ͬͨΓ͢Δɻ

  10. όονॲཧͰ΍Γ͍ͨ͜ͱ ఆظతʹಈ͘ॲཧΛɺ ͨͩॻ͖͍͚ͨͩͳΜ͡Όɻ ཉΛݴ͏ͱϗετΛؾʹͨ͘͠ͳ͍ɻ

  11. खܰʹѻ͑Δج൫ʹ͍ͨ͠

  12. όοναʔόʔ͸ Ͳ͏͍͏࢓ࣄΛ͍ͯ͠Δͷ͔

  13. ௐ΂͍ͯ͘ͱόονॲཧ͸3ͭʹɺ ෼ྨग़དྷΔ͜ͱ͕Θ͔ͬͨɻ

  14. ৭Μͳόονॲཧ • ஞ࣮࣍ߦ • ఆظ࣮ߦ • ࣮͸ґଘؔ܎͕͋Δ ͦΕͧΕͷόονॲཧ͕ɺ ॏෳͯ͠ಈ͔ͳ͍͜ͱΛظ଴͍ͯ͠Δɻ

  15. ৭Μͳόονॲཧ • ஞ࣮࣍ߦ • ఆظ࣮ߦ • ࣮͸ґଘؔ܎͕͋Δ ͦΕͧΕͷόονॲཧ͕ɺ ॏෳͯ͠ಈ͔ͳ͍͜ͱΛظ଴͍ͯ͠Δɻ ͍·"84Ͱఏڙ͞ΕͯΔαʔϏεͳΒɺ

    ΋ͬͱѻ͍΍͍͢ج൫ʹग़དྷΔͷͰ͸ʁ
  16. ৽͍͠ج൫ʹ࢖͑ͦ͏ͳαʔϏε • ECS(+Fargate) • Step Functions

  17. ECS • Amazon Elastic Container Service (Amazon ECS) ͸ɺ ׬શϚωʔδυܕͷίϯςφΦʔέετϨʔγϣϯ

    αʔϏεͰ͢ɻ https://aws.amazon.com/jp/ecs/ • Fargateͱ૊Έ߹ΘͤΔͱɺ ϗετͳ͠Ͱར༻ՄೳʹͳΔɻʢαʔόʔϨεʣ
  18. ECSͷಈ࡞Πϝʔδ

  19. { "executionRoleArn": "arn:aws:iam::000000:role/execRole", "containerDefinitions": [ { "logConfiguration": { "logDriver": "awslogs",

    "options": {...} }, "command": [ "make", "run" ], "image": "00000.dkr.ecr.ap-northeast-1.amazonaws.com/batch:production", "name": "hoge" } ], "memory": "2048", "taskRoleArn": "arn:aws:iam::895849419934:role/taskRole", "family": "hoge", "requiresCompatibilities": [ "FARGATE" ], "networkMode": "awsvpc", "cpu": "512", "volumes": [] } taskఆٛྫ aws ecs register-task-definition --cli-input-json file://task.json
  20. Fargateʹ͍ͭͯ

  21. Fargateͷྑ͍ͱ͜Ζ 2018೥12݄ࠒ͔Βɺஈ֊తʹόονॲཧΛFargate ʹҠߦͯ͠·͕ͨ͠ɺ FargateͰࠔͬͨͱ͍͏έʔε͸͍·ͷͱ͜Ζͳ͍ɻ αʔόʔϨεͳͷͰɺϗετͷෆௐͰ೰·͞ΕΔ έʔε͕ͳ͘ͳΓɺτϥϒϧγϡʔςΟϯά͸ ҎલΑΓ΋γϯϓϧʹͳΓ·ͨ͠ɻ

  22. Fargate͕߹Θͳ͍έʔε ىಈ͕࣌ؒEC2্Ͱಈ͔͢ECSΑΓ͸஗͘ͳΔͷͰɺ ىಈ͕࣌ؒॏཁͳόονॲཧͰ͸͓͢͢Ί͠·ͤΜɻ τʔλϧͷ஗͘ͳΔ౓߹͍͸ΠϝʔδαΠζʹ΋ ࠨӈ͞ΕΔͷͰɺ֤؀ڥͰݕূͯ͠Έ͍ͯͩ͘͞ɻ

  23. Step Functions • AWS LambdaɺAWS Fargate ͓Αͼ Amazon SageMaker ͳͲͷαʔϏεΛͭͳ͛ͯ

    ػೳ๛෋ͳΞϓϦέʔγϣϯʹ·ͱΊΔ ϫʔΫϑϩʔΛઃܭ࣮ͯ͠ߦͰ͖·͢ɻ https://aws.amazon.com/jp/step-functions/
  24. Step Functionsͷಈ࡞Πϝʔδ

  25. ͜Ε͍͚ΔͷͰ͸ʁ

  26. αʔόʔϨεҠߦ

  27. ৭Μͳόονॲཧʢ࠶ܝʣ • ஞ࣮࣍ߦ • ఆظ࣮ߦ • ࣮͸ґଘؔ܎͕͋Δ ͦΕͧΕͷόονॲཧ͕ɺ ॏෳͯ͠ಈ͔ͳ͍͜ͱΛظ଴͍ͯ͠Δɻ

  28. ஞ࣮࣍ߦ

  29. ஞ࣮࣍ߦͱ͸ όονॲཧΛͳΔ͸΍Ͱ࣮ߦ͢Δɻ cron࣮૷ྫ * * * * * * cron-user

    flock -w 1 hoge.lock make run
  30. None
  31. ECS Service • ࢦఆͨ͠਺ͷΠϯελϯεΛಉ࣌ʹ࣮ߦͯ͠ ҡ࣋Ͱ͖·͢ɻ https://docs.aws.amazon.com/ja_jp/ AmazonECS/latest/developerguide/ ecs_services.html

  32. ECS ServiceͷྲྀΕɿඞཁ਺1ͷྫ 1. λεΫ਺͕0ͳͨΊɺAλεΫ͕౤ೖ͞Εɺ λεΫ਺͕1ʹͳΔɻ 2. AλεΫ͕ॲཧΛऴ͑Δɻ λεΫ਺͕0ʹͳΔɻ 3. λεΫ਺͕0ͳͨΊɺAλεΫ͕౤ೖ͞ΕΔry

    ͜Ε͕܁Γฦ͞ΕΔɻ
  33. { "cluster": "batch", "taskDefinition": "arn:aws:ecs:ap-northeast-1:000000:task-definition/hoge", "networkConfiguration": { "awsvpcConfiguration": { "assignPublicIp":

    "ENABLED", "securityGroups": [ "sg-hoge" ], "subnets": [ "subnet-hoge" ] } }, "desiredCount": 1ɹඞཁ਺ } Serviceͷఆٛྫ aws ecs create-service --service-name hoge \ --launch-type FARGATE --cli-input-json file://service.json
  34. ͜ͷύλʔϯͷϝϦοτ • ͳΔ΂͘ૣ࣮͘ߦ͢Δ͕γϯϓϧʹදݱͰ͖Δɻ

  35. ఆظ࣮ߦ

  36. ఆظ࣮ߦͱ͸ ࢦఆ࣌ؒʹόονॲཧΛ࣮ߦ͍ͨ͠ɻ cron࣮૷ྫ 45 * * * * * cron-user

    flock -w 1 hoge.lock make run
  37. None
  38. Cloud Watch Events ECS Task Schedule • cron ϥΠΫͳεέδϡʔϧͰͷ λεΫͷ࣮ߦ͕Մೳɻ

    https://docs.aws.amazon.com/ja_jp/ AmazonECS/latest/developerguide/ scheduling_tasks.html
  39. εέδϡʔϥʔͷઃఆ 3VMF ͍࣮ͭߦ͢Δ͔ 5BSHFUT ԿΛ͢Δ͔ 3VMFʹ5BSHFUTΛඥ෇͚Δ

  40. { "Name": "hoge", "ScheduleExpression": "cron(35 * * * ? *)",

    "State": "ENABLED", "Description": "ίϝϯτ" } Rule aws events put-rule --name hoge --cli-input-json file://rule.json
  41. { "Targets": [ { "Id": "hoge", "Arn": "arn:aws:ecs:ap-northeast-1:00000:cluster/batch", "RoleArn": "arn:aws:iam::895849419934:role/ecsEventsRole",

    "Input": "{}", "EcsParameters": { "TaskDefinitionArn": "arn:aws:ecs:ap-northeast-1:000000:task-definition/hoge", "TaskCount": 1, "LaunchType": "FARGATE", "NetworkConfiguration": { "awsvpcConfiguration": { "Subnets": [ "subnet-hoge" ], "SecurityGroups": [ "sg-hoge" ], "AssignPublicIp": "ENABLED" } }, "PlatformVersion": "LATEST" } } ] } Targets aws events put-targets --rule hoge —cli-input-json file://targets.json
  42. IUUQTEPDTBXTBNB[PODPNKB@KQ"NB[PO$MPVE8BUDIMBUFTUFWFOUT$8&@5SPVCMFTIPPUJOHIUNM3VMF5SJHHFSFE.PSF5IBO0ODF ෳ਺ճτϦΨʔ͞ΕΔ͜ͱ͕͋Δ

  43. None
  44. dlock Go੡෼ࢄγεςϜ޲͚flockϥΠΫπʔϧ ҠߦݩͷόοναʔόʔͷॲཧΛͦͷ··ECSͳͲʹɺ ࣋ͬͯ͜ΕΔ༻ʹ։ൃ͞Εͨࣾ಺πʔϧɻ Lock؅ཧʹDynamoDBͷςʔϒϧΛ࢖༻͍ͯ͠Δɻ ςʔϒϧͷΩϟύγςΟͱ͔͸ؾʹ͠ͳͯ͘ྑ͍Α͏ʹΦϯσϚϯυΛ ࢖͍ͬͯ·͢ɻ ※ഉଞ੍ޚΛؾʹ͠ͳ͍͍࣮ͯ͘૷ʹ͢Δͷ͕ϕετͰ͢ɻ OSSʹ͸ͯ͠·ͤΜ͕ɺधཁ͕͋Γͦ͏ͳΒެ։΋ߟ͑ͯ΋ྑͦ͞͏ɻ

  45. dlock λεΫͷίϚϯυͰҎԼΛ࣮ߦ͢Δ EMPDLŠSFHJPOBQOPSUIFBTUSVOEMPDLIPHFMPDLNBLFSVO IPHF

  46. ͜ͷύλʔϯͷϝϦοτ • cronΛ࢖ͬͨόονॲཧΛɺͦͷ··࣋ͬͯ͜ΕΔɻ • dlockͱ૊Έ߹ΘͤΕ͹ɺഉଞॲཧ΋Մೳɻ

  47. ࣮͸ґଘؔ܎͕͋Δ

  48. ࣮͸ґଘؔ܎͕͋Δ ॻ͔Ε͍ͯΔcron Aόον͸ຖ࣌20෼։࢝ Bόον͸ຖ࣌40෼։࢝ "όονॲཧ #όονॲཧ "όονॲཧͷ݁ՌΛݩʹॲཧ͢Δ DSPOͰ͸ݟ͑ͳ͍͚ͲɺཪͰ͸ґଘ͍ͯ͠Δ

  49. ࣮͸ґଘؔ܎͕͋Δ ૉ௚ʹcronͰґଘؔ܎Λදݱ͢Δͱɺ ґଘ͍ͯ͠Δόονॲཧ͕མͪΔͱɺޙஈͷόονॲཧ ΋མͪͯɺͦͷޙஈ΋མͪͯͱͳΔɻ ·ͨɺόονॲཧͷґଘؔ܎ʹৄ͘͠ͳ͍ͱɺ ॲཧ͠௚͠खॱ͕Θ͔Βͳ͍ɻ ࣗಈͰ͍͍ײ͡ʹ͢Δʹͯ͠΋ɺ ϦτϥΠͷ࢓૊ΈΛ࡞Δͱ͔͠ͳ͍ͱ͍͚ͳ͍ɻ

  50. None
  51. ͜ͷύλʔϯͷϝϦοτ • δϣϒͷґଘؔ܎Λૉ௚ʹදݱͰ͖Δɻ • ϦτϥΠ΍Τϥʔॲཧ͸StepFucntionsͷඪ४ػೳ Λ࢖͑͹؆୯ʹ࣮૷Մೳɻ https://docs.aws.amazon.com/ja_jp/step-functions/latest/dg/concepts-error-handling.html "Retry": [ {

    "ErrorEquals": [ "States.Timeout" ], "IntervalSeconds": 3, ࠷ॳͷ࠶ࢼߦલͷඵ਺ "MaxAttempts": 2,ɹ࠶ࢼߦͷ࠷େճ਺ "BackoffRate": 1.5ɹ֤ࢼߦؒͰ࠶ࢼߦִ͕ؒ૿Ճ͢Δ৐਺ } ]
  52. ৭Μͳόονॲཧʢ࠶ܝʣ • ஞ࣮࣍ߦ • ఆظ࣮ߦ • ࣮͸ґଘؔ܎͕͋Δ ͦΕͧΕͷόονॲཧ͕ɺ ॏෳͯ͠ಈ͔ͳ͍͜ͱΛظ଴͍ͯ͠Δɻ

  53. αʔόʔϨεͰ͍͚Δ΍Μ

  54. ؂ࢹ

  55. ؾʹͳΔͱ͜Ζ • ECSͷ࣮ߦΤϥʔൃੜ͍ͯ͠ͳ͍͔ɻ • StepFunctionsͷεςʔτϚγϯͷ ϫʔΫϑϩʔ్͕தͰࣦഊ͍ͯ͠Δ͔ɻ • CloudWatch Events͕ECS TaskΛ

    invokeग़དྷ͍ͯΔ͔ɻ
  56. ECSͷ࣮ߦΤϥʔൃੜ͍ͯ͠ͳ͍͔

  57. ECSͷ؂ࢹ

  58. Resources: Func: Type: AWS::Serverless::Function Properties: Runtime: python3.8 MemorySize: 128 Timeout:

    60 Handler: main.handle Role: !GetAtt FuncRole.Arn CodeUri: ./ Environment: Variables: SLACK_SUCCESS_CHANNEL: !Ref SlackSuccessChannel SLACK_FAILURE_CHANNEL: !Ref SlackFailureChannel SLACK_HOOK_URL: !Ref SlackHookUrl Events: ECSTask: Type: CloudWatchEvent Properties: Pattern: source: - "aws.ecs" detail-type: - "ECS Task State Change" detail: lastStatus: - "STOPPED" ECSͷSTOPPEDΠϕϯτʢSAMʣ
  59. { ... "detail": { "clusterArn": "arn:aws:ecs:ap-northeast-1:951472794671:cluster/name", "containerInstanceArn": "arn:aws:ecs:ap-northeast-1:951472794671:container-instance/aaaaaaaa-bbbb-cccc-dddd- eeeeeeeeeeee", "containers":

    [ { "containerArn": "arn:aws:ecs:ap-northeast-1:951472794671:container/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "exitCode": 0, codeΛݟΕ͹੒ޭ͔ͨ֬͠ೝՄೳ "lastStatus": "STOPPED", "name": "container_one", "taskArn": "arn:aws:ecs:ap-northeast-1:951472794671:task/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" }, { "containerArn": "arn:aws:ecs:ap-northeast-1:951472794671:container/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "lastStatus": "STOPPED", "name": “container_two", "reason": "CannotStartContainerError: API error (500): cannot start a stopped process: unknown\n",ɹࣦഊཧ༝ "taskArn": "arn:aws:ecs:ap-northeast-1:951472794671:task/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" } ] } ... } ΠϕϯτͰඈΜͰ͘Δ৘ใ
  60. ฐࣾͷྫ

  61. StepFunctionsͷεςʔτϚγϯͷ ϫʔΫϑϩʔ్͕தͰࣦഊ͍ͯ͠Δ͔

  62. StepFunctionsͷεςʔτϚγϯͷ ϫʔΫϑϩʔ్͕தͰࣦഊ͍ͯ͠Δ͔ ExecutionsFailedΛݟΕ͹ɺ్தͰࣦഊ͍ͯ͠Δ͜ͱ͕෼͔Δɻ ฐࣾͰ͸εςʔτϚγϯͷࡉ͔͍ঢ়ଶભҠ͸ݟ͍ͯ·ͤΜ͕ɺ CloudWatch Eventsܦ༝Ͱऔಘ͢Δ͜ͱ΋ՄೳͰ͢ɻ

  63. CloudWatch Events͕ECS TaskΛ invokeग़དྷ͍ͯΔ͔

  64. CloudWatch Events͕ECS TaskΛ invokeग़དྷ͍ͯΔ͔ FailedInvocationͰInvokeͷࣦഊʹؾͮ͘͜ͱ͕ग़དྷ·͢ɻ Ͳ͏͍͏έʔεͰinvokeࣦഊ͢Δ͔ͱ͍͏ͱɺ TargetsͷRole ArnͷRoleͷݖݶ͕଍Γͯͳ͍ͱ͔ɺ Role͕ͳ͘ͳͬͯΔͱ͔Ͱ͢ɻ Ξϥʔτແ͍͔Β҆ఆͯ͠ΔͳͬͯࢥͬͨΒɺInvokeࣦഊͯͨ͠

    ͱ͔স͑ͳ͍Ͱ͔͢ΒͶʢࣗ෼͸΍ͬͯ͠·ͬͨʣɻ
  65. ӡ༻ͯ͠ΈͯͲ͏͔ͩͬͨ

  66. ಘΒΕͨϝϦοτ

  67. ಘΒΕͨϝϦοτ • ৗʹ࢖͍ࣺͯ͢Δ࣮ߦ؀ڥΛ࣮ݱͰ͖ͨɻ 2019೥8݄23೔ͷEC2ো֐࣌ʹҰ࣌తʹ ΤϥʔʹͳΓͭͭ΋ɺࣗಈͰ෮چͨ͠ɻ https://www.itmedia.co.jp/news/articles/1908/28/news127_2.html • ECSɺStepFunctions͑͞஌͍ͬͯΕ͹ڍಈ͕Θ͔Δɻ

  68. ஍ຯخ͍͠ • 1όονॲཧɺ1ϩʔϧʹͳͬͨͷͰɺݖݶ͕࠷௿ݶ෇༩ग़དྷΔɻ • όονॲཧʹ߹ΘͤͯɺϚγϯύϫʔͷνϡʔχϯά͕ग़དྷΔɻ • ϑϧϚωʔδυ͔ͩΒɺԿ΋͠ͳͯ͘΋ศརʹͳ͍ͬͯ͘ɻ • ଞͷόονॲཧ͕Ͳ͏ͱ͔ؾʹ͠ͳͯ͘Α͍ɻ

  69. όονॲཧ͸αʔόʔϨεͰ ग़དྷΔ࣌୅Ͱ͢ʂ

  70. ·ͱΊ

  71. αʔόʔϨε࠷ߴ

  72. ฐࣾͰ͸ ΤϯδχΞ࠾༻ͯ͠·͢ʂ

  73. https://techlog.voyagegroup.com/entry/2019/02/04/171325

  74. pei0804ͷ࠷ۙͷ࢓ࣄ • DSPͷػೳ։ൃ Ͳ͏࡞Δ͔ΒɺίʔσΟϯάɺӡ༻·Ͱ Ұؾ௨؏ͯ͠΍ͬͯ·͢ɻ • ηΩϡϦςΟ੔උ ηΩϡϦςΟΞΧ΢ϯτ࡞੒ͨ͠Γɺ GuardDutyಋೖ΍CISϕϯνϚʔΫʹ߹Θͤͨ ؀ڥ੔උ΍ࣾ಺޲͚ηΩϡϦςΟษڧձͷ։࠵ɻ

  75. https://note.com/ryosuke_kawamura/n/nb5fc4d34a7c8