Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11

AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11

JAWS-UG新潟#11で発表した資料です。

82d6167c4d14393c2e20b37a74b363c5?s=128

kasacchiful

January 15, 2022
Tweet

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. AWSͰ࡞ΔɺαʔόʔϨε σʔλ෼ੳج൫ߏங JAWS-UG৽ׁ#11 2022-01-15 @kasacchiful

  2. Classmethod, Inc. Solutions Architect / Software Develper Favorite: Community: •

    JAWS-UG Niigata • Python ML in Niigata • JaSST Niigata • ASTER • SWANII • etc. Hiroshi Kasahara @kasacchiful @kasacchiful 2
  3. αʔόʔϨεͷ෼ੳج൫

  4. σʔλ෼ੳʹ͓͚Δ֤छAWSαʔϏε

  5. σʔλͷՃ޻ʗ෼ੳʹ AWS Lambda ΋Մೳ

  6. ෳࡶɾେن໛ͳΒ AWS Step Functions Λ׆༻

  7. αʔόʔϨεύλʔϯ IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO

  8. Ϣʔεέʔεผʹύλʔϯ͕͋Δ IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO

  9. ύλʔϯͷৄࡉ͸Black BeltͷࢿྉΛࢀߟʹ IUUQTEBXTTUBUJDDPNXFCJOBSTKQQEGTFSWJDFT@"84@#MBDL#FU@4FSWFSMFTT@6TFDBTF@1BUUFSOTQEG :PV5VCFͰͷղઆಈըIUUQTZPVUVCF)*M8ESC@Z.

  10. S3ʹೖΕͯ͠·͑͹ɺͳΜͱ͔ͳΔ

  11. αʔόʔϨεͰσʔλ࿈ܞ͢Δࡍʹ ϋϚͬͨͱ͜Ζ

  12. Step FunctionsͷεςʔτϚγϯͰLambdaͷ ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ޻

  13. Step FunctionsͷεςʔτϚγϯͰLambdaͷ ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ޻ σʔλൃੜݩ͔ΒɺσʔλΛऔ ಘͯ͠4ʹอଘ ֤ϑΝΠϧຖʹɺ࠷௿ݶͷσʔ λՃ޻Λͯ͠ɺ4ʹอଘ 2VJDL4JHIU #* ༻ʹ

    ෳ਺ϑΝΠϧͷσʔλΛ·ͱΊ ͯద੾ʹ੔ܗ͢Δ
  14. ͍Ζ͍ΖϋϚͬͨͱ͜Ζ 4ͭ঺հ

  15. 1. ಛఆͷσʔλϑΝΠϧଟ͗͢

  16. ೰Έ: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩҟৗʹଟ͍ • 5෼ؒͷσʔλ͕1ϑΝΠϧʹ͋Δ • த਎͸ϛϦඵ୯ҐͷϨίʔυ • ಛఆͷॲཧ͚͕͔͔ͩ࣌ؒΔ

  17. • ݅਺ଟ͍σʔλ͸ɺBIʹग़ྗ͠ͳ͍߲໨ͩͬͨ • ೔࣍ॲཧ͔Β੾Γ཭ͯ͠ɺຖ࣌ॲཧʹมߋ • ೔࣍ॲཧͷϘτϧωοΫΛআ͍ͨ ରॲ๏: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩɺຖ࣌ॲ ཧʹมߋ

  18. 2. AthenaͷΫΥʔλ

  19. • σʔλҠߦ࣌ʹɺ೔࣍ॲཧͷ࠷ޙͷLambdaͰΤϥʔʹͳΔ • લஈͰॲཧͨ͠ෳ਺σʔλΛAthena࢖ͬͯSQLΫΤϦͰऔಘ͢Δͱ͜ΖͰ ্ݶʹҾ͔͔ͬΔ • Lambdaؔ਺1ͭʹ͖ͭɺɹstart-query-executionɹAPIΛ5ճίʔϧ • Ұ࣌తʹόʔετͰ্ݶ80·Ͱ૿͑Δ͚ͲɺσʔλҠߦ࣌ʹ͸20Ͱ಄ଧͪ •

    ্ݶ؇࿨ਃ੥͢Ε͹্ݶ͋͛ΒΕΔ ೰Έ: AthenaͷΫΤϦಉ࣮࣌ߦ਺ͷΫΥʔλʹ Ҿ͔͔ͬΔ
  20. IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHMJNJUTPWFSWJFXIUNM

  21. ରॲ๏: Step Functions ͷMapεςʔτͷ࠷େಉ ࣮࣌ߦ਺Λઃఆ • Mapεςʔτ (഑ྻ౉͢ͱɺಉ࣮࣌ߦͰ഑ྻཁૉΛॲཧ͢ΔΠϝʔδ) ͷ࠷େಉ࣮࣌ߦ਺Λઃఆ͠ɺAthenaͷ start-query-execution

    APIίʔ ϧΛ࠷େ20·Ͱʹ͓͑͞Δ
  22. Mapεςʔτʹ͍ͭͯ͸ɺҎԼͷهࣄΛࢀߟʹ IUUQTEFWDMBTTNFUIPEKQBSUJDMFTTUFQGVODUJPOTVQEBUFNBQTUBUF IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHBNB[POTUBUFTMBOHVBHFNBQTUBUFIUNM

  23. 3. Step FunctionsͷΫΥʔλ

  24. ೰Έ: Step FunctionsͷΠϕϯτཤྺ਺͕ΫΥʔ λʹҾ͔͔ͬΔ • ͋Δಛఆͷ೔͚ͩɺຖ࣌ॲཧͷϑΝΠϧ਺͕ҟৗʹଟ͍ • 1࣌ؒܦͬͯҟৗऴྃɻStep FunctionsͷΠϕϯτཤྺ਺ͷ্ݶ౸ୡ (25,000Πϕϯτ)

    • ্ݶ؇࿨ෆՄͷ߲໨ { "error": "States.Runtime", "cause": "The execution reached the maximum number of history events (25000)." }
  25. IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHMJNJUTPWFSWJFXIUNM

  26. ରॲ๏: Step Functions ͷεςʔτϚγϯΛೖΕ ࢠʹ • εςʔτϚγϯΛೖΕࢠʹ͢Δ͜ͱͰɺΠϕϯτཤྺ্ݶʹҾ͔͔ͬ Βͳ͍Α͏ʹͨ͠ • Lambdaͷಉ࣮࣌ߦ਺͕͔ͳΓ૿͑ΔͷͰɺҎԼͷରԠΛ௥Ճ

    ✓ Lambdaͷಉ࣮࣌ߦ਺ͷ্ݶ؇࿨ਃ੥ ✓ Step FunctionsͷMapεςʔτͷ࠷େಉ࣮࣌ߦ਺Λઃఆ
  27. มߋલ มߋޙ

  28. มߋલ มߋޙ

  29. 4. Lambdaͷεέʔϧ͕௥͍͔ͭͳ͍

  30. ೰Έ: 1ճ͚ͩLambdaͷRateLimitΤϥʔʹૺ۰ • ಉ࣮࣌ߦ਺ͷΤϥʔͷΑ͏͚ͩͲ… • ͢Ͱʹಉ࣮࣌ߦ਺ͷ্ݶΛҾ্͖͍͛ͯΔ΋ͷͷɺ֤ؔ਺ͷϞχλϦ ϯάݟΔݶΓɺಉ࣮࣌ߦ਺ʹ౸ୡ͍ͯ͠ͳ͍ { "error": "Lambda.TooManyRequestsException",

    "cause": "Rate Exceeded. (Service: Lambda, Status Code: 429, Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, Extended Request ID: null)" }
  31. IUUQTEPDTBXTBNB[PODPNKB@KQMBNCEBMBUFTUEHJOWPDBUJPOTDBMJOHIUNM

  32. ରॲ๏: Lambdaؔ਺ͷRetryઃఆΛݟ௚͠ • Step Functions ͷ Mapεςʔτͷ࠷େಉ࣮࣌ߦ਺Λݟ௚͠ • Step Functions

    Ͱఆٛ͢Δ Lambda ͷ Retry ઃఆΛݟ௚͠
  33. Retry ͷִؒʹ͍ͭͯ͸ҎԼͷهࣄ͕ৄ͍͠ $ node -e '((i,m,b)=>{for(let w=i,c=0;c<m;c++){console.log(w+=(c==0?0:b**c))}})(2,7,1.85)' 2 3.85 7.272500000000001

    13.604125000000002 25.317631250000005 46.987617812500005 87.07709295312502 IUUQTEFWDMBTTNFUIPEKQBSUJDMFTXBJU@UJNF@BOE@QBSBNT@JO@TUFQ@GVODUJPO@SFUSZ
  34. Lambda ͷ Provisioned Concurrency ઃఆ͸ࠓճ ࣮ࢪͯ͠ͳ͍ IUUQTEFWDMBTTNFUIPEKQBSUJDMFTMBNCEBQSPWJTJPOFEDPODVSSFODZDPMETUBSU

  35. σʔλͷՃ޻ʹ͸ AWS Glueͱ͍͏αʔϏε͋ΔΑʁ

  36. σʔλͷՃ޻ͳΒGlue͕͋Δ Glue࢖ΘͣʹɺΘ͟Θ͟Step Functions + LambdaͰ૊Ήඞཁ͋Δͷ͔ʁ • Step Functions + Lambdaͷ৔߹ɺΑ͘࢖ΘΕΔ։ൃϑϨʔϜϫʔΫ͕࢖͑ΔͷͰɺෳ਺ਓ

    Ͱͷ։ൃ͕͠΍͍͢ɻ ✓ ࠓճ͸ Serverless Framework ࢖ͬͨɻ • σʔλϑΝΠϧ਺͕ଟͯ͘΋ɺσʔλ1݅͋ͨΓͷ༰ྔ͕ͦ͜·Ͱେ͖͘ͳ͚Ε͹ɺ޻෉࣍ ୈͰLambdaͰॲཧ͕Ͱ͖Δɻ • LambdaͰ͸৐Γ੾Εͳ͍σʔλ༰ྔ΍࣮ߦ࣌ؒΛѻ͏৔߹͸ɺGlue࢖ͬͨํ͕͍͍ɻ ✓ ࠷େϝϞϦׂ౰: 10240MBɺ࠷େ࣮ߦ࣌ؒ: 15෼ɺ /tmp σΟϨΫτϦαΠζ: 512MB
  37. ͓·͚

  38. ͓·͚: AWS Data Wrangler͕ศར IUUQTHJUIVCDPNBXTMBCTBXTEBUBXSBOHMFS

  39. ͓·͚: AWS Data Wrangler͕ศར PandasͷػೳΛAWSʹ֦ு͢ΔɺΦʔϓϯιʔεͷPythonϥΠϒϥϦ • PandasσʔλϑϨʔϜͱAWSͷσʔλؔ࿈ͷαʔϏεͱΛ͏·͘઀ଓͯ͘͠Ε Δ ✓ Redshift

    / Glue / Athena / EMR ͳͲ • ௨ৗͷETLλεΫʹඞཁͳؔ਺͕ἧ͍ͬͯΔ
  40. ஫ҙ఺: ϑΝΠϧαΠζ͕େ͖ͯ͘ɺͦͷ··ͩ ͱLambdaʹ৐Βͳ͍ • LambdaͷσϓϩΠύοέʔδ͸ඇѹॖ࣌ʹ250MBҎԼʹ͢Δඞཁ͕͋Δ ✓ AWS Data WranglerΛී௨ʹpipΠϯετʔϧ͢Δͱɺ250MB௒͑Δ •

    GitHubͷReleaseϖʔδʹ͋ΔɺLambda Layer༻ͷzipϑΝΠϧΛར༻͠Α͏
  41. ·ͱΊ • αʔόʔϨεαʔϏεΛۦ࢖ͯ͠ɺσʔλ෼ੳج൫ΛߏஙՄೳ • αʔόʔϨεͷΑ͋͘ΔΞʔΩςΫνϟύλʔϯΛ͏·͘࢖͍͜ͳ͠ ·͠ΐ͏

  42. ͓͠·͍