Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11

AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11

JAWS-UG新潟#11で発表した資料です。

kasacchiful

January 15, 2022
Tweet

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. AWSͰ࡞ΔɺαʔόʔϨε
    σʔλ෼ੳج൫ߏங
    JAWS-UG৽ׁ#11
    2022-01-15 @kasacchiful

    View Slide

  2. Classmethod, Inc.


    Solutions Architect / Software Develper


    Favorite:


    Community:


    • JAWS-UG Niigata


    • Python ML in Niigata


    • JaSST Niigata


    • ASTER


    • SWANII


    • etc.


    Hiroshi Kasahara
    @kasacchiful
    @kasacchiful
    2

    View Slide

  3. αʔόʔϨεͷ෼ੳج൫

    View Slide

  4. σʔλ෼ੳʹ͓͚Δ֤छAWSαʔϏε

    View Slide

  5. σʔλͷՃ޻ʗ෼ੳʹ AWS Lambda ΋Մೳ

    View Slide

  6. ෳࡶɾେن໛ͳΒ AWS Step Functions Λ׆༻

    View Slide

  7. αʔόʔϨεύλʔϯ
    IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO

    View Slide

  8. Ϣʔεέʔεผʹύλʔϯ͕͋Δ
    IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO

    View Slide

  9. ύλʔϯͷৄࡉ͸Black BeltͷࢿྉΛࢀߟʹ
    [email protected]"[email protected]#MBDL#[email protected]@[email protected]
    :PV5VCFͰͷղઆಈըIUUQTZPVUVCF)*[email protected]

    View Slide

  10. S3ʹೖΕͯ͠·͑͹ɺͳΜͱ͔ͳΔ

    View Slide

  11. αʔόʔϨεͰσʔλ࿈ܞ͢Δࡍʹ


    ϋϚͬͨͱ͜Ζ

    View Slide

  12. Step FunctionsͷεςʔτϚγϯͰLambdaͷ
    ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ޻

    View Slide

  13. Step FunctionsͷεςʔτϚγϯͰLambdaͷ
    ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ޻
    σʔλൃੜݩ͔ΒɺσʔλΛऔ
    ಘͯ͠4ʹอଘ
    ֤ϑΝΠϧຖʹɺ࠷௿ݶͷσʔ
    λՃ޻Λͯ͠ɺ4ʹอଘ
    2VJDL4JHIU #*
    ༻ʹ
    ෳ਺ϑΝΠϧͷσʔλΛ·ͱΊ
    ͯద੾ʹ੔ܗ͢Δ

    View Slide

  14. ͍Ζ͍ΖϋϚͬͨͱ͜Ζ


    4ͭ঺հ

    View Slide

  15. 1. ಛఆͷσʔλϑΝΠϧଟ͗͢

    View Slide

  16. ೰Έ: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩҟৗʹଟ͍
    • 5෼ؒͷσʔλ͕1ϑΝΠϧʹ͋Δ


    • த਎͸ϛϦඵ୯ҐͷϨίʔυ


    • ಛఆͷॲཧ͚͕͔͔ͩ࣌ؒΔ

    View Slide

  17. • ݅਺ଟ͍σʔλ͸ɺBIʹग़ྗ͠ͳ͍߲໨ͩͬͨ


    • ೔࣍ॲཧ͔Β੾Γ཭ͯ͠ɺຖ࣌ॲཧʹมߋ


    • ೔࣍ॲཧͷϘτϧωοΫΛআ͍ͨ
    ରॲ๏: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩɺຖ࣌ॲ
    ཧʹมߋ

    View Slide

  18. 2. AthenaͷΫΥʔλ

    View Slide

  19. • σʔλҠߦ࣌ʹɺ೔࣍ॲཧͷ࠷ޙͷLambdaͰΤϥʔʹͳΔ


    • લஈͰॲཧͨ͠ෳ਺σʔλΛAthena࢖ͬͯSQLΫΤϦͰऔಘ͢Δͱ͜ΖͰ
    ্ݶʹҾ͔͔ͬΔ


    • Lambdaؔ਺1ͭʹ͖ͭɺɹstart-query-executionɹAPIΛ5ճίʔϧ


    • Ұ࣌తʹόʔετͰ্ݶ80·Ͱ૿͑Δ͚ͲɺσʔλҠߦ࣌ʹ͸20Ͱ಄ଧͪ


    • ্ݶ؇࿨ਃ੥͢Ε͹্ݶ͋͛ΒΕΔ
    ೰Έ: AthenaͷΫΤϦಉ࣮࣌ߦ਺ͷΫΥʔλʹ
    Ҿ͔͔ͬΔ

    View Slide

  20. IUUQTEPDTBXTBNB[[email protected]

    View Slide

  21. ରॲ๏: Step Functions ͷMapεςʔτͷ࠷େಉ
    ࣮࣌ߦ਺Λઃఆ
    • Mapεςʔτ (഑ྻ౉͢ͱɺಉ࣮࣌ߦͰ഑ྻཁૉΛॲཧ͢ΔΠϝʔδ)
    ͷ࠷େಉ࣮࣌ߦ਺Λઃఆ͠ɺAthenaͷ start-query-execution APIίʔ
    ϧΛ࠷େ20·Ͱʹ͓͑͞Δ

    View Slide

  22. Mapεςʔτʹ͍ͭͯ͸ɺҎԼͷهࣄΛࢀߟʹ
    IUUQTEFWDMBTTNFUIPEKQBSUJDMFTTUFQGVODUJPOTVQEBUFNBQTUBUF IUUQTEPDTBXTBNB[[email protected][POTUBUFTMBOHVBHFNBQTUBUFIUNM

    View Slide

  23. 3. Step FunctionsͷΫΥʔλ

    View Slide

  24. ೰Έ: Step FunctionsͷΠϕϯτཤྺ਺͕ΫΥʔ
    λʹҾ͔͔ͬΔ
    • ͋Δಛఆͷ೔͚ͩɺຖ࣌ॲཧͷϑΝΠϧ਺͕ҟৗʹଟ͍


    • 1࣌ؒܦͬͯҟৗऴྃɻStep FunctionsͷΠϕϯτཤྺ਺ͷ্ݶ౸ୡ
    (25,000Πϕϯτ)


    • ্ݶ؇࿨ෆՄͷ߲໨
    {


    "error": "States.Runtime",


    "cause": "The execution reached the maximum number of history events (25000)."


    }

    View Slide

  25. IUUQTEPDTBXTBNB[[email protected]

    View Slide

  26. ରॲ๏: Step Functions ͷεςʔτϚγϯΛೖΕ
    ࢠʹ
    • εςʔτϚγϯΛೖΕࢠʹ͢Δ͜ͱͰɺΠϕϯτཤྺ্ݶʹҾ͔͔ͬ
    Βͳ͍Α͏ʹͨ͠


    • Lambdaͷಉ࣮࣌ߦ਺͕͔ͳΓ૿͑ΔͷͰɺҎԼͷରԠΛ௥Ճ


    ✓ Lambdaͷಉ࣮࣌ߦ਺ͷ্ݶ؇࿨ਃ੥


    ✓ Step FunctionsͷMapεςʔτͷ࠷େಉ࣮࣌ߦ਺Λઃఆ

    View Slide

  27. มߋલ มߋޙ

    View Slide

  28. มߋલ มߋޙ

    View Slide

  29. 4. Lambdaͷεέʔϧ͕௥͍͔ͭͳ͍

    View Slide

  30. ೰Έ: 1ճ͚ͩLambdaͷRateLimitΤϥʔʹૺ۰
    • ಉ࣮࣌ߦ਺ͷΤϥʔͷΑ͏͚ͩͲ…


    • ͢Ͱʹಉ࣮࣌ߦ਺ͷ্ݶΛҾ্͖͍͛ͯΔ΋ͷͷɺ֤ؔ਺ͷϞχλϦ
    ϯάݟΔݶΓɺಉ࣮࣌ߦ਺ʹ౸ୡ͍ͯ͠ͳ͍
    {


    "error": "Lambda.TooManyRequestsException",


    "cause": "Rate Exceeded. (Service: Lambda, Status Code: 429, Request ID:
    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, Extended Request ID: null)"


    }

    View Slide

  31. IUUQTEPDTBXTBNB[[email protected]

    View Slide

  32. ରॲ๏: Lambdaؔ਺ͷRetryઃఆΛݟ௚͠
    • Step Functions ͷ Mapεςʔτͷ࠷େಉ࣮࣌ߦ਺Λݟ௚͠


    • Step Functions Ͱఆٛ͢Δ Lambda ͷ Retry ઃఆΛݟ௚͠

    View Slide

  33. Retry ͷִؒʹ͍ͭͯ͸ҎԼͷهࣄ͕ৄ͍͠
    $ node -e '((i,m,b)=>{for(let w=i,c=0;c

    2


    3.85


    7.272500000000001


    13.604125000000002


    25.317631250000005


    46.987617812500005


    87.07709295312502


    [email protected]@[email protected]@[email protected]@[email protected]

    View Slide

  34. Lambda ͷ Provisioned Concurrency ઃఆ͸ࠓճ
    ࣮ࢪͯ͠ͳ͍
    IUUQTEFWDMBTTNFUIPEKQBSUJDMFTMBNCEBQSPWJTJPOFEDPODVSSFODZDPMETUBSU

    View Slide

  35. σʔλͷՃ޻ʹ͸


    AWS Glueͱ͍͏αʔϏε͋ΔΑʁ

    View Slide

  36. σʔλͷՃ޻ͳΒGlue͕͋Δ
    Glue࢖ΘͣʹɺΘ͟Θ͟Step Functions + LambdaͰ૊Ήඞཁ͋Δͷ͔ʁ


    • Step Functions + Lambdaͷ৔߹ɺΑ͘࢖ΘΕΔ։ൃϑϨʔϜϫʔΫ͕࢖͑ΔͷͰɺෳ਺ਓ
    Ͱͷ։ൃ͕͠΍͍͢ɻ


    ✓ ࠓճ͸ Serverless Framework ࢖ͬͨɻ


    • σʔλϑΝΠϧ਺͕ଟͯ͘΋ɺσʔλ1݅͋ͨΓͷ༰ྔ͕ͦ͜·Ͱେ͖͘ͳ͚Ε͹ɺ޻෉࣍
    ୈͰLambdaͰॲཧ͕Ͱ͖Δɻ


    • LambdaͰ͸৐Γ੾Εͳ͍σʔλ༰ྔ΍࣮ߦ࣌ؒΛѻ͏৔߹͸ɺGlue࢖ͬͨํ͕͍͍ɻ


    ✓ ࠷େϝϞϦׂ౰: 10240MBɺ࠷େ࣮ߦ࣌ؒ: 15෼ɺ /tmp σΟϨΫτϦαΠζ: 512MB

    View Slide

  37. ͓·͚

    View Slide

  38. ͓·͚: AWS Data Wrangler͕ศར
    IUUQTHJUIVCDPNBXTMBCTBXTEBUBXSBOHMFS

    View Slide

  39. ͓·͚: AWS Data Wrangler͕ศར
    PandasͷػೳΛAWSʹ֦ு͢ΔɺΦʔϓϯιʔεͷPythonϥΠϒϥϦ


    • PandasσʔλϑϨʔϜͱAWSͷσʔλؔ࿈ͷαʔϏεͱΛ͏·͘઀ଓͯ͘͠Ε
    Δ


    ✓ Redshift / Glue / Athena / EMR ͳͲ


    • ௨ৗͷETLλεΫʹඞཁͳؔ਺͕ἧ͍ͬͯΔ

    View Slide

  40. ஫ҙ఺: ϑΝΠϧαΠζ͕େ͖ͯ͘ɺͦͷ··ͩ
    ͱLambdaʹ৐Βͳ͍
    • LambdaͷσϓϩΠύοέʔδ͸ඇѹॖ࣌ʹ250MBҎԼʹ͢Δඞཁ͕͋Δ


    ✓ AWS Data WranglerΛී௨ʹpipΠϯετʔϧ͢Δͱɺ250MB௒͑Δ


    • GitHubͷReleaseϖʔδʹ͋ΔɺLambda Layer༻ͷzipϑΝΠϧΛར༻͠Α͏

    View Slide

  41. ·ͱΊ
    • αʔόʔϨεαʔϏεΛۦ࢖ͯ͠ɺσʔλ෼ੳج൫ΛߏஙՄೳ


    • αʔόʔϨεͷΑ͋͘ΔΞʔΩςΫνϟύλʔϯΛ͏·͘࢖͍͜ͳ͠
    ·͠ΐ͏

    View Slide

  42. ͓͠·͍

    View Slide