$30 off During Our Annual Pro Sale. View Details »

Step Functions と Lambda Function で組む Athena によるファイル処理 / serverless lightweight etl

Step Functions と Lambda Function で組む Athena によるファイル処理 / serverless lightweight etl

本セッションではイベント管理サービスを例にとり、セッション情報のCSVファイルとチケット情報のCSVファイルからひとつのファイルに変換するETLを考えていきます。AWS Glue と Amazon Athena のことを知り、サーバーレスアプリケーションに組み込むイメージを持って帰ってもらうことが目的です。

アプローチ:
・AWS Glue と Amazon Athena のことを知る
・ファイル to ファイル のETLユースケースを考える
・AWS CDK でリソースを作成してみる

Yusuke Wada

July 29, 2020
Tweet

More Decks by Yusuke Wada

Other Decks in Technology

Transcript

  1. AWS Serverless ͱϑΝΠϧ Step Functions ͱ Lambda Function Ͱ૊Ή Athena

    ʹΑΔϑΝΠϧॲཧ #serverlessjp
  2. ɹαʔόʔϨεͰϑΝΠϧ to ϑΝΠϧͷETL ࠓ೔ͷ࿩

  3. None
  4. αʔόʔϨεͰETLͱ͍͑͹ AWS Glue ?

  5. None
  6. Amazon Athena Ͱσʔλநग़ʁ

  7. Θͨ͠ͷ"84(MVF΍"NB[PO"UIFOBͷୈҰҹ৅  "84ίϯιʔϧ͔Β࢖ͬͨ͜ͱ͋Δ໊͠લ΋஌ͬͯΔ ͚ͲɺΞϓϦέʔγϣϯʹ૊ΈࠐΉΠϝʔδ͕͋·ΓΘ ͔ͳ͍

  8. Θͨ͠ͷ"84(MVF΍"NB[PO"UIFOBͷୈҰҹ৅  αʔϏεͷ࣮ମ͕Α͘Θ͔ΒΜ 4ʢΩʔόϦϡʔͷʣετϨʔδ %ZOBNP%#ɿ/P42-  Ћ ͷϑϧϚωʔδυσʔλϕʔε "84"QQ4ZODɿ(SBQI2-"1*ͷήʔτ΢ΣΠ 4UFQ'VODUJPOTɿλεΫΛఆٛ͠ঢ়ଶભҠͱͯ͠δϣϒΛ૊Έཱͯ

    "84(MVFɿͳΜ͔ม׵͢Δਓʁ "NB[PO"UIFOBɿ4ͷσʔλʹରͯ͠ΫΤϦൃߦͯ͘͠ΕΔਓʁ
  9. ຊηογϣϯͷ໨త  "84(MVFͱ"NB[PO"UIFOBͷ͜ͱΛ஌ΓɺαʔόʔϨ εΞϓϦέʔγϣϯʹ૊ΈࠐΉΠϝʔδΛ࣋ͬͯؼͬͯ΋Β͏ Ξϓϩʔνɿ "84(MVFͱ"NB[PO"UIFOBͷ͜ͱΛ஌Δ ϑΝΠϧUPϑΝΠϧͷ&5-ϢʔεέʔεΛߟ͑Δ "84$%,ͰϦιʔεΛ࡞੒ͯ͠ΈΔ

  10. Ϋϥϝιͷ࿨ాͰ͢αʔόʔϨε޷͖Ͱ͢  ࿨ా༞հ $9ࣄۀຊ෦ αʔόʔϨεΤϯδχΞ XBEEZ@V ༗ޮظݶ੾Ε

  11. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  12. "84(MVF  ߏ੒ཁૉ͸ɺσʔλΧλϩάͱαʔόʔϨε&5-ɺΦʔ έετϨʔγϣϯͷ̏ͭ ߏ੒ཁૉ Ϧιʔε໊௨শ ʢ$MPVE'PSNBUJPOΑΓʣ σʔλΧλϩά "84(MVF5BCMF αʔόʔϨε&5-

    "84(MVF+PC ΦʔέετϨʔγϣϯ "84(MVF8PSLqPX ࢀߟɿAWS CloudFormation ςϯϓϨʔτΛ࢖༻ͨ͠σʔλΧλϩάͷࣄલઃఆ - https://docs.aws.amazon.com/ja_jp/glue/latest/dg/populate-with-cloudformation-templates.html
  13. "84(MVF  ߏ੒ཁૉ͸ɺσʔλΧλϩάͱαʔόʔϨε&5-ɺΦʔ έετϨʔγϣϯͷ̏ͭ ߏ੒ཁૉ Ϧιʔε໊௨শ ʢ$MPVE'PSNBUJPOΑΓʣ σʔλΧλϩά "84(MVF5BCMF αʔόʔϨε&5-

    "84(MVF+PC ΦʔέετϨʔγϣϯ "84(MVF8PSLqPX ࢀߟɿAWS CloudFormation ςϯϓϨʔτΛ࢖༻ͨ͠σʔλΧλϩάͷࣄલઃఆ - https://docs.aws.amazon.com/ja_jp/glue/latest/dg/populate-with-cloudformation-templates.html
  14.  Glue Table

  15. "84(MVF5BCMF  "QBDIF)JWFޓ׵ͷϝλετΞ ΋͏গ͠Θ͔Γ΍͘͢ɿεΩʔϚఆ͚ٛͩΛ(MVFʹ΋ͨͤΒΕΔ σʔλͷ࣮ମ͸ผʹ࣋ͨͤɺ(MVFͱͯ͠͸ఆ͚͓ٛͩ࣋ͬͯ͘ (MVF5BCMF͸͔͋ͨ΋εΩʔϚఆٛΛ΋ͬͨςʔϒϧͷΑ͏ʹѻ͑ Δ ྫɿ"NB[PO"UIFOBͰΫΤϦͨ͠Γɺ(MVF+PCͰΧϥϜ໊͔Β σʔλΛͱ͖ͬͯͨΓͰ͖Δ

  16. "84(MVF5BCMF  σʔλͷ࣮ମ 4ͳͲ  εΩʔϚఆٛͳͲͷϝλσʔ λ 20190806 AWS Black

    Belt Online Seminar AWS Glue https://www.slideshare.net/AmazonWebServicesJapan/20190806-aws-black-belt-online-seminar-aws-glue?ref=https://aws.amazon.com/jp/blogs/news/webinar-bb-aws-glue-2019/
  17.  Glue Job

  18. "84(MVF+PC  &5-ʢσʔλม׵ॲཧʣͷຊମ σʔλΧλϩάʢ(MVF5BCMF ͔ΒσʔλΛநग़͠ɺϩδο ΫΛద༻ͯ͠ɺผͷσʔλετΞʢλʔήοτʣʹग़ྗ͢Δ

  19. "84(MVF+PC  εΫϦϓτ͸1ZUIPO4IFMMͱ4QBSLͰબ୒ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  20.  Glue Workflow

  21. "84(MVF8PSLqPX  ΫϩʔϥͱδϣϒΛ࢖ͬͯҰ࿈ͷॲཧΛఆٛͰ͖Δ wਐߦঢ়گΛ֬ೝͰ͖Δ wෳ਺ͷδϣϒΛ૊Έ߹Θ ͤΒΕΔ w4UFQ'VODUJPOTͱࣅͯ ͍Δ͕ɺͪ͜Β͸(MVF ͷΫϩʔϥ (MVF+PCͷ

    ΈࢦఆՄೳ
  22.  Amazon Athena ͱͷؔ܎

  23. None
  24. None
  25. ެࣜτοϓʹ΋͏ॻ͍ͯΔ

  26. "84(MVF5BCMFͱ"NB[PO"UIFOB  4σʔλΧλϩάʹରͯ͠1SFTUPΫΤϦΛൃߦ

  27. "84(MVF5BCMFͱ"NB[PO"UIFOB  4σʔλΧλϩάʹରͯ͠1SFTUPΫΤϦΛൃߦ

  28. ͳΜͱͳ͘Πϝʔδ͕༙͍ͨͰ͠ΐ͏͔  αʔϏεͷ࣮ମʹ࠶νϟϨϯδ "84(MVFɿ(MVF5BCMF (MVF+PC (MVF8PSLqPXʹ෼ղ Ͱ͖ɺͦΕͧΕࣗ༝ʹ૊Έ߹ΘͤΒΕΔ&5-αʔϏε "NB[PO"UIFOBɿ4Λσʔλιʔεͱ͢Δ(MVF5BCMFʹ ରͯ͠1SFTUPΫΤϦΛൃߦ͠ɺ݁ՌΛ4ʹग़ྗ͢ΔαʔϏ ε

  29. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  30. ϑΝΠϧΛѻ͍ͦ͏ͳγνϡΤʔγϣϯ  ՍۭͷΠϕϯτ؅ཧαʔϏε (PPHMF'PSNͳͲͰηογϣϯΛެื͢Δʢ*%ϝΞυʣ ಉ࣌ʹɺผͷ࢓૊ΈͰࢀՃνέοτΛൃߦ͢Δʢ*%ϝΞυʣ ͜ΕΒ͸$47ग़ྗՄೳ νέοτ*%Λࢦఆ͢Δͱɺηογϣϯ৘ใΛऔಘͰ͖ΔΑ͏ͳ "1*Λ࡞Γ͍ͨ

  31. ϑΝΠϧΛѻ͍ͦ͏ͳγνϡΤʔγϣϯ  ՍۭͷΠϕϯτ؅ཧαʔϏε (PPHMF'PSNͳͲͰηογϣϯΛެื͢Δʢ*%ϝΞυʣ ಉ࣌ʹɺผͷ࢓૊ΈͰࢀՃνέοτΛൃߦ͢Δʢ*%ϝΞυʣ ͜ΕΒ͸$47ग़ྗՄೳ νέοτ*%Λࢦఆ͢Δͱɺηογϣϯ৘ใΛऔಘͰ͖ΔΑ͏ͳ "1*Λ࡞Γ͍ͨ

  32. ͳͥʁ  ෳ਺ͷαʔϏεͷ৘ใ͔ΒͻͱͭͷαʔϏεͷόοΫΤϯυ ʹݟ͔͚ͤΔ࢓૊Έ͸ࠓޙ૿͑ͦ͏ ศརͳαʔϏε͕ͨ͘͞Μ ͲΕ͔ҰͭͷαʔϏεΛ࢖͍ͬͯΕ͹ਓੜ͕׬݁͢Δͱ͍͏͜ͱ͸ গͳ͘ͳ͖ͬͯͦ͏ େ఍ͷαʔϏεͰ͸ɺσʔλΛ$47ͱͯ͠ग़ྗՄೳ 8FCIPPL &WFOU#SJEHFʜ͍Ζ͍Ζͳ࿈ܞํ๏͕͋Δ͕ɺϑΝΠϧ

    Λೖྗͱͯ͠&5-͢Δधཁ΋ɺࠓޙ૿͑ͦ͏
  33. ηογϣϯ৘ใͷެืͱ$47ग़ྗ  .csv

  34. ࢀՃऀνέοτͷ$47ग़ྗ  λΠϜελϯϓ ໊͓લ ϝʔϧΞυϨε ॴଐ૊৫ ࢀՃ೔ ޕޙ(.5  ࿨ా༞հ

    XBEBZVTVLF !DMBTTNFUIPEKQ Ϋϥεϝιουגࣜձࣾ ೔ ޕޙ(.5  ৽Ҫ੣Ұ XBEBZVTVLF !DMBTTNFUIPEKQ Ϋϥεϝιουגࣜձࣾ ೔ ࢀՃ೔Λ͓બͼ͍ͩ͘͞ɻ νέοτ*% Ϣʔβʔ*% ೔໨೔໨೔໨ BFBCFCBEFCEFG GEEBBDBFCBC ೔໨೔໨೔໨ FGFBBCFBBCF ECED
  35. νέοτ*%͔Βηογϣϯ৘ใΛҾ͘  (&5IUUQTBQJNFFUVQQFSDPNTFTTJPO FWFOU*E UJDLFU*EBFBCFCBEFCEFG { "name": "࿨ా ༞հ", "title":

    "αʔόʔϨε x ϑΝΠϧɿAmazon Athena ͱ Step Functions Λ૊Έ߹ΘͤͯϑΝΠϧΛ݁߹͢Δ", "description": "αʔόʔϨεͰϑΝΠϧΛѻ͏γʔϯ͕૿͖͑ͯ·ͨ͠ɻAWSͰอଘ͓ͯ͘͠৔ॴͱͯ͠S3͸༗ྗͰ͕͢ɺΞ ϓϦέʔγϣϯͰར༻Ͱ͖Δܗࣜʹม׵͢Δํ๏ͱͯ͠ɺଟ͘ͷબ୒ࢶ͕͋Γ·͢ɻ\n͜ͷηογϣϯͰ͸ɺʮผʑͷαʔϏε Ͱൃߦ͞ΕͨCSVϑΝΠϧΛJOIN͢Δʯͱ͍͏Ϣʔεέʔεʹର͠ɺAmazonAthenaΛ࢖ͬͨΞϓϩʔνΛ঺հ͠·͢ɻ͞Β ʹɺҰ࿈ͷॲཧΛStepFunctionsʹ·ͱΊΔ͜ͱͰɺδϣϒͱͯ͠ఆٛ͢Δํ๏Λࣔ͠·͢ɻ" }
  36. ཁ݅Λ֬ೝ  ࠷ऴతʹ͸νέοτ*%͔Βɺηογϣϯσʔλ͕Ҿͬ ுͬͯ͜ΒΕΕ͹ྑ͍ ืूظؒऴྃޙ͸$47ϑΝΠϧͷมߋ͸΄΅ͳ͍ͷͰɺϦΞϧ λΠϜʹมߋΛ൓ө͢Δඞཁ͸ͳ͍ ͨͩ͠؅ཧը໘ͳͲ͔Β$47ϑΝΠϧΛΞοϓϩʔυ͢Δ͜ͱ ͰมߋΛ൓өͰ͖ΔΑ͏ʹͯ͠΄͍͠ɻ͍͍ͩͨ෼͘Β͍Ͱ Πϕϯτ͝ͱʹืू͢ΔͷͰɺΠϕϯτऴྃޙ͸ΞʔΧΠϒՄೳ

  37. Ͳ͏࣮ݱ͢Δ͔  3%#Ͱ͋Ε͹ͦΕͧΕͷςʔϒϧʹσʔλΛೖΕͯϦ ΫΤετ࣌ʹ+0*/͢Ε͹ྑ͍͕ʜ "84αʔόʔϨεͩͱҎԼύλʔϯ͍ͣΕ͔

  38. Ͳ͏࣮ݱ͢Δ͔  3%#Ͱ͋Ε͹ͦΕͧΕͷςʔϒϧʹσʔλΛೖΕͯϦ ΫΤετ࣌ʹ+0*/͢Ε͹ྑ͍͕ʜ "84αʔόʔϨεͩͱҎԼύλʔϯ͍ͣΕ͔ ͜ΕΛ࢖͏ ɾϦΞϧλΠϜੑ͸ཁٻ͞Εͳ͍ ɾݹ͍ΠϕϯτͷΦϒδΣΫτ͸(MBDJFSߦ͖ ɾಡΈग़࣌͠ͷϩδοΫ͸࠷খݶʹ͍ͨ͠

  39.  ΰʔϧɿJOINࡁΈͷS3 ΦϒδΣΫτΛ࡞Δ S3 to S3 ͷϑΝΠϧ ETL

  40. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢ 

  41. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢ 

  42. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢  ɾσʔλΧλϩάΛ"NB[PO"UIFOBͰΫΤϦ͢Δ ɾJOTFSUJOUPͰ+0*/͢Δ ɾҰ࿈ͷྲྀΕΛ4UFQ'VODUJPOTͰ૊Ή ʜͱ͍͏ϧʔτ΋͋Δ

  43. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  ͜ͷن໛ͩͱ(MVF+PCͷىಈ࣌ؒͱ՝ֹ͕ۚωοΫ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  44. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  ͜ͷن໛ͩͱ(MVF+PCͷىಈ࣌ؒͱ՝ֹ͕ۚωοΫ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  45. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  (MVF5BCMFͷΫΤϦ͸"UIFOBʹ೚࣮ͤͯߦ؀ڥΛ -BNCEB'VODUJPOʹ͢Δͱ͍͏ख "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ

    "84-BNCEB ϝϞϦ (#PS(# (# .# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ࠷େ෼ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ Ϧʔδϣϯຖʹ ՝ֹۚ ෼ʹ੾Γ্͛ ෼ʹ੾Γ্͛ ϛϦඵʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ਺ϛϦඵ਺ඵ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB αϙʔτ͍ͯ͠ΔϥϯλΠϜ ΧελϜϥϯλΠϜ AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  46. ΞʔΩςΫνϟ֓؍  ※࣮ࡍʹ͸ S3 ΦϒδΣΫτͷURLͳͲΛอଘ͓ͯ͘͠ DynamoDB ͕ඞཁ

  47. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  48. "84$%,Λ࢖ͬͯ૊ΈཱͯΔྲྀΕͷҰ෦Λ঺հ  $%,ʹ͍ͭͯ͸ΫϥϝιͷΠϕϯτͰ΋ൃදͯ͠·͢ ʲొஃࢿྉʳAWS CDK Λ࢖ͬͨ αʔόʔϨεΞϓϦέʔγϣϯͷσϓϩΠํ๏ͱ࣮૷ྫΛ঺հ͠·ͨ͠ – Developes.IO 2020

    CONNECT #devio2020 | Developers.IO https://dev.classmethod.jp/articles/production-deploy-serverless-aws-cdk/
  49. "84$%,*OGSBTUSVDUVSFBT$PEF  5ZQF4DSJQUͳͲͷίʔυ͔Β$MPVE'PSNBUJPOς ϯϓϨʔτΛग़ྗɺσϓϩΠՄೳ What is the AWS CDK? -

    AWS Cloud Development Kit (AWS CDK) https://docs.aws.amazon.com/cdk/latest/guide/home.html
  50. $%,αϯϓϧɿ4 -BNCEB'VODUJPO 

  51. "UIFOBͬͯ$47Ͱ΋ΫΤϦͰ͖ΔΜ͡Όʁ  ࠓճͷηογϣϯ֓ཁͷΑ͏ͳ$47σʔλ಺ʹվߦΛؚΉ σʔλΛ͏·͘ύʔεͰ͖ͳ͔ͬͨ ଞʹ΋೔ຊޠͷΧϥϜ໊ͩͱύʔςΟγϣϯ͕੾Εͳ͍ࣄ৅Λ֬ೝ $47ͷจࣈίʔυ͕"UIFOBͰѻ͑ͳ͍4+*4ͩͬͨΓ͢Δ $47ੜσʔλΛͦͷ··+0*/͢ΔͷͰ͸ͳ͘ɺҰ౓+40/-JOFT ʹม׵͢ΔͳͲͨ͠΄͏͕Αͦ͞͏ ͜͏͍ͬͨࡉ͔͍ॲཧ͕Ͱ͖Δͷ΋-BNCEB'VODUJPOͷڧΈͷͻ ͱͭ

  52. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn);
  53. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w $47ੜσʔλΛ౤ೖ͢Δ4 όέοτ w QSFTJHOFEVSM͔ΒΞοϓ ϩʔυͯ͠΋Β͏ͷͰ$034 ઃఆ
  54. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w +40/-JOFTΛอଘ͓ͯ͘͠ όέοτ
  55. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w ੜσʔλόέοτ͔Βϩʔυ ͠ɺม׵͠ɺ+40/-JOFTό έοτʹ౤ೖ͢Δ-BNCEB 'VODUJPO
  56. $%,αϯϓϧɿ(MVF5BCMF "NB[PO"UIFOB 

  57. $%,αϯϓϧɿ(MVF5BCMF "NB[PO"UIFOB  ɾ4σʔλιʔε (MVF5BCMF ɾ-BNCEB'VODUJPO "NB[PO"UIFOB

  58. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], });
  59. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w (MVF%BUBCBTFͱ(MVF 5BCMFΛ࡞੒
  60. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w σʔλιʔεͱͳΔ4ό έοτΛࢦఆ
  61. $%,αϯϓϧɿ4 (MVF5BCMF  const ticketGlueTable = new glue.Table(stack, 'TicketTable', {

    database: slsVirtualDataBase, tableName: 'ticket', columns: [ { name: 'email', type: glue.Schema.STRING, }, { name: 'ticketId', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'ticket/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w νέοτ༻ͷ(MVF5BCMF΋ ಉ͡Α͏ʹ࡞੒
  62. $%,αϯϓϧɿ4 (MVF5BCMF  const entryTable = new glue.Table(stack, 'EntryTable', {

    database: slsVirtualDataBase, tableName: 'entry', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'ticketId', type: glue.Schema.STRING, }, ], dataFormat: new glue.DataFormat({ inputFormat: glue.InputFormat.TEXT, outputFormat: glue.OutputFormat.HIVE_IGNORE_KEY_TEXT, serializationLibrary: glue.SerializationLibrary.HIVE_JSON, }), bucket: s3Resource.entryBucket, s3Prefix: 'entry/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, ], w ࠷ޙʹɺ"UIFOBͰ+0*/͠ ͨσʔλΛ౤ೖ͢Δ༻ͷ (MVF5BCMF΋༻ҙ͓ͯ͘͠
  63. +0*/͢Δ-BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  export async function handler( event: JoinRequest, context?: LambdaContext,

    ): Promise<JoinResponse> { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } }
  64. export async function handler( event: JoinRequest, context?: LambdaContext, ): Promise<JoinResponse>

    { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } } -BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  w ϝΞυͰ+0*/ w +0*/ޙͷσʔλΛ&OUSZ ςʔϒϧʹ౤ೖ
  65. export async function handler( event: JoinRequest, context?: LambdaContext, ): Promise<JoinResponse>

    { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } } -BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  w "UIFOBΫΤϦΛ࣮ߦ
  66. ίʔυαϯϓϧɿ44FMFDU 

  67. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ 
  68. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ  w TFMFDU GSPNTPCKFDUPCKXIFSFPCKUJDLFUJE b`
  69. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ  w 44FMFDU͸݁ՌΛಠࣗͷ/PEFKTετϦʔϜͰฦ͢ͷͰ ͕Μ͹ͬͯऔಘ͍ͯ͠Δ༷ࢠ
  70. ίʔυαϯϓϧͷ·ͱΊ  -BNCEB'VODUJPOΛ࢖ͬͨ&5-Ͱ"84$%,Λ࢖ ͍૊ΈཱͯΔαϯϓϧΛࣔͨ͠ "84ߏ੒ਤ͔Β$%,ίʔυʹམͱͨ͢Ίʹ͸ɺ֤αʔϏεͷ ࣮ମΛ஌Δ͜ͱ͕ۙಓ ཁ݅ʹΑͬͯϚον͢ΔαʔϏε͸ҟͳΔɻ߹க͢ΔαʔϏε ͕ͳʹ͔ɺࣾ಺ͰɺαʔόʔϨείϛϡχςΟͰٞ࿦͠·͠ΐ ͏

  71. ·ͱΊ  ϑΝΠϧΛ+0*/͢ΔϢʔεέʔεʹରͯ͠ɺαʔόʔϨ εͰϥΠτͳ&5-Λߦͬͨ ෳ਺ͷαʔϏεͷσʔλΛ૊Έ߹Θͤͯɺͻͱͭͷ݁ՌΛಘ Δͱ͍͏γνϡΤʔγϣϯʹૺ۰ͨ͠ͱ͖ʹɺࠓճͷΑ͏ʹ ͍ͬͨΜ4ϑΝΠϧ΁ग़ྗɺͦͷޙ"UIFOBͰ+0*/ͯ͠4 4FMFDUͰΫΤϦ͢Δͱ͍͏ྲྀΕ͕࢖͑Δ͜ͱΛࢥ͍ग़ͯ͠΋Β ͑Ε͹޾͍Ͱ͢

  72. ࢀߟॻ੶ɾར༻ͨ͠πʔϧ  w DODGXHTFSWFSMFTT$/$'4FSWFSMFTT8(IUUQTHJUIVCDPNDODG XHTFSWFSMFTT w "84$%,IUUQTHJUIVCDPNBXTBXTDEL w DNXBEBZVTVLFUTBTDELIUUQTHJUIVCDPNDNXBEBZVTVLF UTBTDEL