Step Functions と Lambda Function で組む Athena によるファイル処理 / serverless lightweight etl

Step Functions と Lambda Function で組む Athena によるファイル処理 / serverless lightweight etl

本セッションではイベント管理サービスを例にとり、セッション情報のCSVファイルとチケット情報のCSVファイルからひとつのファイルに変換するETLを考えていきます。AWS Glue と Amazon Athena のことを知り、サーバーレスアプリケーションに組み込むイメージを持って帰ってもらうことが目的です。

アプローチ:
・AWS Glue と Amazon Athena のことを知る
・ファイル to ファイル のETLユースケースを考える
・AWS CDK でリソースを作成してみる

664b6e8ebe272fcfa5dbd6070eaf3cd4?s=128

Yusuke Wada

July 29, 2020
Tweet

Transcript

  1. AWS Serverless ͱϑΝΠϧ Step Functions ͱ Lambda Function Ͱ૊Ή Athena

    ʹΑΔϑΝΠϧॲཧ #serverlessjp
  2. ɹαʔόʔϨεͰϑΝΠϧ to ϑΝΠϧͷETL ࠓ೔ͷ࿩

  3. None
  4. αʔόʔϨεͰETLͱ͍͑͹ AWS Glue ?

  5. None
  6. Amazon Athena Ͱσʔλநग़ʁ

  7. Θͨ͠ͷ"84(MVF΍"NB[PO"UIFOBͷୈҰҹ৅  "84ίϯιʔϧ͔Β࢖ͬͨ͜ͱ͋Δ໊͠લ΋஌ͬͯΔ ͚ͲɺΞϓϦέʔγϣϯʹ૊ΈࠐΉΠϝʔδ͕͋·ΓΘ ͔ͳ͍

  8. Θͨ͠ͷ"84(MVF΍"NB[PO"UIFOBͷୈҰҹ৅  αʔϏεͷ࣮ମ͕Α͘Θ͔ΒΜ 4ʢΩʔόϦϡʔͷʣετϨʔδ %ZOBNP%#ɿ/P42-  Ћ ͷϑϧϚωʔδυσʔλϕʔε "84"QQ4ZODɿ(SBQI2-"1*ͷήʔτ΢ΣΠ 4UFQ'VODUJPOTɿλεΫΛఆٛ͠ঢ়ଶભҠͱͯ͠δϣϒΛ૊Έཱͯ

    "84(MVFɿͳΜ͔ม׵͢Δਓʁ "NB[PO"UIFOBɿ4ͷσʔλʹରͯ͠ΫΤϦൃߦͯ͘͠ΕΔਓʁ
  9. ຊηογϣϯͷ໨త  "84(MVFͱ"NB[PO"UIFOBͷ͜ͱΛ஌ΓɺαʔόʔϨ εΞϓϦέʔγϣϯʹ૊ΈࠐΉΠϝʔδΛ࣋ͬͯؼͬͯ΋Β͏ Ξϓϩʔνɿ "84(MVFͱ"NB[PO"UIFOBͷ͜ͱΛ஌Δ ϑΝΠϧUPϑΝΠϧͷ&5-ϢʔεέʔεΛߟ͑Δ "84$%,ͰϦιʔεΛ࡞੒ͯ͠ΈΔ

  10. Ϋϥϝιͷ࿨ాͰ͢αʔόʔϨε޷͖Ͱ͢  ࿨ా༞հ $9ࣄۀຊ෦ αʔόʔϨεΤϯδχΞ XBEEZ@V ༗ޮظݶ੾Ε

  11. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  12. "84(MVF  ߏ੒ཁૉ͸ɺσʔλΧλϩάͱαʔόʔϨε&5-ɺΦʔ έετϨʔγϣϯͷ̏ͭ ߏ੒ཁૉ Ϧιʔε໊௨শ ʢ$MPVE'PSNBUJPOΑΓʣ σʔλΧλϩά "84(MVF5BCMF αʔόʔϨε&5-

    "84(MVF+PC ΦʔέετϨʔγϣϯ "84(MVF8PSLqPX ࢀߟɿAWS CloudFormation ςϯϓϨʔτΛ࢖༻ͨ͠σʔλΧλϩάͷࣄલઃఆ - https://docs.aws.amazon.com/ja_jp/glue/latest/dg/populate-with-cloudformation-templates.html
  13. "84(MVF  ߏ੒ཁૉ͸ɺσʔλΧλϩάͱαʔόʔϨε&5-ɺΦʔ έετϨʔγϣϯͷ̏ͭ ߏ੒ཁૉ Ϧιʔε໊௨শ ʢ$MPVE'PSNBUJPOΑΓʣ σʔλΧλϩά "84(MVF5BCMF αʔόʔϨε&5-

    "84(MVF+PC ΦʔέετϨʔγϣϯ "84(MVF8PSLqPX ࢀߟɿAWS CloudFormation ςϯϓϨʔτΛ࢖༻ͨ͠σʔλΧλϩάͷࣄલઃఆ - https://docs.aws.amazon.com/ja_jp/glue/latest/dg/populate-with-cloudformation-templates.html
  14.  Glue Table

  15. "84(MVF5BCMF  "QBDIF)JWFޓ׵ͷϝλετΞ ΋͏গ͠Θ͔Γ΍͘͢ɿεΩʔϚఆ͚ٛͩΛ(MVFʹ΋ͨͤΒΕΔ σʔλͷ࣮ମ͸ผʹ࣋ͨͤɺ(MVFͱͯ͠͸ఆ͚͓ٛͩ࣋ͬͯ͘ (MVF5BCMF͸͔͋ͨ΋εΩʔϚఆٛΛ΋ͬͨςʔϒϧͷΑ͏ʹѻ͑ Δ ྫɿ"NB[PO"UIFOBͰΫΤϦͨ͠Γɺ(MVF+PCͰΧϥϜ໊͔Β σʔλΛͱ͖ͬͯͨΓͰ͖Δ

  16. "84(MVF5BCMF  σʔλͷ࣮ମ 4ͳͲ  εΩʔϚఆٛͳͲͷϝλσʔ λ 20190806 AWS Black

    Belt Online Seminar AWS Glue https://www.slideshare.net/AmazonWebServicesJapan/20190806-aws-black-belt-online-seminar-aws-glue?ref=https://aws.amazon.com/jp/blogs/news/webinar-bb-aws-glue-2019/
  17.  Glue Job

  18. "84(MVF+PC  &5-ʢσʔλม׵ॲཧʣͷຊମ σʔλΧλϩάʢ(MVF5BCMF ͔ΒσʔλΛநग़͠ɺϩδο ΫΛద༻ͯ͠ɺผͷσʔλετΞʢλʔήοτʣʹग़ྗ͢Δ

  19. "84(MVF+PC  εΫϦϓτ͸1ZUIPO4IFMMͱ4QBSLͰબ୒ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  20.  Glue Workflow

  21. "84(MVF8PSLqPX  ΫϩʔϥͱδϣϒΛ࢖ͬͯҰ࿈ͷॲཧΛఆٛͰ͖Δ wਐߦঢ়گΛ֬ೝͰ͖Δ wෳ਺ͷδϣϒΛ૊Έ߹Θ ͤΒΕΔ w4UFQ'VODUJPOTͱࣅͯ ͍Δ͕ɺͪ͜Β͸(MVF ͷΫϩʔϥ (MVF+PCͷ

    ΈࢦఆՄೳ
  22.  Amazon Athena ͱͷؔ܎

  23. None
  24. None
  25. ެࣜτοϓʹ΋͏ॻ͍ͯΔ

  26. "84(MVF5BCMFͱ"NB[PO"UIFOB  4σʔλΧλϩάʹରͯ͠1SFTUPΫΤϦΛൃߦ

  27. "84(MVF5BCMFͱ"NB[PO"UIFOB  4σʔλΧλϩάʹରͯ͠1SFTUPΫΤϦΛൃߦ

  28. ͳΜͱͳ͘Πϝʔδ͕༙͍ͨͰ͠ΐ͏͔  αʔϏεͷ࣮ମʹ࠶νϟϨϯδ "84(MVFɿ(MVF5BCMF (MVF+PC (MVF8PSLqPXʹ෼ղ Ͱ͖ɺͦΕͧΕࣗ༝ʹ૊Έ߹ΘͤΒΕΔ&5-αʔϏε "NB[PO"UIFOBɿ4Λσʔλιʔεͱ͢Δ(MVF5BCMFʹ ରͯ͠1SFTUPΫΤϦΛൃߦ͠ɺ݁ՌΛ4ʹग़ྗ͢ΔαʔϏ ε

  29. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  30. ϑΝΠϧΛѻ͍ͦ͏ͳγνϡΤʔγϣϯ  ՍۭͷΠϕϯτ؅ཧαʔϏε (PPHMF'PSNͳͲͰηογϣϯΛެื͢Δʢ*%ϝΞυʣ ಉ࣌ʹɺผͷ࢓૊ΈͰࢀՃνέοτΛൃߦ͢Δʢ*%ϝΞυʣ ͜ΕΒ͸$47ग़ྗՄೳ νέοτ*%Λࢦఆ͢Δͱɺηογϣϯ৘ใΛऔಘͰ͖ΔΑ͏ͳ "1*Λ࡞Γ͍ͨ

  31. ϑΝΠϧΛѻ͍ͦ͏ͳγνϡΤʔγϣϯ  ՍۭͷΠϕϯτ؅ཧαʔϏε (PPHMF'PSNͳͲͰηογϣϯΛެื͢Δʢ*%ϝΞυʣ ಉ࣌ʹɺผͷ࢓૊ΈͰࢀՃνέοτΛൃߦ͢Δʢ*%ϝΞυʣ ͜ΕΒ͸$47ग़ྗՄೳ νέοτ*%Λࢦఆ͢Δͱɺηογϣϯ৘ใΛऔಘͰ͖ΔΑ͏ͳ "1*Λ࡞Γ͍ͨ

  32. ͳͥʁ  ෳ਺ͷαʔϏεͷ৘ใ͔ΒͻͱͭͷαʔϏεͷόοΫΤϯυ ʹݟ͔͚ͤΔ࢓૊Έ͸ࠓޙ૿͑ͦ͏ ศརͳαʔϏε͕ͨ͘͞Μ ͲΕ͔ҰͭͷαʔϏεΛ࢖͍ͬͯΕ͹ਓੜ͕׬݁͢Δͱ͍͏͜ͱ͸ গͳ͘ͳ͖ͬͯͦ͏ େ఍ͷαʔϏεͰ͸ɺσʔλΛ$47ͱͯ͠ग़ྗՄೳ 8FCIPPL &WFOU#SJEHFʜ͍Ζ͍Ζͳ࿈ܞํ๏͕͋Δ͕ɺϑΝΠϧ

    Λೖྗͱͯ͠&5-͢Δधཁ΋ɺࠓޙ૿͑ͦ͏
  33. ηογϣϯ৘ใͷެืͱ$47ग़ྗ  .csv

  34. ࢀՃऀνέοτͷ$47ग़ྗ  λΠϜελϯϓ ໊͓લ ϝʔϧΞυϨε ॴଐ૊৫ ࢀՃ೔ ޕޙ(.5  ࿨ా༞հ

    XBEBZVTVLF !DMBTTNFUIPEKQ Ϋϥεϝιουגࣜձࣾ ೔ ޕޙ(.5  ৽Ҫ੣Ұ XBEBZVTVLF !DMBTTNFUIPEKQ Ϋϥεϝιουגࣜձࣾ ೔ ࢀՃ೔Λ͓બͼ͍ͩ͘͞ɻ νέοτ*% Ϣʔβʔ*% ೔໨೔໨೔໨ BFBCFCBEFCEFG GEEBBDBFCBC ೔໨೔໨೔໨ FGFBBCFBBCF ECED
  35. νέοτ*%͔Βηογϣϯ৘ใΛҾ͘  (&5IUUQTBQJNFFUVQQFSDPNTFTTJPO FWFOU*E UJDLFU*EBFBCFCBEFCEFG { "name": "࿨ా ༞հ", "title":

    "αʔόʔϨε x ϑΝΠϧɿAmazon Athena ͱ Step Functions Λ૊Έ߹ΘͤͯϑΝΠϧΛ݁߹͢Δ", "description": "αʔόʔϨεͰϑΝΠϧΛѻ͏γʔϯ͕૿͖͑ͯ·ͨ͠ɻAWSͰอଘ͓ͯ͘͠৔ॴͱͯ͠S3͸༗ྗͰ͕͢ɺΞ ϓϦέʔγϣϯͰར༻Ͱ͖Δܗࣜʹม׵͢Δํ๏ͱͯ͠ɺଟ͘ͷબ୒ࢶ͕͋Γ·͢ɻ\n͜ͷηογϣϯͰ͸ɺʮผʑͷαʔϏε Ͱൃߦ͞ΕͨCSVϑΝΠϧΛJOIN͢Δʯͱ͍͏Ϣʔεέʔεʹର͠ɺAmazonAthenaΛ࢖ͬͨΞϓϩʔνΛ঺հ͠·͢ɻ͞Β ʹɺҰ࿈ͷॲཧΛStepFunctionsʹ·ͱΊΔ͜ͱͰɺδϣϒͱͯ͠ఆٛ͢Δํ๏Λࣔ͠·͢ɻ" }
  36. ཁ݅Λ֬ೝ  ࠷ऴతʹ͸νέοτ*%͔Βɺηογϣϯσʔλ͕Ҿͬ ுͬͯ͜ΒΕΕ͹ྑ͍ ืूظؒऴྃޙ͸$47ϑΝΠϧͷมߋ͸΄΅ͳ͍ͷͰɺϦΞϧ λΠϜʹมߋΛ൓ө͢Δඞཁ͸ͳ͍ ͨͩ͠؅ཧը໘ͳͲ͔Β$47ϑΝΠϧΛΞοϓϩʔυ͢Δ͜ͱ ͰมߋΛ൓өͰ͖ΔΑ͏ʹͯ͠΄͍͠ɻ͍͍ͩͨ෼͘Β͍Ͱ Πϕϯτ͝ͱʹืू͢ΔͷͰɺΠϕϯτऴྃޙ͸ΞʔΧΠϒՄೳ

  37. Ͳ͏࣮ݱ͢Δ͔  3%#Ͱ͋Ε͹ͦΕͧΕͷςʔϒϧʹσʔλΛೖΕͯϦ ΫΤετ࣌ʹ+0*/͢Ε͹ྑ͍͕ʜ "84αʔόʔϨεͩͱҎԼύλʔϯ͍ͣΕ͔

  38. Ͳ͏࣮ݱ͢Δ͔  3%#Ͱ͋Ε͹ͦΕͧΕͷςʔϒϧʹσʔλΛೖΕͯϦ ΫΤετ࣌ʹ+0*/͢Ε͹ྑ͍͕ʜ "84αʔόʔϨεͩͱҎԼύλʔϯ͍ͣΕ͔ ͜ΕΛ࢖͏ ɾϦΞϧλΠϜੑ͸ཁٻ͞Εͳ͍ ɾݹ͍ΠϕϯτͷΦϒδΣΫτ͸(MBDJFSߦ͖ ɾಡΈग़࣌͠ͷϩδοΫ͸࠷খݶʹ͍ͨ͠

  39.  ΰʔϧɿJOINࡁΈͷS3 ΦϒδΣΫτΛ࡞Δ S3 to S3 ͷϑΝΠϧ ETL

  40. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢ 

  41. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢ 

  42. "84(MVFʹΑΔ&5-Λࢥ͍ग़͢  ɾσʔλΧλϩάΛ"NB[PO"UIFOBͰΫΤϦ͢Δ ɾJOTFSUJOUPͰ+0*/͢Δ ɾҰ࿈ͷྲྀΕΛ4UFQ'VODUJPOTͰ૊Ή ʜͱ͍͏ϧʔτ΋͋Δ

  43. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  ͜ͷن໛ͩͱ(MVF+PCͷىಈ࣌ؒͱ՝ֹ͕ۚωοΫ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  44. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  ͜ͷن໛ͩͱ(MVF+PCͷىಈ࣌ؒͱ՝ֹ͕ۚωοΫ "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ ϝϞϦ

    (#PS(# (# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ ՝ֹۚ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ੑೳͱ࣮ߦ࣌ؒϕʔεͷैྔ՝ۚ ͨͩ͠গͳ͘ͱ΋෼ʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  45. ͳͥผϧʔτΛݕ౼͢Δͷ͔ʁ  (MVF5BCMFͷΫΤϦ͸"UIFOBʹ೚࣮ͤͯߦ؀ڥΛ -BNCEB'VODUJPOʹ͢Δͱ͍͏ख "84(MVF+PC 1ZUIPO4IFMM "84(MVF+PC 4QBSL  ʢ$MPVE'PSNBUJPOΑΓʣ

    "84-BNCEB ϝϞϦ (#PS(# (# .# ࣮ߦ࣌ؒ σϑΥϧτ࣌ؒ σϑΥϧτ࣌ؒ ࠷େ෼ ಉ࣮࣌ߦ਺ ΞΧ΢ϯτ಺Ͱ ΞΧ΢ϯτ಺Ͱ Ϧʔδϣϯຖʹ ՝ֹۚ ෼ʹ੾Γ্͛ ෼ʹ੾Γ্͛ ϛϦඵʹ੾Γ্͛ ىಈ࣌ؒ ਺ेඵ਺෼ ਺ेඵ਺෼ ਺ϛϦඵ਺ඵ ݴޠ 1ZUIPO 1ZUIPOPS4DBMB αϙʔτ͍ͯ͠ΔϥϯλΠϜ ΧελϜϥϯλΠϜ AWS GlueΛ࢖ͬͨ Serverless ETL ͷ࣮૷ύλʔϯ - Speaker Deck https://speakerdeck.com/seiichi1101/aws-gluewoshi-tuta-serverless-etl-falseshi-zhuang-patan-ac5553a5-f8e9-45fb-b093-4bcf0b113a48?slide=41
  46. ΞʔΩςΫνϟ֓؍  ※࣮ࡍʹ͸ S3 ΦϒδΣΫτͷURLͳͲΛอଘ͓ͯ͘͠ DynamoDB ͕ඞཁ

  47. $POUFOUT  "84(MVFͱ"NB[PO"UIFOBͷؔ܎ αϯϓϧϢʔεέʔεɿΠϕϯτ؅ཧαʔϏε "84$%,ʹΑΔ&5-δϣϒ૊Έཱͯ

  48. "84$%,Λ࢖ͬͯ૊ΈཱͯΔྲྀΕͷҰ෦Λ঺հ  $%,ʹ͍ͭͯ͸ΫϥϝιͷΠϕϯτͰ΋ൃදͯ͠·͢ ʲొஃࢿྉʳAWS CDK Λ࢖ͬͨ αʔόʔϨεΞϓϦέʔγϣϯͷσϓϩΠํ๏ͱ࣮૷ྫΛ঺հ͠·ͨ͠ – Developes.IO 2020

    CONNECT #devio2020 | Developers.IO https://dev.classmethod.jp/articles/production-deploy-serverless-aws-cdk/
  49. "84$%,*OGSBTUSVDUVSFBT$PEF  5ZQF4DSJQUͳͲͷίʔυ͔Β$MPVE'PSNBUJPOς ϯϓϨʔτΛग़ྗɺσϓϩΠՄೳ What is the AWS CDK? -

    AWS Cloud Development Kit (AWS CDK) https://docs.aws.amazon.com/cdk/latest/guide/home.html
  50. $%,αϯϓϧɿ4 -BNCEB'VODUJPO 

  51. "UIFOBͬͯ$47Ͱ΋ΫΤϦͰ͖ΔΜ͡Όʁ  ࠓճͷηογϣϯ֓ཁͷΑ͏ͳ$47σʔλ಺ʹվߦΛؚΉ σʔλΛ͏·͘ύʔεͰ͖ͳ͔ͬͨ ଞʹ΋೔ຊޠͷΧϥϜ໊ͩͱύʔςΟγϣϯ͕੾Εͳ͍ࣄ৅Λ֬ೝ $47ͷจࣈίʔυ͕"UIFOBͰѻ͑ͳ͍4+*4ͩͬͨΓ͢Δ $47ੜσʔλΛͦͷ··+0*/͢ΔͷͰ͸ͳ͘ɺҰ౓+40/-JOFT ʹม׵͢ΔͳͲͨ͠΄͏͕Αͦ͞͏ ͜͏͍ͬͨࡉ͔͍ॲཧ͕Ͱ͖Δͷ΋-BNCEB'VODUJPOͷڧΈͷͻ ͱͭ

  52. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn);
  53. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w $47ੜσʔλΛ౤ೖ͢Δ4 όέοτ w QSFTJHOFEVSM͔ΒΞοϓ ϩʔυͯ͠΋Β͏ͷͰ$034 ઃఆ
  54. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w +40/-JOFTΛอଘ͓ͯ͘͠ όέοτ
  55. $%,αϯϓϧɿ4 -BNCEB'VODUJPO  export async function sessionConverter( stack: cdk.Construct, global:

    GlobalProps, ): Promise<EuqueteBucketResource> { const rawCsvBucket = new s3.Bucket(stack, 'Raw', { bucketName: global.getBucketName('raw', cdk.Stack.of(stack).account), cors: [ { allowedHeaders: ['*'], allowedMethods: [HttpMethods.PUT], allowedOrigins: ['*'], }, ], }); const jsonLinesBucket = new s3.Bucket(stack, 'JsonLines', { bucketName: global.getBucketName( 'json-lines', cdk.Stack.of(stack).account, ), }); const convertSessionCsvToJsonLinesFn = new lambda.Function( stack, 'ConvertSessionCsvToJsonLines', { functionName: global.getFunctionName('ConvertSessionCsvToJsonLines'), code: lambda.Code.fromAsset(NODE_LAMBDA_SRC_DIR), handler: 'lambda/handlers/sf/csv-converter/convert-session-csv-to-json-handler.handler', runtime: lambda.Runtime.NODEJS_12_X, layers: [nodeModulesLayer], timeout: Duration.minutes(15), memorySize: 256, environment: { REGION: global.pm.region, CSV_BUCKET_NAME: rawCsvBucket.bucketName, JSON_LINES_BUCKET_NAME: jsonLinesBucket.bucketName, }, }, ); rawCsvBucket.grantRead(convertSessionCsvToJsonLinesFn); jsonLinesBucket.grantReadWrite(convertSessionCsvToJsonLinesFn); w ੜσʔλόέοτ͔Βϩʔυ ͠ɺม׵͠ɺ+40/-JOFTό έοτʹ౤ೖ͢Δ-BNCEB 'VODUJPO
  56. $%,αϯϓϧɿ(MVF5BCMF "NB[PO"UIFOB 

  57. $%,αϯϓϧɿ(MVF5BCMF "NB[PO"UIFOB  ɾ4σʔλιʔε (MVF5BCMF ɾ-BNCEB'VODUJPO "NB[PO"UIFOB

  58. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], });
  59. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w (MVF%BUBCBTFͱ(MVF 5BCMFΛ࡞੒
  60. $%,αϯϓϧɿ4 (MVF5BCMF  export async function eventGlueTableResource( stack: cdk.Construct, global:

    GlobalProps, s3Resource: SlsGlueTableResourceInput, ): Promise<EventGlueTableResource> { const slsVirtualDataBase = new glue.Database(stack, 'SlsVirtualDatabase', { databaseName: global.getGlueDatabaseName('slsVirtual_application'), }); const sessionGlueTable = new glue.Table(stack, 'SessionTable', { database: slsVirtualDataBase, tableName: 'session', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'email', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'organizationName', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'session/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w σʔλιʔεͱͳΔ4ό έοτΛࢦఆ
  61. $%,αϯϓϧɿ4 (MVF5BCMF  const ticketGlueTable = new glue.Table(stack, 'TicketTable', {

    database: slsVirtualDataBase, tableName: 'ticket', columns: [ { name: 'email', type: glue.Schema.STRING, }, { name: 'ticketId', type: glue.Schema.STRING, }, ], dataFormat: glue.DataFormat.JSON, bucket: s3Resource.jsonLinesBucket, s3Prefix: 'ticket/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, { name: 'versionId', type: glue.Schema.STRING, }, ], }); w νέοτ༻ͷ(MVF5BCMF΋ ಉ͡Α͏ʹ࡞੒
  62. $%,αϯϓϧɿ4 (MVF5BCMF  const entryTable = new glue.Table(stack, 'EntryTable', {

    database: slsVirtualDataBase, tableName: 'entry', columns: [ { name: 'sessionId', type: glue.Schema.STRING, }, { name: 'name', type: glue.Schema.STRING, }, { name: 'title', type: glue.Schema.STRING, }, { name: 'description', type: glue.Schema.STRING, }, { name: 'ticketId', type: glue.Schema.STRING, }, ], dataFormat: new glue.DataFormat({ inputFormat: glue.InputFormat.TEXT, outputFormat: glue.OutputFormat.HIVE_IGNORE_KEY_TEXT, serializationLibrary: glue.SerializationLibrary.HIVE_JSON, }), bucket: s3Resource.entryBucket, s3Prefix: 'entry/', partitionKeys: [ { name: 'eventId', type: glue.Schema.STRING, }, ], w ࠷ޙʹɺ"UIFOBͰ+0*/͠ ͨσʔλΛ౤ೖ͢Δ༻ͷ (MVF5BCMF΋༻ҙ͓ͯ͘͠
  63. +0*/͢Δ-BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  export async function handler( event: JoinRequest, context?: LambdaContext,

    ): Promise<JoinResponse> { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } }
  64. export async function handler( event: JoinRequest, context?: LambdaContext, ): Promise<JoinResponse>

    { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } } -BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  w ϝΞυͰ+0*/ w +0*/ޙͷσʔλΛ&OUSZ ςʔϒϧʹ౤ೖ
  65. export async function handler( event: JoinRequest, context?: LambdaContext, ): Promise<JoinResponse>

    { const param: DocumentClient.GetItemInput = { TableName: EventDynamodbTableName, Key: { id: event.eventId, }, }; const res = await dynamo.get(param).promise(); const sessionVersionId = res.Item!.sessionJsonLinesInfo.versionId; const ticketVersionId = res.Item!.ticketJsonLinesInfo.versionId; try { const param = { ResultConfiguration: { OutputLocation: `s3://${AthenaOutputBucketName}/`, }, QueryExecutionContext: { Database: SlsVirtualGlueDataBaseName, }, QueryString: [ `insert into ${EntryGlueTableName}`, 'select session.sessionId,', 'ticket.ticketId,', 'session.name,', 'session.organizationName,', 'session.title,', 'session.description,', 'ticket.email', // lower case for partition key `from ${SessionGlueTableName} session inner join ${TicketGlueTableName} ticket on session.email = ticket.email`, `where session.versionId = '${sessionVersionId}' and ticket.versionId = '${ticketVersionId}'`, ].join(' '), }; console.log('insert into param', param); const execution = await athena.startQueryExecution(param).promise(); console.log(execution); return { executionId: execution.QueryExecutionId!, }; } catch (e) { console.log(e); throw e; } } -BNCEB'VODUJPOͷίʔυ͸͜Μͳײ͡  w "UIFOBΫΤϦΛ࣮ߦ
  66. ίʔυαϯϓϧɿ44FMFDU 

  67. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ 
  68. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ  w TFMFDU GSPNTPCKFDUPCKXIFSFPCKUJDLFUJE b`
  69. async function createS3SelectStream<T>( request: EntryRequest, expression: string, ): Promise<Subject<T>> {

    const source$ = new Subject<T>(); const content = await s3 .selectObjectContent({ Bucket: request.bucket, Key: request.key, InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: 'GZIP', }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } }, Expression: expression, ExpressionType: 'SQL', }) .promise(); const contentStream: any = content.Payload!; const filterRecordsOnly = new stream.Transform({ objectMode: true, transform( chunk: any, encoding: string, done: stream.TransformCallback, ): void { if (chunk.Records) { this.push(chunk.Records.Payload); } else if (chunk.Stats) { console.log( `Processed ${chunk.Stats.Details.BytesProcessed} bytes`, ); } else if (chunk.End) { console.log('SelectObjectContent completed'); } done(); }, }); const readLine = readline.createInterface( contentStream.pipe(filterRecordsOnly), ); readLine .on('line', (line: any) => { // console.log('line', line); source$.next(JSON.parse(line) as T); }) .on('close', () => { console.log('end'); source$.complete(); 44FMFDUͷ-BNCEB'VODUJPOίʔυ  w 44FMFDU͸݁ՌΛಠࣗͷ/PEFKTετϦʔϜͰฦ͢ͷͰ ͕Μ͹ͬͯऔಘ͍ͯ͠Δ༷ࢠ
  70. ίʔυαϯϓϧͷ·ͱΊ  -BNCEB'VODUJPOΛ࢖ͬͨ&5-Ͱ"84$%,Λ࢖ ͍૊ΈཱͯΔαϯϓϧΛࣔͨ͠ "84ߏ੒ਤ͔Β$%,ίʔυʹམͱͨ͢Ίʹ͸ɺ֤αʔϏεͷ ࣮ମΛ஌Δ͜ͱ͕ۙಓ ཁ݅ʹΑͬͯϚον͢ΔαʔϏε͸ҟͳΔɻ߹க͢ΔαʔϏε ͕ͳʹ͔ɺࣾ಺ͰɺαʔόʔϨείϛϡχςΟͰٞ࿦͠·͠ΐ ͏

  71. ·ͱΊ  ϑΝΠϧΛ+0*/͢ΔϢʔεέʔεʹରͯ͠ɺαʔόʔϨ εͰϥΠτͳ&5-Λߦͬͨ ෳ਺ͷαʔϏεͷσʔλΛ૊Έ߹Θͤͯɺͻͱͭͷ݁ՌΛಘ Δͱ͍͏γνϡΤʔγϣϯʹૺ۰ͨ͠ͱ͖ʹɺࠓճͷΑ͏ʹ ͍ͬͨΜ4ϑΝΠϧ΁ग़ྗɺͦͷޙ"UIFOBͰ+0*/ͯ͠4 4FMFDUͰΫΤϦ͢Δͱ͍͏ྲྀΕ͕࢖͑Δ͜ͱΛࢥ͍ग़ͯ͠΋Β ͑Ε͹޾͍Ͱ͢

  72. ࢀߟॻ੶ɾར༻ͨ͠πʔϧ  w DODGXHTFSWFSMFTT$/$'4FSWFSMFTT8(IUUQTHJUIVCDPNDODG XHTFSWFSMFTT w "84$%,IUUQTHJUIVCDPNBXTBXTDEL w DNXBEBZVTVLFUTBTDELIUUQTHJUIVCDPNDNXBEBZVTVLF UTBTDEL