Upgrade to Pro — share decks privately, control downloads, hide ads and more …

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

Jagu'e'r データ利活用分科会LTスライド

Shinichi Nakagawa

December 02, 2022
Tweet

More Decks by Shinichi Nakagawa

Other Decks in Technology

Transcript

 1. ҰਓͰ΋খ࢝͘͞ΊΒΕΔ


  Google CloudͰ࣮ݱ͢Δ


  ΄΅αʔόϨεͳσʔλج൫
  ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ
  ⚾؍ઓΛศརʹ͢ΔͨΊͷʮݸਓతͳDXʯઓུͱͦͷ࣮૷
  Shinichi Nakagawa 2022/12/02 Jagu’e’rσʔλར׆༻෼Պձ #8

  View Slide

 2. Who am I ?
  • Shinichi Nakagawaʢத઒৳Ұʣ


  • ΞΫηϯνϡΞגࣜձࣾ

  ςΫϊϩδʔίϯαϧςΟϯάຊ෦Ϛωδϟʔ


  • Ҏલͷ࢓ࣄ: ελʔτΞοϓ, ϝΨϕϯνϟʔͷΤϯδχΞ


  • ΞΫηϯνϡΞͰ͸Google Cloudؔ࿈ͷσϦόϦʔ


  • ݸਓͱͯ͠͸ҎԼͷ໨తͰϓϩμΫτ։ൃʢ㲈झຯʣ


  • ໺ٿσʔλ෼ੳɾղੳ


  • ࣗ෼ࣗ਎ͷϔϧεέΞ


  • ্هΛςʔϚʹٕͨ͠ज़ݕূ


  • ਪ͠ͷGoogle Cloud: BigQuery, Cloud Run


  • ਪ͠ͷBaseball Human: ৽ঙ߶ࢤ, ສ೾தਖ਼

  View Slide

 3. ຊ೔ͷ͓࿩
  • ϝδϟʔϦʔάͷϏοάσʔλͷ͝঺հ


  • PythonͱGoogle CloudͰ࡞ΔαʔόϨεͰ͍͍ײ͡ͳσʔλج൫


  ʮGoogle CloudͰσʔλج൫ΛαʔόϨεʹখ࢝͘͞ΊΔʯ࿩Λ

  ओʹٕज़બఆͱ؍఺ͷଆ໘Ͱ͓࿩͠·͢, ໺ٿΛྫʹ⽁

  View Slide

 4. ϝδϟʔϦʔάͷϏοάσʔλ
  • ϝδϟʔϦʔά͸ʮStatcastʯͱ͍͏γεςϜͰ৭ΜͳσʔλΛه࿥͍ͯ͠·͢.

  ※ݪଇΧϝϥɾϨʔμʔͱ͍ͬͨܭଌػثͰه࿥ʢҰ෦ਓྗͰͷه࿥ɾਪଌ஋ΛؚΉʣ


  • ྫ͑͹, ࣮گɾղઆͷݩωλ͸͢΂ͯ͜ͷʮStatcastʯͱ͍͏Ϗοάσʔλ͕ݩωλʹͳ͍ͬͯ·͢.


  • ΦΦλχαϯʂ˓߸ຊྥଧʂଧٿ଎౓180km/h, ඈڑ཭130m


  • ΦΦλχαϯʂ162km/hͷਅͬ௚͙Ͱݟಀ͠ࡾৼʂʂʂ


  • ໺ٿͷҰڍखҰ౤଍, ͢΂ͯͷ౤ٿɾଧٿσʔλ͕ه࿥͞ΕΔ.


  • ϨΪϡϥʔγʔζϯʢ30νʔϜɾ162ࢼ߹ʣͰ͓͓Αͦ70ʙ80ສٿલޙ. ϙετγʔζϯɾय़Ωϟϯϓσʔλ΋͋Δ.


  • σʔλ͸91ݸͷ߲໨ʢ!?ʣͰߏ੒͞ΕΔ, ϨΪϡϥʔγʔζϯ෼Ͱ͓͓Αͦ400MBʙ600MB͙Β͍ͷσʔλ.


  • baseballsavant.mlb.com ͱ͍͏αΠτͰ୭Ͱ΋Ӿཡɾμ΢ϯϩʔυʢCSV ϑΥʔϚοτʣͰ͖·͢.

  View Slide

 5. ϝδϟʔϦʔάͷσʔλͰৼΓฦΔʮΦΦλχαϯͷ2022೥ʯ


  ͜ͷσʔλͰԿ͕Ͱ͖Δ͔঺հ͠·͢
  ग़య: ͍Β͢ͱ΍ https://www.irasutoya.com/2013/12/blog-post_5056.html ग़య: ͍Β͢ͱ΍ https://www.irasutoya.com/2019/06/blog-post_512.html

  View Slide

 6. 2022೥ͷΦΦλχαϯ,


  εϥΠμʔͱ2γʔϜ,


  ΧοτϘʔϧܑ͞ΜʹͳΔ
  • ࠓ೥ͷΦΦλχαϯ, ΊͬͪΌ

  εϥΠμʔ౤͍͛ͯΔ


  • ͓ؾ͖ͮͩΖ͏͔?ޙ൒ઓ͸

  2γʔϜʢσʔλ্͸Sinkerʣ͕

  ૿͍͑ͯΔ͜ͱʹ!?


  • εϥΠμʔ, 2γʔϜ, ΧοτϘʔϧͰ

  บ͕ڧ͍ۂ͕Γٿ౤͛ΔϚϯʹΩϟϥม
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  View Slide

 7. ͱ͋ΔΦΦλχαϯͷొ൘೔ʢ2022/9/29, 8ճ10ୣࡾৼແࣦ఺ʣ


  ൒෼ۙ͘εϥΠμʔΛ౤͛ͯ2γʔϜͱΧοτͰԡ͍ͯ͘͠Πϝʔδ
  ౤͛ͨ৔ॴʢัख໨ઢʣ ϦϦʔεϙΠϯτʢัख໨ઢʣ
  ٿछͷׂ߹
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022ʢ্ه͢΂ͯʣ

  View Slide

 8. ͪͳΈʹଧऀͱͯ͠ύϑΥʔϚϯε


  ඇৗʹڧ͍ଧٿΛଧͬͯϗʔϜϥϯʹ͍ͯ͠Δ͜ͱ͕Θ͔Δ
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  View Slide

 9. ʁʁʮຖ೔ຖࢼ߹ݟΔ࢓૊Έཉ͍͠ʯ


  ී௨ͷσʔλ෼ੳͳΒʮGoogle ColabͰͪΐͬͱ৮͓ͬͯ͠·͍ʯͰ͕͢


  ຖ೔ݟΔ࢓૊Έ͕ཉ͔ͬͨ͠ͷͰ࡞Γ·ͨ͠

  View Slide

 10. ͱ͍͏Θ͚Ͱ, ͪΐͬ͜ͱ࡞ͬͯΈ·ͨ͠.
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  View Slide

 11. PythonͱGoogle CloudͰ࡞Δ


  αʔόϨεͰ͍͍ײ͡ͳ


  σʔλج൫ʢ໺ٿฤʣ

  View Slide

 12. ΞʔΩςΫνϟͷશମ૾

  View Slide

 13. This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

  View Slide

 14. ΞʔΩςΫνϟղઆʢ㲈ͩ͜ΘΓϙΠϯτʣ
  • ຖ೔σʔλ֬ೝɾຖ೔σʔλߋ৽Λ͍͍ײ͡ʹ࣮ݱ͢ΔͨΊ,

  ʮϑϧϚωʔδυͳαʔόϨεܥΫϥ΢υαʔϏεʯΛશ໘తʹ׆༻ͯ͠ߏஙɾӡ༻.


  • αʔϏεબఆͷجຊํ਑


  • ʮDWH͸BigQueryʯΛى఺ʹ, ETLͱΞϓϦέʔγϣϯຊମΛઃܭʢBQΛ࢖͍͍͔ͨΒʣ


  • ֤ίϯϙʔωϯτ͸ϚΠΫϩαʔϏεͱͯ͠ಠཱͤ͞Δ, ࡞Γ΍͍͢ɾςετ͠΍͍͢ͷͰ.


  • ҰͭҰͭͷཁૉ͸খ͍͞ΞϓϦͳͷͰ, Cloud Funcions or Cloud RunͰߏஙɾӡ༻


  • GitHub Actions౳ͷCI/CDͷύΠϓϥΠϯʹ૊ΈࠐΜͰσϓϩΠɾεέʔϧͰ͖ͨΓ

  جຊతʹ͸ʮ࢖ͬͨ෼͚ͩ՝ۚʯʹͳΔͷͰ͓ࡒ෍ʹ΋༏͍͠ʢ݄͋ͨΓ$5લޙʣ👛

  View Slide

 15. Ϣʔεέʔε঺հ

  View Slide

 16. • ΞϓϦຊମ͸Cloud RunͰϗεςΟϯά, ΞϓϦຊମ͸Dashͱ͍͏PythonͷFrameworkͰ࣮૷
  • API GatewayΛ௨ͯ͠BackendʢCloud FunctionsʣʹΞΫηε. Backend͸Functions FrameworkͰ࡞ͬͨRESTful API
  • Database͸Firestore, ޙʹ঺հ͢ΔETLͰBigQuery͔ΒETLͯ͠ߏங
  μογϡϘʔυΞϓϦ
  This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

  View Slide

 17. σʔλऩू&BigQueryอଘ
  • σʔλݩαΠτʢBaseball Savantʣ͔Βఆظతʹσʔλऩू͢ΔΫϩʔϥʔʢCloud Functionsʣ࣮ߦ
  • ࣮ߦ݁Ռ͸Google Cloud StorageʢGCSʣʹCSVͱͯ͠อଘ. ͜Ε͕ݯઘͷσʔλʢDatalakeʣ
  • GCS্ͷCSVΛαϚϦʔ͍͍ͯ͠ײ͡ʹͯ͠BigQueryʹอଘ͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ
  This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

  View Slide

 18. Firestore౤ೖʢDatabaseʹσʔλҠૹʣ
  • BigQueryσʔλΛμογϡϘʔυ༻σʔλͷܗࣜʢJSONʣʹม׵͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ
  • ࣮ߦ݁ՌʢGCS্ʹJSONܗࣜͰอଘʣΛFirestoreʹೖΕΔͨΊͷPythonεΫϦϓτΛ࣮ߦ
  • DataprocͱFirestoreͷॲཧ͸खݩͷεΫϦϓτΛΛखಈ࣮ߦʢ׬શࣗಈԽΛ્֐͢Δ੍໿͕͋ͬͨͨΊʣ
  This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

  View Slide

 19. ӡ༻ͯ͠ͷৼΓฦΓ
  • ΞϓϦଆ͸Cloud Run & Cloud FunctionsͰ͍͍ײ͡ʹӡ༻Ͱ͖ͯΔ👏

  અ໿ͷͨΊϦιʔεΛίʔϧυελϯόΠͷঢ়ଶͰӡ༻͍ͯ͠Δ͕

  ݸਓར༻ͳͷͰࢧো͸ͳ͍ʢ͔ͭinstance͸CIճͯ͠૿ݮͰ͖Δߏ੒ʣ


  • σʔλଆ͸ධՁ͕෼͔ΕΔ


  • Cloud FunctionsͱSchedulerͰͷϐλΰϥεΠονͳσʔλॲཧ͸˕


  • BigQuery΋͍͍ײ͡, ूܭॲཧͳͲ΋ετϨεແ͘ߦ͚͍ͯΔ̋


  • Dataproc serverlessΛࠓճͷن໛Ͱ࢖͏ͷ͸৑௕͔ͩͬͨ΋͠Εͳ͍ʁ

  View Slide

 20. ࠓޙ͸ଟ෼͜͏ͳΔ


  (Ver 1.1 -> Ver 2.0)

  View Slide

 21. This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

  View Slide

 22. ࠓޙ΍Ζ͏ͱࢥ͍ͬͯΔ͜ͱ
  • ETL͸Cloud Functions & Cloud SchedulerͷϐλΰϥεΠονʹ౷Ұ


  • Dataproc͸࣮ࡍ࢖ͬͨ݁Ռ, ࣗ෼ͷϢʔεέʔεʹ͸too much


  • γϯϓϧͳCSV͔ͭσʔλྔ΋গͳ͍ͷͰCloud FunctionsͰॲཧՄೳ


  • σʔλऩूɾॲཧϑϩʔͷ׬શࣗಈԽ, IaCʹΑΔΠϯϑϥ؅ཧ


  • BigQueryΛ࢖֤ͬͨछ౷ܭσʔλͷॆ࣮Խ


  • ࠓ͸΄΅ੜσʔλΛΫΤϦͯ͠Δ͚ͩʢඞཁʹԠͯ͡viewΛ࡞ΔͳͲʣ


  • Spark΋࢖͑Δ͠΋͏ͪΐͬͱؾͷར͍ͨDatamartΛॆ࣮ͤ͞Δ

  View Slide

 23. ߨԋͷ·ͱΊ
  • ϝδϟʔϦʔάʹ͸Φʔϓϯσʔλ͕͋Γɺ

  େ୩ᠳฏબखͳͲͷύϑΥʔϚϯε͕

  ֬ೝɾධՁͰ͖·͢ɻ


  • Φʔϓϯσʔλͷ෼ੳɾՄࢹԽΛ೔ৗతʹ

  ׆༻͢ΔͨΊɺGoogle CloudͰ

  σʔλ෼ੳج൫Λ࡞Γ·ͨ͠ɻ


  • αʔόϨεɾΞʔΩςΫνϟͷΈͰ

  σʔλج൫ߏஙɾӡ༻͸࣮ݱՄೳɺ

  ڧ͘Φεεϝ͍͖͍ͯͨ͠Ͱ͢ɻ
  ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  View Slide

 24. ࠓճ͓࿩Ͱ͖ͳ͔ͬͨ͜ͱ
  • DataprocʢSparkʣͷৄ͍͠࿩. ಛʹServerlessΛ࢖ͬͨ݅.


  • Cloud Functionsୈೋੈ୅Λ࢖ͬͨόονॲཧͱόοΫΤϯυAPI


  Ͳ͔͜ͷػձͰ·ͨLTͳͲͰ͖ͨΒͱࢥ͓ͬͯΓ·͢👍

  View Slide

 25. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠

  View Slide