Upgrade to Pro — share decks privately, control downloads, hide ads and more …

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

Jagu'e'r データ利活用分科会LTスライド

Shinichi Nakagawa

December 02, 2022
Tweet

More Decks by Shinichi Nakagawa

Other Decks in Technology

Transcript

  1. ҰਓͰ΋খ࢝͘͞ΊΒΕΔ Google CloudͰ࣮ݱ͢Δ ΄΅αʔόϨεͳσʔλج൫ ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ

    ㅟ ⚾؍ઓΛศརʹ͢ΔͨΊͷʮݸਓతͳDXʯઓུͱͦͷ࣮૷ Shinichi Nakagawa 2022/12/02 Jagu’e’rσʔλར׆༻෼Պձ #8
  2. Who am I ? • Shinichi Nakagawaʢத઒৳Ұʣ • ΞΫηϯνϡΞגࣜձࣾ 


    ςΫϊϩδʔίϯαϧςΟϯάຊ෦Ϛωδϟʔ • Ҏલͷ࢓ࣄ: ελʔτΞοϓ, ϝΨϕϯνϟʔͷΤϯδχΞ • ΞΫηϯνϡΞͰ͸Google Cloudؔ࿈ͷσϦόϦʔ • ݸਓͱͯ͠͸ҎԼͷ໨తͰϓϩμΫτ։ൃʢ㲈झຯʣ • ໺ٿσʔλ෼ੳɾղੳ • ࣗ෼ࣗ਎ͷϔϧεέΞ • ্هΛςʔϚʹٕͨ͠ज़ݕূ • ਪ͠ͷGoogle Cloud: BigQuery, Cloud Run • ਪ͠ͷBaseball Human: ৽ঙ߶ࢤ, ສ೾தਖ਼
  3. ຊ೔ͷ͓࿩ • ϝδϟʔϦʔάͷϏοάσʔλͷ͝঺հ • PythonͱGoogle CloudͰ࡞ΔαʔόϨεͰ͍͍ײ͡ͳσʔλج൫ ʮGoogle CloudͰσʔλج൫ΛαʔόϨεʹখ࢝͘͞ΊΔʯ࿩Λ 
 ओʹٕज़બఆͱ؍఺ͷଆ໘Ͱ͓࿩͠·͢,

    ໺ٿΛྫʹ⽁
  4. ϝδϟʔϦʔάͷϏοάσʔλ • ϝδϟʔϦʔά͸ʮStatcastʯͱ͍͏γεςϜͰ৭ΜͳσʔλΛه࿥͍ͯ͠·͢. 
 ※ݪଇΧϝϥɾϨʔμʔͱ͍ͬͨܭଌػثͰه࿥ʢҰ෦ਓྗͰͷه࿥ɾਪଌ஋ΛؚΉʣ • ྫ͑͹, ࣮گɾղઆͷݩωλ͸͢΂ͯ͜ͷʮStatcastʯͱ͍͏Ϗοάσʔλ͕ݩωλʹͳ͍ͬͯ·͢. • ΦΦλχαϯʂ˓߸ຊྥଧʂଧٿ଎౓180km/h,

    ඈڑ཭130m • ΦΦλχαϯʂ162km/hͷਅͬ௚͙Ͱݟಀ͠ࡾৼʂʂʂ • ໺ٿͷҰڍखҰ౤଍, ͢΂ͯͷ౤ٿɾଧٿσʔλ͕ه࿥͞ΕΔ. • ϨΪϡϥʔγʔζϯʢ30νʔϜɾ162ࢼ߹ʣͰ͓͓Αͦ70ʙ80ສٿલޙ. ϙετγʔζϯɾय़Ωϟϯϓσʔλ΋͋Δ. • σʔλ͸91ݸͷ߲໨ʢ!?ʣͰߏ੒͞ΕΔ, ϨΪϡϥʔγʔζϯ෼Ͱ͓͓Αͦ400MBʙ600MB͙Β͍ͷσʔλ. • baseballsavant.mlb.com ͱ͍͏αΠτͰ୭Ͱ΋Ӿཡɾμ΢ϯϩʔυʢCSV ϑΥʔϚοτʣͰ͖·͢.
  5. ϝδϟʔϦʔάͷσʔλͰৼΓฦΔʮΦΦλχαϯͷ2022೥ʯ ͜ͷσʔλͰԿ͕Ͱ͖Δ͔঺հ͠·͢ ग़య: ͍Β͢ͱ΍ https://www.irasutoya.com/2013/12/blog-post_5056.html ग़య: ͍Β͢ͱ΍ https://www.irasutoya.com/2019/06/blog-post_512.html

  6. 2022೥ͷΦΦλχαϯ, εϥΠμʔͱ2γʔϜ, ΧοτϘʔϧܑ͞ΜʹͳΔ • ࠓ೥ͷΦΦλχαϯ, ΊͬͪΌ 
 εϥΠμʔ౤͍͛ͯΔ • ͓ؾ͖ͮͩΖ͏͔?ޙ൒ઓ͸

    
 2γʔϜʢσʔλ্͸Sinkerʣ͕ 
 ૿͍͑ͯΔ͜ͱʹ!? • εϥΠμʔ, 2γʔϜ, ΧοτϘʔϧͰ 
 บ͕ڧ͍ۂ͕Γٿ౤͛ΔϚϯʹΩϟϥม ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022 ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022
  7. ͱ͋ΔΦΦλχαϯͷొ൘೔ʢ2022/9/29, 8ճ10ୣࡾৼແࣦ఺ʣ ൒෼ۙ͘εϥΠμʔΛ౤͛ͯ2γʔϜͱΧοτͰԡ͍ͯ͘͠Πϝʔδ ౤͛ͨ৔ॴʢัख໨ઢʣ ϦϦʔεϙΠϯτʢัख໨ઢʣ ٿछͷׂ߹ ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022ʢ্ه͢΂ͯʣ

  8. ͪͳΈʹଧऀͱͯ͠ύϑΥʔϚϯε ඇৗʹڧ͍ଧٿΛଧͬͯϗʔϜϥϯʹ͍ͯ͠Δ͜ͱ͕Θ͔Δ ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  9. ʁʁʮຖ೔ຖࢼ߹ݟΔ࢓૊Έཉ͍͠ʯ ී௨ͷσʔλ෼ੳͳΒʮGoogle ColabͰͪΐͬͱ৮͓ͬͯ͠·͍ʯͰ͕͢ ຖ೔ݟΔ࢓૊Έ͕ཉ͔ͬͨ͠ͷͰ࡞Γ·ͨ͠

  10. ͱ͍͏Θ͚Ͱ, ͪΐͬ͜ͱ࡞ͬͯΈ·ͨ͠. ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022

  11. PythonͱGoogle CloudͰ࡞Δ αʔόϨεͰ͍͍ײ͡ͳ σʔλج൫ʢ໺ٿฤʣ

  12. ΞʔΩςΫνϟͷશମ૾

  13. This presentation makes reference to marks owned by third parties.

    Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  14. ΞʔΩςΫνϟղઆʢ㲈ͩ͜ΘΓϙΠϯτʣ • ຖ೔σʔλ֬ೝɾຖ೔σʔλߋ৽Λ͍͍ײ͡ʹ࣮ݱ͢ΔͨΊ, 
 ʮϑϧϚωʔδυͳαʔόϨεܥΫϥ΢υαʔϏεʯΛશ໘తʹ׆༻ͯ͠ߏஙɾӡ༻. • αʔϏεબఆͷجຊํ਑ • ʮDWH͸BigQueryʯΛى఺ʹ, ETLͱΞϓϦέʔγϣϯຊମΛઃܭʢBQΛ࢖͍͍͔ͨΒʣ

    • ֤ίϯϙʔωϯτ͸ϚΠΫϩαʔϏεͱͯ͠ಠཱͤ͞Δ, ࡞Γ΍͍͢ɾςετ͠΍͍͢ͷͰ. • ҰͭҰͭͷཁૉ͸খ͍͞ΞϓϦͳͷͰ, Cloud Funcions or Cloud RunͰߏஙɾӡ༻ • GitHub Actions౳ͷCI/CDͷύΠϓϥΠϯʹ૊ΈࠐΜͰσϓϩΠɾεέʔϧͰ͖ͨΓ 
 جຊతʹ͸ʮ࢖ͬͨ෼͚ͩ՝ۚʯʹͳΔͷͰ͓ࡒ෍ʹ΋༏͍͠ʢ݄͋ͨΓ$5લޙʣ👛
  15. Ϣʔεέʔε঺հ

  16. • ΞϓϦຊମ͸Cloud RunͰϗεςΟϯά, ΞϓϦຊମ͸Dashͱ͍͏PythonͷFrameworkͰ࣮૷ • API GatewayΛ௨ͯ͠BackendʢCloud FunctionsʣʹΞΫηε. Backend͸Functions FrameworkͰ࡞ͬͨRESTful

    API • Database͸Firestore, ޙʹ঺հ͢ΔETLͰBigQuery͔ΒETLͯ͠ߏங μογϡϘʔυΞϓϦ This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  17. σʔλऩू&BigQueryอଘ • σʔλݩαΠτʢBaseball Savantʣ͔Βఆظతʹσʔλऩू͢ΔΫϩʔϥʔʢCloud Functionsʣ࣮ߦ • ࣮ߦ݁Ռ͸Google Cloud StorageʢGCSʣʹCSVͱͯ͠อଘ. ͜Ε͕ݯઘͷσʔλʢDatalakeʣ

    • GCS্ͷCSVΛαϚϦʔ͍͍ͯ͠ײ͡ʹͯ͠BigQueryʹอଘ͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  18. Firestore౤ೖʢDatabaseʹσʔλҠૹʣ • BigQueryσʔλΛμογϡϘʔυ༻σʔλͷܗࣜʢJSONʣʹม׵͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ • ࣮ߦ݁ՌʢGCS্ʹJSONܗࣜͰอଘʣΛFirestoreʹೖΕΔͨΊͷPythonεΫϦϓτΛ࣮ߦ • DataprocͱFirestoreͷॲཧ͸खݩͷεΫϦϓτΛΛखಈ࣮ߦʢ׬શࣗಈԽΛ્֐͢Δ੍໿͕͋ͬͨͨΊʣ This presentation

    makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  19. ӡ༻ͯ͠ͷৼΓฦΓ • ΞϓϦଆ͸Cloud Run & Cloud FunctionsͰ͍͍ײ͡ʹӡ༻Ͱ͖ͯΔ👏 
 અ໿ͷͨΊϦιʔεΛίʔϧυελϯόΠͷঢ়ଶͰӡ༻͍ͯ͠Δ͕ 


    ݸਓར༻ͳͷͰࢧো͸ͳ͍ʢ͔ͭinstance͸CIճͯ͠૿ݮͰ͖Δߏ੒ʣ • σʔλଆ͸ධՁ͕෼͔ΕΔ • Cloud FunctionsͱSchedulerͰͷϐλΰϥεΠονͳσʔλॲཧ͸˕ • BigQuery΋͍͍ײ͡, ूܭॲཧͳͲ΋ετϨεແ͘ߦ͚͍ͯΔ̋ • Dataproc serverlessΛࠓճͷن໛Ͱ࢖͏ͷ͸৑௕͔ͩͬͨ΋͠Εͳ͍ʁ
  20. ࠓޙ͸ଟ෼͜͏ͳΔ (Ver 1.1 -> Ver 2.0)

  21. This presentation makes reference to marks owned by third parties.

    Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  22. ࠓޙ΍Ζ͏ͱࢥ͍ͬͯΔ͜ͱ • ETL͸Cloud Functions & Cloud SchedulerͷϐλΰϥεΠονʹ౷Ұ • Dataproc͸࣮ࡍ࢖ͬͨ݁Ռ, ࣗ෼ͷϢʔεέʔεʹ͸too

    much • γϯϓϧͳCSV͔ͭσʔλྔ΋গͳ͍ͷͰCloud FunctionsͰॲཧՄೳ • σʔλऩूɾॲཧϑϩʔͷ׬શࣗಈԽ, IaCʹΑΔΠϯϑϥ؅ཧ • BigQueryΛ࢖֤ͬͨछ౷ܭσʔλͷॆ࣮Խ • ࠓ͸΄΅ੜσʔλΛΫΤϦͯ͠Δ͚ͩʢඞཁʹԠͯ͡viewΛ࡞ΔͳͲʣ • Spark΋࢖͑Δ͠΋͏ͪΐͬͱؾͷར͍ͨDatamartΛॆ࣮ͤ͞Δ
  23. ߨԋͷ·ͱΊ • ϝδϟʔϦʔάʹ͸Φʔϓϯσʔλ͕͋Γɺ 
 େ୩ᠳฏબखͳͲͷύϑΥʔϚϯε͕ 
 ֬ೝɾධՁͰ͖·͢ɻ • Φʔϓϯσʔλͷ෼ੳɾՄࢹԽΛ೔ৗతʹ 


    ׆༻͢ΔͨΊɺGoogle CloudͰ 
 σʔλ෼ੳج൫Λ࡞Γ·ͨ͠ɻ • αʔόϨεɾΞʔΩςΫνϟͷΈͰ 
 σʔλج൫ߏஙɾӡ༻͸࣮ݱՄೳɺ 
 ڧ͘Φεεϝ͍͖͍ͯͨ͠Ͱ͢ɻ ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022
  24. ࠓճ͓࿩Ͱ͖ͳ͔ͬͨ͜ͱ • DataprocʢSparkʣͷৄ͍͠࿩. ಛʹServerlessΛ࢖ͬͨ݅. • Cloud Functionsୈೋੈ୅Λ࢖ͬͨόονॲཧͱόοΫΤϯυAPI Ͳ͔͜ͷػձͰ·ͨLTͳͲͰ͖ͨΒͱࢥ͓ͬͯΓ·͢👍

  25. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠