ҰਓͰ΋খ࢝͘͞ΊΒΕΔ Google CloudͰ࣮ݱ͢Δ ΄΅αʔόϨεͳσʔλج൫ ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ ⚾؍ઓΛศརʹ͢ΔͨΊͷʮݸਓతͳDXʯઓུͱͦͷ࣮૷ Shinichi Nakagawa 2022/12/02 Jagu’e’rσʔλར׆༻෼Պձ #8

Who am I ? • Shinichi Nakagawaʢத઒৳Ұʣ • ΞΫηϯνϡΞגࣜձࣾ 
 ςΫϊϩδʔίϯαϧςΟϯάຊ෦Ϛωδϟʔ • Ҏલͷ࢓ࣄ: ελʔτΞοϓ, ϝΨϕϯνϟʔͷΤϯδχΞ • ΞΫηϯνϡΞͰ͸Google Cloudؔ࿈ͷσϦόϦʔ • ݸਓͱͯ͠͸ҎԼͷ໨తͰϓϩμΫτ։ൃʢ㲈झຯʣ • ໺ٿσʔλ෼ੳɾղੳ • ࣗ෼ࣗ਎ͷϔϧεέΞ • ্هΛςʔϚʹٕͨ͠ज़ݕূ • ਪ͠ͷGoogle Cloud: BigQuery, Cloud Run • ਪ͠ͷBaseball Human: ৽ঙ߶ࢤ, ສ೾தਖ਼

ຊ೔ͷ͓࿩ • ϝδϟʔϦʔάͷϏοάσʔλͷ͝঺հ • PythonͱGoogle CloudͰ࡞ΔαʔόϨεͰ͍͍ײ͡ͳσʔλج൫ ʮGoogle CloudͰσʔλج൫ΛαʔόϨεʹখ࢝͘͞ΊΔʯ࿩Λ 
 ओʹٕज़બఆͱ؍఺ͷଆ໘Ͱ͓࿩͠·͢, ໺ٿΛྫʹ⽁

ϝδϟʔϦʔάͷϏοάσʔλ • ϝδϟʔϦʔά͸ʮStatcastʯͱ͍͏γεςϜͰ৭ΜͳσʔλΛه࿥͍ͯ͠·͢. 
 ※ݪଇΧϝϥɾϨʔμʔͱ͍ͬͨܭଌػثͰه࿥ʢҰ෦ਓྗͰͷه࿥ɾਪଌ஋ΛؚΉʣ • ྫ͑͹, ࣮گɾղઆͷݩωλ͸͢΂ͯ͜ͷʮStatcastʯͱ͍͏Ϗοάσʔλ͕ݩωλʹͳ͍ͬͯ·͢. • ΦΦλχαϯʂ˓߸ຊྥଧʂଧٿ଎౓180km/h, ඈڑ཭130m • ΦΦλχαϯʂ162km/hͷਅͬ௚͙Ͱݟಀ͠ࡾৼʂʂʂ • ໺ٿͷҰڍखҰ౤଍, ͢΂ͯͷ౤ٿɾଧٿσʔλ͕ه࿥͞ΕΔ. • ϨΪϡϥʔγʔζϯʢ30νʔϜɾ162ࢼ߹ʣͰ͓͓Αͦ70ʙ80ສٿલޙ. ϙετγʔζϯɾय़Ωϟϯϓσʔλ΋͋Δ. • σʔλ͸91ݸͷ߲໨ʢ!?ʣͰߏ੒͞ΕΔ, ϨΪϡϥʔγʔζϯ෼Ͱ͓͓Αͦ400MBʙ600MB͙Β͍ͷσʔλ. • ͱ͍͏αΠτͰ୭Ͱ΋Ӿཡɾμ΢ϯϩʔυʢCSV ϑΥʔϚοτʣͰ͖·͢.

ϝδϟʔϦʔάͷσʔλͰৼΓฦΔʮΦΦλχαϯͷ2022೥ʯ ͜ͷσʔλͰԿ͕Ͱ͖Δ͔঺հ͠·͢ ग़య: ͍Β͢ͱ΍ ग़య: ͍Β͢ͱ΍

2022೥ͷΦΦλχαϯ, εϥΠμʔͱ2γʔϜ, ΧοτϘʔϧܑ͞ΜʹͳΔ • ࠓ೥ͷΦΦλχαϯ, ΊͬͪΌ 
 εϥΠμʔ౤͍͛ͯΔ • ͓ؾ͖ͮͩΖ͏͔?ޙ൒ઓ͸ 
 ૿͍͑ͯΔ͜ͱʹ!? • εϥΠμʔ, 2γʔϜ, ΧοτϘʔϧͰ 
 บ͕ڧ͍ۂ͕Γٿ౤͛ΔϚϯʹΩϟϥม ग़య: ग़య:

ͱ͋ΔΦΦλχαϯͷొ൘೔ʢ2022/9/29, 8ճ10ୣࡾৼແࣦ఺ʣ ൒෼ۙ͘εϥΠμʔΛ౤͛ͯ2γʔϜͱΧοτͰԡ͍ͯ͘͠Πϝʔδ ౤͛ͨ৔ॴʢัख໨ઢʣ ϦϦʔεϙΠϯτʢัख໨ઢʣ ٿछͷׂ߹ ग़య:ʢ্ه͢΂ͯʣ

ͪͳΈʹଧऀͱͯ͠ύϑΥʔϚϯε ඇৗʹڧ͍ଧٿΛଧͬͯϗʔϜϥϯʹ͍ͯ͠Δ͜ͱ͕Θ͔Δ ग़య:

ʁʁʮຖ೔ຖࢼ߹ݟΔ࢓૊Έཉ͍͠ʯ ී௨ͷσʔλ෼ੳͳΒʮGoogle ColabͰͪΐͬͱ৮͓ͬͯ͠·͍ʯͰ͕͢ ຖ೔ݟΔ࢓૊Έ͕ཉ͔ͬͨ͠ͷͰ࡞Γ·ͨ͠

ͱ͍͏Θ͚Ͱ, ͪΐͬ͜ͱ࡞ͬͯΈ·ͨ͠. ग़య:

PythonͱGoogle CloudͰ࡞Δ αʔόϨεͰ͍͍ײ͡ͳ σʔλج൫ʢ໺ٿฤʣ

This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.

ΞʔΩςΫνϟղઆʢ㲈ͩ͜ΘΓϙΠϯτʣ • ຖ೔σʔλ֬ೝɾຖ೔σʔλߋ৽Λ͍͍ײ͡ʹ࣮ݱ͢ΔͨΊ, 
 ʮϑϧϚωʔδυͳαʔόϨεܥΫϥ΢υαʔϏεʯΛશ໘తʹ׆༻ͯ͠ߏஙɾӡ༻. • αʔϏεબఆͷجຊํ਑ • ʮDWH͸BigQueryʯΛى఺ʹ, ETLͱΞϓϦέʔγϣϯຊମΛઃܭʢBQΛ࢖͍͍͔ͨΒʣ • ֤ίϯϙʔωϯτ͸ϚΠΫϩαʔϏεͱͯ͠ಠཱͤ͞Δ, ࡞Γ΍͍͢ɾςετ͠΍͍͢ͷͰ. • ҰͭҰͭͷཁૉ͸খ͍͞ΞϓϦͳͷͰ, Cloud Funcions or Cloud RunͰߏஙɾӡ༻ • GitHub Actions౳ͷCI/CDͷύΠϓϥΠϯʹ૊ΈࠐΜͰσϓϩΠɾεέʔϧͰ͖ͨΓ 

• ΞϓϦຊମ͸Cloud RunͰϗεςΟϯά, ΞϓϦຊମ͸Dashͱ͍͏PythonͷFrameworkͰ࣮૷ • API GatewayΛ௨ͯ͠BackendʢCloud FunctionsʣʹΞΫηε. Backend͸Functions FrameworkͰ࡞ͬͨRESTful API • Database͸Firestore, ޙʹ঺հ͢ΔETLͰBigQuery͔ΒETLͯ͠ߏங μογϡϘʔυΞϓϦ

σʔλऩू&BigQueryอଘ • σʔλݩαΠτʢBaseball Savantʣ͔Βఆظతʹσʔλऩू͢ΔΫϩʔϥʔʢCloud Functionsʣ࣮ߦ • ࣮ߦ݁Ռ͸Google Cloud StorageʢGCSʣʹCSVͱͯ͠อଘ. ͜Ε͕ݯઘͷσʔλʢDatalakeʣ • GCS্ͷCSVΛαϚϦʔ͍͍ͯ͠ײ͡ʹͯ͠BigQueryʹอଘ͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ

Firestore౤ೖʢDatabaseʹσʔλҠૹʣ • BigQueryσʔλΛμογϡϘʔυ༻σʔλͷܗࣜʢJSONʣʹม׵͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ • ࣮ߦ݁ՌʢGCS্ʹJSONܗࣜͰอଘʣΛFirestoreʹೖΕΔͨΊͷPythonεΫϦϓτΛ࣮ߦ • DataprocͱFirestoreͷॲཧ͸खݩͷεΫϦϓτΛΛखಈ࣮ߦʢ׬શࣗಈԽΛ્֐͢Δ੍໿͕͋ͬͨͨΊʣ

ӡ༻ͯ͠ͷৼΓฦΓ • ΞϓϦଆ͸Cloud Run & Cloud FunctionsͰ͍͍ײ͡ʹӡ༻Ͱ͖ͯΔ👏 
 ݸਓར༻ͳͷͰࢧো͸ͳ͍ʢ͔ͭinstance͸CIճͯ͠૿ݮͰ͖Δߏ੒ʣ • σʔλଆ͸ධՁ͕෼͔ΕΔ • Cloud FunctionsͱSchedulerͰͷϐλΰϥεΠονͳσʔλॲཧ͸˕ • BigQuery΋͍͍ײ͡, ूܭॲཧͳͲ΋ετϨεແ͘ߦ͚͍ͯΔ̋ • Dataproc serverlessΛࠓճͷن໛Ͱ࢖͏ͷ͸৑௕͔ͩͬͨ΋͠Εͳ͍ʁ

ࠓޙ͸ଟ෼͜͏ͳΔ (Ver 1.1 -> Ver 2.0)

ࠓޙ΍Ζ͏ͱࢥ͍ͬͯΔ͜ͱ • ETL͸Cloud Functions & Cloud SchedulerͷϐλΰϥεΠονʹ౷Ұ • Dataproc͸࣮ࡍ࢖ͬͨ݁Ռ, ࣗ෼ͷϢʔεέʔεʹ͸too much • γϯϓϧͳCSV͔ͭσʔλྔ΋গͳ͍ͷͰCloud FunctionsͰॲཧՄೳ • σʔλऩूɾॲཧϑϩʔͷ׬શࣗಈԽ, IaCʹΑΔΠϯϑϥ؅ཧ • BigQueryΛ࢖֤ͬͨछ౷ܭσʔλͷॆ࣮Խ • ࠓ͸΄΅ੜσʔλΛΫΤϦͯ͠Δ͚ͩʢඞཁʹԠͯ͡viewΛ࡞ΔͳͲʣ • Spark΋࢖͑Δ͠΋͏ͪΐͬͱؾͷར͍ͨDatamartΛॆ࣮ͤ͞Δ

ߨԋͷ·ͱΊ • ϝδϟʔϦʔάʹ͸Φʔϓϯσʔλ͕͋Γɺ 
 ֬ೝɾධՁͰ͖·͢ɻ • Φʔϓϯσʔλͷ෼ੳɾՄࢹԽΛ೔ৗతʹ 
 ׆༻͢ΔͨΊɺGoogle CloudͰ 
 σʔλ෼ੳج൫Λ࡞Γ·ͨ͠ɻ • αʔόϨεɾΞʔΩςΫνϟͷΈͰ 
 ڧ͘Φεεϝ͍͖͍ͯͨ͠Ͱ͢ɻ ग़య:

ࠓճ͓࿩Ͱ͖ͳ͔ͬͨ͜ͱ • DataprocʢSparkʣͷৄ͍͠࿩. ಛʹServerlessΛ࢖ͬͨ݅. • Cloud Functionsୈೋੈ୅Λ࢖ͬͨόονॲཧͱόοΫΤϯυAPI Ͳ͔͜ͷػձͰ·ͨLTͳͲͰ͖ͨΒͱࢥ͓ͬͯΓ·͢👍

