Slide 1

Slide 1 text

WebϝσΟΞͰͷDataFlow׆༻ GCPUG Tokyo Dataflow Day May 2017
 
 ΦʔϧΞό΢τ ٢ా ୓࣮(ytakky)

Slide 2

Slide 2 text

ࣗݾ঺հ • ٢ా ୓࣮ (Takumi Yoshida) • @y_takky2014 https://twitter.com/y_takky2014
 https://github.com/ytakky2014
 
 ৽ଔೖࣾ4೥໨ 
 ٕज़ج൫G
 - DevOpsਪਐ
 - ίϯςφ,GKEͷಋೖ
 - ࣗಈԽਪਐ
 - ։ൃऀدΓ

Slide 3

Slide 3 text

All Aboutͱ͸ https://allabout.co.jp

Slide 4

Slide 4 text

ϝσΟΞʹ͓͚Δσʔλ׆༻ • ྫ͑͹ • هࣄ಺༰ͷ෼ੳ • ͜͏͢Ε͹όζΓ΍͍͢ 
 • Ϩίϝϯυ • Ϣʔβʹऔͬͯ༗ӹͳؔ࿈هࣄΛఏڙ͍ͨ͠
 • ޿ࠂ഑৴ • Ϣʔβʹͱͬͯ༗ӹͳ޿ࠂΛ࠷దͳ৔ॴʹग़͍ͨ͠

Slide 5

Slide 5 text

γεςϜཁ݅ • ߴ଎ԽɾεέʔϧԽ͕Ͱ͖Δ • OpsνʔϜͷӡ༻Λগͳ͍ͨ͘͠ • σʔλྔ(ੜσʔλ)͸1TB/݄͙Β͍
 σʔλྔ͕૿͑ͯ΋଱͑ΒΕΔΑ͏ʹ • Ϣʔβ࠷దԽ : ػցֶशΛ࢖༻͢Δ
 ػցֶशͷΞϧΰϦζϜมߋ΍TensorFlowͷಋೖͳͲ
 ʹ଱͑ΒΕΔΑ͏ʹ͍ͨ͠

Slide 6

Slide 6 text

Google Cloud Platform

Slide 7

Slide 7 text

GCP • BigDataॲཧʹڧΈΛ࣋ͭΫϥ΢υ • ༷ʑͳϚωδϝϯταʔϏε 
 + ࣾ಺ࣄ৘ 
 (ϑϩϯτΞϓϦέʔγϣϯ͕
 GKEʹσϓϩΠ͞Ε͍ͯΔ )
 ※https://allabout.co.jpͷதͷҰ෦

Slide 8

Slide 8 text

GCPͷϚωδϝϯταʔϏε

Slide 9

Slide 9 text

ࠓճߏஙͨ͠ΞʔΩςΫνϟ :

Slide 10

Slide 10 text

ࠓճߏஙͨ͠ΞʔΩςΫνϟ :

Slide 11

Slide 11 text

fluentd • fluent-plugin-bigquery Λ࢖͏
 https://github.com/kaizenplatform/fluent-plugin-bigquery
 • record_reformerͳͲͰطଘͷॲཧʹλάΛ௥Ճͯ͠
 fluent-plugin-bigqueryͰ౤͛Δ͚ͩ
 


Slide 12

Slide 12 text

:

Slide 13

Slide 13 text

• BQ͔Βॲཧର৅σʔλΛऔಘ • DataStore͔Βલճͷֶश݁ՌΛऔಘ • ֶशΛฒྻॲཧͤ͞Δ • ݁ՌΛDataStoreʹ֨ೲ͢Δ • PythonSDKͰ࣮૷ • ϦϦʔε௚લͰGAʹͳͬͨ DataflowͰ΍ͬͯΔ͜ͱ

Slide 14

Slide 14 text

ฒྻॲཧͷ༷ࢠ

Slide 15

Slide 15 text

ฒྻॲཧͷΠϝʔδ લॲཧ લॲཧ લॲཧ ֶश ֨ೲ ֨ೲ ֶश ֶश ֶश ֶश ֶश

Slide 16

Slide 16 text

TIPS • Log • ඪ४ग़ྗʹు͍͓͚ͯ͹StackDriver Logging্Ͱ
 ֬ೝͰ͖Δ BB my_job NZ@KPC

Slide 17

Slide 17 text

TIPS2 • ศརͳίϚϯυ • gcloud dataflow jobs list • jobͷ࣮ߦ݁Ռͷ֬ೝ • --created-after , --created-before Ͱ࣌ؒߜࠐ • gcloud dataflow jobs list \ 
 --created-after=“2017-05-22 15:00:00” \
 --created-before="2017-05-22 16:00:00" 
 15:00-16:00ͷjob࣮ߦ݁Ռ͕Θ͔Δ

Slide 18

Slide 18 text

ࠓճߏஙͨ͠ΞʔΩςΫνϟ :

Slide 19

Slide 19 text

ࠓճߏஙͨ͠ΞʔΩςΫνϟ :

Slide 20

Slide 20 text

• DataStore͔ΒֶशࡁΈ݁ՌΛऔಘ • Cloud DataStore Client Libraries • https://cloud.google.com/datastore/docs/reference/ libraries Front ApplicationͰ΍͍ͬͯΔ͜ͱ

Slide 21

Slide 21 text

ίετ໘ • DataStore : ¥3,000ఔ౓ • Dataflow : ¥1,100ఔ౓ • vCPU Time Batch US: 111.657 ࣌ؒ : ¥700 • RAM Time US: 418.72GB/࣌ؒ : ¥200 • Local Disk Time PD Standard: 27914.631 GB/࣌ؒ: ¥200 • ࢀߟ Πϯελϯεྉۚ • n1-standard-1 US /݄ :$24.27 ≒ 2700ԁఔ౓

Slide 22

Slide 22 text

·ͱΊ • GCPͷྗΛआΓΔ͜ͱͰ
 NoOpsͰ෼ࢄॲཧج൫ߏங͕Մೳ • ϚωδϝϯταʔϏεΛ্ख͘׆༻ • PythonSDKΛ࢖͏͜ͱͰػցֶशपΓָ͕ʹͰ͖Δ • Tensorflowͷલॲཧͱͯ͠DataflowΛ࢖͏ɺ
 Έ͍ͨͳࣄྫ΋ग़͍ͯΔ • ίετ࡟ݮ • ΠϯελϯεΛཱͯͨΓམͱͨ͠ΓΛߟ͑ͳ͍͍ͯ͘

Slide 23

Slide 23 text

ΦʔϧΞό΢τɹςοΫϒϩά ݕࡧ PythonSDKΛ࡞࣮ͬͨ૷ʹ͍ͭͯ΋ެ։༧ఆ! http://allabout-tech.hatenablog.com/entry/2017/05/24/094600 ࠓճͷ಺༰ ͷϒϩά