Slide 1

Slide 1 text

Cloud Composer & Dataflow ʹΑΔ
 όονETLͷ࠶ߏங 2019-07-19
 #data_ml_engineering
 presented by @yuzutas0 
 https://www.pexels.com/photo/architecture-blur-building-colourful-392031/ https://www.pexels.com/photo/architecture-blur-building-colourful-392031/

Slide 2

Slide 2 text

WEBʹެ։ࡁΈͰ͢ #data_ml_engineering
 ɹࡱӨ΍ϝϞ͸ෆཁͰ͢ɻϦϥοΫεͯ͠ฉ͍͍͚ͯͨͩΕ͹ͱࢥ͍·͢ɻ
 
 εϥΠυ 70+ ຕ
 ɹΞδΣϯμʲ4ʳΛॏ఺తʹɺଞ͸ϥΠτχϯάͰτʔΫ͠·͢ɻ
 ɹ࠙਌λΠϜɾSNSͰͷQ&AαϙʔτΛલఏͱͨ͠಺༰ʹͳΓ·͢ɻ ςΫϊϩδʔ͸෢ثͩͱࢥ͍ͬͯ·͢
 ɹ໨తɾ੍໿ʹԠͯ͡࢖͍෼͚·͠ΐ͏ɻಛఆͷٕज़ཁૉΛਪ঑͢ΔൃදͰ͸͋Γ·ͤΜɻ ɹ஫ҙɾ໔੹

Slide 3

Slide 3 text

1. ͸͡Ίʹ 2. ίϯςΩετ 3. ܭଌɺݕ౼ɺ߹ҙܗ੒ 4. ϦϏϧυˍϦϦʔε 5. ͓ΘΓʹ ɹΞδΣϯμ

Slide 4

Slide 4 text

ɹ@yuzutas0 ɹɹ 
 


Slide 5

Slide 5 text

ɹաڈͷొஃࢿྉ σʔλج൫ͷϊ΢ϋ΢ɾ஌ݟΛఏڙ͍ͯ͠·͢
 
 PyCon JP ϕεττʔΫΞϫʔυ༏ल৆ σϒαϛՆ Ξϯέʔτຬ଍౓No.1

Slide 6

Slide 6 text

ʮ࠶ߏஙʯͷࣄྫΛఏڙ͢Δ ͋͘·Ͱ1ͭͷࣄྫͳͷͰ
 ࣗ͝਎ͷٕज़ཁૉ΍૊৫ঢ়گͱൺ΂ͳ͕Βߟ͑ͯ
 ࣗ෼ͳΓͷֶͼΛಘ͍ͯͩ͘͞ ɹຊ೔ͷझࢫ

Slide 7

Slide 7 text

ϩάऩू΍ETLʹ͍ͭͯ
 γεςϜߏஙɾӡ༻ͷ࣮຿Λ୲͏
 ιϑτ΢ΣΞΤϯδχΞ ͱɺͦͷΫϥΠΞϯτɾϚωʔδϟʔʢʹͳΔ༧ఆͷਓʣ ɹຊ೔ͷ૝ఆλʔήοτ

Slide 8

Slide 8 text

1. ͸͡Ίʹ 2. ίϯςΩετ 3. ܭଌɺݕ౼ɺ߹ҙܗ੒ 4. ϦϏϧυˍϦϦʔε 5. ͓ΘΓʹ ɹΞδΣϯμ

Slide 9

Slide 9 text

ɹϝϧΧϦʢCtoCϑϦϚʣ

Slide 10

Slide 10 text

FY2019.6 3Q ܾࢉઆ໌ձࢿྉ https://pdf.irpocket.com/C4385/eHSm/vwwn/oECA.pdf ɹࣄۀ੒௕ʢʹσʔλ૿ྔʣ

Slide 11

Slide 11 text

ɹάϩʔόϧɾ৽نࣄۀ

Slide 12

Slide 12 text

https://speakerdeck.com/hik0107/mercari-bi-team-data-analytics-summit-2018 ɹੵۃతͳσʔλ׆༻

Slide 13

Slide 13 text

ɾϓϩμΫτ͕৳ͼ͍ͯΔ ɾσʔλྔ͕ٸܹʹ૿͍͑ͯΔ
 ɾάϩʔόϧ΍৽نࣄۀΛ৳͹͢ମ੍Λ࡞͍ͬͯΔ ɾ෼ੳ΍MLͳͲσʔλΛੵۃతʹ׆༻͍ͯ͠Δ ɹ·ͱΊ of ಛ௃

Slide 14

Slide 14 text

ʮBQͷσʔλ͕ߋ৽͞Ε͍ͯͳ͍ΜͰ͚͢Ͳʂʯ 
 ʢҰ෦ͷςʔϒϧ͸൒݄΋ࢭ·͍ͬͯͨʣ ɹݱ৔Ͱੜ͍ͯͨ͡՝୊

Slide 15

Slide 15 text

ɹ੒௕௧ ϓϩμΫτˢ σʔλˢ ෛՙˢ ར༻ऀˢ Good Good Bad Bad ❌ γεςϜɺବ໨Ͱ͢ʂ ߋ৽͞Ε͍ͯͳ͍΍Μʂ

Slide 16

Slide 16 text

ɹྺ࢙తܦҢ ETL System ETL
 for US ETL
 for JP ࡞ͬͨʂ ϝϯςʂ US Team ຊۀͷ๣Β ળҙͰαϙʔτ
 ʢਖ਼௚ݶք͕͋Δʣ JP SRE JP BI JP΋ཉ͍͠ʂ
 ૬৐Γͤͯ͞ʂ ґཔ USΞϓϦΛ ྑ͘͢Δͧʂ JPΞϓϦຊ൪؀ڥ
 ͕࠷༏ઌͩʂ ෼ੳۀ຿ʹ ઐ೦͢Δͧʂ ETL
 for UK

Slide 17

Slide 17 text

ɹ͜ͷҊ݅ͷείʔϓᶃ ϓϩμΫτ Ϣʔβʔ DBɾϩά ࢪࡦɾۀ຿ BigQuery ऩू ૄ௨ ׆༻ Ձ஋ %BUB0QTʹ͓͍ͯ ࠷େԽ͢΂͖໨తม਺

Slide 18

Slide 18 text

ɹ͜ͷҊ݅ͷείʔϓᶄ Other Product
 DB .POPMJUI "11#& Other Other BigQuery ॱ࣍Ҡ؅༧ఆ Read Only Replica ػີ৘ใ ϚεΩϯά DB .JDSP
 TFSWJDFT DB .JDSP
 TFSWJDFT DB .JDSP
 TFSWJDFT ੴङDC GCP

Slide 19

Slide 19 text

ɾ൒݄΋ߋ৽͞Ε͍ͯͳ͍σʔλ ɾ͋ͳͨͩͬͨΒͲ͏͠·͔͢ʁ ɹToday’s Issue

Slide 20

Slide 20 text

1. ͸͡Ίʹ 2. ίϯςΩετ 3. ܭଌɺݕ౼ɺ߹ҙܗ੒ 4. ϦϏϧυˍϦϦʔε 5. ͓ΘΓʹ ɹΞδΣϯμ

Slide 21

Slide 21 text

ɹؔ܎ऀώΞϦϯά ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

Slide 22

Slide 22 text

ɹܭଌ͢Δ ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

Slide 23

Slide 23 text

ؔ܎ऀҰಉʮ༧૝ΑΓ൵ࢂͳ͜ͱʹͳ͍ͬͯΔʯ ɹBQߋ৽஗ԆbotΛ࡞ͬͨ

Slide 24

Slide 24 text

ຖ࣮࣌ߦ dataset.__TABLES__ ΛSELECT
 ϝλ৘ใΛεφοϓγϣοτอଘ pandas.read_csv() Ͱऔಘ
 νΣοΫ࣌ؒɺର৅ςʔϒϧ
 ௨஌ઌνϟϯωϧ pandas.read_gbq() Ͱ ςʔϒϧ໊ͱ
 ࠷ऴߋ৽೔࣌Λऔಘ ߋ৽༗ແΛ൑ఆ slackweb.Slack(). notify() Ͱ ࢦఆνϟϯωϧʹ௨஌ ɹBQ update checker / implementation IUUQTXXXqBUJDPODPNGSFFJDPODTW@ ύωϧσʔλΛ෼ੳͰ͖ΔΑ͏ʹ஝ੵ

Slide 25

Slide 25 text

ɹBQ update checker / design http://yuzutas0.hatenablog.com/entry/2017/05/23/073000 BigQuery

Slide 26

Slide 26 text

ɹBQ update checker / docs for user (1)

Slide 27

Slide 27 text

ɹBQ update checker / docs for user (2)

Slide 28

Slide 28 text

ɹՄࢹԽ → ߹ҙܗ੒ ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ ༏ઌॱΛ্͛ͯରԠʂ

Slide 29

Slide 29 text

ɹԆ໋͢Δ ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

Slide 30

Slide 30 text


 ΞφϦετͱҰॹʹʮͱΓ͋͑ͣϦτϥΠʯ
 
 ஗Ԇ͍ͯ͠ͳ͍ςʔϒϧͷ࿈ܞ·Ͱಓ࿈ΕͰશ໓
 ʢೋ࣍ࡂ֐ʣ ʮར༻ऀ͕૝ఆ͍ͯ͠Δ΄Ͳ؆୯ͳঢ়گͰ͸ͳ͍ʯ͕ՄࢹԽ͞Εͨ
 ɹ࢑ఆରԠ IUUQTXXXQFYFMTDPNQIPUPCSPXOBOEXIJUFUBCCZLJUUFO

Slide 31

Slide 31 text

USݖݶΛ࢑ఆൃߦͯ͠΋Βͬͯௐࠪ։࢝
 ॏ͗ͯ͢؅ཧը໘͕։͚ͳ͍ ίπΛڭ͑ͯ΋Β͏ͱ͜Ζ͔Β…… http://{ip_or_domain}/admin/airflow/tree?dag_id={id}&num_runs=1 ɹ҉த໛ࡧ IUUQTXXXQFYFMTDPNQIPUPHSFZDPODSFUFSPBE

Slide 32

Slide 32 text

ɾσʔλ૿Ճʹ൐͏λΠϜΞ΢τ͕ଟൃ
 ɾશδϣϒ͕௚ྻ࣮ߦͰޙଓॲཧΛר͖ࠐΉ
 ʢJDBC→DBͷΞΫηεෛՙΛ཈͑ΔҙਤͰͷઃܭʣ
 ɾUSνʔϜ΋ಉ͡࢓૊Έ͕ͩδϣϒͷ෼͚ํΛ޻෉
 ɾJP͸ͦ͜·Ͱग़དྷ͍ͯͳ͔ͬͨ
 ʢ૬৐Γʴยखؒͷળҙαϙʔτͩͱݶք͕͋Δʣ ɹௐࠪ

Slide 33

Slide 33 text

Ԧಓͷखஈͱͯ͠͸USνʔϜͱಉ༷ͷνϡʔχϯά ʢ҆қͳ࠶ߏஙʹಀ͛ͳ͍ʂʣ 
 ͨͩ͠
 ɾ࢓૊ΈΛΩϟονΞοϓ͢Δͱ͜Ζ͔Βελʔτ ɾෛՙͰΤϥʔ͕ى͖͍ͯΔطଘγεςϜӨڹΛߟྀ͠ͳ͕Β࡞ۀ ɹνϡʔχϯά͔ʁ

Slide 34

Slide 34 text

ϝϧϖΠDataplatformTeam͔ΒఏҊ
 ʮ͜ΜͳΜ࡞ͬͨΜ͚ͩͲྑ͔ͬͨΒԣల։͠·ͤΜʁʯ ɹϦϏϧυ͔ʁ ϝϧϖΠʹ͓͚Δେن໛όονॲཧ - Mercari Engineering Blog
 https://tech.mercari.com/entry/2019/06/05/120000

Slide 35

Slide 35 text

̋ ̋ ˕ ˕ ɹൺֱݕ౼ γεςϜ αϙʔτ 64
 &5-4ZTUFN "JSqPXPO(,&4QBSLFBSMZ
 
 νϡʔχϯά͢Ε͹ػೳཁ݅ΛຬͨͤΔ ͸ͣ ஍ཧɾ͕࣌ࠩ͋Δ
 
 ඇಉظͰ૬ஊ͸Մೳ .FSQBZ
 #BUDI1JQFMJOF $MPVE$PNQPTFS%BUBqPXMBUFMZ
 
 ػೳཁ݅ΛຬͨͤΔ
 GVMMNBOBHFEͰ૬ରతʹ࢖͍΍͍͢ ͸ͣ ෺ཧతʹΦϑΟε͕͍ۙ
 
 ૬ஊ͠΍͍͢

Slide 36

Slide 36 text

໌Β͔ʹ “ETLγεςϜઃܭ” ͷ໰୊Ͱ͸ͳ͘
 ”JPઐ೚ϝϯςφͷ௕ظෆࡏ” ͱ “ͦ͏ͳΔʹࢸͬͨ૊৫తྗֶ” ͕
 ਅʹղ͘΂͖Πγϡʔ 
 
 “σʔλૄ௨͕ࢭ·͍ͬͯΔ” ͸ණࢁͷҰ֯
 ͳΔ΂͘ϚΠϯυγΣΞΛׂ͔ͣʹࡁΉΑ͏ʹ
 “͍͔ʹٕज़໘ͰϥΫͯ͠ରԠ͢Δ͔” ͕ҙࢥܾఆͷ࣠ͱͳΔ ɹҙࢥܾఆͷϙΠϯτ IUUQTXXXJSBTVUPZBDPNCMPHQPTU@IUNM

Slide 37

Slide 37 text


 https://www.pexels.com/photo/architecture-blur-building-colourful-392031/ ࠶ߏஙʴར༻੾ସͷ΄͏͕ૣ͘׬ྃͰ͖Δͱ൑அ ʢ҆қͳ࠶ߏஙʹಀ͛·ͨ͠ʂʣ ɹϦϏϧυʂ ͪͳΈʹΦν͸ ɹᶃϝϧϖΠͷύΠϓϥΠϯ͸ϑϧGCPલఏͷߏ੒ͳͷͰɺͦͷ··ͷԣల։͸ग़དྷͳ͔ͬͨ
 ɹᶄUSνʔϜ͸USνʔϜͰ஗ԆՄࢹԽΛड͚ͯJPͷδϣϒΛվमͯͩͬͨ͘͠͞

Slide 38

Slide 38 text

ɹՄࢹԽ → ߹ҙܗ੒ ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ ϑΥʔΧε͢Δ ࢑ఆରԠʹ࣌ؒɾ࿑ྗΛׂ͔ͳ͍

Slide 39

Slide 39 text

ɾ୅ସςʔϒϧͷ֓ࢉ஋Ͱࡁ·ͤΔ ɾBQʹͳ͍σʔλΛεΫϦϓτͰࢀর͢Δ ɾ஌ݟ΍πʔϧΛੵۃతʹڞ༗͠߹͏ ෆ҆ఆͳγεςϜʹա৒ґଘͤͣʹۀ຿Λ਱ߦ͢ΔੌΈ͕͋Δοʂ ʢ໨త஍ʹḷΓணͨ͘Ίͷखஈɾܦ࿏͸1ͭͰ͸ͳ͍ʣ ɹΞφϦετͷ޻෉͕͋ͬͯͦ͜ https://www.pexels.com/photo/group-hand-fist-bump-1068523/

Slide 40

Slide 40 text

ɹ߹ҙܗ੒·ͱΊ ՝୊ ղܾ ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

Slide 41

Slide 41 text

1. ͸͡Ίʹ 2. ίϯςΩετ 3. ܭଌɺݕ౼ɺ߹ҙܗ੒ 4. ϦϏϧυˍϦϦʔε 5. ͓ΘΓʹ ɹΞδΣϯμ

Slide 42

Slide 42 text

ɹγεςϜߏ੒ Replica DB

Slide 43

Slide 43 text

ɹγεςϜߏ੒ Replica DB ͜͜͸
 !TJSPLFO͞Μ͕ ྑ͍ײ͡ʹ
 ΍ͬͯ͘Ε·ͨ͠

Slide 44

Slide 44 text

ɹγεςϜߏ੒ Replica DB ͜͜Λ࿩͠·͢

Slide 45

Slide 45 text

ɹCloud Composer: DAG Runs ᶃόϦσʔγϣϯ ᶄDataflow࣮ߦ ᶅGCSϑΝΠϧऔಘ ᶆBQ Load (ࠩ෼ or શ݅)

Slide 46

Slide 46 text

ɹComposer → Dataflow ʢਖ਼֬ʹ͸GCS্ʹ഑උ͞Ε͍ͯΔʣTemplate Λࢦఆͯ͠ Cloud Dataflow ʹ࣮ߦ໋ྩΛૹΔ

Slide 47

Slide 47 text

ɹCloud Dataflow: ETL ᶃGCS͔ΒdumpϑΝΠϧΛread ᶄѱຐվ଄ͷม׵ॲཧͰσʔλΛmodify ᶅGCSʹBQ LoadableͳϑΝΠϧΛwrite ಈ࡞֬ೝͰΤϥʔΛ௵͠ͳ͕Β ม׵ॲཧΛ࡞ΓࠐΉ ※ΤϯϋϯεͷͨΊ࠷৽ঢ়گͱဃ཭͕͋Γ·͢ɻ

Slide 48

Slide 48 text

ɹWhy Dataflow? ɾmysqldumpͷTSVϑΥʔϚοτͰ͸BigQueryʹLoadͰ͖ͳ͍ → ཁ੔ܗ
 ɹɹɾdouble-quotation-marks escaped by double-quotation-marks in double-quotation-marks
 ɹɹɾnew-line escaped by double backslashes 
 ɾσʔλྔ͕ଟ͍ͷͰDBෛՙˍύϑΥʔϚϯε؍఺͔Β
 ɹεέʔϥϏϦςΟͷߴ͍DataflowʹॲཧΛدͤͨ
 
 ɾDataflow͸ม׵૷ஔͱͯ͠੹຿ΛׂΓ੾͍ͬͯΔͷͰ ɹDataflow → BigQuery ʹ௚LoadͤͣɺGCSʹม׵ޙϑΝΠϧΛஔ͍͍ͯΔ
 
 ɾ࣮ߦ؀ڥ͸Python3.5 (supported at Apache Beam 2.11.0 / Mar 5, 2019)

Slide 49

Slide 49 text

ɹDataflow Onboard by @rilmayer_jp

Slide 50

Slide 50 text

ɹTest Code for Transform σόοάͰΤϥʔ͕ग़ͨ
 σʔλύλʔϯΛςετʹ࢖͏ σόοάͰΤϥʔ͕ग़ͨ
 ςʔϒϧͷσʔλΛςετʹ࢖͏ beamϞδϡʔϧ͸
 MagicMockʹͯ͠ ϩδοΫ෦෼͚ͩ ίʔυͰςετ

Slide 51

Slide 51 text

ɹComposer → BQ: શ݅ߋ৽ GCS → BQ Load

Slide 52

Slide 52 text

ɹComposer → BQ: ࠩ෼ߋ৽ ݩςʔϒϧ + tmpςʔϒϧ
 ˠ Union ALL → ॏෳআڈ → ্ॻ͖ tmpςʔϒϧΛ࡟আ ࠩ෼σʔλΛtmpςʔϒϧʹload ৄ͘͠͸ҎԼͷهࣄΛࢀর͍ͩ͘͞ʂ ਺ඦGBͷσʔλΛMySQL͔ΒBigQuery΁ಉظ͢Δ
 https://tech.mercari.com/entry/2018/06/28/100000

Slide 53

Slide 53 text

ɹRebuilt BQ / docs for user (1)

Slide 54

Slide 54 text

ɹRebuilt BQ / docs for user (2)

Slide 55

Slide 55 text

ɹRebuilt BQ / docs for user (3) ʢ൒݄์ஔ͞Ε͍ͯΔʣݱঢ়ΑΓ͸
 ʮϚγʹͳΔʯͰσʔλར༻ऀͱѲΔ ɹɾա৒඼࣭ʹ͠ͳ͍ ɹɾܭଌʢ஗Ԇ؂ࢹʣͱαϙʔτ͸໌ه ɹɾᐆດͳ΋ͷ͸ᐆດͰ͋Δ͜ͱΛ໌ه

Slide 56

Slide 56 text

Ұ෦νʔϜʹఏڙ → ؀ڥґଘͷো֐ → ݕ஌ɾՐফ͠ɾରԠϑϩʔͷ੔උ ɹCanary Release

Slide 57

Slide 57 text

Sprint + Increment: ܧଓతվળͷϦζϜΛ࡞Δ ɹִिස౓Ͱஈ֊ϦϦʔε W W W 0QT Ұ෦ͷνʔϜ͔Βఏڙ ࣍ͷνʔϜʹ΋ఏڙ ʜʜ ར༻Ҋ಺W
 2"ɾϑΟʔυόοΫ ར༻Ҋ಺W 2"ɾϑΟʔυόοΫ ʜʜ %BUB શ݅ߋ৽ͰࡁΉςʔϒϧ ࠩ෼ߋ৽͠ͳ͍ͱਏ͍ςʔϒϧ ʜʜ NZTRMEVNQͰ$47ϑΝΠϧ͕
 (#ҎԼʹ෼ׂ͞ΕΔςʔϒϧ %BUBqPXͰ$47Λ෼ׂ͠ͳ͍ͱ
 #2-PBE͕ࣦഊ͢Δςʔϒϧ ʜʜ վળ վળ վળ վળ վળ վળ վળ վળ

Slide 58

Slide 58 text

7hͰλΠϜΞ΢τ͍ͯͨ͠ߪങσʔλ࿈ܞ͕ɺ2.5hͰແࣄʹSuccessʂ 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 Before After ɹ݁Ռ ❌ ✅ 
 લͷॲཧ

Slide 59

Slide 59 text

1. ͸͡Ίʹ 2. ίϯςΩετ 3. ܭଌɺݕ౼ɺ߹ҙܗ੒ 4. ϦϏϧυˍϦϦʔε 5. ͓ΘΓʹ ɹΞδΣϯμ

Slide 60

Slide 60 text

ݸੑ๛͔ͳλϨϯτϓϨΠϠʔ͕ଟ͍૊৫ͳͷͰ ࣗ෼ͷྲّྀ΍ઃܭࢥ૝Λԡ͠௨͢ͷͰ͸ͳ͘ ӢͷΑ͏ʹॊೈʹܗΛม͑ͯʢCloudʣ ࢦشऀͷΑ͏ʹશମΛݟ౉͠ʢComposerʣ ৘ใͷྲྀΕΛ੔ཧ͠ͳ͕ΒਐΊͨʢDataflowʣ ·͞ʹ "Cloud Composer & Dataflow ʹΑΔόονETLͷ࠶ߏங” ɹҙࣝͨ͜͠ͱ https://www.pexels.com/photo/hd-457881/

Slide 61

Slide 61 text

[BI / PM] @mattsun, @shoei, @hase-ryo, @hikaru, @nakatomo,
 ɹɹɹɹ @natsume, @igachan-san, @tsudar, @anboo, @hiza
 [JP Dev] @siroken3, @shoe116, @ichirin2501, @bokko, @catatsuy, @shinpei 
 [Merpay Dev] @laughingman7743, @syucream, @cocoiti, @kazegusuri, @sfujjiwara 
 [US Dev/ML] @hatone, @yu 
 [JP ML / Search] @furusawa, @tairosan ɹSpecial Thanks account-name in team Slack

Slide 62

Slide 62 text

ɹࠓޙͷ՝୊ of Batch ETL in Mercari JP ୹ظ l࢖ΘΕΔzج൫ͷຏ͖ࠐΈ ϓϩμΫτϚωδϝϯτγεςϜ։ൃ XJUI#*43&%BUB1MBUGPSN தظ lഁյͱ૑଄z͔Βlܭଌͱվળz΁ͷγϑτ αʔϏεϚωδϝϯτʢ*5*-ʣσʔλϚωδϝϯτʢ%.#0,ʣ XJUIIBTFSZPTBO ௕ظ lہॴ࠷దz͔Βͷ୤٫ શࣾσʔλઓུࡦఆʢ%BUB0QTʣ XJUIUBJSPTBO

Slide 63

Slide 63 text

݈શͳ෼ੳ͸
 ݈શͳσʔλͷ্ʹ੒Γཱͪ·͢ 
 ݈શͳσʔλ͸
 ݈શͳϓϩηεͱγεςϜͷ্ʹ੒Γཱͪ·͢ ·ͣ͸໨ͷલͷখ͞ͳ1า͔Β
 σʔλΛ੔උ͍͖ͯ͠·͠ΐ͏ʂ ɹ·ͱΊ

Slide 64

Slide 64 text

๛෋ͳσʔλ׆༻ࣄྫͱ߹Θͤͯ Ҋ݅ɾϓϩηεɾγεςϜɾνʔϜɾΧϧνϟʔΛ
 ͍͔ʹ݈શͳঢ়ଶ΁ͱϋοΫ͢Δ͔͝঺հ ɹએ఻

Slide 65

Slide 65 text

ݽ܉ฃಆͰؤு͍ͬͯΔݱ৔୲౰ͷօ༷
 ݱঢ়Λෆ҆ࢹ͍ͯ͠ΔϚωʔδϟʔͷօ༷
 ͥͻ @yuzutas0 ʹ͓੠ֻ͚͍ͩ͘͞
 AsIs → ToBe ొΓํͷ੔ཧΛ͓ख఻͍͠·͢ ɹަྲྀλΠϜʹ޲͚ͯ

Slide 66

Slide 66 text

ྫ͑͹Cloud Dataflow͸खܰʹεέʔϧͰ͖ΔҰํͰίετ΋ֻ͔Γ·͢ ࣄۀن໛΍׆༻ํ๏ʹΑͬͯ͸ROI؍఺ͰϖΠ͠ͳ͍͔΋͠Ε·ͤΜ ɾεέʔϥϒϧͳγεςϜΛ࡞Δલʹ΍Δ͜ͱ͸ࢁఔ͋ΔͷͰ͸ʁ
 ɾද໘తͳٕज़ཁૉΛऔΓೖΕΔ͜ͱ͕໨తԽ͍ͯ͠ͳ͍ʁ ɾͦͷσʔλૄ௨Ͱຊ౰ʹܦӦ՝୊ΛղܾͰ͖Δʁ ҆қͳγεςϜ։ൃʹඈͼͭ͘લʹɺͥͻҰ౓ߟ͑ͯΈ͍ͯͩ͘͞ ɹ஫ҙɿਖ਼͍͠΋ͷΛɺਖ਼͘͠࡞Γ·͠ΐ͏

Slide 67

Slide 67 text

ʮ࠶ߏஙʯͷࣄྫΛఏڙ͢Δ ͋͘·Ͱ1ͭͷࣄྫͳͷͰ
 ࣗ͝਎ͷٕज़ཁૉ΍૊৫ঢ়گͱൺ΂ͳ͕Βߟ͑ͯ
 ࣗ෼ͳΓͷֶͼΛಘ͍ͯͩ͘͞ ɹຊ೔ͷझࢫʢ࠶ܝʣ

Slide 68

Slide 68 text

ࢲ͸͜͏͠·ͨ͠ɻ ͋ͳͨͩͬͨΒͲ͏͠·͔͢ʁ

Slide 69

Slide 69 text

͋ͳ͕ͨ͝୲౰͍ͯ͠Δ
 ϏδωεɺϓϩηεɺγεςϜɺνʔϜɺΧϧνϟʔͱ Ͳ͕͜ಉ͡Ͱ͔ͨ͠ʁͲ͕͜ҧ͍·͔ͨ͠ʁ ͦͷڞ௨ɾࠩҟ͸ɺͳͥੜ͍ͯ͡·͔͢ʁ

Slide 70

Slide 70 text

͋ͳͨͷ୲౰ݱ৔͸ࠓͷঢ়ଶ͕ϕετͰ͔͢ʁ
 ͦΕͱ΋վળ༨஍͸͋Γͦ͏Ͱ͔͢ʁ খͯ͘͞΋͍͍ͷͰม͑ΒΕΔ͜ͱ͸͋Γ·͔͢ʁ

Slide 71

Slide 71 text

ࠓ͙͢1ͭΞΫγϣϯΛى͜͢ͱͨ͠Β Կ͕Ͱ͖ͦ͏Ͱ͔͢ʁ

Slide 72

Slide 72 text


 https://www.pexels.com/photo/architecture-blur-building-colourful-392031/ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠