Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Composer & Dataflow によるバッチETLの再構築 #data_ml_engineering / 20190719

Cloud Composer & Dataflow によるバッチETLの再構築 #data_ml_engineering / 20190719

データとML周辺エンジニアリングを考える会#2の発表資料です。
https://data-engineering.connpass.com/event/136756/

yuzutas0
PRO

July 19, 2019
Tweet

More Decks by yuzutas0

Other Decks in Technology

Transcript

  1. Cloud Composer & Dataflow ʹΑΔ

    όονETLͷ࠶ߏங
    2019-07-19

    #data_ml_engineering

    presented by @yuzutas0

    https://www.pexels.com/photo/architecture-blur-building-colourful-392031/


    https://www.pexels.com/photo/architecture-blur-building-colourful-392031/

    View Slide

  2. WEBʹެ։ࡁΈͰ͢ #data_ml_engineering

    ɹࡱӨ΍ϝϞ͸ෆཁͰ͢ɻϦϥοΫεͯ͠ฉ͍͍͚ͯͨͩΕ͹ͱࢥ͍·͢ɻ


    εϥΠυ 70+ ຕ

    ɹΞδΣϯμʲ4ʳΛॏ఺తʹɺଞ͸ϥΠτχϯάͰτʔΫ͠·͢ɻ

    ɹ࠙਌λΠϜɾSNSͰͷQ&AαϙʔτΛલఏͱͨ͠಺༰ʹͳΓ·͢ɻ
    ςΫϊϩδʔ͸෢ثͩͱࢥ͍ͬͯ·͢

    ɹ໨తɾ੍໿ʹԠͯ͡࢖͍෼͚·͠ΐ͏ɻಛఆͷٕज़ཁૉΛਪ঑͢ΔൃදͰ͸͋Γ·ͤΜɻ


    ɹ஫ҙɾ໔੹

    View Slide

  3. 1. ͸͡Ίʹ
    2. ίϯςΩετ
    3. ܭଌɺݕ౼ɺ߹ҙܗ੒
    4. ϦϏϧυˍϦϦʔε
    5. ͓ΘΓʹ


    ɹΞδΣϯμ

    View Slide

  4. ɹ@yuzutas0
    ɹɹ




    View Slide

  5. ɹաڈͷొஃࢿྉ
    σʔλج൫ͷϊ΢ϋ΢ɾ஌ݟΛఏڙ͍ͯ͠·͢




    PyCon JP ϕεττʔΫΞϫʔυ༏ल৆
    σϒαϛՆ Ξϯέʔτຬ଍౓No.1

    View Slide

  6. ʮ࠶ߏஙʯͷࣄྫΛఏڙ͢Δ
    ͋͘·Ͱ1ͭͷࣄྫͳͷͰ

    ࣗ͝਎ͷٕज़ཁૉ΍૊৫ঢ়گͱൺ΂ͳ͕Βߟ͑ͯ

    ࣗ෼ͳΓͷֶͼΛಘ͍ͯͩ͘͞


    ɹຊ೔ͷझࢫ

    View Slide

  7. ϩάऩू΍ETLʹ͍ͭͯ

    γεςϜߏஙɾӡ༻ͷ࣮຿Λ୲͏

    ιϑτ΢ΣΞΤϯδχΞ
    ͱɺͦͷΫϥΠΞϯτɾϚωʔδϟʔʢʹͳΔ༧ఆͷਓʣ


    ɹຊ೔ͷ૝ఆλʔήοτ

    View Slide

  8. 1. ͸͡Ίʹ
    2. ίϯςΩετ
    3. ܭଌɺݕ౼ɺ߹ҙܗ੒
    4. ϦϏϧυˍϦϦʔε
    5. ͓ΘΓʹ


    ɹΞδΣϯμ

    View Slide



  9. ɹϝϧΧϦʢCtoCϑϦϚʣ

    View Slide

  10. FY2019.6 3Q ܾࢉઆ໌ձࢿྉ
    https://pdf.irpocket.com/C4385/eHSm/vwwn/oECA.pdf


    ɹࣄۀ੒௕ʢʹσʔλ૿ྔʣ

    View Slide



  11. ɹάϩʔόϧɾ৽نࣄۀ

    View Slide

  12. https://speakerdeck.com/hik0107/mercari-bi-team-data-analytics-summit-2018


    ɹੵۃతͳσʔλ׆༻

    View Slide

  13. ɾϓϩμΫτ͕৳ͼ͍ͯΔ
    ɾσʔλྔ͕ٸܹʹ૿͍͑ͯΔ

    ɾάϩʔόϧ΍৽نࣄۀΛ৳͹͢ମ੍Λ࡞͍ͬͯΔ
    ɾ෼ੳ΍MLͳͲσʔλΛੵۃతʹ׆༻͍ͯ͠Δ


    ɹ·ͱΊ of ಛ௃

    View Slide

  14. ʮBQͷσʔλ͕ߋ৽͞Ε͍ͯͳ͍ΜͰ͚͢Ͳʂʯ

    ʢҰ෦ͷςʔϒϧ͸൒݄΋ࢭ·͍ͬͯͨʣ


    ɹݱ৔Ͱੜ͍ͯͨ͡՝୊

    View Slide



  15. ɹ੒௕௧
    ϓϩμΫτˢ σʔλˢ ෛՙˢ ར༻ऀˢ
    Good Good Bad Bad

    γεςϜɺବ໨Ͱ͢ʂ ߋ৽͞Ε͍ͯͳ͍΍Μʂ

    View Slide



  16. ɹྺ࢙తܦҢ
    ETL System
    ETL

    for US
    ETL

    for JP
    ࡞ͬͨʂ
    ϝϯςʂ
    US Team
    ຊۀͷ๣Β
    ળҙͰαϙʔτ

    ʢਖ਼௚ݶք͕͋Δʣ
    JP SRE
    JP BI
    JP΋ཉ͍͠ʂ

    ૬৐Γͤͯ͞ʂ
    ґཔ
    USΞϓϦΛ
    ྑ͘͢Δͧʂ
    JPΞϓϦຊ൪؀ڥ

    ͕࠷༏ઌͩʂ
    ෼ੳۀ຿ʹ
    ઐ೦͢Δͧʂ
    ETL

    for UK

    View Slide



  17. ɹ͜ͷҊ݅ͷείʔϓᶃ
    ϓϩμΫτ
    Ϣʔβʔ
    DBɾϩά
    ࢪࡦɾۀ຿
    BigQuery
    ऩू ૄ௨
    ׆༻
    Ձ஋
    %BUB0QTʹ͓͍ͯ
    ࠷େԽ͢΂͖໨తม਺

    View Slide



  18. ɹ͜ͷҊ݅ͷείʔϓᶄ
    Other
    Product

    DB
    .POPMJUI
    "11#&
    Other Other
    BigQuery
    ॱ࣍Ҡ؅༧ఆ
    Read
    Only
    Replica
    ػີ৘ใ
    ϚεΩϯά
    DB
    .JDSP

    TFSWJDFT
    DB
    .JDSP

    TFSWJDFT DB
    .JDSP

    TFSWJDFT
    ੴङDC
    GCP

    View Slide

  19. ɾ൒݄΋ߋ৽͞Ε͍ͯͳ͍σʔλ
    ɾ͋ͳͨͩͬͨΒͲ͏͠·͔͢ʁ


    ɹToday’s Issue

    View Slide

  20. 1. ͸͡Ίʹ
    2. ίϯςΩετ
    3. ܭଌɺݕ౼ɺ߹ҙܗ੒
    4. ϦϏϧυˍϦϦʔε
    5. ͓ΘΓʹ


    ɹΞδΣϯμ

    View Slide



  21. ɹؔ܎ऀώΞϦϯά
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

    View Slide



  22. ɹܭଌ͢Δ
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

    View Slide

  23. ؔ܎ऀҰಉʮ༧૝ΑΓ൵ࢂͳ͜ͱʹͳ͍ͬͯΔʯ


    ɹBQߋ৽஗ԆbotΛ࡞ͬͨ

    View Slide

  24. ຖ࣮࣌ߦ
    dataset.__TABLES__ ΛSELECT

    ϝλ৘ใΛεφοϓγϣοτอଘ
    pandas.read_csv() Ͱऔಘ

    νΣοΫ࣌ؒɺର৅ςʔϒϧ

    ௨஌ઌνϟϯωϧ
    pandas.read_gbq() Ͱ
    ςʔϒϧ໊ͱ

    ࠷ऴߋ৽೔࣌Λऔಘ
    ߋ৽༗ແΛ൑ఆ
    slackweb.Slack(). notify() Ͱ
    ࢦఆνϟϯωϧʹ௨஌


    ɹBQ update checker / implementation
    [email protected]
    ύωϧσʔλΛ෼ੳͰ͖ΔΑ͏ʹ஝ੵ

    View Slide



  25. ɹBQ update checker / design
    http://yuzutas0.hatenablog.com/entry/2017/05/23/073000
    BigQuery

    View Slide



  26. ɹBQ update checker / docs for user (1)

    View Slide



  27. ɹBQ update checker / docs for user (2)

    View Slide



  28. ɹՄࢹԽ → ߹ҙܗ੒
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ
    ༏ઌॱΛ্͛ͯରԠʂ

    View Slide



  29. ɹԆ໋͢Δ
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

    View Slide


  30. ΞφϦετͱҰॹʹʮͱΓ͋͑ͣϦτϥΠʯ


    ஗Ԇ͍ͯ͠ͳ͍ςʔϒϧͷ࿈ܞ·Ͱಓ࿈ΕͰશ໓

    ʢೋ࣍ࡂ֐ʣ
    ʮར༻ऀ͕૝ఆ͍ͯ͠Δ΄Ͳ؆୯ͳঢ়گͰ͸ͳ͍ʯ͕ՄࢹԽ͞Εͨ



    ɹ࢑ఆରԠ
    IUUQTXXXQFYFMTDPNQIPUPCSPXOBOEXIJUFUBCCZLJUUFO

    View Slide

  31. USݖݶΛ࢑ఆൃߦͯ͠΋Βͬͯௐࠪ։࢝

    ॏ͗ͯ͢؅ཧը໘͕։͚ͳ͍
    ίπΛڭ͑ͯ΋Β͏ͱ͜Ζ͔Β……
    http://{ip_or_domain}/admin/airflow/tree?dag_id={id}&num_runs=1


    ɹ҉த໛ࡧ
    IUUQTXXXQFYFMTDPNQIPUPHSFZDPODSFUFSPBE

    View Slide

  32. ɾσʔλ૿Ճʹ൐͏λΠϜΞ΢τ͕ଟൃ

    ɾશδϣϒ͕௚ྻ࣮ߦͰޙଓॲཧΛר͖ࠐΉ

    ʢJDBC→DBͷΞΫηεෛՙΛ཈͑ΔҙਤͰͷઃܭʣ

    ɾUSνʔϜ΋ಉ͡࢓૊Έ͕ͩδϣϒͷ෼͚ํΛ޻෉

    ɾJP͸ͦ͜·Ͱग़དྷ͍ͯͳ͔ͬͨ

    ʢ૬৐Γʴยखؒͷળҙαϙʔτͩͱݶք͕͋Δʣ


    ɹௐࠪ

    View Slide

  33. Ԧಓͷखஈͱͯ͠͸USνʔϜͱಉ༷ͷνϡʔχϯά
    ʢ҆қͳ࠶ߏஙʹಀ͛ͳ͍ʂʣ

    ͨͩ͠

    ɾ࢓૊ΈΛΩϟονΞοϓ͢Δͱ͜Ζ͔Βελʔτ
    ɾෛՙͰΤϥʔ͕ى͖͍ͯΔطଘγεςϜӨڹΛߟྀ͠ͳ͕Β࡞ۀ


    ɹνϡʔχϯά͔ʁ

    View Slide

  34. ϝϧϖΠDataplatformTeam͔ΒఏҊ

    ʮ͜ΜͳΜ࡞ͬͨΜ͚ͩͲྑ͔ͬͨΒԣల։͠·ͤΜʁʯ


    ɹϦϏϧυ͔ʁ
    ϝϧϖΠʹ͓͚Δେن໛όονॲཧ - Mercari Engineering Blog

    https://tech.mercari.com/entry/2019/06/05/120000

    View Slide

  35. ̋ ̋
    ˕ ˕


    ɹൺֱݕ౼
    γεςϜ αϙʔτ
    64

    &5-4ZTUFN
    "JSqPXPO(,&4QBSLFBSMZ


    νϡʔχϯά͢Ε͹ػೳཁ݅ΛຬͨͤΔ ͸ͣ

    ஍ཧɾ͕࣌ࠩ͋Δ


    ඇಉظͰ૬ஊ͸Մೳ
    .FSQBZ

    #BUDI1JQFMJOF
    $MPVE$PNQPTFS%BUBqPXMBUFMZ


    ػೳཁ݅ΛຬͨͤΔ

    GVMMNBOBHFEͰ૬ରతʹ࢖͍΍͍͢ ͸ͣ

    ෺ཧతʹΦϑΟε͕͍ۙ


    ૬ஊ͠΍͍͢

    View Slide

  36. ໌Β͔ʹ “ETLγεςϜઃܭ” ͷ໰୊Ͱ͸ͳ͘

    ”JPઐ೚ϝϯςφͷ௕ظෆࡏ” ͱ “ͦ͏ͳΔʹࢸͬͨ૊৫తྗֶ” ͕

    ਅʹղ͘΂͖Πγϡʔ


    “σʔλૄ௨͕ࢭ·͍ͬͯΔ” ͸ණࢁͷҰ֯

    ͳΔ΂͘ϚΠϯυγΣΞΛׂ͔ͣʹࡁΉΑ͏ʹ

    “͍͔ʹٕज़໘ͰϥΫͯ͠ରԠ͢Δ͔” ͕ҙࢥܾఆͷ࣠ͱͳΔ


    ɹҙࢥܾఆͷϙΠϯτ
    [email protected]

    View Slide


  37. https://www.pexels.com/photo/architecture-blur-building-colourful-392031/
    ࠶ߏஙʴར༻੾ସͷ΄͏͕ૣ͘׬ྃͰ͖Δͱ൑அ
    ʢ҆қͳ࠶ߏஙʹಀ͛·ͨ͠ʂʣ


    ɹϦϏϧυʂ
    ͪͳΈʹΦν͸
    ɹᶃϝϧϖΠͷύΠϓϥΠϯ͸ϑϧGCPલఏͷߏ੒ͳͷͰɺͦͷ··ͷԣల։͸ग़དྷͳ͔ͬͨ

    ɹᶄUSνʔϜ͸USνʔϜͰ஗ԆՄࢹԽΛड͚ͯJPͷδϣϒΛվमͯͩͬͨ͘͠͞

    View Slide



  38. ɹՄࢹԽ → ߹ҙܗ੒
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ
    ϑΥʔΧε͢Δ
    ࢑ఆରԠʹ࣌ؒɾ࿑ྗΛׂ͔ͳ͍

    View Slide

  39. ɾ୅ସςʔϒϧͷ֓ࢉ஋Ͱࡁ·ͤΔ
    ɾBQʹͳ͍σʔλΛεΫϦϓτͰࢀর͢Δ
    ɾ஌ݟ΍πʔϧΛੵۃతʹڞ༗͠߹͏
    ෆ҆ఆͳγεςϜʹա৒ґଘͤͣʹۀ຿Λ਱ߦ͢ΔੌΈ͕͋Δοʂ
    ʢ໨త஍ʹḷΓணͨ͘Ίͷखஈɾܦ࿏͸1ͭͰ͸ͳ͍ʣ


    ɹΞφϦετͷ޻෉͕͋ͬͯͦ͜
    https://www.pexels.com/photo/group-hand-fist-bump-1068523/

    View Slide



  40. ɹ߹ҙܗ੒·ͱΊ
    ՝୊ ղܾ
    ΞφϦετ ʮࢭ·ͬͯΔʂʯ ʮࠓ΄͍͠ʂ࢑ఆରԠΛʂʯ
    σϕϩού ʮݴ͏΄Ͳ͔ʁʯ ʮ࠶ߏஙͨ͠΄͏͕͍͍ʂʯ

    View Slide

  41. 1. ͸͡Ίʹ
    2. ίϯςΩετ
    3. ܭଌɺݕ౼ɺ߹ҙܗ੒
    4. ϦϏϧυˍϦϦʔε
    5. ͓ΘΓʹ


    ɹΞδΣϯμ

    View Slide



  42. ɹγεςϜߏ੒
    Replica DB

    View Slide



  43. ɹγεςϜߏ੒
    Replica DB
    ͜͜͸

    !TJSPLFO͞Μ͕
    ྑ͍ײ͡ʹ

    ΍ͬͯ͘Ε·ͨ͠

    View Slide



  44. ɹγεςϜߏ੒
    Replica DB
    ͜͜Λ࿩͠·͢

    View Slide



  45. ɹCloud Composer: DAG Runs
    ᶃόϦσʔγϣϯ
    ᶄDataflow࣮ߦ
    ᶅGCSϑΝΠϧऔಘ
    ᶆBQ Load (ࠩ෼ or શ݅)

    View Slide



  46. ɹComposer → Dataflow
    ʢਖ਼֬ʹ͸GCS্ʹ഑උ͞Ε͍ͯΔʣTemplate Λࢦఆͯ͠
    Cloud Dataflow ʹ࣮ߦ໋ྩΛૹΔ

    View Slide



  47. ɹCloud Dataflow: ETL
    ᶃGCS͔ΒdumpϑΝΠϧΛread
    ᶄѱຐվ଄ͷม׵ॲཧͰσʔλΛmodify
    ᶅGCSʹBQ LoadableͳϑΝΠϧΛwrite
    ಈ࡞֬ೝͰΤϥʔΛ௵͠ͳ͕Β
    ม׵ॲཧΛ࡞ΓࠐΉ
    ※ΤϯϋϯεͷͨΊ࠷৽ঢ়گͱဃ཭͕͋Γ·͢ɻ

    View Slide



  48. ɹWhy Dataflow?
    ɾmysqldumpͷTSVϑΥʔϚοτͰ͸BigQueryʹLoadͰ͖ͳ͍ → ཁ੔ܗ

    ɹɹɾdouble-quotation-marks escaped by double-quotation-marks in double-quotation-marks

    ɹɹɾnew-line escaped by double backslashes

    ɾσʔλྔ͕ଟ͍ͷͰDBෛՙˍύϑΥʔϚϯε؍఺͔Β

    ɹεέʔϥϏϦςΟͷߴ͍DataflowʹॲཧΛدͤͨ


    ɾDataflow͸ม׵૷ஔͱͯ͠੹຿ΛׂΓ੾͍ͬͯΔͷͰ
    ɹDataflow → BigQuery ʹ௚LoadͤͣɺGCSʹม׵ޙϑΝΠϧΛஔ͍͍ͯΔ


    ɾ࣮ߦ؀ڥ͸Python3.5 (supported at Apache Beam 2.11.0 / Mar 5, 2019)

    View Slide



  49. ɹDataflow Onboard by @rilmayer_jp

    View Slide



  50. ɹTest Code for Transform
    σόοάͰΤϥʔ͕ग़ͨ

    σʔλύλʔϯΛςετʹ࢖͏
    σόοάͰΤϥʔ͕ग़ͨ

    ςʔϒϧͷσʔλΛςετʹ࢖͏
    beamϞδϡʔϧ͸

    MagicMockʹͯ͠
    ϩδοΫ෦෼͚ͩ
    ίʔυͰςετ

    View Slide



  51. ɹComposer → BQ: શ݅ߋ৽
    GCS → BQ Load

    View Slide



  52. ɹComposer → BQ: ࠩ෼ߋ৽
    ݩςʔϒϧ + tmpςʔϒϧ

    ˠ Union ALL → ॏෳআڈ → ্ॻ͖
    tmpςʔϒϧΛ࡟আ
    ࠩ෼σʔλΛtmpςʔϒϧʹload
    ৄ͘͠͸ҎԼͷهࣄΛࢀর͍ͩ͘͞ʂ
    ਺ඦGBͷσʔλΛMySQL͔ΒBigQuery΁ಉظ͢Δ

    https://tech.mercari.com/entry/2018/06/28/100000

    View Slide



  53. ɹRebuilt BQ / docs for user (1)

    View Slide



  54. ɹRebuilt BQ / docs for user (2)

    View Slide



  55. ɹRebuilt BQ / docs for user (3)
    ʢ൒݄์ஔ͞Ε͍ͯΔʣݱঢ়ΑΓ͸

    ʮϚγʹͳΔʯͰσʔλར༻ऀͱѲΔ
    ɹɾա৒඼࣭ʹ͠ͳ͍
    ɹɾܭଌʢ஗Ԇ؂ࢹʣͱαϙʔτ͸໌ه
    ɹɾᐆດͳ΋ͷ͸ᐆດͰ͋Δ͜ͱΛ໌ه

    View Slide

  56. Ұ෦νʔϜʹఏڙ → ؀ڥґଘͷো֐ → ݕ஌ɾՐফ͠ɾରԠϑϩʔͷ੔උ


    ɹCanary Release

    View Slide

  57. Sprint + Increment: ܧଓతվળͷϦζϜΛ࡞Δ


    ɹִिස౓Ͱஈ֊ϦϦʔε
    W W W
    0QT
    Ұ෦ͷνʔϜ͔Βఏڙ ࣍ͷνʔϜʹ΋ఏڙ ʜʜ
    ར༻Ҋ಺W

    2"ɾϑΟʔυόοΫ
    ར༻Ҋ಺W
    2"ɾϑΟʔυόοΫ
    ʜʜ
    %BUB
    શ݅ߋ৽ͰࡁΉςʔϒϧ ࠩ෼ߋ৽͠ͳ͍ͱਏ͍ςʔϒϧ ʜʜ
    NZTRMEVNQͰ$47ϑΝΠϧ͕

    (#ҎԼʹ෼ׂ͞ΕΔςʔϒϧ
    %BUBqPXͰ$47Λ෼ׂ͠ͳ͍ͱ

    #2-PBE͕ࣦഊ͢Δςʔϒϧ
    ʜʜ
    վળ
    վળ
    վળ
    վળ
    վળ
    վળ
    վળ
    վળ

    View Slide

  58. 7hͰλΠϜΞ΢τ͍ͯͨ͠ߪങσʔλ࿈ܞ͕ɺ2.5hͰແࣄʹSuccessʂ
    01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00
    Before
    After


    ɹ݁Ռ



    લͷॲཧ

    View Slide

  59. 1. ͸͡Ίʹ
    2. ίϯςΩετ
    3. ܭଌɺݕ౼ɺ߹ҙܗ੒
    4. ϦϏϧυˍϦϦʔε
    5. ͓ΘΓʹ


    ɹΞδΣϯμ

    View Slide

  60. ݸੑ๛͔ͳλϨϯτϓϨΠϠʔ͕ଟ͍૊৫ͳͷͰ
    ࣗ෼ͷྲّྀ΍ઃܭࢥ૝Λԡ͠௨͢ͷͰ͸ͳ͘
    ӢͷΑ͏ʹॊೈʹܗΛม͑ͯʢCloudʣ
    ࢦشऀͷΑ͏ʹશମΛݟ౉͠ʢComposerʣ
    ৘ใͷྲྀΕΛ੔ཧ͠ͳ͕ΒਐΊͨʢDataflowʣ
    ·͞ʹ "Cloud Composer & Dataflow ʹΑΔόονETLͷ࠶ߏங”


    ɹҙࣝͨ͜͠ͱ
    https://www.pexels.com/photo/hd-457881/

    View Slide

  61. [BI / PM] @mattsun, @shoei, @hase-ryo, @hikaru, @nakatomo,

    ɹɹɹɹ @natsume, @igachan-san, @tsudar, @anboo, @hiza

    [JP Dev] @siroken3, @shoe116, @ichirin2501, @bokko, @catatsuy, @shinpei

    [Merpay Dev] @laughingman7743, @syucream, @cocoiti, @kazegusuri, @sfujjiwara

    [US Dev/ML] @hatone, @yu

    [JP ML / Search] @furusawa, @tairosan


    ɹSpecial Thanks
    account-name in team Slack

    View Slide



  62. ɹࠓޙͷ՝୊ of Batch ETL in Mercari JP
    ୹ظ
    l࢖ΘΕΔzج൫ͷຏ͖ࠐΈ
    ϓϩμΫτϚωδϝϯτγεςϜ։ൃ
    XJUI#*43&%BUB1MBUGPSN
    தظ
    lഁյͱ૑଄z͔Βlܭଌͱվળz΁ͷγϑτ
    αʔϏεϚωδϝϯτʢ*5*-ʣσʔλϚωδϝϯτʢ%.#0,ʣ
    XJUIIBTFSZPTBO
    ௕ظ
    lہॴ࠷దz͔Βͷ୤٫
    શࣾσʔλઓུࡦఆʢ%BUB0QTʣ
    XJUIUBJSPTBO

    View Slide

  63. ݈શͳ෼ੳ͸

    ݈શͳσʔλͷ্ʹ੒Γཱͪ·͢

    ݈શͳσʔλ͸

    ݈શͳϓϩηεͱγεςϜͷ্ʹ੒Γཱͪ·͢
    ·ͣ͸໨ͷલͷখ͞ͳ1า͔Β

    σʔλΛ੔උ͍͖ͯ͠·͠ΐ͏ʂ


    ɹ·ͱΊ

    View Slide

  64. ๛෋ͳσʔλ׆༻ࣄྫͱ߹Θͤͯ
    Ҋ݅ɾϓϩηεɾγεςϜɾνʔϜɾΧϧνϟʔΛ

    ͍͔ʹ݈શͳঢ়ଶ΁ͱϋοΫ͢Δ͔͝঺հ


    ɹએ఻

    View Slide

  65. ݽ܉ฃಆͰؤு͍ͬͯΔݱ৔୲౰ͷօ༷

    ݱঢ়Λෆ҆ࢹ͍ͯ͠ΔϚωʔδϟʔͷօ༷

    ͥͻ @yuzutas0 ʹ͓੠ֻ͚͍ͩ͘͞

    AsIs → ToBe ొΓํͷ੔ཧΛ͓ख఻͍͠·͢


    ɹަྲྀλΠϜʹ޲͚ͯ

    View Slide

  66. ྫ͑͹Cloud Dataflow͸खܰʹεέʔϧͰ͖ΔҰํͰίετ΋ֻ͔Γ·͢
    ࣄۀن໛΍׆༻ํ๏ʹΑͬͯ͸ROI؍఺ͰϖΠ͠ͳ͍͔΋͠Ε·ͤΜ
    ɾεέʔϥϒϧͳγεςϜΛ࡞Δલʹ΍Δ͜ͱ͸ࢁఔ͋ΔͷͰ͸ʁ

    ɾද໘తͳٕज़ཁૉΛऔΓೖΕΔ͜ͱ͕໨తԽ͍ͯ͠ͳ͍ʁ
    ɾͦͷσʔλૄ௨Ͱຊ౰ʹܦӦ՝୊ΛղܾͰ͖Δʁ
    ҆қͳγεςϜ։ൃʹඈͼͭ͘લʹɺͥͻҰ౓ߟ͑ͯΈ͍ͯͩ͘͞


    ɹ஫ҙɿਖ਼͍͠΋ͷΛɺਖ਼͘͠࡞Γ·͠ΐ͏

    View Slide

  67. ʮ࠶ߏஙʯͷࣄྫΛఏڙ͢Δ
    ͋͘·Ͱ1ͭͷࣄྫͳͷͰ

    ࣗ͝਎ͷٕज़ཁૉ΍૊৫ঢ়گͱൺ΂ͳ͕Βߟ͑ͯ

    ࣗ෼ͳΓͷֶͼΛಘ͍ͯͩ͘͞


    ɹຊ೔ͷझࢫʢ࠶ܝʣ

    View Slide

  68. ࢲ͸͜͏͠·ͨ͠ɻ
    ͋ͳͨͩͬͨΒͲ͏͠·͔͢ʁ


    View Slide

  69. ͋ͳ͕ͨ͝୲౰͍ͯ͠Δ

    ϏδωεɺϓϩηεɺγεςϜɺνʔϜɺΧϧνϟʔͱ
    Ͳ͕͜ಉ͡Ͱ͔ͨ͠ʁͲ͕͜ҧ͍·͔ͨ͠ʁ
    ͦͷڞ௨ɾࠩҟ͸ɺͳͥੜ͍ͯ͡·͔͢ʁ


    View Slide

  70. ͋ͳͨͷ୲౰ݱ৔͸ࠓͷঢ়ଶ͕ϕετͰ͔͢ʁ

    ͦΕͱ΋վળ༨஍͸͋Γͦ͏Ͱ͔͢ʁ
    খͯ͘͞΋͍͍ͷͰม͑ΒΕΔ͜ͱ͸͋Γ·͔͢ʁ


    View Slide

  71. ࠓ͙͢1ͭΞΫγϣϯΛى͜͢ͱͨ͠Β
    Կ͕Ͱ͖ͦ͏Ͱ͔͢ʁ


    View Slide


  72. https://www.pexels.com/photo/architecture-blur-building-colourful-392031/


    ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠

    View Slide