Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DigdagでETL処理をする

 DigdagでETL処理をする

データとML周辺エンジニアリングを考える会 #2

https://data-engineering.connpass.com/event/136756/

#data_ml_engineering

tosametal

July 19, 2019
Tweet

More Decks by tosametal

Other Decks in Technology

Transcript

  1. ϩάج൫ͷߏ੒ Imp
 Server Click
 Server RTB Server Kafka Hadoop (σʔλ΢ΣΞϋ΢ε)

    Digdag Hadoop (෼ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰ؅ཧ ႈ౳ͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏ଄Խσʔλ
  2. ϓϩδΣΫτΛػೳ୯ҐͰ෼ׂ ϓϩδΣΫτͱ͸ In Digdag, workflows are packaged together with other

    files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰ͸ݱࡏ໿60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
  3. ϓϩδΣΫτ಺ͷґଘؔ܎ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig

    +subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ࢖͏͜ͱͰdigϑΝΠϧ ͷ෼ׂΛߦ͏͜ͱ͕Մೳ •requireΛ࢖͏ͱ΋͏গ͠ෳࡶͳDAGͷ දݱ΋Մೳ subtask1 subtask2 task2