Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WorkFlowEngine Digdagの導入

amesho
March 07, 2017

WorkFlowEngine Digdagの導入

amesho

March 07, 2017
Tweet

More Decks by amesho

Other Decks in Programming

Transcript

  1. Workflow Engine digdagͷಋೖ

    View Slide

  2. ࣗݾ঺հ
    • @amesho
    • Quantͷ։ൃΛ͍ͯ͠·͢
    • ୲౰Օॴ
    • όοΫΤϯυ
    • σʔλॲཧपΓͷνϡʔχϯά౳

    View Slide

  3. Agenda
    • Workflow Engine
    • Digdag
    • ϫʔΫϑϩʔͷఆٛ
    • εέδϡʔϦϯά
    • ԋࢉࢠ
    • ղܾ͞ΕΔ໰୊

    View Slide

  4. Workflow Engine
    • h#p:/
    /regional.rubykaigi.org/tokyo11/interview/frsyuki/
    • ݹڮ͞Μͷݴ༿ΛआΓΔͱ
    ϫʔΫϑϩʔΤϯδϯ͸ɺґଘؔ܎ͷ͋Δෳ਺ͷλεΫΛ࣮ߦ͢Δπʔϧ

    View Slide

  5. Workflow Engine
    ݹڮ͞Μͱ͸
    • Founder of Treasure Data, Inc.
    • MessagePack
    • Fluentd
    • Embulk

    View Slide

  6. Workflow Engine
    • OSS΍঎༻ΛؚΊΔͱ͔ͳΓͷ਺͕ग़͍ͯΔ
    • ༗໊Ͳ͜Ζ
    • Azkaban
    • Luigi
    • Airflow

    View Slide

  7. Workflow Engine
    Azkaban
    • h#ps:/
    /azkaban.github.io/
    • h#ps:/
    /github.com/azkaban/azkaban

    View Slide

  8. Workflow Engine
    luigi
    • h#ps:/
    /github.com/spo1fy/luigi

    View Slide

  9. Workflow Engine
    Airflow
    • h#ps:/
    /github.com/apache/incubator-airflow

    View Slide

  10. Digdag
    ֓ཁ
    • TD੡ͷϫʔΫϑϩʔΤϯδϯ
    • YAMLͰϑϩʔΛهड़ग़དྷΔ
    • YAMLͦͷ΋ͷ͕֦ு͞Ε͍ͯͯதͰεΫϦϓτ͕ॻ͚Δ
    • TD΍AmazonͷαʔϏεͱͷ࿈ܞ͕ڧྗ
    • λεΫΛάϧʔϓԽग़དྷΔ

    View Slide

  11. Digdag
    TDͱ͸
    • Treasure Dataࣾͷ͜ͱΛࢦ͠·͢
    • Digdagͷ࡞ऀͰ͋Δݹڮ͞Μ΋ॴଐ͍ͯ͠·͢
    • ৄࡉ͸ h/ps:/
    /www.treasuredata.com/jp/ ͪ͜ΒΛ͝ཡͩ͘͞
    ͍

    View Slide

  12. Digdag
    ॏཁͳϙΠϯτ
    • ΞΠίϯ͕͔Θ͍͍

    View Slide

  13. ϫʔΫϑϩʔͷఆٛํ๏
    • .digϑΝΠϧͷ࡞੒
    • "+"ͰλεΫ໊Λఆٛ
    • ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ
    • ม਺ͷຒΊࠐΈ
    • ฒྻ࣮ߦ
    • Τϥʔ௨஌

    View Slide

  14. ϫʔΫϑϩʔͷఆٛ
    .digϑΝΠϧͷ࡞੒
    • ఆٛ͸.digͱ͍͏֦ுࢠͷϑΝΠϧʹهड़͠·͢
    vim hello_digdag.dig

    View Slide

  15. ϫʔΫϑϩʔͷఆٛ
    "+"ͰλεΫ໊Λఆٛ
    timezone: UTC
    + task_name:

    View Slide

  16. ϫʔΫϑϩʔͷఆٛ
    ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ
    • γΣϧεΫϦϓτͷ࣮ߦ
    • Python|Rubyͷϝιου
    • ిࢠϝʔϧͷૹ৴
    • ౳ʑ

    View Slide

  17. ϫʔΫϑϩʔͷఆٛ
    ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ
    timezone: UTC
    + task_name:
    sh> /bin/touch /tmp/hello_digdag

    View Slide

  18. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    • ${...}ߏจͰJavaScriptΛ࢖༻ͯ͠ม਺Λѻ͑·͢
    • ࣌ؒܭࢉʹMoment.jsΛόϯυϧ͍ͯ͠ΔͷͰmoment()Ͱܭࢉ
    ͕ग़དྷ·͢

    View Slide

  19. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    timezone: UTC
    + task_name:
    sh> /bin/touch /tmp/hello_digdag
    + task_name2:
    echo> ${moment(session_time).utc().format("YYYY-MM-DD HH:mm:ss")}

    View Slide

  20. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    • ݴޠAPIΛ༻͍ͯม਺͕ຒΊࠐΊ·͢
    • Python|Ruby͕͋Γ·͢

    View Slide

  21. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    • RubyΛྫʹͱΓ·͢
    • ҎԼͷ2ͭͷϑΝΠϧ͕͋ͬͨ৔߹
    • workflow.dig
    • tasks/my_workflow.rb

    View Slide

  22. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    workflow.dig
    _export:
    rb:
    require: 'tasks/my_workflow'
    +step1:
    rb>: MyWorkflow.step1
    +step2:
    rb>: MyWorkflow.step2

    View Slide

  23. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    tasks/my_workflow.rb
    class MyWorkflow
    def step1
    puts "step1"
    end
    def step2
    puts "step2"
    end
    end

    View Slide

  24. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    • ม਺ఆٛ
    • Digdag.env.store
    • Digdag.env.params

    View Slide

  25. ϫʔΫϑϩʔͷఆٛ
    ม਺ͷຒΊࠐΈ
    class MyWorkflow
    def step1
    Digdag.env.store(my_value: 1)
    end
    def step2
    puts "step2: %s" % Digdag.env.params['my_value']
    end
    end

    View Slide

  26. ϫʔΫϑϩʔͷఆٛ
    ฒྻ࣮ߦ
    • _parallel: trueΛઃఆ͢Δͱάϧʔϓ಺ͷλεΫ͕ฒྻԽ
    ͞Ε·͢

    View Slide

  27. ϫʔΫϑϩʔͷఆٛ
    ฒྻ࣮ߦ
    timezone: UTC
    +run:
    _parallel: true
    +task_name:
    sh>: /usr/bin/touch /tmp/hello_digdag
    +task_name2:
    sh>: /usr/bin/touch /tmp/hello_digdag2
    +task_name3:
    sh>: /usr/bin/touch /tmp/hello_digdag3

    View Slide

  28. ϫʔΫϑϩʔͷఆٛ
    ฒྻ࣮ߦ
    ฒྻʹ࣮ߦ͞Ε·͢ʢॱ൪͸อূ͞Εͳ͍
    2017-03-07 09:23:34 +0900 [INFO] ([email protected]+hoge+run+task_name2): sh>: /usr/bin/touch /tmp/hello_digdag2
    2017-03-07 09:23:34 +0900 [INFO] ([email protected]+hoge+run+task_name): sh>: /usr/bin/touch /tmp/hello_digdag
    2017-03-07 09:23:34 +0900 [INFO] ([email protected]+hoge+run+task_name3): sh>: /usr/bin/touch /tmp/hello_digdag3

    View Slide

  29. ϫʔΫϑϩʔͷఆٛ
    Τϥʔ௨஌
    • _error:ύϥϝʔλͰઃఆ͞Ε͍ͯΔ৔߹
    • Τϥʔ͕ى͖ͨ৔߹௨஌͕͞Ε·͢

    View Slide

  30. ϫʔΫϑϩʔͷఆٛ
    Τϥʔ௨஌
    timezone: UTC
    +run:
    _error:
    sh>: /usr/bin/touch /tmp/error
    +task_name:
    sh>: /usr/touch /tmp/hello_digdag

    View Slide

  31. ϫʔΫϑϩʔͷఆٛ
    Τϥʔ௨஌
    2017-03-07 09:30:34 +0900 [INFO] ([email protected]+error+run+task_name): sh>: /usr/touch /tmp/hello_digdag
    /bin/sh: line 1: /usr/touch: No such file or directory
    2017-03-07 09:30:34 +0900 [ERROR] ([email protected]+error+run+task_name): Task failed with unexpected error: Command failed with code 127
    java.lang.RuntimeException: Command failed with code 127
    at io.digdag.standards.operator.ShOperatorFactory$ShOperator.runTask(ShOperatorFactory.java:143)
    at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
    at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:314)
    at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:674)
    at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:255)
    at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:138)
    at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
    at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:136)
    at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:120)
    at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:656)
    at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:95)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    2017-03-07 09:30:35 +0900 [INFO] ([email protected]+error+run^error): sh>: /usr/bin/touch /tmp/error
    2017-03-07 09:30:35 +0900 [INFO] ([email protected]+error^failure-alert): type: notify
    error:
    * +error+run+task_name:
    Command failed with code 127 (runtime)

    View Slide

  32. εέδϡʔϦϯάϫʔΫϑϩʔ
    • εέδϡʔϧͷઃఆ
    • εέδϡʔϥͷىಈ
    • εέδϡʔϥͷεςʔλε֬ೝ
    • ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ

    View Slide

  33. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϧͷઃఆ
    timezone: UTC
    schedule:
    daily>: 07:00:00
    +step1:
    sh>: /usr/bin/touch /tmp/schedule

    View Slide

  34. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϧͷઃఆ
    • CRONํࣜ΋࠾༻Ͱ͖·͢

    View Slide

  35. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϧͷઃఆ
    timezone: UTC
    schedule:
    cron>: 42 4 1 * *
    +step1:
    sh>: /usr/bin/touch /tmp/schedule

    View Slide

  36. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϥͷىಈ
    digdag sched --memory

    View Slide

  37. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϥͷεςʔλε֬ೝ
    digdag check

    View Slide

  38. εέδϡʔϦϯάϫʔΫϑϩʔ
    εέδϡʔϥͷεςʔλε֬ೝ
    Schedules (1 entries):
    mydag:
    daily>: "19:40:00"
    first session time: 2017-03-07 00:00:00 +0900
    first scheduled to run at: 2017-03-07 19:40:00 +0900 (in 9h 4m 52s)

    View Slide

  39. εέδϡʔϦϯάϫʔΫϑϩʔ
    ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ
    • ੍ݶ࣌ؒΛઃఆͯͦ͠ΕΛ௒͑ͨΒΞϥʔτ͕ඈ͹ͤ·͢

    View Slide

  40. εέδϡʔϦϯάϫʔΫϑϩʔ
    ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ
    timezone: UTC
    schedule:
    daily>: 07:00:00
    sla:
    # triggers this task at 02:00
    time: 02:00
    +notice:
    sh>: notice.sh
    +long_running_job:
    sh>: long_running_job.sh

    View Slide

  41. ղܾ͞ΕΔ໰୊
    • ෳࡶͳґଘؔ܎Λ࣋ͬͨCRONΛҰཡͰ͖Δ
    • Α͋͘ΔɺCRON͕͚ͨ͜ͱ͖ʹͦͷCRON͕ͺͬͱΈͲ͜
    ʹґଘ͍ͯͯ͠
    • ߋʹͦͷCRONʹґଘ͍ͯ͠ΔλεΫ͕ͳΜͳͷ͔෼͔Βͳ
    ͍໰୊
    • ؅ཧը໘͕͋ΔͷͰ͔ͦ͜ΒͲ͜·Ͱ࣮ߦ͞Ε͍ͯΔ͔ͷLog
    Λ֬ೝՄೳ

    View Slide

  42. ղܾ͞ΕΔ໰୊
    • ฒྻॲཧΛؾܰʹߦ͑ΔΑ͏ʹͳΔͷͰ଎౓վળ͕ى͖Δ

    View Slide

  43. ղܾ͞ΕΔ໰୊
    ૉఢͳґଘؔ܎ʹ͋Δcron
    server A
    # 4࣌ʹىಈͯ͠ڪΒ͘2࣌ؒͰऴΘΔॲཧ
    0 4 * * * /path/to/your_script.rb
    server B
    # server Aͷॲཧ͕6࣌ʹऴ͍ྃͯ͠Δ͸ͣͳͷͰͦΕΛݩʹॲཧ
    # ࠶࣮ߦ͢Δͱࠅ͍͜ͱʹͳΔͷͰ஫ҙ
    0 8 * * * /path/to/your_script.rb

    View Slide

  44. ղܾ͞ΕΔ໰୊
    ૉఢͳґଘؔ܎ʹ͋Δcron
    ͖ͯ͢ͳٙ໰
    • server Aͷॲཧͷ࣮ߦ͕࣌ؒԆͼ͍ͯͬͨΒͲ͏͢Δͷ͔ʁ
    • server Aͷॲཧ͕ෆ۩߹Ͱऴྃ͠ͳ͔ͬͨΒͲ͏͢Δͷ͔ʁ
    • ౳ʑ

    View Slide

  45. ղܾ͞ΕΔ໰୊
    ૉఢͳґଘؔ܎ͷcronΛdigdagͰॻ͖௚͢
    timezone: "Asia/Tokyo"
    schedule:
    daily>: 04:00:00
    +step1:
    sh>: ssh host:/path/to/your_script.rb
    +step2:
    sh>: /path/to/your_script2.rb

    View Slide

  46. ԋࢉࢠ
    • ϫʔΫϑϩʔԋࢉࢠ
    • τϨδϟʔσʔλԋࢉࢠ
    • σʔλϕʔεԋࢉࢠ
    • ωοτϫʔΫԋࢉࢠ
    • AWSԋࢉࢠ
    • Google Cloud Pla0ormԋࢉࢠ

    View Slide

  47. ԋࢉࢠ
    • ֎෦αʔϏεͱ࿈ܞ͍ͯ͠Δԋࢉࢠׂ͕ͱଟ͋͘Γ·͢
    • τϨδϟʔσʔλԋࢉࢠ
    • AWSԋࢉࢠ
    • Google Cloud Pla0ormԋࢉࢠ

    View Slide

  48. ԋࢉࢠ
    ϫʔΫϑϩʔԋࢉࢠ
    • call>: ผͷϫʔΫϑϩʔΛݺͼग़͢
    • require>: ґଘ͢ΔϫʔΫϑϩʔΛ࣮ߦ
    • callͱ͕͍ͪଞͷϫʔΫϑϩʔΛ࣮ߦ͠ͳ͍
    • loop>: λεΫΛ܁Γฦ͢
    • ճ਺Λ͍ͯͯ͠͠_doҎԼͷλεΫΛ܁Γฦ͢

    View Slide

  49. ԋࢉࢠ
    ωοτϫʔΫԋࢉࢠ
    • mail>: ిࢠϝʔϧΛૹ৴͢Δ
    • h*p>: HTTPϦΫΤετΛߦ͏

    View Slide

  50. ԋࢉࢠ
    εΫϦϓτԋࢉࢠ
    • sh>: γΣϧ
    • py>: PythonεΫϦϓτ
    • rb>: RubyεΫϦϓτ
    • embulk>: embulkͷσʔλసૹ
    • ݱࡏഇࢭ͞Ε͍ͯ·͢

    View Slide

  51. ·ͱΊ
    • YAMLͰWorkFlowΛ͔͚Δ͜ͱʹΑΓෳࡶͳtaskΛҰཡͰ͖Δ
    Α͏ʹͳΔ
    • ґଘؔ܎ͰۭؾΛಡ·ͣʹࡁΉΑ͏ʹͳΔ
    • ฒྻԽ౳Λ؆୯ʹରԠͰ͖ΔͷͰ଎౓Ξοϓ͕๬ΊΔՕॴ͕Ͱ
    ͯ͘Δ

    View Slide

  52. ·ͱΊ
    • ؅ཧը໘͔ΒλεΫͷ࣮ߦঢ়گΛ೺ѲͰ͖Δ
    • ֎෦αʔϏεͱ࿈ܞ͍ͯ͠Δԋࢉࢠ͕ଟ͋͘ΔͷͰΫϥ΢υα
    ʔϏεͱͷ࿈ܞָ͕ʹͰ͖Δ

    View Slide

  53. ͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠

    View Slide