Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WorkFlowEngine Digdagの導入

4548ab6d7151e0240340f4ee02c2e78c?s=47 amesho
March 07, 2017

WorkFlowEngine Digdagの導入

4548ab6d7151e0240340f4ee02c2e78c?s=128

amesho

March 07, 2017
Tweet

Transcript

  1. Workflow Engine digdagͷಋೖ

  2. ࣗݾ঺հ • @amesho • Quantͷ։ൃΛ͍ͯ͠·͢ • ୲౰Օॴ • όοΫΤϯυ •

    σʔλॲཧपΓͷνϡʔχϯά౳
  3. Agenda • Workflow Engine • Digdag • ϫʔΫϑϩʔͷఆٛ • εέδϡʔϦϯά

    • ԋࢉࢠ • ղܾ͞ΕΔ໰୊
  4. Workflow Engine • h#p:/ /regional.rubykaigi.org/tokyo11/interview/frsyuki/ • ݹڮ͞Μͷݴ༿ΛआΓΔͱ ϫʔΫϑϩʔΤϯδϯ͸ɺґଘؔ܎ͷ͋Δෳ਺ͷλεΫΛ࣮ߦ͢Δπʔϧ

  5. Workflow Engine ݹڮ͞Μͱ͸ • Founder of Treasure Data, Inc. •

    MessagePack • Fluentd • Embulk
  6. Workflow Engine • OSS΍঎༻ΛؚΊΔͱ͔ͳΓͷ਺͕ग़͍ͯΔ • ༗໊Ͳ͜Ζ • Azkaban • Luigi

    • Airflow
  7. Workflow Engine Azkaban • h#ps:/ /azkaban.github.io/ • h#ps:/ /github.com/azkaban/azkaban

  8. Workflow Engine luigi • h#ps:/ /github.com/spo1fy/luigi

  9. Workflow Engine Airflow • h#ps:/ /github.com/apache/incubator-airflow

  10. Digdag ֓ཁ • TD੡ͷϫʔΫϑϩʔΤϯδϯ • YAMLͰϑϩʔΛهड़ग़དྷΔ • YAMLͦͷ΋ͷ͕֦ு͞Ε͍ͯͯதͰεΫϦϓτ͕ॻ͚Δ • TD΍AmazonͷαʔϏεͱͷ࿈ܞ͕ڧྗ

    • λεΫΛάϧʔϓԽग़དྷΔ
  11. Digdag TDͱ͸ • Treasure Dataࣾͷ͜ͱΛࢦ͠·͢ • Digdagͷ࡞ऀͰ͋Δݹڮ͞Μ΋ॴଐ͍ͯ͠·͢ • ৄࡉ͸ h/ps:/

    /www.treasuredata.com/jp/ ͪ͜ΒΛ͝ཡͩ͘͞ ͍
  12. Digdag ॏཁͳϙΠϯτ • ΞΠίϯ͕͔Θ͍͍

  13. ϫʔΫϑϩʔͷఆٛํ๏ • .digϑΝΠϧͷ࡞੒ • "+"ͰλεΫ໊Λఆٛ • ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ • ม਺ͷຒΊࠐΈ •

    ฒྻ࣮ߦ • Τϥʔ௨஌
  14. ϫʔΫϑϩʔͷఆٛ .digϑΝΠϧͷ࡞੒ • ఆٛ͸.digͱ͍͏֦ுࢠͷϑΝΠϧʹهड़͠·͢ vim hello_digdag.dig

  15. ϫʔΫϑϩʔͷఆٛ "+"ͰλεΫ໊Λఆٛ timezone: UTC + task_name:

  16. ϫʔΫϑϩʔͷఆٛ ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ • γΣϧεΫϦϓτͷ࣮ߦ • Python|Rubyͷϝιου • ిࢠϝʔϧͷૹ৴ • ౳ʑ

  17. ϫʔΫϑϩʔͷఆٛ ">"ԋࢉࢠͰΞΫγϣϯΛ࣮ߦ timezone: UTC + task_name: sh> /bin/touch /tmp/hello_digdag

  18. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ • ${...}ߏจͰJavaScriptΛ࢖༻ͯ͠ม਺Λѻ͑·͢ • ࣌ؒܭࢉʹMoment.jsΛόϯυϧ͍ͯ͠ΔͷͰmoment()Ͱܭࢉ ͕ग़དྷ·͢

  19. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ timezone: UTC + task_name: sh> /bin/touch /tmp/hello_digdag +

    task_name2: echo> ${moment(session_time).utc().format("YYYY-MM-DD HH:mm:ss")}
  20. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ • ݴޠAPIΛ༻͍ͯม਺͕ຒΊࠐΊ·͢ • Python|Ruby͕͋Γ·͢

  21. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ • RubyΛྫʹͱΓ·͢ • ҎԼͷ2ͭͷϑΝΠϧ͕͋ͬͨ৔߹ • workflow.dig • tasks/my_workflow.rb

  22. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ workflow.dig _export: rb: require: 'tasks/my_workflow' +step1: rb>: MyWorkflow.step1

    +step2: rb>: MyWorkflow.step2
  23. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ tasks/my_workflow.rb class MyWorkflow def step1 puts "step1" end

    def step2 puts "step2" end end
  24. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ • ม਺ఆٛ • Digdag.env.store • Digdag.env.params

  25. ϫʔΫϑϩʔͷఆٛ ม਺ͷຒΊࠐΈ class MyWorkflow def step1 Digdag.env.store(my_value: 1) end def

    step2 puts "step2: %s" % Digdag.env.params['my_value'] end end
  26. ϫʔΫϑϩʔͷఆٛ ฒྻ࣮ߦ • _parallel: trueΛઃఆ͢Δͱάϧʔϓ಺ͷλεΫ͕ฒྻԽ ͞Ε·͢

  27. ϫʔΫϑϩʔͷఆٛ ฒྻ࣮ߦ timezone: UTC +run: _parallel: true +task_name: sh>: /usr/bin/touch

    /tmp/hello_digdag +task_name2: sh>: /usr/bin/touch /tmp/hello_digdag2 +task_name3: sh>: /usr/bin/touch /tmp/hello_digdag3
  28. ϫʔΫϑϩʔͷఆٛ ฒྻ࣮ߦ ฒྻʹ࣮ߦ͞Ε·͢ʢॱ൪͸อূ͞Εͳ͍ 2017-03-07 09:23:34 +0900 [INFO] (0018@+hoge+run+task_name2): sh>: /usr/bin/touch

    /tmp/hello_digdag2 2017-03-07 09:23:34 +0900 [INFO] (0017@+hoge+run+task_name): sh>: /usr/bin/touch /tmp/hello_digdag 2017-03-07 09:23:34 +0900 [INFO] (0019@+hoge+run+task_name3): sh>: /usr/bin/touch /tmp/hello_digdag3
  29. ϫʔΫϑϩʔͷఆٛ Τϥʔ௨஌ • _error:ύϥϝʔλͰઃఆ͞Ε͍ͯΔ৔߹ • Τϥʔ͕ى͖ͨ৔߹௨஌͕͞Ε·͢

  30. ϫʔΫϑϩʔͷఆٛ Τϥʔ௨஌ timezone: UTC +run: _error: sh>: /usr/bin/touch /tmp/error +task_name:

    sh>: /usr/touch /tmp/hello_digdag
  31. ϫʔΫϑϩʔͷఆٛ Τϥʔ௨஌ 2017-03-07 09:30:34 +0900 [INFO] (0017@+error+run+task_name): sh>: /usr/touch /tmp/hello_digdag

    /bin/sh: line 1: /usr/touch: No such file or directory 2017-03-07 09:30:34 +0900 [ERROR] (0017@+error+run+task_name): Task failed with unexpected error: Command failed with code 127 java.lang.RuntimeException: Command failed with code 127 at io.digdag.standards.operator.ShOperatorFactory$ShOperator.runTask(ShOperatorFactory.java:143) at io.digdag.util.BaseOperator.run(BaseOperator.java:35) at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:314) at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:674) at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:255) at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:138) at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25) at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:136) at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:120) at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:656) at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:95) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-03-07 09:30:35 +0900 [INFO] (0017@+error+run^error): sh>: /usr/bin/touch /tmp/error 2017-03-07 09:30:35 +0900 [INFO] (0017@+error^failure-alert): type: notify error: * +error+run+task_name: Command failed with code 127 (runtime)
  32. εέδϡʔϦϯάϫʔΫϑϩʔ • εέδϡʔϧͷઃఆ • εέδϡʔϥͷىಈ • εέδϡʔϥͷεςʔλε֬ೝ • ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ

  33. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϧͷઃఆ timezone: UTC schedule: daily>: 07:00:00 +step1: sh>: /usr/bin/touch

    /tmp/schedule
  34. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϧͷઃఆ • CRONํࣜ΋࠾༻Ͱ͖·͢

  35. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϧͷઃఆ timezone: UTC schedule: cron>: 42 4 1 *

    * +step1: sh>: /usr/bin/touch /tmp/schedule
  36. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϥͷىಈ digdag sched --memory

  37. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϥͷεςʔλε֬ೝ digdag check

  38. εέδϡʔϦϯάϫʔΫϑϩʔ εέδϡʔϥͷεςʔλε֬ೝ Schedules (1 entries): mydag: daily>: "19:40:00" first session

    time: 2017-03-07 00:00:00 +0900 first scheduled to run at: 2017-03-07 19:40:00 +0900 (in 9h 4m 52s)
  39. εέδϡʔϦϯάϫʔΫϑϩʔ ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ • ੍ݶ࣌ؒΛઃఆͯͦ͠ΕΛ௒͑ͨΒΞϥʔτ͕ඈ͹ͤ·͢

  40. εέδϡʔϦϯάϫʔΫϑϩʔ ࣮ߦ࣌ؒʹؔ͢ΔΞϥʔτ timezone: UTC schedule: daily>: 07:00:00 sla: # triggers

    this task at 02:00 time: 02:00 +notice: sh>: notice.sh +long_running_job: sh>: long_running_job.sh
  41. ղܾ͞ΕΔ໰୊ • ෳࡶͳґଘؔ܎Λ࣋ͬͨCRONΛҰཡͰ͖Δ • Α͋͘ΔɺCRON͕͚ͨ͜ͱ͖ʹͦͷCRON͕ͺͬͱΈͲ͜ ʹґଘ͍ͯͯ͠ • ߋʹͦͷCRONʹґଘ͍ͯ͠ΔλεΫ͕ͳΜͳͷ͔෼͔Βͳ ͍໰୊ •

    ؅ཧը໘͕͋ΔͷͰ͔ͦ͜ΒͲ͜·Ͱ࣮ߦ͞Ε͍ͯΔ͔ͷLog Λ֬ೝՄೳ
  42. ղܾ͞ΕΔ໰୊ • ฒྻॲཧΛؾܰʹߦ͑ΔΑ͏ʹͳΔͷͰ଎౓վળ͕ى͖Δ

  43. ղܾ͞ΕΔ໰୊ ૉఢͳґଘؔ܎ʹ͋Δcron server A # 4࣌ʹىಈͯ͠ڪΒ͘2࣌ؒͰऴΘΔॲཧ 0 4 * *

    * /path/to/your_script.rb server B # server Aͷॲཧ͕6࣌ʹऴ͍ྃͯ͠Δ͸ͣͳͷͰͦΕΛݩʹॲཧ # ࠶࣮ߦ͢Δͱࠅ͍͜ͱʹͳΔͷͰ஫ҙ 0 8 * * * /path/to/your_script.rb
  44. ղܾ͞ΕΔ໰୊ ૉఢͳґଘؔ܎ʹ͋Δcron ͖ͯ͢ͳٙ໰ • server Aͷॲཧͷ࣮ߦ͕࣌ؒԆͼ͍ͯͬͨΒͲ͏͢Δͷ͔ʁ • server Aͷॲཧ͕ෆ۩߹Ͱऴྃ͠ͳ͔ͬͨΒͲ͏͢Δͷ͔ʁ •

    ౳ʑ
  45. ղܾ͞ΕΔ໰୊ ૉఢͳґଘؔ܎ͷcronΛdigdagͰॻ͖௚͢ timezone: "Asia/Tokyo" schedule: daily>: 04:00:00 +step1: sh>: ssh

    host:/path/to/your_script.rb +step2: sh>: /path/to/your_script2.rb
  46. ԋࢉࢠ • ϫʔΫϑϩʔԋࢉࢠ • τϨδϟʔσʔλԋࢉࢠ • σʔλϕʔεԋࢉࢠ • ωοτϫʔΫԋࢉࢠ •

    AWSԋࢉࢠ • Google Cloud Pla0ormԋࢉࢠ
  47. ԋࢉࢠ • ֎෦αʔϏεͱ࿈ܞ͍ͯ͠Δԋࢉࢠׂ͕ͱଟ͋͘Γ·͢ • τϨδϟʔσʔλԋࢉࢠ • AWSԋࢉࢠ • Google Cloud

    Pla0ormԋࢉࢠ
  48. ԋࢉࢠ ϫʔΫϑϩʔԋࢉࢠ • call>: ผͷϫʔΫϑϩʔΛݺͼग़͢ • require>: ґଘ͢ΔϫʔΫϑϩʔΛ࣮ߦ • callͱ͕͍ͪଞͷϫʔΫϑϩʔΛ࣮ߦ͠ͳ͍

    • loop>: λεΫΛ܁Γฦ͢ • ճ਺Λ͍ͯͯ͠͠_doҎԼͷλεΫΛ܁Γฦ͢
  49. ԋࢉࢠ ωοτϫʔΫԋࢉࢠ • mail>: ిࢠϝʔϧΛૹ৴͢Δ • h*p>: HTTPϦΫΤετΛߦ͏

  50. ԋࢉࢠ εΫϦϓτԋࢉࢠ • sh>: γΣϧ • py>: PythonεΫϦϓτ • rb>:

    RubyεΫϦϓτ • embulk>: embulkͷσʔλసૹ • ݱࡏഇࢭ͞Ε͍ͯ·͢
  51. ·ͱΊ • YAMLͰWorkFlowΛ͔͚Δ͜ͱʹΑΓෳࡶͳtaskΛҰཡͰ͖Δ Α͏ʹͳΔ • ґଘؔ܎ͰۭؾΛಡ·ͣʹࡁΉΑ͏ʹͳΔ • ฒྻԽ౳Λ؆୯ʹରԠͰ͖ΔͷͰ଎౓Ξοϓ͕๬ΊΔՕॴ͕Ͱ ͯ͘Δ

  52. ·ͱΊ • ؅ཧը໘͔ΒλεΫͷ࣮ߦঢ়گΛ೺ѲͰ͖Δ • ֎෦αʔϏεͱ࿈ܞ͍ͯ͠Δԋࢉࢠ͕ଟ͋͘ΔͷͰΫϥ΢υα ʔϏεͱͷ࿈ܞָ͕ʹͰ͖Δ

  53. ͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠