Upgrade to Pro — share decks privately, control downloads, hide ads and more …

digdag-Introduction

 digdag-Introduction

Digdagを本番導入したので社内勉強会で発表した資料です。

Masatoshi Shimada

August 19, 2016
Tweet

More Decks by Masatoshi Shimada

Other Decks in Programming

Transcript

  1. Introduction of Digdag.

    View Slide

  2. Who am I.
    • Twitter/GitHub Account
    • @smdmts
    • Main Fields
    • Scala & Java8 & React.js & Python
    • DDD CleanArchitecture @ akka-http
    • Workflow
    • Hive/Presto

    View Slide

  3. Agenda
    • Digdag = Workflow automation system.

    View Slide

  4. ϫʔΫϑϩʔΤϯδϯͷओͳཁ݅
    • ఆظతͳλεΫͷ࣮ߦ
    • λεΫͷॱ࣮࣍ߦ
    • γεςϜؒͷσʔλࣗಈ࿈ܞ
    • όονʹΑΔσʔλूܭͷࣗಈԽ
    • όονδϣϒ׬ྃޙͷϝʔϧ/SlackͳͲ΁௨஌
    • ϦτϥΠ࣌ʹ͓͚Δႈ౳ੑ

    View Slide

  5. Digdagͱ͸
    • DAGʢDirected acyclic graph)Λ࣮ݱ͢ΔϫʔΫϑ
    ϩʔΤϯδϯ
    • YAMLͰDAGΛදݱ͢ΔͨΊఆٛମ͸Git؅ཧՄೳ
    ʢWorkflow as Codeʣ
    • LocalϞʔυͰ։ൃ͠ɺClient/ServerϞʔυͰຊ൪
    ͰՔಇͤ͞Δ
    • Python/Ruby/Bash/DockerͳͲͰαϒλεΫ͕࣮
    ߦՄೳ

    View Slide

  6. Digdagͱ͸ʢClient/ServerϞʔυʣ
    • PostgreSQLͰQueueΛ࣮ݱ͍ͯ͠Δ
    • αϒλεΫຖͰQueueԽ͞Ε͓ͯΓαʔόෳ਺
    ୆Ͱ࣮ߦ؀ڥ͕εέʔϧՄೳ
    • Workflowͷ࣮ମ͸PostgreSQLʹӬଓԽ͞ΕΔ
    • Client͕ίϚϯυͰWorkflowΛpush͢Δ
    • Workflow͸ੈ୅؅ཧ͞ΕΔ
    • ࠶ىಈෆཁͰδϣϒొ࿥ʗ࠶࣮ߦՄೳ

    View Slide

  7. DAG (Directed acyclic graph)ͱ͸
    • DAGʢ༗޲ඇ८ճάϥϑʣͱ͸ ʢwikipedia)
    άϥϑཧ࿦ʹ͓͚Δด࿏ͷͳ͍༗޲άϥϑͷࣄ
    ༗޲άϥϑ͸௖఺ͱ༗޲ลʢํ޲Λࣔ͢໼ҹ෇͖
    ͷลʣ͔ΒͳΓɺล͸௖఺ಉ࢜Λͭͳ͙͕ɺ͋Δ
    ௖఺ v ͔Βग़ൃ͠ɺลΛͨͲΓɺ௖఺ v ʹ໭ͬͯ
    ͜ͳ͍ͷ͕༗޲ඇ८ճάϥϑͰ͋Δɻ

    View Slide

  8. DAG (Directed acyclic graph)ͱ͸
    • DAGʢ༗޲ඇ८ճάϥϑʣͱ͸
    • ୺తʹݴ͏ͱऴ఺͕ଘࡏ͠։࢝఺ʹ໭ͬͯ͜ͳ
    ͍άϥϑ

    View Slide

  9. DigdagͰͷදݱํ๏
    • YAMLͰΦϖϨʔλΛఆٛ
    timezone: UTC
    _export:
    mail:
    ..... # Definition of mail
    +step1_input:
    py>: tasks.load
    _error:
    mail>: body.txt
    subject: input error!
    to: [[email protected]]
    +step2_process:
    sh>: echo process.
    +step2_report:
    sh>: echo report.

    View Slide

  10. δϣϒϑϩʔߏ੒ུ֓ਤ

    View Slide

  11. δϣϒϑϩʔߏ੒ུ֓ਤʹ͓͚Δఆٛ
    timezone: UTC
    +prepare_load_aws_env:
    py>: tasks.load_aws_env
    +step1_produce_tasks:
    # Generate SQL Queries for Redshift.
    !include : 'child_tasks/produce_tasks/bootstrap.dig'
    +step2_create_redshift_buffer:
    # Internal S3 or TreasureData to Redshift temporary buffer.
    !include : 'child_tasks/create_redshift_buffer/bootstrap.dig'
    +step3_create_publisher_s3:
    # Create Redshift buffer to publisher s3 bucket.
    !include : 'child_tasks/create_publisher_s3/bootstrap.dig'

    View Slide

  12. ։ൃ/ӡ༻ͯ͠Έͨײ૝
    • Workflow͕ίʔυͰදݱ͞Ε σόοά΋༰қ ͳ
    ͷͰ ී௨ͷ։ൃͷϊϦ Ͱॱ൪ͱΤϥʔϋϯυϦ
    ϯάΛҙࣝͨ͠δϣϒΛΧδϡΞϧʹ࡞Εͨ
    • Πϯετʔϧͷ؆қੑ΍ɺ࠶ىಈෆཁͷδϣϒ࠶
    ొ࿥ʗ࣮ߦՄೳͳͲɺಋೖ/։ൃ/ӡ༻ָ͕ʹͳΔ
    ͜ͱ͕ҙࣝͯ͠ઃܭ͞Ε͍ͯΔҹ৅
    • ࣮ߦॱংɺฒྻԽɺΤϥʔϋϯυϦϯάͷ੍ޚ͕
    ඇৗʹ༰қͳҝɺશόονܥΛDigdagʹҠ؅͢Δ
    ࣄΛܾఆ

    View Slide

  13. ։ൃ࣌ʹൃੜͨ͠໰୊/՝୊
    • py operatorར༻࣌ʹগ͠ϋϚͬͨ
    • ςετίʔυΛॻͨ͘Ίʹ͸import digdagͷ
    ϞοΫίʔυ͕ඞཁ
    • !includeͰผσΟϨΫτϦʹdigΛஔ͘ͱ
    PythonεΫϦϓτؒͷґଘղܾͷ࣮૷͕ඞཁ
    • λεΫؒͷม਺࿈ܞdigdag.env.storeͰ͸ɺ
    શλεΫԣஅͰΩʔͷ໊લΛҰҙʹ͢Δඞཁ͋
    ΓʢಉҰ໊শͰ͸্ॻ͖͕ൃੜ͢Δ৔߹༗Γʣ

    View Slide

  14. ։ൃ࣌ʹൃੜͨ͠໰୊/՝୊ʢิ଍ʣ
    • rb operator͸ར༻ͯ͠·ͤΜ

    View Slide

  15. ӡ༻࣌ʹൃੜͨ͠໰୊/՝୊
    • ӡ༻Ͱཉ͍͠ػೳ͕͋Δঢ়گʢ։ൃதʁʣ
    • ֬ೝը໘ (ίϚϯυͰճආத)
    • ਐߦঢ়گɾ࣮ߦ݁ՌɾΤϥʔͳͲ
    • ϩάͷS3ӬଓԽ (S3FSͰճආத)

    View Slide

  16. ·ͱΊ
    • ࣗಈԽ͸ਖ਼ٛʂ

    View Slide