trinity で Cloud Composer に
ワークフローを簡単デプロイ / Easy workflow deployment to Cloud Composer with trinity

trinity で Cloud Composer に
ワークフローを簡単デプロイ / Easy workflow deployment to Cloud Composer with trinity

2019.10.25 Fukuoka.go#14+Umeda.go
https://fukuokago.connpass.com/event/146447/

10caa9b4d041b23c2ecd6a947fdb1607?s=128

Hiroka Zaitsu

October 25, 2019
Tweet

Transcript

  1. ࡒ௡େՆ / Pepabo R&D Institute, GMO Pepabo, Inc. 2019.10.25 Fukuoka.go#14+Umeda.go

    trinity Ͱ Cloud Composer ʹ
 ϫʔΫϑϩʔΛ؆୯σϓϩΠ
  2. σʔλαΠΤϯςΟετ ࡒ௡ େՆ / @zaimy 2 Hiroka Zaitsu ϖύϘݚڀॴ ݚڀһ

  3. 1. Cloud Composer ͱ͸ 2. Cloud Composer ΁ͷσϓϩΠ࣌ͷࠔΓ͝ͱ 3. trinity

    ʹΑΔղܾͷࢼΈ 4. ࠓޙ΍Δ͜ͱ 3 ໨࣍
  4. 1. Cloud Composer ͱ͸

  5. • GCP ͷ "ϑϧϚωʔδυͷϫʔΫϑϩʔ ΦʔέετϨʔγϣϯ αʔϏε" • Apache Airflow Λ

    GCP ্ʹߏங͢Δ • ϖύϘͷϩάج൫ʢDWHʣΛ Treasure Data ͔Β GCP ΁Ҡߦத • ϫʔΫϑϩʔαʔϏε΋ Treasure Workflow (Ϛωʔδυ Digdag) ͔Β Cloud Composer ΁Ҡߦத 5 Cloud Composer ͷ֓ཁ
  6. ϫʔΫϑϩʔͷίʔυϕʔε repository └ dags ɹ ├ workflowA ɹ │ ├

    main.py ɹ │ └ hoge.sql ɹ └ workflowB ɹ ɹ ├ main.py ɹ ɹ └ piyo.sql 6 • dags σΟϨΫτϦ഑ԼʹϫʔΫϑϩʔ୯ҐͰ
 αϒσΟϨΫτϦΛ੾Δ • ϫʔΫϑϩʔຊମʢDAGʣͷ python ίʔυ • ϫʔΫϑϩʔͰར༻͢ΔΫΤϦ • ઃఆϑΝΠϧɹͳͲ
 ※σΟϨΫτϦߏ଄Λ Cloud Storage ͱ߹ΘͤΔ৔߹
  7. ϫʔΫϑϩʔͷσϓϩΠʢ௥Ճͱߋ৽ʣ $ gcloud composer environments storage dags import \ --environment

    ENVIRONMENT_NAME \ --location LOCATION \ --source LOCAL_FILE_TO_UPLOAD 7 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSJNQPSU
  8. ϫʔΫϑϩʔͷ࡟আ ͦͷ1 - Cloud Storage ͔Β࡟আ $ gcloud composer environments

    storage dags delete \ --environment ENVIRONMENT_NAME \ --location LOCATION \ DAG_NAME.py 8 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF
  9. ϫʔΫϑϩʔͷ࡟আ ͦͷ2 - Airflow ͔Β࡟আ $ gcloud composer environments run

    --location LOCATION \ ENVIRONMENT_NAME delete_dag -- DAG_NAME 9 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF@EBH
  10. 2. Cloud Composer ΁ͷ
 σϓϩΠ࣌ͷࠔΓ͝ͱ

  11. • ϫʔΫϑϩʔͷ௥Ճͱߋ৽ • import ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • import ͸

    Cloud Storage ͷϑΝΠϧΛ্ॻ͖͢Δ • ίʔυϕʔεͰ࡟আͨ͠ϑΝΠϧ͸
 ݸผʹ࡟আ͠ͳ͍ݶΓ Cloud Storage ʹ࢒Δ 11 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม
  12. • ϫʔΫϑϩʔͷ࡟আ • delete ͱ Airflow ͷ dag_delete ͷ2ճίϚϯυΛ࣮ߦ͢Δඞཁ͕͋Δ •

    delete ͸ϑΝΠϧ୯Ґ, dag_delete ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϑΝΠϧ/ϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • ։ൃʹΑΓ਺ेݸͷϫʔΫϑϩʔʹ೔ʑࠩ෼͕ੜ·Ε͍ͯ͘ • ࠩ෼Λػցతʹݕग़ͯ͠ Cloud Composer ʹಉظ͍ͨ͠ 12 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม
  13. • όέοτ/σΟϨΫτϦؒͰϑΝΠϧΛಉظ͢Δ Cloud Storage ͷίϚϯυ • ϑΝΠϧͷߋ৽࣌ࠁʹࠩҟ͕͋Ε͹ಉظର৅ͱ൑ఆ͞ΕΔ • ಺༰͕มߋ͞Ε͍ͯͳͯ͘΋ॲཧର৅ʹͳͬͯ͠·͏ •

    Cloud Storage ʹґଘ͢Δ • Airflow ͸ GCP Ҏ֎Ͱ΋ߏஙͰ͖ΔͷͰଞͷετϨʔδʹ΋ରԠ͍ͨ͠ 13 gsutil rsync ͸Ͳ͏͔ͳ
  14. • ಛఆͷ git ϦϙδτϦͱಉظ͢Δ Airflow ͷػೳ • ୯ҰͷϒϥϯνͷΈࢦఆՄೳ • ຊ൪؀ڥʹ

    master ͷίʔυΛಉظ͢Δʹ͸ྑͦ͞͏ • ςετ؀ڥ΍ CI Ͱ͸ feature branch ͷίʔυΛσϓϩΠ͍ͨ͠ 14 Airflow sync ͸Ͳ͏͔ͳ
  15. 3. trinity ʹΑΔղܾͷࢼΈ

  16. • ίʔυϕʔεͱ Cloud Storage ͱ Airflow ͷ3ͭΛಉظ͢Δ • ϫʔΫϑϩʔ୯ҐͰɺσΟϨΫτϦߏ଄ͱϑΝΠϧ಺༰͔Βϋογϡ஋Λܭࢉ •

    ͋Δ࣌఺ͷϫʔΫϑϩʔఆٛΛද͢ϋογϡ஋ • ίʔυϕʔε͔Βܭࢉͨ͠ϋογϡ஋ͱ Cloud Storage ʹอଘ͞Ε͍ͯΔ
 ϋογϡ஋͕ҟͳΔϫʔΫϑϩʔΛಉظૢ࡞ͷର৅ʹ͢Δ 16 trinity ͷํ਑
  17. • https://github.com/zaimy/trinity • A tool to synchronize workflows between Codebase,

    Cloud Storage and Airflow metadata. • ͳͥ Goʁ • ΫϩείϯύΠϧͰ Mac, Linux, Windows ʹରԠͰ͖Δ • ϫʔΫϑϩʔ୯ҐͰॲཧ͕ՄೳͳͷͰฒྻԽ͍ͨ͠ 17 trinity ͷ࣮૷ $ trinity --bucket=BUCKET_NAME \ --composer-env=COMPOSER_ENV_NAME
  18. 1. ίʔυϕʔεͰϋογϡ஋Λܭࢉͯ͠ϫʔΫϑϩʔ͝ͱʹอଘ 2. ίʔυϕʔεͱ Cloud Storage ͷϫʔΫϑϩʔΛϦετͯ͠ൺֱ i. ίʔυϕʔεʹ͔͠ͳ͚Ε͹ Cloud

    Storage ʹΞοϓϩʔυʢ௥Ճʣ ii. Cloud Storage ʹ͔͠ͳ͚Ε͹ Cloud Storage ͱ Airflow ͔Β࡟আ iii. ྆ํʹ͋Ε͹ίʔυϕʔεͱ Cloud Storage ͷϋογϡ஋Λൺֱ a. ࠩҟ͕͋Ε͹ Cloud Storage ͷϫʔΫϑϩʔΛஔ׵ʢߋ৽ʣ 18 ॲཧͷྲྀΕ
  19. ؆୯ʹಉظతͳσϓϩΠ͕
 Ͱ͖ΔΑ͏ʹͳͬͨ !

  20. • ςετ௥ՃͱϦϑΝΫλϦϯά • Go ͷ࡞๏΍ߟ͑ํʹԊ͍͖͍ͬͯͨ • ػೳ௥Ճ • Airflow ʹ͸

    dags Ҏ֎ʹ plugins ΋͋ΔͷͰରԠ͢Δ • dry-run 20 ࠓޙ΍Δ͜ͱ
  21. None