Upgrade to Pro — share decks privately, control downloads, hide ads and more …

trinity で Cloud Composer に
ワークフローを簡単デプロイ / Easy...

trinity で Cloud Composer に
ワークフローを簡単デプロイ / Easy workflow deployment to Cloud Composer with trinity

2019.10.25 Fukuoka.go#14+Umeda.go
https://fukuokago.connpass.com/event/146447/

Hiroka Zaitsu

October 25, 2019
Tweet

More Decks by Hiroka Zaitsu

Other Decks in Technology

Transcript

  1. ࡒ௡େՆ / Pepabo R&D Institute, GMO Pepabo, Inc. 2019.10.25 Fukuoka.go#14+Umeda.go

    trinity Ͱ Cloud Composer ʹ
 ϫʔΫϑϩʔΛ؆୯σϓϩΠ
  2. • GCP ͷ "ϑϧϚωʔδυͷϫʔΫϑϩʔ ΦʔέετϨʔγϣϯ αʔϏε" • Apache Airflow Λ

    GCP ্ʹߏங͢Δ • ϖύϘͷϩάج൫ʢDWHʣΛ Treasure Data ͔Β GCP ΁Ҡߦத • ϫʔΫϑϩʔαʔϏε΋ Treasure Workflow (Ϛωʔδυ Digdag) ͔Β Cloud Composer ΁Ҡߦத 5 Cloud Composer ͷ֓ཁ
  3. ϫʔΫϑϩʔͷίʔυϕʔε repository └ dags ɹ ├ workflowA ɹ │ ├

    main.py ɹ │ └ hoge.sql ɹ └ workflowB ɹ ɹ ├ main.py ɹ ɹ └ piyo.sql 6 • dags σΟϨΫτϦ഑ԼʹϫʔΫϑϩʔ୯ҐͰ
 αϒσΟϨΫτϦΛ੾Δ • ϫʔΫϑϩʔຊମʢDAGʣͷ python ίʔυ • ϫʔΫϑϩʔͰར༻͢ΔΫΤϦ • ઃఆϑΝΠϧɹͳͲ
 ※σΟϨΫτϦߏ଄Λ Cloud Storage ͱ߹ΘͤΔ৔߹
  4. ϫʔΫϑϩʔͷσϓϩΠʢ௥Ճͱߋ৽ʣ $ gcloud composer environments storage dags import \ --environment

    ENVIRONMENT_NAME \ --location LOCATION \ --source LOCAL_FILE_TO_UPLOAD 7 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSJNQPSU
  5. ϫʔΫϑϩʔͷ࡟আ ͦͷ1 - Cloud Storage ͔Β࡟আ $ gcloud composer environments

    storage dags delete \ --environment ENVIRONMENT_NAME \ --location LOCATION \ DAG_NAME.py 8 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF
  6. ϫʔΫϑϩʔͷ࡟আ ͦͷ2 - Airflow ͔Β࡟আ $ gcloud composer environments run

    --location LOCATION \ ENVIRONMENT_NAME delete_dag -- DAG_NAME 9 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF@EBH
  7. • ϫʔΫϑϩʔͷ௥Ճͱߋ৽ • import ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • import ͸

    Cloud Storage ͷϑΝΠϧΛ্ॻ͖͢Δ • ίʔυϕʔεͰ࡟আͨ͠ϑΝΠϧ͸
 ݸผʹ࡟আ͠ͳ͍ݶΓ Cloud Storage ʹ࢒Δ 11 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม
  8. • ϫʔΫϑϩʔͷ࡟আ • delete ͱ Airflow ͷ dag_delete ͷ2ճίϚϯυΛ࣮ߦ͢Δඞཁ͕͋Δ •

    delete ͸ϑΝΠϧ୯Ґ, dag_delete ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϑΝΠϧ/ϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • ։ൃʹΑΓ਺ेݸͷϫʔΫϑϩʔʹ೔ʑࠩ෼͕ੜ·Ε͍ͯ͘ • ࠩ෼Λػցతʹݕग़ͯ͠ Cloud Composer ʹಉظ͍ͨ͠ 12 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม
  9. • ಛఆͷ git ϦϙδτϦͱಉظ͢Δ Airflow ͷػೳ • ୯ҰͷϒϥϯνͷΈࢦఆՄೳ • ຊ൪؀ڥʹ

    master ͷίʔυΛಉظ͢Δʹ͸ྑͦ͞͏ • ςετ؀ڥ΍ CI Ͱ͸ feature branch ͷίʔυΛσϓϩΠ͍ͨ͠ 14 Airflow sync ͸Ͳ͏͔ͳ
  10. • ίʔυϕʔεͱ Cloud Storage ͱ Airflow ͷ3ͭΛಉظ͢Δ • ϫʔΫϑϩʔ୯ҐͰɺσΟϨΫτϦߏ଄ͱϑΝΠϧ಺༰͔Βϋογϡ஋Λܭࢉ •

    ͋Δ࣌఺ͷϫʔΫϑϩʔఆٛΛද͢ϋογϡ஋ • ίʔυϕʔε͔Βܭࢉͨ͠ϋογϡ஋ͱ Cloud Storage ʹอଘ͞Ε͍ͯΔ
 ϋογϡ஋͕ҟͳΔϫʔΫϑϩʔΛಉظૢ࡞ͷର৅ʹ͢Δ 16 trinity ͷํ਑
  11. • https://github.com/zaimy/trinity • A tool to synchronize workflows between Codebase,

    Cloud Storage and Airflow metadata. • ͳͥ Goʁ • ΫϩείϯύΠϧͰ Mac, Linux, Windows ʹରԠͰ͖Δ • ϫʔΫϑϩʔ୯ҐͰॲཧ͕ՄೳͳͷͰฒྻԽ͍ͨ͠ 17 trinity ͷ࣮૷ $ trinity --bucket=BUCKET_NAME \ --composer-env=COMPOSER_ENV_NAME
  12. 1. ίʔυϕʔεͰϋογϡ஋Λܭࢉͯ͠ϫʔΫϑϩʔ͝ͱʹอଘ 2. ίʔυϕʔεͱ Cloud Storage ͷϫʔΫϑϩʔΛϦετͯ͠ൺֱ i. ίʔυϕʔεʹ͔͠ͳ͚Ε͹ Cloud

    Storage ʹΞοϓϩʔυʢ௥Ճʣ ii. Cloud Storage ʹ͔͠ͳ͚Ε͹ Cloud Storage ͱ Airflow ͔Β࡟আ iii. ྆ํʹ͋Ε͹ίʔυϕʔεͱ Cloud Storage ͷϋογϡ஋Λൺֱ a. ࠩҟ͕͋Ε͹ Cloud Storage ͷϫʔΫϑϩʔΛஔ׵ʢߋ৽ʣ 18 ॲཧͷྲྀΕ