Slide 1

Slide 1 text

ࡒ௡େՆ / Pepabo R&D Institute, GMO Pepabo, Inc. 2019.10.25 Fukuoka.go#14+Umeda.go trinity Ͱ Cloud Composer ʹ
 ϫʔΫϑϩʔΛ؆୯σϓϩΠ

Slide 2

Slide 2 text

σʔλαΠΤϯςΟετ ࡒ௡ େՆ / @zaimy 2 Hiroka Zaitsu ϖύϘݚڀॴ ݚڀһ

Slide 3

Slide 3 text

1. Cloud Composer ͱ͸ 2. Cloud Composer ΁ͷσϓϩΠ࣌ͷࠔΓ͝ͱ 3. trinity ʹΑΔղܾͷࢼΈ 4. ࠓޙ΍Δ͜ͱ 3 ໨࣍

Slide 4

Slide 4 text

1. Cloud Composer ͱ͸

Slide 5

Slide 5 text

• GCP ͷ "ϑϧϚωʔδυͷϫʔΫϑϩʔ ΦʔέετϨʔγϣϯ αʔϏε" • Apache Airflow Λ GCP ্ʹߏங͢Δ • ϖύϘͷϩάج൫ʢDWHʣΛ Treasure Data ͔Β GCP ΁Ҡߦத • ϫʔΫϑϩʔαʔϏε΋ Treasure Workflow (Ϛωʔδυ Digdag) ͔Β Cloud Composer ΁Ҡߦத 5 Cloud Composer ͷ֓ཁ

Slide 6

Slide 6 text

ϫʔΫϑϩʔͷίʔυϕʔε repository └ dags ɹ ├ workflowA ɹ │ ├ main.py ɹ │ └ hoge.sql ɹ └ workflowB ɹ ɹ ├ main.py ɹ ɹ └ piyo.sql 6 • dags σΟϨΫτϦ഑ԼʹϫʔΫϑϩʔ୯ҐͰ
 αϒσΟϨΫτϦΛ੾Δ • ϫʔΫϑϩʔຊମʢDAGʣͷ python ίʔυ • ϫʔΫϑϩʔͰར༻͢ΔΫΤϦ • ઃఆϑΝΠϧɹͳͲ
 ※σΟϨΫτϦߏ଄Λ Cloud Storage ͱ߹ΘͤΔ৔߹

Slide 7

Slide 7 text

ϫʔΫϑϩʔͷσϓϩΠʢ௥Ճͱߋ৽ʣ $ gcloud composer environments storage dags import \ --environment ENVIRONMENT_NAME \ --location LOCATION \ --source LOCAL_FILE_TO_UPLOAD 7 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSJNQPSU

Slide 8

Slide 8 text

ϫʔΫϑϩʔͷ࡟আ ͦͷ1 - Cloud Storage ͔Β࡟আ $ gcloud composer environments storage dags delete \ --environment ENVIRONMENT_NAME \ --location LOCATION \ DAG_NAME.py 8 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF

Slide 9

Slide 9 text

ϫʔΫϑϩʔͷ࡟আ ͦͷ2 - Airflow ͔Β࡟আ $ gcloud composer environments run --location LOCATION \ ENVIRONMENT_NAME delete_dag -- DAG_NAME 9 ίʔυϕʔε $MPVE4UPSBHF "JSqPX HDMPVEDPNQPTFSEFMFUF@EBH

Slide 10

Slide 10 text

2. Cloud Composer ΁ͷ
 σϓϩΠ࣌ͷࠔΓ͝ͱ

Slide 11

Slide 11 text

• ϫʔΫϑϩʔͷ௥Ճͱߋ৽ • import ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • import ͸ Cloud Storage ͷϑΝΠϧΛ্ॻ͖͢Δ • ίʔυϕʔεͰ࡟আͨ͠ϑΝΠϧ͸
 ݸผʹ࡟আ͠ͳ͍ݶΓ Cloud Storage ʹ࢒Δ 11 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม

Slide 12

Slide 12 text

• ϫʔΫϑϩʔͷ࡟আ • delete ͱ Airflow ͷ dag_delete ͷ2ճίϚϯυΛ࣮ߦ͢Δඞཁ͕͋Δ • delete ͸ϑΝΠϧ୯Ґ, dag_delete ͸ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ • ࠩ෼ͷ͋ΔϑΝΠϧ/ϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ • ։ൃʹΑΓ਺ेݸͷϫʔΫϑϩʔʹ೔ʑࠩ෼͕ੜ·Ε͍ͯ͘ • ࠩ෼Λػցతʹݕग़ͯ͠ Cloud Composer ʹಉظ͍ͨ͠ 12 gcloud ίϚϯυΛͦͷ··ӡ༻ʹ࢖͏ͱେม

Slide 13

Slide 13 text

• όέοτ/σΟϨΫτϦؒͰϑΝΠϧΛಉظ͢Δ Cloud Storage ͷίϚϯυ • ϑΝΠϧͷߋ৽࣌ࠁʹࠩҟ͕͋Ε͹ಉظର৅ͱ൑ఆ͞ΕΔ • ಺༰͕มߋ͞Ε͍ͯͳͯ͘΋ॲཧର৅ʹͳͬͯ͠·͏ • Cloud Storage ʹґଘ͢Δ • Airflow ͸ GCP Ҏ֎Ͱ΋ߏஙͰ͖ΔͷͰଞͷετϨʔδʹ΋ରԠ͍ͨ͠ 13 gsutil rsync ͸Ͳ͏͔ͳ

Slide 14

Slide 14 text

• ಛఆͷ git ϦϙδτϦͱಉظ͢Δ Airflow ͷػೳ • ୯ҰͷϒϥϯνͷΈࢦఆՄೳ • ຊ൪؀ڥʹ master ͷίʔυΛಉظ͢Δʹ͸ྑͦ͞͏ • ςετ؀ڥ΍ CI Ͱ͸ feature branch ͷίʔυΛσϓϩΠ͍ͨ͠ 14 Airflow sync ͸Ͳ͏͔ͳ

Slide 15

Slide 15 text

3. trinity ʹΑΔղܾͷࢼΈ

Slide 16

Slide 16 text

• ίʔυϕʔεͱ Cloud Storage ͱ Airflow ͷ3ͭΛಉظ͢Δ • ϫʔΫϑϩʔ୯ҐͰɺσΟϨΫτϦߏ଄ͱϑΝΠϧ಺༰͔Βϋογϡ஋Λܭࢉ • ͋Δ࣌఺ͷϫʔΫϑϩʔఆٛΛද͢ϋογϡ஋ • ίʔυϕʔε͔Βܭࢉͨ͠ϋογϡ஋ͱ Cloud Storage ʹอଘ͞Ε͍ͯΔ
 ϋογϡ஋͕ҟͳΔϫʔΫϑϩʔΛಉظૢ࡞ͷର৅ʹ͢Δ 16 trinity ͷํ਑

Slide 17

Slide 17 text

• https://github.com/zaimy/trinity • A tool to synchronize workflows between Codebase, Cloud Storage and Airflow metadata. • ͳͥ Goʁ • ΫϩείϯύΠϧͰ Mac, Linux, Windows ʹରԠͰ͖Δ • ϫʔΫϑϩʔ୯ҐͰॲཧ͕ՄೳͳͷͰฒྻԽ͍ͨ͠ 17 trinity ͷ࣮૷ $ trinity --bucket=BUCKET_NAME \ --composer-env=COMPOSER_ENV_NAME

Slide 18

Slide 18 text

1. ίʔυϕʔεͰϋογϡ஋Λܭࢉͯ͠ϫʔΫϑϩʔ͝ͱʹอଘ 2. ίʔυϕʔεͱ Cloud Storage ͷϫʔΫϑϩʔΛϦετͯ͠ൺֱ i. ίʔυϕʔεʹ͔͠ͳ͚Ε͹ Cloud Storage ʹΞοϓϩʔυʢ௥Ճʣ ii. Cloud Storage ʹ͔͠ͳ͚Ε͹ Cloud Storage ͱ Airflow ͔Β࡟আ iii. ྆ํʹ͋Ε͹ίʔυϕʔεͱ Cloud Storage ͷϋογϡ஋Λൺֱ a. ࠩҟ͕͋Ε͹ Cloud Storage ͷϫʔΫϑϩʔΛஔ׵ʢߋ৽ʣ 18 ॲཧͷྲྀΕ

Slide 19

Slide 19 text

؆୯ʹಉظతͳσϓϩΠ͕
 Ͱ͖ΔΑ͏ʹͳͬͨ !

Slide 20

Slide 20 text

• ςετ௥ՃͱϦϑΝΫλϦϯά • Go ͷ࡞๏΍ߟ͑ํʹԊ͍͖͍ͬͯͨ • ػೳ௥Ճ • Airflow ʹ͸ dags Ҏ֎ʹ plugins ΋͋ΔͷͰରԠ͢Δ • dry-run 20 ࠓޙ΍Δ͜ͱ

Slide 21

Slide 21 text

No content