2019.10.25 Fukuoka.go#14+Umeda.go https://fukuokago.connpass.com/event/146447/
ࡒେՆ / Pepabo R&D Institute, GMO Pepabo, Inc.2019.10.25 Fukuoka.go#14+Umeda.gotrinity Ͱ Cloud Composer ʹ ϫʔΫϑϩʔΛ؆୯σϓϩΠ
View Slide
σʔλαΠΤϯςΟετࡒ େՆ / @zaimy2Hiroka ZaitsuϖύϘݚڀॴ ݚڀһ
1. Cloud Composer ͱ2. Cloud Composer ͷσϓϩΠ࣌ͷࠔΓ͝ͱ3. trinity ʹΑΔղܾͷࢼΈ4. ࠓޙΔ͜ͱ3࣍
1.Cloud Composer ͱ
• GCP ͷ "ϑϧϚωʔδυͷϫʔΫϑϩʔ ΦʔέετϨʔγϣϯ αʔϏε"• Apache Airflow Λ GCP ্ʹߏங͢Δ• ϖύϘͷϩάج൫ʢDWHʣΛ Treasure Data ͔Β GCP Ҡߦத• ϫʔΫϑϩʔαʔϏε Treasure Workflow (Ϛωʔδυ Digdag) ͔ΒCloud Composer Ҡߦத5Cloud Composer ͷ֓ཁ
ϫʔΫϑϩʔͷίʔυϕʔεrepository└ dagsɹ ├ workflowAɹ │ ├ main.pyɹ │ └ hoge.sqlɹ └ workflowBɹ ɹ ├ main.pyɹ ɹ └ piyo.sql6• dags σΟϨΫτϦԼʹϫʔΫϑϩʔ୯ҐͰ αϒσΟϨΫτϦΛΔ• ϫʔΫϑϩʔຊମʢDAGʣͷ python ίʔυ• ϫʔΫϑϩʔͰར༻͢ΔΫΤϦ• ઃఆϑΝΠϧɹͳͲ ※σΟϨΫτϦߏΛ Cloud Storage ͱ߹ΘͤΔ߹
ϫʔΫϑϩʔͷσϓϩΠʢՃͱߋ৽ʣ$ gcloud composer environments storage dags import \--environment ENVIRONMENT_NAME \--location LOCATION \--source LOCAL_FILE_TO_UPLOAD7ίʔυϕʔε$MPVE4UPSBHF "JSqPXHDMPVEDPNQPTFSJNQPSU
ϫʔΫϑϩʔͷআ ͦͷ1 - Cloud Storage ͔Βআ$ gcloud composer environments storage dags delete \--environment ENVIRONMENT_NAME \--location LOCATION \DAG_NAME.py8ίʔυϕʔε$MPVE4UPSBHF "JSqPXHDMPVEDPNQPTFSEFMFUF
ϫʔΫϑϩʔͷআ ͦͷ2 - Airflow ͔Βআ$ gcloud composer environments run --location LOCATION \ENVIRONMENT_NAME delete_dag -- DAG_NAME9ίʔυϕʔε$MPVE4UPSBHF "JSqPXHDMPVEDPNQPTFSEFMFUF@EBH
2.Cloud Composer ͷ σϓϩΠ࣌ͷࠔΓ͝ͱ
• ϫʔΫϑϩʔͷՃͱߋ৽• import ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ• ࠩͷ͋ΔϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ• import Cloud Storage ͷϑΝΠϧΛ্ॻ͖͢Δ• ίʔυϕʔεͰআͨ͠ϑΝΠϧ ݸผʹআ͠ͳ͍ݶΓ Cloud Storage ʹΔ11gcloud ίϚϯυΛͦͷ··ӡ༻ʹ͏ͱେม
• ϫʔΫϑϩʔͷআ• delete ͱ Airflow ͷ dag_delete ͷ2ճίϚϯυΛ࣮ߦ͢Δඞཁ͕͋Δ• delete ϑΝΠϧ୯Ґ, dag_delete ϫʔΫϑϩʔ୯ҐͰͷ࣮ߦ• ࠩͷ͋ΔϑΝΠϧ/ϫʔΫϑϩʔʹରͯ͠ݸผʹ࣮ߦ͢Δඞཁ͕͋Δ• ։ൃʹΑΓेݸͷϫʔΫϑϩʔʹʑ͕ࠩੜ·Ε͍ͯ͘• ࠩΛػցతʹݕग़ͯ͠ Cloud Composer ʹಉظ͍ͨ͠12gcloud ίϚϯυΛͦͷ··ӡ༻ʹ͏ͱେม
• όέοτ/σΟϨΫτϦؒͰϑΝΠϧΛಉظ͢Δ Cloud Storage ͷίϚϯυ• ϑΝΠϧͷߋ৽࣌ࠁʹࠩҟ͕͋Εಉظରͱఆ͞ΕΔ• ༰͕มߋ͞Ε͍ͯͳͯ͘ॲཧରʹͳͬͯ͠·͏• Cloud Storage ʹґଘ͢Δ• Airflow GCP Ҏ֎ͰߏஙͰ͖ΔͷͰଞͷετϨʔδʹରԠ͍ͨ͠13gsutil rsync Ͳ͏͔ͳ
• ಛఆͷ git ϦϙδτϦͱಉظ͢Δ Airflow ͷػೳ• ୯ҰͷϒϥϯνͷΈࢦఆՄೳ• ຊ൪ڥʹ master ͷίʔυΛಉظ͢Δʹྑͦ͞͏• ςετڥ CI Ͱ feature branch ͷίʔυΛσϓϩΠ͍ͨ͠14Airflow sync Ͳ͏͔ͳ
3.trinity ʹΑΔղܾͷࢼΈ
• ίʔυϕʔεͱ Cloud Storage ͱ Airflow ͷ3ͭΛಉظ͢Δ• ϫʔΫϑϩʔ୯ҐͰɺσΟϨΫτϦߏͱϑΝΠϧ༰͔ΒϋογϡΛܭࢉ• ͋Δ࣌ͷϫʔΫϑϩʔఆٛΛද͢ϋογϡ• ίʔυϕʔε͔Βܭࢉͨ͠ϋογϡͱ Cloud Storage ʹอଘ͞Ε͍ͯΔ ϋογϡ͕ҟͳΔϫʔΫϑϩʔΛಉظૢ࡞ͷରʹ͢Δ16trinity ͷํ
• https://github.com/zaimy/trinity• A tool to synchronize workflows between Codebase, Cloud Storage and Airflow metadata.• ͳͥ Goʁ• ΫϩείϯύΠϧͰ Mac, Linux, Windows ʹରԠͰ͖Δ• ϫʔΫϑϩʔ୯ҐͰॲཧ͕ՄೳͳͷͰฒྻԽ͍ͨ͠17trinity ͷ࣮$ trinity --bucket=BUCKET_NAME \--composer-env=COMPOSER_ENV_NAME
1. ίʔυϕʔεͰϋογϡΛܭࢉͯ͠ϫʔΫϑϩʔ͝ͱʹอଘ2. ίʔυϕʔεͱ Cloud Storage ͷϫʔΫϑϩʔΛϦετͯ͠ൺֱi. ίʔυϕʔεʹ͔͠ͳ͚Ε Cloud Storage ʹΞοϓϩʔυʢՃʣii. Cloud Storage ʹ͔͠ͳ͚Ε Cloud Storage ͱ Airflow ͔Βআiii. ྆ํʹ͋Είʔυϕʔεͱ Cloud Storage ͷϋογϡΛൺֱa. ࠩҟ͕͋Ε Cloud Storage ͷϫʔΫϑϩʔΛஔʢߋ৽ʣ18ॲཧͷྲྀΕ
؆୯ʹಉظతͳσϓϩΠ͕ Ͱ͖ΔΑ͏ʹͳͬͨ !
• ςετՃͱϦϑΝΫλϦϯά• Go ͷ࡞๏ߟ͑ํʹԊ͍͖͍ͬͯͨ• ػೳՃ• Airflow ʹ dags Ҏ֎ʹ plugins ͋ΔͷͰରԠ͢Δ• dry-run20ࠓޙΔ͜ͱ