Slide 1

Slide 1 text

Ͱ૊ΉػցֶशύΠϓϥΠϯ DsDs #0 / 2020.08.21  Masao Tsukiyama Cloud Composer

Slide 2

Slide 2 text

 ஙࢁকԝʛ.BTBP5TVLJZBNB גࣜձࣾ.PCJMJUZ5FDIOPMPHJFTʛ.-0QTΤϯδχΞ ೥݄ʹ%F/"ʹ৽ଔೖࣾɻ גࣜձࣾ.PCJMJUZ5FDIOPMPHJFTʹग़޲͓ͯ͠Γɺ.-ΤϯδχΞϦϯάୈҰάϧʔϓॴଐɻ ֶੜ࣌୅͸ίϯϐϡʔλϏδϣϯͷݚڀʹैࣄͭͭ͠ɺελʔτΞοϓ౳Ͱ8FC։ൃΛߦ͍ͬͯͨɻ ग़޲લ͔ΒݱࡏʹࢸΔ·ͰɺΦʔτϞʔςΟϒ෼໺ʹ͓͚Δ.-γεςϜͷ։ൃӡ༻ʹܞΘ͍ͬͯΔɻ .-0QTΛத৺ʹɺΫϥ΢υωΠςΟϒΞʔΩςΫνϟ΍ࣗಈԽʹಛʹؔ৺͕͋Δɻ

Slide 3

Slide 3 text



Slide 4

Slide 4 text

 ৽λΫγʔΞϓϦʰ(0ʱ݄ϦϦʔε༧ఆ

Slide 5

Slide 5 text

 MLOps ྖҬ͕͔ͳΓ޿͍ %T%4ͷςʔϚ

Slide 6

Slide 6 text

 %4ۀ຿ޮ཰Խ Ϟσϧਫ਼౓୲อ Πϯϑϥ੔උ ࣗಈԽ $*$% σʔλج൫ etc…

Slide 7

Slide 7 text

 %4ۀ຿ޮ཰Խ Ϟσϧਫ਼౓୲อ Πϯϑϥ੔උ ࣗಈԽ $*$% σʔλج൫ etc… ࠓճѻ͏ྖҬ

Slide 8

Slide 8 text

 w $MPVE$PNQPTFSνϡʔτϦΞϧ w $PNQPTFS.-ύΠϓϥΠϯ։ൃͷํ਑ w ࣮ӡ༻ɾ։ൃʹ͓͚Δ$*΍ࣗಈσϓϩΠ ࠓճѻ͏τϐοΫ

Slide 9

Slide 9 text

 w $MPVE$PNQPTFSνϡʔτϦΞϧ w $PNQPTFS.-ύΠϓϥΠϯ։ൃͷํ਑ w ࣮ӡ༻ɾ։ൃʹ͓͚Δ$*΍ࣗಈσϓϩΠ ࠓճѻ͏τϐοΫ εϥΠυ͸Ξοϓϩʔυ͞ΕΔͷͰޙ΄Ͳ͝ཡ͍ͩ͘͞ʂ

Slide 10

Slide 10 text

 ৐຿һ޲͚ػೳʰ͓٬༷୳ࡧφϏʱ

Slide 11

Slide 11 text

 धཁΛ༧ଌͯ͠࠷దͳӦۀܦ࿏ΛఏҊ

Slide 12

Slide 12 text

 ʰ͓٬༷୳ࡧφϏʱͷػցֶशύΠϓϥΠϯ

Slide 13

Slide 13 text

 $MPVE$PNQPTFSͰલॲཧ͔ΒσϓϩΠ·ͰࣗಈԽ

Slide 14

Slide 14 text

 ˞໼ҹ͸σʔλͷྲྀΕͰ͸ͳ͘λεΫͷґଘؔ܎ σϓϩΠύΠϓϥΠϯ֓؍

Slide 15

Slide 15 text

 w మ൘ϫʔΫϑϩʔͷ"JSqPXΛ࠾༻ w 1ZUIPOͰॻ͚Δ w ଞͷ($1ػೳͱͷ࿈ܞ͕༰қ w Ϋϥελӡ༻΍ϩάӡ༻͕Ϛωʔδυ Cloud Composer

Slide 16

Slide 16 text

 Composer Tutorial

Slide 17

Slide 17 text

 import datetime import logging from airflow.models import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import PythonOperator def greeting(): logging.info("Hello World!") DEFAULT_ARGS = { "start_date": datetime.datetime(2018, 1, 1), "retries": 5, } dag = DAG( dag_id="test_dag", schedule_interval=datetime.timedelta(days=1), default_args=DEFAULT_ARGS, ) hello_python = PythonOperator(task_id="hello", python_callable=greeting, dag=dag) goodbye_bash = BashOperator(task_id="bye", bash_command="echo Goodbye.", dag=dag) hello_python >> goodbye_bash ύΠϓϥΠϯ %BH ఆٛ ԋࢉࢠʢ0QFSBUPSʣ͔Β λεΫΛੜ੒ λεΫ࣮ߦॱংΛఆٛ

Slide 18

Slide 18 text

 ͱͯ΋୯७ͳػցֶशύΠϓϥΠϯͷ৔߹ profiler_task >> [preprocess_a_task, preprocess_b_task] >> trainer_task

Slide 19

Slide 19 text

 Α͘࢖͏0QFSBUPSʢλεΫͷ୯Ґʣ w #BTI0QFSBUPSʛγΣϧίϚϯυΛ࣮ߦ͢ΔɻHDMPVE΍LVCFDUM͕࢖͑Δɻ w 1ZUIPO0QFSBUPSʛ$PNQPTFS؀ڥ಺Ͱ1ZUIPOؔ਺Λ࣮ߦ͢Δɻ w 1ZUIPO7JSUVBM&OW0QFSBUPSʛ1ZUIPOԾ૝؀ڥ্Ͱؔ਺Λ࣮ߦ͢Δɻ w (,&1PE0QFSBUPSʛࢦఆͨ͠(,&Ϋϥελ্Ͱ೚ҙͷίϯςφΛ࣮ߦ͢Δɻ w #JH2VFSZ0QFSBUPSʛ#JH2VFSZ+PCΛൃߦ͢Δɻ

Slide 20

Slide 20 text

 Variables w $PNQPTFS؀ڥશମͰڞ༗͞ΕΔ؀ڥม਺ͷΑ͏ͳ΋ͷɻ w ੩తͳ஋Λ֨ೲ͓ͯ͘͠ɻҟͳΔ%BHͰಉ͡,FZΛ࢖Θͳ͍Α͏஫ҙɻ w ྫʛ($4ͷೖग़ྗύεɺ(,&ͷΫϥελ໊ɺ($3*NBHF%JHFTUͳͲ XCom (cross communication) w %BH3VO಺ͰͷΈڞ༗͞Εɺ͋ΔλεΫ͔ΒλεΫ΁ͱड͚౉͞ΕΔࣙॻɻ w 3VO͝ͱʹมΘΔ஋Λ౉͢ɻ5BTL*OTUBODFΦϒδΣΫτ͔ΒࢀরͰ͖Δɻ w ྫʛֶशσʔλͷूܭظؒɺϑΝΠϧ໊ʹ෇͚Δϋογϡ஋ͳͲ λεΫ΁ͷ৘ใͷ౉͠ํ

Slide 21

Slide 21 text

 def set_env_variables(c, key, value): c.run( f"gcloud --project {PROJECT} composer environments run {COMPOSER_NAME} --location {LOCATION} \ variables -- -s {key} {value}" ) ) 7BSJBCMFͷ௥Ճ from airflow.models import Variable VALUE = Variable.get(key) 7BSJBCMFͷࢀর

Slide 22

Slide 22 text

 7BSJBCMFT͸"JSGMPX8FC6*͔Β΋ࢀরɾฤूͰ͖Δ

Slide 23

Slide 23 text

 9$PN7BMVFͷ௥Ճྫ def create_args(**kwargs): execution_date = kwargs["execution_date"] preprocess_start_datetime = execution_date - timedelta(days=PREPROCESS_DIFF) kwargs["ti"].xcom_push( key="preprocess_start_datetime", value=preprocess_start_datetime.strftime("%Y-%m-%dT%H:%M:%S"), ) create_args_task = PythonOperator( task_id="create_args", python_callable=create_args, dag=dag ) 1ZUIPO0QFSBPSͰͷؔ਺࣮ߦ࣌ɺՄม௕Ҿ਺͔Β࣮ߦ೔࣌΍λεΫΠϯελϯεΛࢀরͰ͖Δ ˞UJ͸UBTLJOTUBODFͷུ UBTL@JOTUBODF YDPN@QVTI ࠷ॳͷλεΫͰ QVTI͓ͯ͘͠

Slide 24

Slide 24 text

 9$PN7BMVFͷࢀরྫ def preprocess(**kwargs): preprocess_start_datetime = kwargs["ti"].xcom_pull(key="preprocess_start_datetime") …… 1ZUIPO0QFSBPSͰͷؔ਺࣮ߦ࣌ɺՄม௕Ҿ਺͔Β࣮ߦ೔࣌΍λεΫΠϯελϯεΛࢀরͰ͖Δ ˞UJ͸UBTLJOTUBODFͷུ UBTL@JOTUBODF YDPN@QVMM

Slide 25

Slide 25 text

 NBJO@EBHQZ ڞ௨෦෼Λ੾Γग़͠ɺλεΫ͝ͱͷ0QFSBUPSϥούʔΛ࡞Δͱ͖ͬ͢Γ͢Δ create_args_task = PythonOperator( task_id="create_args", python_callable=create_args, dag=dag ) profiler_task = profiler_operator.create_operator(dag) preprocess_a_task = preprocess_operator.create_operator(dag, "a") preprocess_b_task = preprocess_operator.create_operator(dag, "b") train_task = train_operator.create_operator(dag) create_args_task >> profiler_task >> [ preprocess_a_task, preprocess_b_task, ] >> train_task def create_operator(dag, task_id, create_args_task_id): container_arguments = [ "--bucket_name", BUCKET_NAME, "preprocess", "--start_datetime", "{{ ti.xcom_pull(task_ids='" + create_args_task_id + "', key='preprocess_start_datetime') }}", "--bq_dataset_name", BQ_DATASET_NAME, "--gcs_path", GCS_PATH, ] operator = GKEPodOperator( task_id=task_id, project_id=PROJECT, location=CLUSTER_LOCATION, cluster_name=CLUSTER_NAME, namespace="default", image=IMAGE, arguments=container_arguments, dag=dag, ) return operator QSFQSPDFTT@PQFSBUPSQZ Import

Slide 26

Slide 26 text

 ୯७ͳػցֶशύΠϓϥΠϯΛ࡞ͬͯΈΔ

Slide 27

Slide 27 text

 ֶश͸1ZUIPO0QFSBUPSʁ લॲཧ͸#JH2VFSZ0QFSUBUPSʁ

Slide 28

Slide 28 text

(,&1PE0QFSBUPSͰશ෦΍Δ  جຊํ਑

Slide 29

Slide 29 text

 (,&1PE0QFSBUPSͰશ෦΍Δ ཧ༝ w .-ଆͷ࣮૷ͱύΠϓϥΠϯ࣮૷Λग़དྷΔ͚ͩಠཱ͍ͤͨ͞ w σʔλαΠΤϯςΟετਞʹύΠϓϥΠϯଆͷ࣮૷Λҙࣝͤͨ͘͞ͳ͍ w 1ZUIPO0QFSBUPSͷ੍໿ʢޙड़ʣ౳ɺ$PNQPTFS؀ڥ͸ෳࡶͳॲཧʹෆ޲͖ ۩ମతʹ w ผϦϙδτϦΛ࡞Γɺ.-ΞϧΰϦζϜ౳ͷ࣮૷͸ͦͪΒͰ؅ཧ͢Δ w લॲཧɺֶशɺͦͷଞࡉʑͨ͠ॲཧ͸શͯ%PDLFSΠϝʔδʹด͡ࠐΊΔ w #JH2VFSZΛ࢖͏৔߹΋ɺ42-ͱ+PCൃߦॲཧ͸ˢͷΠϝʔδʹด͡ࠐΊΔ w ෼ੳɺ࣮ݧɺϩʔΧϧͰͷ։ൃΛߟྀͯ͠΋͜ͷํ๏͕ಘࡦ

Slide 30

Slide 30 text

 1ZUIPO0QFSBUPSͷ੍໿ PythonOperator w 1Z1*ύοέʔδΛඞཁͱ͠ͳ͍ൣғͷ؆୯ͳॲཧͳΒ͓ͦΒ͘࠷దղ w 7BSJBCMFTHFUTFUͰ஋ͷड͚౉͕͠ඇৗʹָɺ9$PN΋༰қʹ࢖͑Δ w Ұํɺ1Z1*ύοέʔδΛඞཁͱ͢ΔॲཧͰ͸$PNQPTFS؀ڥΛԚછ͢Δ w "JSqPXͷύοέʔδґଘͱিಥ͢ΔͳͲɺ࠶ݱੑ͕ݫ͍͠ PythonVirtualenvOperator w ྑ͍ͱ͜औΓ͔ͱࢥ͍͖΍ѱ͍ͱ͜औΓͩͬͨ w 7BSJBCMFT΋9$PN΋࢖͑ͣɺ࢖͍উख͸(,&1PE0QFSBUPSҎԼ w ҰͭͷDBMMBCMFʹશͯΛ٧ΊࠐΉඞཁ͕͋Γɺඇৗʹ࢖͍ͮΒ͍

Slide 31

Slide 31 text

 8FC6*͔HDMPVEͰύοέʔδΛಋೖ͢Δඞཁ͕͋Γɺ؀ڥશମΛԚછ͢Δ ˞͔͠΋ߴ֬཰Ͱ"JSGMPXͷґଘͱিಥ͢Δ ˞ͪͳΈʹিಥ͢Δͱ"JSGMPX8PSLFS͕ࢮΜͰ%BH͕࣮ߦ͞Εͳ͘ͳΔ

Slide 32

Slide 32 text

 (,&1PE0QFSBUPSͰશ෦΍Δ࣌ͷ஫ҙ఺ σϝϦοτ w λεΫ࣮ߦ࣌ʹ7BSJBCMFT͕ίʔυ͔ΒࢀরͰ͖ͳ͍ w YDPN@QVTI YDPN@QVMM ΋࢖͑ͳ͍ ղܾࡦ w (,&1PE0QFSBUPSͰίϯςφҾ਺͔Βશͯ౉ͯ͠΍Δ w %PDLFSpMFͰHDMPVE4%,ͱLVCFDUMΛೖΕΕ͹େମԿͰ΋Ͱ͖Δ w ผ؀ڥͷݖݶ͕ඞཁͳ৔߹͸4FSWJDF"DDPVOU,FZpMFΛ҉߸Խͯ͠౉͢

Slide 33

Slide 33 text

 ೖग़ྗύε΍ूܭظؒͳͲ΋શͯίϚϯυϥΠϯҾ਺Ͱ੍ޚͰ͖ΔΑ͏ʹ͓ͯ͘͠ ˞ຊൃදͷൣғ֎͕ͩɺ1ZUIPO'JSF΍*OWPLF 'BCSJD Λ࢖͏ͱָ container_arguments = [ “preprocess", "--bucket_name", BUCKET_NAME, "--start_datetime", PREPROCESS_START_DATETIME, "--bq_dataset_name", BQ_DATASET_NAME, “—gcs_export_path", GCS_EXPORT_PATH, ] (,&1PE0QFSBUPSʹҾ਺Λ౉͢

Slide 34

Slide 34 text

 (,&1PE0QFSBUPSʹҾ਺Λ౉͢ BUCKET_NAME = Variable.get("bucket_name") CLUSTER_NAME = Variable.get("cluster_name") CLUSTER_LOCATION = Variable.get("cluster_location") IMAGE = f"gcr.io/{PROJECT}/test-image@{Variable.get('test_image_digest')}" BQ_PROFILE_DATASET_NAME = Variable.get("bq_dataset_name") ඞཁͳ7BSJBCMFT͸ࣄલʹऔಘ͓ͯ͘͠

Slide 35

Slide 35 text

 def create_operator(dag, task_id, create_args_task_id): container_arguments = [ “preprocess", "--bucket_name", BUCKET_NAME, "--start_datetime", "{{ ti.xcom_pull(task_ids='" + create_args_task_id + "', key='preprocess_start_datetime') }}", "--bq_dataset_name", BQ_DATASET_NAME, "--gcs_path", GCS_PATH, ] operator = GKEPodOperator( task_id=task_id, project_id=PROJECT, location=CLUSTER_LOCATION, cluster_name=CLUSTER_NAME, namespace="default", image=IMAGE, arguments=container_arguments, dag=dag, ) return operator +JOKBςϯϓϨʔτͰ 9$PN஋ΛࢀরͰ͖Δ BSHVNFOUTҾ਺͸ ςϯϓϨʔτஔ׵ର৅ ˞೾ׅހͰғͬͨจࣈྻ͕λεΫ࣮ߦ௚લʹ ςϯϓϨʔτஔ׵͞ΕΔ

Slide 36

Slide 36 text

 https://cloud.google.com/composer/docs/how-to/using/writing-dags

Slide 37

Slide 37 text

 ࣮ࡍͷӡ༻ྫ

Slide 38

Slide 38 text

 .-ΞϧΰϦζϜͷߋ৽Λࣗಈ൓ө ໨ඪ w .-ଆϦϙδτϦʹมߋ͕͋ͬͯ΋ɺύΠϓϥΠϯଆ͸मਖ਼ෆཁͳঢ়ଶ͕ཧ૝ w (,&1PE0QFSBUPSͰ࣮ߦ͞ΕΔΠϝʔδΛߋ৽͢Ε͹͍͍͚ͩɺͱ͍͏ঢ়ଶ ۩ମతʹ w .-ଆϦϙδτϦͷNBTUFSϒϥϯνʹϚʔδ͞Εͨࡍɺ$JSDMF$*ͰࣗಈϏϧυ w Ϗϧυ͞Εͨ($3*NBHF%JHFTUΛHDMPVEDPNQPTFSWBSJBCMFTTFUͰઃఆ͢Δ w ࣍ճύΠϓϥΠϯ࣮ߦ࣌ʹ͸উखʹߋ৽͕൓ө͞Ε͍ͯΔ w ίϚϯυϥΠϯҾ਺มߋ΍ػೳ௥Ճ͕͋ͬͨࡍ͸΍Ήͳ͘ύΠϓϥΠϯΛमਖ਼

Slide 39

Slide 39 text

 build_dev: docker: - image: google/cloud-sdk environment: GCP_PROJECT: dummy-gcp COMPOSER_NAME: dummy-composer IMAGE_TAG: dummy-tag steps: - checkout - setup_remote_docker: docker_layer_caching: true - attach_workspace: at: . - run: name: build command: &build | TAG=gcr.io/${GCP_PROJECT}/test-image:${IMAGE_TAG} docker build -t ${TAG} -f images/runner/Dockerfile . docker push ${TAG} IMAGE_DIGEST=$(gcloud container images describe gcr.io/${GCP_PROJECT}/test-image: ${IMAGE_TAG} —format='value(image_summary.digest)') gcloud composer environments run ${COMPOSER_NAME} --location asia-northeast1 variables -- -s pipeline_image_digest ${IMAGE_DIGEST} .-ଆϦϙδτϦ಺ʹஔ͔Εͨ$JSDMF$*༻ͷDPOGJHZNM ($3*NBHF%JHFTUΛ 7BSJBCMFTʹొ࿥

Slide 40

Slide 40 text

 ʰ͓٬༷୳ࡧφϏʱͷਪ࿦ύΠϓϥΠϯ ௒୯७ʛ෼͝ͱʹਪ࿦όονΛ࣮ߦ͢Δ͚ͩ σϓϩΠύΠϓϥΠϯͰ࢖͏*NBHFʹϞσϧͷ1JDLMFΛՃ͚͑ͨͩͷਪ࿦༻*NBHFΛ༻͍͍ͯΔ

Slide 41

Slide 41 text

 ϞσϧͷධՁͱࣗಈσϓϩΠ ϞσϧͷࣗಈσϓϩΠ w ਪ࿦༻ͷ*NBHF%JHFTUΛ࣋ͭ7BSJBCMFΛ্ॻ͖͢Ε͹Α͍ w ͭ·Γɺ$JSDMF$*ͰσϓϩΠύΠϓϥΠϯ*NBHFΛߋ৽͍ͯͨ͠ͷͱຆͲಉ͡ w ࠷ޙஈͷλεΫͰ৽چϞσϧͷൺֱධՁͱ7BSJBCMFͷ্ॻ͖Λߦ͏ ϞσϧͷධՁ w ৄࡉ͸ল͕͘ɺλΫγʔ৐຿γϛϡϨʔλͰ࠷ऴతͳϞσϧධՁΛߦ͍ͬͯΔ w ৽ϞσϧͱطଘϞσϧͷ྆ํͰόονਪ࿦ͱγϛϡϨʔγϣϯΛฒྻ࣮ߦ w Ϟσϧߋ৽ج४ʛ̎िؒ࿈ଓͰطଘϞσϧͷύϑΥʔϚϯεΛ্ճΔ͜ͱ

Slide 42

Slide 42 text

 ਪ࿦༻ΠϝʔδΛϏϧυ ৽ϞσϧͱطଘϞσϧ ฒྻʹόονਪ࿦ σϓϩΠ൑ఆ͠ɺ 7BSJBCMFΛ্ॻ͖

Slide 43

Slide 43 text

 ίετ໘ͷ΋Ζ΋Ζ w $PNQPTFS͸Ϋϥελʹཁٻ͢Δ࠷খϦιʔε͕ॏΊ w Ҋ݅͝ͱʹ$PNQPTFS؀ڥΛ༻ҙ͢Δͱ͚ͬ͜͏ߴ͍ w શ͘ҧ͏ϓϩμΫτͰͳ͚Ε͹ಉ͡؀ڥʹ%BHΛڞଘͤ͞Δ w (,&ͷݻఆඅ࡟ݮͷͨΊɺ"*1MBUGPSN+PCΛ׆༻͢Δ

Slide 44

Slide 44 text

 ·ͱΊ

Slide 45

Slide 45 text

 $MPVE$PNQPTFS͸ߏஙɾӡ༻ɾ࣮૷͕͓खܰͳϫʔΫϑϩʔΤϯδϯ ύΠϓϥΠϯ͸͋͘·ͰΨϫͰ͔͠ͳ͍ɻ࣮ࡍͷॲཧ͸(,&΍"*1MBUGPSNʹશͯ೚ͤΔ .-ଆϦϙδτϦͷߋ৽ΛࣗಈͰऔΓࠐΈɺύΠϓϥΠϯଆͷ࣮૷͸શ͘मਖ਼ෆཁͳঢ়ଶ͕ཧ૝ $PNQPTFSͷ༗ޮ׆༻ʹΑͬͯɺલॲཧ͔ΒϞσϧͷຊ൪σϓϩΠ·ͰࣗಈԽͱ҆ఆӡ༻Λ࣮ݱͰ͖Δ