Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介

Naoki Matsuda
September 17, 2019

[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介

PyCon JP 2019の発表資料です。

Naoki Matsuda

September 17, 2019
Tweet

More Decks by Naoki Matsuda

Other Decks in Technology

Transcript

  1. Agenda 0. ࣗݾ঺հ 1. Airflowͷ֓ཁ 2. Airflowͷࣾ಺ࣄྫ঺հ - ։ൃϓϩμΫτ֓ཁͱ՝୊, Airflowͷ؀ڥ

    3. AirflowͰͭ·͍ͮͨ఺ - λεΫؒͷσʔλͷ΍ΓͱΓ - DAGಈ࡞֬ೝ ~ dockerͰϩʔΧϧ։ൃ؀ڥߏங
  2. Apache Airflow֓ཁ όονॲཧ͔ΒͳΔϫʔΫϑϩʔͷεέδϡʔϦϯάˍϞχλ Ϧϯά͕ՄೳͳϓϥοτϑΥʔϜ - Airbnbࣾ੡ - Φʔϓϯιʔε (Apache software

    foundationͷincubation project)1,2 - PythonͰ࣮૷͞Ε͍ͯΔ3 - ։ൃίϛϡχςΟ͕׆ൃ3  IUUQTJODVCBUPSBQBDIFPSHQSPKFDUTBJSGMPXIUNM  IUUQTBJSGMPXBQBDIFPSHMJDFOTFIUNM IUUQTHJUIVCDPNBQBDIFBJSGMPX
  3. ϫʔΫϑϩʔΛߏ੒͢ΔλεΫͷ࡞੒ - BashίϚϯυ࣮ߦ: BashOperator - Pythonؔ਺࣮ߦ: PythonOperator - SQL࣮ߦ: MySqlOperator,

    PostgresOperator, … - HTTPϦΫΤετૹ৴: SimpleHttpOperator - ͦͷଞΫϥ΢υܥͳͲ: BigQueryOperator, AWSAthenaOperator, … - ಛఆ৚݅Ληϯγϯά: Sensor IUUQTBJSGMPXBQBDIFPSHDPODFQUTIUNMPQFSBUPST IUUQTBJSGMPXBQBDIFPSH@BQJBJSGMPXPQFSBUPSTJOEFYIUNM
  4. ։ൃ૊৫ͱϓϩμΫτʹ͍ͭͯ ɾɾɾ ޿ࠂ഑৴ σʔλ ϓϥϯφʔ Ӧۀ - ڈ೥7~9݄, - GoogleDispla

    y - ҿྉۀք imp 100000 clicks 5000 cv 700 cost ɾɾɾ ϝσΟΞ  ։ൃ૊৫  νʔϜن໛ɿσʔλΤϯδχΞ ໊ σʔλαΠΤϯςΟετͳͲ໊ d  ϓϩμΫτ  ࣾ಺޲͚σδλϧ޿ࠂϓϥϯχϯάπʔϧ  ࣾ಺Ͱѻ͏ϝσΟΞɾΫϥΠΞϯτͷ޿ࠂ഑৴࣮੷σʔλΛՄࢹԽˍ༧ଌ ϓϩμΫτ (PPHMF :BIPP 5XJUUFS 'BDFCPPL -*/&
  5. Airflowߏ੒ apache-airflow 1.10.2 web worker scheduler Amazon S3 Amazon RDS

    Airflow AWS Fargate Amazon ElastiCache Redis Elastic Load Balancing flower DAGs - AWS FargateʹAirflowΛσϓϩΠ ≈ ≈
  6. λεΫؒͷσʔλͷ΍ΓͱΓ λεΫؒͷσʔλ΍ΓͱΓ͸XComΛ࢖͏ - XComͷ࢖͍ํ - XCom΁σʔλΛpush - ؔ਺಺Ͱreturn - ؔ਺಺Ͱkwargs['task_instance’].

    xcom_push(value=hoge, key=‘huga’) - ஋Λฦ͢Operator ྫ: BigqueryGetDataOperator - XCom͔ΒσʔλΛpull - kwargs['task_instance’].xcom_pull() metadata database
  7. λεΫؒͷσʔλͷ΍ΓͱΓ provide_contextΛTrueʹ͠ͳ͍ͱkwargs[‘task_intance’]ͰKeyError - provide_context=False (default) kwargs : {} - provide_context=True

    kwargs: { 'dag': <DAG: sample>, 'ds': '2019-09-10’, 'next_ds': '2019-09-10’, … 'task_instance’: <TaskInstance: sample.task1_2 …> … }
  8. ฒྻઃఆ Configuring parallelism in airflow.cfg - parallelism : ෼ࢄॲཧΫϥελશମͰ࣮ߦՄೳͳϓϩηε਺ -

    dag_concurrency : ҰͭͷϫʔΧͰಉ࣮࣌ߦՄೳͳ࠷େϓϩηε਺ - max_active_runs_per_dag : DAG಺෦Ͱಉ࣮࣌ߦՄೳͳ࠷େλε Ϋ਺ - worker_concurrency : ҰͭͷCeleryϫʔΧͰಉ࣮࣌ߦՄೳͳ࠷େ ϓϩηε਺  IUUQTBOBMZUJDTMJWFTFOTFDPKQFOUSZ