Slide 1

Slide 1 text

Airflowのチュートリアルやってみた 2023-06-30 ENECHANGE Tech Talk(社内勉強会) CTO室 岩本隆史

Slide 2

Slide 2 text

Airflow案件に関わることに

Slide 3

Slide 3 text

よい機会なのでMWAAを試したい https://aws.amazon.com/jp/managed-workflows-for-apache-airflow/

Slide 4

Slide 4 text

チュートリアルをやってみよう https://docs.aws.amazon.com/mwaa/latest/userguide/quick-start.html

Slide 5

Slide 5 text

めっちゃ時間かかった… https://docs.aws.amazon.com/mwaa/latest/userguide/quick-start.html#quick-start- createstack

Slide 6

Slide 6 text

Dockerだと数分で構築完了 curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.2/docker-compose.yaml' mkdir -p ./dags ./logs ./plugins ./config echo -e "AIRFLOW_UID=$(id -u)" > .env docker compose up airflow-init docker compose up https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/

Slide 7

Slide 7 text

サンプルDAGも豊富

Slide 8

Slide 8 text

tutorial DAGを実行

Slide 9

Slide 9 text

成功

Slide 10

Slide 10 text

3つのタスク

Slide 11

Slide 11 text

タスク1=日付の出力 t1 = BashOperator( task_id="print_date", bash_command="date", ) [2023-06-22, 06:52:22 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'date'] [2023-06-22, 06:52:22 UTC] {subprocess.py:86} INFO - Output: [2023-06-22, 06:52:22 UTC] {subprocess.py:93} INFO - Thu Jun 22 06:52:22 UTC 2023 [2023-06-22, 06:52:22 UTC] {subprocess.py:97} INFO - Command exited with return code 0

Slide 12

Slide 12 text

タスク2=スリープ t2 = BashOperator( task_id="sleep", depends_on_past=False, bash_command="sleep 5", retries=3, ) [2023-06-22, 06:52:25 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'sleep 5'] [2023-06-22, 06:52:25 UTC] {subprocess.py:86} INFO - Output: [2023-06-22, 06:52:30 UTC] {subprocess.py:97} INFO - Command exited with return code 0

Slide 13

Slide 13 text

タスク3=テンプレートの利用 templated_command = dedent( """ {% for i in range(5) %} echo "{{ ds }}" echo "{{ macros.ds_add(ds, 7)}}" {% endfor %} """ ) t3 = BashOperator( task_id="templated", depends_on_past=False, bash_command=templated_command, )

Slide 14

Slide 14 text

10個のechoにレンダリング echo "2023-06-22" echo "2023-06-29" echo "2023-06-22" echo "2023-06-29" echo "2023-06-22" echo "2023-06-29" echo "2023-06-22" echo "2023-06-29" echo "2023-06-22" echo "2023-06-29"

Slide 15

Slide 15 text

10個の日付が出力 [2023-06-22, 06:52:25 UTC] {subprocess.py:86} INFO - Output: [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22 [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29 [2023-06-22, 06:52:25 UTC] {subprocess.py:97} INFO - Command exited with return code 0

Slide 16

Slide 16 text

タスク依存関係は演算子で指定 t1 >> [t2, t3]

Slide 17

Slide 17 text

別のチュートリアルも実行

Slide 18

Slide 18 text

Extract @task() def extract(): data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}' order_data_dict = json.loads(data_string) return order_data_dict Key Value return_value {'1001': 301.27, '1002': 433.21, '1003': 502.22}

Slide 19

Slide 19 text

Transform @task(multiple_outputs=True) def transform(order_data_dict: dict): total_order_value = 0 for value in order_data_dict.values(): total_order_value += value return {"total_order_value": total_order_value} Key Value total_order_value 1236.7 return_value {'total_order_value': 1236.7}

Slide 20

Slide 20 text

Load @task() def load(total_order_value: float): print(f"Total order value is: {total_order_value:.2f}") [2023-06-22, 07:55:00 UTC] {logging_mixin.py:149} INFO - Total order value is: 1236.70

Slide 21

Slide 21 text

タスク依存関係は自動解決 order_data = extract() order_summary = transform(order_data) load(order_summary["total_order_value"])

Slide 22

Slide 22 text

実はAirflow 2.0の新機能 @task def hello_name(name: str): print(f'Hello {name}!') hello_name('Airflow users')

Slide 23

Slide 23 text

Dockerで気軽に試そう