$30 off During Our Annual Pro Sale. View Details »

Airflowのチュートリアルやってみた

 Airflowのチュートリアルやってみた

2023-06-30
ENECHANGE Tech Talk(社内勉強会)

iwamot

June 30, 2023
Tweet

More Decks by iwamot

Other Decks in Technology

Transcript

  1. Airflowのチュートリアルやってみた
    2023-06-30
    ENECHANGE Tech Talk(社内勉強会)
    CTO室 岩本隆史

    View Slide

  2. Airflow案件に関わることに

    View Slide

  3. よい機会なのでMWAAを試したい
    https://aws.amazon.com/jp/managed-workflows-for-apache-airflow/

    View Slide

  4. チュートリアルをやってみよう
    https://docs.aws.amazon.com/mwaa/latest/userguide/quick-start.html

    View Slide

  5. めっちゃ時間かかった…
    https://docs.aws.amazon.com/mwaa/latest/userguide/quick-start.html#quick-start-
    createstack

    View Slide

  6. Dockerだと数分で構築完了
    curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.2/docker-compose.yaml'
    mkdir -p ./dags ./logs ./plugins ./config
    echo -e "AIRFLOW_UID=$(id -u)" > .env
    docker compose up airflow-init
    docker compose up
    https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/

    View Slide

  7. サンプルDAGも豊富

    View Slide

  8. tutorial DAGを実行

    View Slide

  9. 成功

    View Slide

  10. 3つのタスク

    View Slide

  11. タスク1=日付の出力
    t1 = BashOperator(
    task_id="print_date",
    bash_command="date",
    )
    [2023-06-22, 06:52:22 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'date']
    [2023-06-22, 06:52:22 UTC] {subprocess.py:86} INFO - Output:
    [2023-06-22, 06:52:22 UTC] {subprocess.py:93} INFO - Thu Jun 22 06:52:22 UTC 2023
    [2023-06-22, 06:52:22 UTC] {subprocess.py:97} INFO - Command exited with return code 0

    View Slide

  12. タスク2=スリープ
    t2 = BashOperator(
    task_id="sleep",
    depends_on_past=False,
    bash_command="sleep 5",
    retries=3,
    )
    [2023-06-22, 06:52:25 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'sleep 5']
    [2023-06-22, 06:52:25 UTC] {subprocess.py:86} INFO - Output:
    [2023-06-22, 06:52:30 UTC] {subprocess.py:97} INFO - Command exited with return code 0

    View Slide

  13. タスク3=テンプレートの利用
    templated_command = dedent(
    """
    {% for i in range(5) %}
    echo "{{ ds }}"
    echo "{{ macros.ds_add(ds, 7)}}"
    {% endfor %}
    """
    )
    t3 = BashOperator(
    task_id="templated",
    depends_on_past=False,
    bash_command=templated_command,
    )

    View Slide

  14. 10個のechoにレンダリング
    echo "2023-06-22"
    echo "2023-06-29"
    echo "2023-06-22"
    echo "2023-06-29"
    echo "2023-06-22"
    echo "2023-06-29"
    echo "2023-06-22"
    echo "2023-06-29"
    echo "2023-06-22"
    echo "2023-06-29"

    View Slide

  15. 10個の日付が出力
    [2023-06-22, 06:52:25 UTC] {subprocess.py:86} INFO - Output:
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-22
    [2023-06-22, 06:52:25 UTC] {subprocess.py:93} INFO - 2023-06-29
    [2023-06-22, 06:52:25 UTC] {subprocess.py:97} INFO - Command exited with return code 0

    View Slide

  16. タスク依存関係は演算子で指定
    t1 >> [t2, t3]

    View Slide

  17. 別のチュートリアルも実行

    View Slide

  18. Extract
    @task()
    def extract():
    data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'
    order_data_dict = json.loads(data_string)
    return order_data_dict
    Key Value
    return_value {'1001': 301.27, '1002': 433.21, '1003': 502.22}

    View Slide

  19. Transform
    @task(multiple_outputs=True)
    def transform(order_data_dict: dict):
    total_order_value = 0
    for value in order_data_dict.values():
    total_order_value += value
    return {"total_order_value": total_order_value}
    Key Value
    total_order_value 1236.7
    return_value {'total_order_value': 1236.7}

    View Slide

  20. Load
    @task()
    def load(total_order_value: float):
    print(f"Total order value is: {total_order_value:.2f}")
    [2023-06-22, 07:55:00 UTC] {logging_mixin.py:149} INFO - Total order value is: 1236.70

    View Slide

  21. タスク依存関係は自動解決
    order_data = extract()
    order_summary = transform(order_data)
    load(order_summary["total_order_value"])

    View Slide

  22. 実はAirflow 2.0の新機能
    @task
    def hello_name(name: str):
    print(f'Hello {name}!')
    hello_name('Airflow users')

    View Slide

  23. Dockerで気軽に試そう

    View Slide