Upgrade to Pro — share decks privately, control downloads, hide ads and more …

dbt coreとFargateでデータ変換 / 20240928-jawsug-toyama...

kasacchiful
September 28, 2024

dbt coreとFargateでデータ変換 / 20240928-jawsug-toyama-hokuriku-shinkansen

2024/09/28 (土) JAWS-UG富山#1 & JAWS-UG北陸新幹線#2 で発表した資料

イベントページ
https://jawsug-toyama.connpass.com/event/328774/

kasacchiful

September 28, 2024
Tweet

More Decks by kasacchiful

Other Decks in Programming

Transcript

  1. Ϟσϧͷྫ ECU  {{ config( materialized='view' )}} with final as

    ( select c.customer_id, c.first_name, c.last_name, co.first_order_date, co.most_recent_order_date, coalesce(co.number_of_orders, 0) as number_of_orders from {{ ref('customers') }} as c left join {{ ref('customer_orders') }} as co on c.customer_id = co.customer_id ) select * from final
  2. +JOKBςϯϓϨʔτΛ࢖ͬͨΫΤϦ wෳࡶͳΫΤϦॲཧͷهड़͕Մೳ wศརͳϚΫϩ͕༻ҙ͞Ε͍ͯΔ ECU  select c.customer_id, c.first_name, c.last_name, co.first_order_date,

    co.most_recent_order_date, coalesce(co.number_of_orders, 0) as number_of_orders from {{ ref('customers') }} as c left join {{ ref('customer_orders') }} as co on c.customer_id = co.customer_id
  3. ςετ wTDIFNBZNMఆٛʹͯɺϞσϧ͕ظ଴௨Γಈ࡞͢Δ͔ςε τՄೳ ECU  version: 2 models: - name:

    customers columns: - name: customer_id tests: - unique - not_null - name: stg_customers columns: - name: customer_id tests: - name: stg_orders columns: - name: order_id tests: - unique - not_null - name: status tests: - accepted_values: values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id tests: - not_null - relationships: to: ref('stg_customers') field: customer_id
  4. wDELϓϩδΣΫτ഑ԼʹECUϓϩδΣΫτΛ࡞੒ wDELUZQFTDSJQUECUQZUIPO "84ͷߏ੒͸$%,Ͱఆٛ  ## CDKϓϩδΣΫτ࡞੒ npx cdk init --langage

    typescript mkdir dbt cd dbt pip install dbt-core dbt-athena-community ## dbtϓϩδΣΫτ࡞੒ dbt init dbt_src --profiles-dir .
  5. σΟϨΫτϦߏ੒ྫ  / ├─ bin │ └─ dbt_core_env.ts ## CDKؔ࿈

    ├─ dbt ## Dockerίϯςφؔ࿈ │ ├─ dbt_src ## dbtؔ࿈ │ │ ├─ <dbtؔ࿈ϑΝΠϧɾσΟϨΫτϦ> │ │ ├─ dbt_project.yml │ │ ├─ profiles.yml │ │ └─ run_dbt.sh │ ├─ Dockerfile │ └─ requirements.txt ├─ lib │ └─ dbt_core_env-stack.ts ## CDKελοΫؔ࿈ ├─ .gitignore ├─ cdk.json └─ <লུ>
  6. wSVO@ECUTIΛ࣮ߦ͢ΔΑ͏ʹઃఆ %PDLFS fi MF  FROM python:3.12 ADD dbt_src /dbt_src

    COPY requirements.txt / RUN pip install -U pip RUN pip install --no-cache-dir -r /requirements.txt RUN pip install --no-cache-dir awscli WORKDIR /dbt_src RUN chmod -R 755 . ENTRYPOINT [ "/bin/sh", "-c" ] CMD [ "./run_dbt.sh" ]
  7. wECUSVO΍ECUEPDTHFOFSBUFΛ࣮ߦ͠ɺυΩϡϝϯτΛ 4όέοτʹίϐʔ SVO@ECUTI  #!/bin/bash set -e echo "Running dbt

    commands" dbt run --profiles-dir . --project-dir . --target dev echo "" echo "Generate documentation files" dbt docs generate --profiles-dir . --project-dir . --target dev echo "" echo "Copying dbt documentation files for hosting" aws s3 cp target/ s3://dbtcoreenvkas-dbt-doc-bucket/dbt_core_env/ --recursive -- exclude="*" --include="*.json" --include="*.html" echo ""
  8. w&$3ʹͯϓογϡίϚϯυ͕දࣔ͞ΕΔͷͰɺͦΕʹैͬͯ ΠϝʔδΛϓογϡ &$3ʹίϯςφΠϝʔδΛϓογϡ  ## docker login aws ecr get-login-password

    --region ap-northeast-1 | docker login --username AWS -- password-stdin <AWSΞΧ΢ϯτID>.dkr.ecr.ap-northeast-1.amazonaws.com ## docker build docker build -t dbtcoreenvkas-dbt-ecr-repo . ## tagging docker tag dbtcoreenvkas-dbt-ecr-repo:latest <AWSΞΧ΢ϯτID>.dkr.ecr.ap- northeast-1.amazonaws.com/dbtcoreenvkas-dbt-ecr-repo:latest ## ECRʹϓογϡ docker push <AWSΞΧ΢ϯτID>.dkr.ecr.ap-northeast-1.amazonaws.com/dbtcoreenvkas-dbt-ecr- repo:latest