Upgrade to Pro — share decks privately, control downloads, hide ads and more …

dbt serverless - How to run dbt in your AWS acc...

nicor88
October 14, 2019

dbt serverless - How to run dbt in your AWS account

A blueprint on how to run dbt in AWS using ECS containers and StepFunctions without using servers.

nicor88

October 14, 2019
Tweet

Other Decks in Programming

Transcript

  1. whoami • Currently: Senior Data Engineer at hey.car • Previously:

    8fit, Babbel, Engineering SpA • Specialities: data plumbing, data modelling, cloud data architectures 2
  2. Running dbt in Production (from dbt docs) • Using dbt

    cloud • Using Airflow ◦ Using dbt-cloud-plugin ◦ Using bash operator • Using an automation server (Code Deploy, Gitlab CI/CD, Bamboo or Jenkins) • Using cron 3
  3. Benefits of Containers • Platform independence: build it once, run

    it anywhere • Effective isolation and resource sharing • Improved developer productivity and development pipeline • Easy integration with Continuous Delivery pipelines 5
  4. This is how a dbt Docker image looks like FROM

    python:3.7.4-slim-stretch MAINTAINER nicor88 RUN pip install dbt==0.14.3 COPY config/profiles.dist.yml /root/.dbt/profiles.yml WORKDIR /dbt COPY dbt_project.yml /dbt/dbt_project.yml COPY macros /dbt/macros COPY models /dbt/models COPY tests /dbt/tests # install dbt deps RUN dbt deps 6
  5. ECS Fargate • Launched in 2017 • Run containers without

    maintaining underlying infrastructure ◦ No EC2 machine needed • Scale your application based on your need • Support longer execution compare to AWS Lambda (only 15 minutes) • Limited Volume Size: only 4GB • In 2019 AWS dropped the pricing for AWS Fargate by up to 50% 7
  6. AWS Fargate Pricing • per vCPU per hour (e.g eu-west-1)

    : 0.04048 $ • per vCPU per second e.g eu-west-1): 0.00001124444444 $ Example 1 A container running for 1 hour every day it will cost 0.3036$ per month with 0.25 vCPU Example 2 A container running for 20 minutes, every hour it will cost 2.4288$ per month with 0.25 vCPU Note When running dbt you can use the minimum container size, because the computation happen in the DB. 8
  7. Ingredients for a serverles setup in AWS • Basic Networking:

    VPC, Internet Gateway, Subnet, Security Group • ECR Registry (or Docker repository) • Elastic Container Cluster (ECS) - Only a logical grouping of tasks • ECS IAM Role for ECS task + IAM policy • Cloudwatch Log Group • ECS task definition with Launch Type FARGATE 9
  8. 10

  9. Orchestration and scheduler • ECS containers can be orchestrated using

    AWS Step Function. • A step function can be triggered using Cloudwatch events using simple cron syntax • Step Function enables the execution of complex workflows ◦ We can ingest data from an API using a Lambda function, then trigger a dbt run and a dbt test ◦ We can be informed when an dbt run/test fails • An example can be found here 11
  10. How to release new dbt models? Package a new Docker

    image with the latest models on each merge to your master branch: • CircleCI (e.g you can use this orb to deploy to ECR) • Gitlab CI/CD • AWS Code Pipeline with Code Build • Github actions The ECS task will always use the latest image. 15
  11. Trigger ECS tasks from Airflow • Develop your own operator

    to trigger dbt jobs in ECS from Airflow using boto3. • Cheap alternative compare to dbt cloud Here an example on how an ECS plugin for Airflow looks like 17
  12. Resources • Running dbt in Production • AWS ECS Fargate

    Deep Dive • ECS Workshop • AWS Fargate Pricing • Github repo dbt-serverless 19