Upgrade to Pro — share decks privately, control downloads, hide ads and more …

dbt serverless - How to run dbt in your AWS acc...

Avatar for nicor88 nicor88
October 14, 2019

dbt serverless - How to run dbt in your AWS account

A blueprint on how to run dbt in AWS using ECS containers and StepFunctions without using servers.

Avatar for nicor88

nicor88

October 14, 2019
Tweet

Other Decks in Programming

Transcript

  1. whoami • Currently: Senior Data Engineer at hey.car • Previously:

    8fit, Babbel, Engineering SpA • Specialities: data plumbing, data modelling, cloud data architectures 2
  2. Running dbt in Production (from dbt docs) • Using dbt

    cloud • Using Airflow ◦ Using dbt-cloud-plugin ◦ Using bash operator • Using an automation server (Code Deploy, Gitlab CI/CD, Bamboo or Jenkins) • Using cron 3
  3. Benefits of Containers • Platform independence: build it once, run

    it anywhere • Effective isolation and resource sharing • Improved developer productivity and development pipeline • Easy integration with Continuous Delivery pipelines 5
  4. This is how a dbt Docker image looks like FROM

    python:3.7.4-slim-stretch MAINTAINER nicor88 RUN pip install dbt==0.14.3 COPY config/profiles.dist.yml /root/.dbt/profiles.yml WORKDIR /dbt COPY dbt_project.yml /dbt/dbt_project.yml COPY macros /dbt/macros COPY models /dbt/models COPY tests /dbt/tests # install dbt deps RUN dbt deps 6
  5. ECS Fargate • Launched in 2017 • Run containers without

    maintaining underlying infrastructure ◦ No EC2 machine needed • Scale your application based on your need • Support longer execution compare to AWS Lambda (only 15 minutes) • Limited Volume Size: only 4GB • In 2019 AWS dropped the pricing for AWS Fargate by up to 50% 7
  6. AWS Fargate Pricing • per vCPU per hour (e.g eu-west-1)

    : 0.04048 $ • per vCPU per second e.g eu-west-1): 0.00001124444444 $ Example 1 A container running for 1 hour every day it will cost 0.3036$ per month with 0.25 vCPU Example 2 A container running for 20 minutes, every hour it will cost 2.4288$ per month with 0.25 vCPU Note When running dbt you can use the minimum container size, because the computation happen in the DB. 8
  7. Ingredients for a serverles setup in AWS • Basic Networking:

    VPC, Internet Gateway, Subnet, Security Group • ECR Registry (or Docker repository) • Elastic Container Cluster (ECS) - Only a logical grouping of tasks • ECS IAM Role for ECS task + IAM policy • Cloudwatch Log Group • ECS task definition with Launch Type FARGATE 9
  8. 10

  9. Orchestration and scheduler • ECS containers can be orchestrated using

    AWS Step Function. • A step function can be triggered using Cloudwatch events using simple cron syntax • Step Function enables the execution of complex workflows ◦ We can ingest data from an API using a Lambda function, then trigger a dbt run and a dbt test ◦ We can be informed when an dbt run/test fails • An example can be found here 11
  10. How to release new dbt models? Package a new Docker

    image with the latest models on each merge to your master branch: • CircleCI (e.g you can use this orb to deploy to ECR) • Gitlab CI/CD • AWS Code Pipeline with Code Build • Github actions The ECS task will always use the latest image. 15
  11. Trigger ECS tasks from Airflow • Develop your own operator

    to trigger dbt jobs in ECS from Airflow using boto3. • Cheap alternative compare to dbt cloud Here an example on how an ECS plugin for Airflow looks like 17
  12. Resources • Running dbt in Production • AWS ECS Fargate

    Deep Dive • ECS Workshop • AWS Fargate Pricing • Github repo dbt-serverless 19