Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where should I run my Database?

OnGres
March 23, 2023

Where should I run my Database?

Until recently, there were two main possible approaches to running your database: "manually" (either on-prem or on bare cloud instances) or using a cloud-provided managed service (DBaaS). Since some years ago, there's a growing trend: run them on Kubernetes.

This talk explores and compare the three alternatives, so you can choose the best one for your use case.

OnGres

March 23, 2023
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. Where should I run my DB
    Where should I run my
    database?
    (on-prem, DBaaS or K8s)
    Alvaro Hernandez
    @ahachete

    View Slide

  2. Where should I run my DB
    ` whoami `
    Alvaro Hernandez

    aht.es
    ● Founder & CEO, OnGres
    ● 20+ years Postgres user and DBA
    ● Mostly doing R&D to create new, innovative
    software on Postgres
    ● More than 120 tech talks, most about Postgres
    ● Founder and President of the NPO Fundación
    PostgreSQL
    ● AWS Data Hero

    View Slide

  3. Where should I run my DB
    What are the possible
    options?

    View Slide

  4. Where should I run my DB
    What are the possible options?
    ● On-prem (or cloud instances)
    ● Kubernetes
    ● DBaaS (managed service)

    View Slide

  5. Where should I run my DB
    Deploying Postgres
    “on-prem”

    View Slide

  6. Where should I run my DB
    apt-get install postgresql
    # yes but well...
    # will you deploy this to prod?
    How to deploy Postgres

    View Slide

  7. Where should I run my DB
    OK, we need to tune the database
    2-8h
    Postgres DBA

    View Slide

  8. Where should I run my DB
    We need to add connection pooling
    pg_bench, scale 2000, m4.large
    (2 vCPU, 8GB RAM, 1k IOPS)
    4-16h
    DevOps / pgDBA

    View Slide

  9. Where should I run my DB
    And High Availability!
    8-24h
    DevOps / pgDBA
    ● HA software (e.g. Patroni)
    ● Distributed configuration
    ● Entrypoint:
    ○ DNS?
    ○ Virtual IP?
    ○ External discovery service (e.g. Consul)?

    View Slide

  10. Where should I run my DB
    Do you backup your data?
    4-16h
    DevOps
    ● Backup software (e.g. WAL-G, pgBackRest)
    ● Backup Storage
    ● Backups lifecycle management
    ● Backup testing / restoration

    View Slide

  11. Where should I run my DB
    You wouldn’t deploy Postgres without monitoring, would you?
    8-24h
    DevOps / pgDBA

    View Slide

  12. Where should I run my DB
    Do you leave Postgres logs on each server?
    4-48h
    DevOps
    ● Configure CSV logging
    ● Add a logging agent (e.g. FluentBit) to export
    logs
    ● Add a logging collector (e.g. Fluentd) to collect
    logs, write code to store it and manage lifecycle.
    ● Or use a paid logs-as-a-Service

    View Slide

  13. Where should I run my DB
    For advanced users: proxy Postgres traffic with Envoy
    8-16h
    DevOps
    ● Exports additional network metrics to
    Prometheus
    ● Offloads Postgres SSL
    https://www.cncf.io/blog/2020/08/13/envoy-1-15-in
    troduces-a-new-postgres-extension-with-monitoring-
    support/

    View Slide

  14. Where should I run my DB
    Install cluster management software
    ?h
    DevOps
    ??????????????

    View Slide

  15. Where should I run my DB
    IaC: Infrastructure as Code
    48-96h
    DevOps

    View Slide

  16. Where should I run my DB
    This is the whole stack you need to deploy Postgres

    View Slide

  17. Where should I run my DB
    Deploying Postgres
    on Kubernetes

    View Slide

  18. Where should I run my DB
    Fighting with K8s?
    ● Kelsey Hightower argues that you need to “fight” K8s to run stateful
    workloads.
    ● Certainly, a bit. But is doable.
    ● Operators have done this already.
    Don’t run databases on Kuberntes “by hand”, use operators.

    View Slide

  19. Where should I run my DB
    Deploy a simple cluster with Kubernetes (w/ StackGres)
    1h
    CKA
    apiVersion: stackgres.io/v1
    kind: SGCluster
    metadata:
    name: simple
    spec:
    instances: 2
    postgres:
    version: 'latest'
    pods:
    persistentVolume:
    size: '100Gi'

    View Slide

  20. Where should I run my DB
    Deploy an advanced cluster with Kubernetes (w/ StackGres)
    4-16h
    CKA
    ● Create YAMLs for several CRDs
    ● Create Ingress if needed
    ● Expose Web Console (Ingress/LB)
    ● Integrate with GitOps

    View Slide

  21. Where should I run my DB
    Total time to deploy whole Postgres Stack
    Without Kubernetes Kubernetes
    Postgres Configuration 2-8h (DBA)
    Connection pooling 4-16h (DevOps + DBA)
    High Availability 8-24h (DevOps + DBA)
    Backups 4-16h (DevOps)
    Monitoring 8-24h (DevOps + DBA)
    Distributed Logs 4-48h (DevOps)
    Envoy proxy 8-16h (DevOps)
    Cluster Management ????
    IaC 48-96h (DevOps)
    Total: 86-248h (DevOps + DBA) 4-16h CKA

    View Slide

  22. Where should I run my DB
    ● Kubernetes also allows to automate Day 2 operations
    ● CKA is enough, mostly no Postgres expertise needed
    ● E.g. Day 2 operations implemented in StackGres:
    ○ Repack
    ○ Vacuum
    ○ Repack
    ○ Minor version upgrade
    ○ Major version upgrade
    ○ Controlled restart
    ○ Benchmark
    Automating Day 2 operations

    View Slide

  23. Where should I run my DB
    Managed Services
    (DBaaS)

    View Slide

  24. Where should I run my DB
    DBaaS costs vs instances
    ● Instances cost: 85%-150% more expensive:
    ○ E.g. RDS vs EC2 is 1.85x
    ○ Plus you need an extra instance (N+1) for high availability
    ○ Estimate price overhead as 1.8*(N+1)/N → N the number of instances
    ● Storage costs:
    ○ AWS: higher cost on RDS (gp2, gp3 overpriced vs EC2)
    ○ Pay separately for I/O ops (e.g. Aurora)
    ● For comparison: on-prem or Kubernetes: same as with cloud instances

    View Slide

  25. Where should I run my DB
    Do DBaaS allow you to “forget about the DB”?
    ● What do they provide:
    ○ High availability with automated failover
    ○ Automated backups
    ○ Monitoring
    ○ Typically a bit of database parameter tuning
    ● But what they don’t:
    ○ No database support (not infra support, I mean db support!)
    ○ Deep parameter tuning. Query tuning. DDL tuning.
    ○ Day 2 operations like bloat removal, reindex, etc.

    View Slide

  26. Where should I run my DB
    DBaaS limitations
    ● Limited number of Postgres extensions:
    ○ RDS: 80
    ○ E.g. StackGres: 130+, adding new every week
    ○ No/few clouds support Timescale (Apache + TSL) or Citus
    ● Connection pooling:
    ○ RDS: not by default, additional cost (RDS Proxy).
    ○ Other DBaaS not even an option.
    ● Limited automation for “Day 2 operations”
    ● Vendor lock-in (how easy is to take your data out?)

    View Slide

  27. Where should I run my DB
    Q & A
    Alvaro Hernandez
    @ahachete

    View Slide