Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where should I run my Database? Databases on Kubernetes?

OnGres
March 23, 2023

Where should I run my Database? Databases on Kubernetes?

As of today, there are two main ways to run your database: in the cloud, consumed as a service; and self-hosted.

Self-hosting was the only option before cloud; was replaced as the default option by DBaaS; and is now making a comeback. With reason.But would you self-host your database as it was done before? Surely not.
Enter DBaaS-like services on Kubernetes. We will explore:
* What Kelsey Hightower thinks about the topic.
* Why you should use operators for databases on Kubernetes.
* What capabilities databases on Kubernetes provide vs what cloud does.
* How to decide where you should run your database.
* What’s the current landscape of solutions to run your database on Kubernetes.

This talk also featured a short live demo to showcase how to run databases on EKS with StackGres (https://stackgres.io), an open source Postgres operator.

OnGres

March 23, 2023
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. Databases on Kubernetes: Yay or Nay?
    Where should I run my
    database?
    Databases on
    Kubernetes?
    Alvaro Hernandez
    @ahachete

    View full-size slide

  2. Databases on Kubernetes: Yay or Nay?
    ` whoami `
    Alvaro Hernandez

    aht.es
    ● Founder & CEO, OnGres
    ● 20+ years Postgres user and DBA
    ● Mostly doing R&D to create new, innovative
    software on Postgres
    ● More than 120 tech talks, most about Postgres
    ● Founder and President of the NPO Fundación
    PostgreSQL
    ● AWS Data Hero

    View full-size slide

  3. Databases on Kubernetes: Yay or Nay?
    Where may I
    run my DB?

    View full-size slide

  4. Databases on Kubernetes: Yay or Nay?
    Possible options to run your DB
    ● On-prem (or cloud instances)
    ● DBaaS (managed service)
    ● Kubernetes (cloud or on-prem)

    View full-size slide

  5. Databases on Kubernetes: Yay or Nay?
    Deploying Postgres
    “on-prem”

    View full-size slide

  6. Databases on Kubernetes: Yay or Nay?
    apt-get install postgresql
    # yes but well...
    # will you deploy this to prod?
    How to deploy Postgres

    View full-size slide

  7. Databases on Kubernetes: Yay or Nay?
    OK, we need to tune the database
    2-8h
    Postgres DBA

    View full-size slide

  8. Databases on Kubernetes: Yay or Nay?
    We need to add connection pooling
    pg_bench, scale 2000, m4.large
    (2 vCPU, 8GB RAM, 1k IOPS)
    4-16h
    DevOps / pgDBA

    View full-size slide

  9. Databases on Kubernetes: Yay or Nay?
    And High Availability!
    8-24h
    DevOps / pgDBA
    ● HA software (e.g. Patroni)
    ● Distributed configuration
    ● Entrypoint:
    ○ DNS?
    ○ Virtual IP?
    ○ External discovery service (e.g. Consul)?

    View full-size slide

  10. Databases on Kubernetes: Yay or Nay?
    Do you backup your data?
    4-16h
    DevOps
    ● Backup software (e.g. WAL-G, pgBackRest)
    ● Backup Storage
    ● Backups lifecycle management
    ● Backup testing / restoration

    View full-size slide

  11. Databases on Kubernetes: Yay or Nay?
    You wouldn’t deploy Postgres without monitoring, would you?
    8-24h
    DevOps / pgDBA

    View full-size slide

  12. Databases on Kubernetes: Yay or Nay?
    Do you leave Postgres logs on each server?
    4-48h
    DevOps
    ● Configure CSV logging
    ● Add a logging agent (e.g. FluentBit) to export
    logs
    ● Add a logging collector (e.g. Fluentd) to collect
    logs, write code to store it and manage lifecycle.
    ● Or use a paid logs-as-a-Service

    View full-size slide

  13. Databases on Kubernetes: Yay or Nay?
    Install cluster management software
    ?h
    DevOps
    ??????????????

    View full-size slide

  14. Databases on Kubernetes: Yay or Nay?
    IaC: Infrastructure as Code
    48-96h
    DevOps

    View full-size slide

  15. Databases on Kubernetes: Yay or Nay?
    Managed Services
    (DBaaS)

    View full-size slide

  16. Databases on Kubernetes: Yay or Nay?
    DBaaS (e.g. RDS)
    ● They provide great value:
    ○ High availability with automated failover
    ○ Automated backups
    ○ Monitoring
    ○ Typically a bit of database parameter tuning
    ● But be aware of what they don’t:
    ○ No database support (not infra support, I mean db support!)
    ○ Deep parameter tuning. Query tuning. DDL tuning.
    ○ Day 2 operations like bloat removal, reindex, etc.
    ○ ChatGPT is not managing your DB yet!

    View full-size slide

  17. Databases on Kubernetes: Yay or Nay?
    Be aware of DBaaS costs vs instances
    ● Good service costs money
    ● Instances cost: 85%-150% more expensive:
    ○ E.g. RDS vs EC2 is 1.85x
    ○ Plus you need an extra instance (N+1) for high availability
    ○ Estimate price overhead as 1.8*(N+1)/N → N the number of instances
    ● Storage costs:
    ○ AWS: higher cost on RDS (gp2, gp3 overpriced vs EC2)
    ○ Pay separately for I/O ops (e.g. Aurora)

    View full-size slide

  18. Databases on Kubernetes: Yay or Nay?
    Managed service == you can’t do anything you want
    ● Not all Postgres extensions are available:
    ○ RDS: 80
    ○ E.g. StackGres: 160+, adding new every week
    ○ No/few clouds support Timescale (Apache + TSL) or Citus
    ● Connection pooling:
    ○ RDS: not by default, additional cost (RDS Proxy).
    ○ Other DBaaS not even an option.
    ● Limited automation for “Day 2 operations”

    View full-size slide

  19. Databases on Kubernetes: Yay or Nay?
    Deploying Postgres
    on Kubernetes

    View full-size slide

  20. Databases on Kubernetes: Yay or Nay?
    What Kelsey Hightower thinks
    https://twitter.com/kelseyhightower/status
    /1624081136073994240

    View full-size slide

  21. Databases on Kubernetes: Yay or Nay?
    Meeting Kubernetes half way
    ● Kelsey Hightower argues that you need to “fight” K8s to run stateful
    workloads.
    ● Certainly, a bit. But is doable.
    ● Operators have done this already.
    Don’t run databases on Kuberntes “by hand”, use operators.

    View full-size slide

  22. Databases on Kubernetes: Yay or Nay?
    Deploy a simple cluster with Kubernetes (w/ StackGres)
    1h
    CKA
    apiVersion: stackgres.io/v1
    kind: SGCluster
    metadata:
    name: simple
    spec:
    instances: 2
    postgres:
    version: 'latest'
    pods:
    persistentVolume:
    size: '100Gi'

    View full-size slide

  23. Databases on Kubernetes: Yay or Nay?
    Deploy an advanced cluster with Kubernetes (w/ StackGres)
    4-16h
    CKA
    ● Create YAMLs for several CRDs
    ● Create Ingress if needed
    ● Expose Web Console (Ingress/LB)
    ● Integrate with GitOps

    View full-size slide

  24. Databases on Kubernetes: Yay or Nay?
    ● Kubernetes also allows to automate Day 2 operations
    ● CKA is enough, mostly no Postgres expertise needed
    ● E.g. Day 2 operations implemented in StackGres:
    ○ Repack
    ○ Vacuum
    ○ Repack
    ○ Minor version upgrade
    ○ Major version upgrade
    ○ Controlled restart
    ○ Benchmark
    Automating Day 2 operations

    View full-size slide

  25. Databases on Kubernetes: Yay or Nay?
    Postgres operators for Kubernetes
    Fully Open Source
    ● CloudNativePG
    ● KubeDB
    ● Kubegres (unmaintained?)
    ● Percona
    ● StackGres
    ● Zalando
    ● New upcoming operators…
    ● …
    Proprietary/paid-for (production)
    ● Crunchydata
    ● EnterpriseDB
    ● Fujitsu
    ● VMware Tanzu
    ● …

    View full-size slide

  26. Databases on Kubernetes: Yay or Nay?
    Operator Feature Matrix
    https://github.com/dokc/operator-feature-matrix

    View full-size slide

  27. Databases on Kubernetes: Yay or Nay?
    Demo

    View full-size slide

  28. Databases on Kubernetes: Yay or Nay?
    Q & A
    Alvaro Hernandez
    @ahachete

    View full-size slide