Slide 1

Slide 1 text

Databases on Kubernetes: Yay or Nay? Where should I run my database? Databases on Kubernetes? Alvaro Hernandez @ahachete

Slide 2

Slide 2 text

Databases on Kubernetes: Yay or Nay? ` whoami ` Alvaro Hernandez aht.es ● Founder & CEO, OnGres ● 20+ years Postgres user and DBA ● Mostly doing R&D to create new, innovative software on Postgres ● More than 120 tech talks, most about Postgres ● Founder and President of the NPO Fundación PostgreSQL ● AWS Data Hero

Slide 3

Slide 3 text

Databases on Kubernetes: Yay or Nay? Where may I run my DB?

Slide 4

Slide 4 text

Databases on Kubernetes: Yay or Nay? Possible options to run your DB ● On-prem (or cloud instances) ● DBaaS (managed service) ● Kubernetes (cloud or on-prem)

Slide 5

Slide 5 text

Databases on Kubernetes: Yay or Nay? Deploying Postgres “on-prem”

Slide 6

Slide 6 text

Databases on Kubernetes: Yay or Nay? apt-get install postgresql # yes but well... # will you deploy this to prod? How to deploy Postgres

Slide 7

Slide 7 text

Databases on Kubernetes: Yay or Nay? OK, we need to tune the database 2-8h Postgres DBA

Slide 8

Slide 8 text

Databases on Kubernetes: Yay or Nay? We need to add connection pooling pg_bench, scale 2000, m4.large (2 vCPU, 8GB RAM, 1k IOPS) 4-16h DevOps / pgDBA

Slide 9

Slide 9 text

Databases on Kubernetes: Yay or Nay? And High Availability! 8-24h DevOps / pgDBA ● HA software (e.g. Patroni) ● Distributed configuration ● Entrypoint: ○ DNS? ○ Virtual IP? ○ External discovery service (e.g. Consul)?

Slide 10

Slide 10 text

Databases on Kubernetes: Yay or Nay? Do you backup your data? 4-16h DevOps ● Backup software (e.g. WAL-G, pgBackRest) ● Backup Storage ● Backups lifecycle management ● Backup testing / restoration

Slide 11

Slide 11 text

Databases on Kubernetes: Yay or Nay? You wouldn’t deploy Postgres without monitoring, would you? 8-24h DevOps / pgDBA

Slide 12

Slide 12 text

Databases on Kubernetes: Yay or Nay? Do you leave Postgres logs on each server? 4-48h DevOps ● Configure CSV logging ● Add a logging agent (e.g. FluentBit) to export logs ● Add a logging collector (e.g. Fluentd) to collect logs, write code to store it and manage lifecycle. ● Or use a paid logs-as-a-Service

Slide 13

Slide 13 text

Databases on Kubernetes: Yay or Nay? Install cluster management software ?h DevOps ??????????????

Slide 14

Slide 14 text

Databases on Kubernetes: Yay or Nay? IaC: Infrastructure as Code 48-96h DevOps

Slide 15

Slide 15 text

Databases on Kubernetes: Yay or Nay? Managed Services (DBaaS)

Slide 16

Slide 16 text

Databases on Kubernetes: Yay or Nay? DBaaS (e.g. RDS) ● They provide great value: ○ High availability with automated failover ○ Automated backups ○ Monitoring ○ Typically a bit of database parameter tuning ● But be aware of what they don’t: ○ No database support (not infra support, I mean db support!) ○ Deep parameter tuning. Query tuning. DDL tuning. ○ Day 2 operations like bloat removal, reindex, etc. ○ ChatGPT is not managing your DB yet!

Slide 17

Slide 17 text

Databases on Kubernetes: Yay or Nay? Be aware of DBaaS costs vs instances ● Good service costs money ● Instances cost: 85%-150% more expensive: ○ E.g. RDS vs EC2 is 1.85x ○ Plus you need an extra instance (N+1) for high availability ○ Estimate price overhead as 1.8*(N+1)/N → N the number of instances ● Storage costs: ○ AWS: higher cost on RDS (gp2, gp3 overpriced vs EC2) ○ Pay separately for I/O ops (e.g. Aurora)

Slide 18

Slide 18 text

Databases on Kubernetes: Yay or Nay? Managed service == you can’t do anything you want ● Not all Postgres extensions are available: ○ RDS: 80 ○ E.g. StackGres: 160+, adding new every week ○ No/few clouds support Timescale (Apache + TSL) or Citus ● Connection pooling: ○ RDS: not by default, additional cost (RDS Proxy). ○ Other DBaaS not even an option. ● Limited automation for “Day 2 operations”

Slide 19

Slide 19 text

Databases on Kubernetes: Yay or Nay? Deploying Postgres on Kubernetes

Slide 20

Slide 20 text

Databases on Kubernetes: Yay or Nay? What Kelsey Hightower thinks https://twitter.com/kelseyhightower/status /1624081136073994240

Slide 21

Slide 21 text

Databases on Kubernetes: Yay or Nay? Meeting Kubernetes half way ● Kelsey Hightower argues that you need to “fight” K8s to run stateful workloads. ● Certainly, a bit. But is doable. ● Operators have done this already. Don’t run databases on Kuberntes “by hand”, use operators.

Slide 22

Slide 22 text

Databases on Kubernetes: Yay or Nay? Deploy a simple cluster with Kubernetes (w/ StackGres) 1h CKA apiVersion: stackgres.io/v1 kind: SGCluster metadata: name: simple spec: instances: 2 postgres: version: 'latest' pods: persistentVolume: size: '100Gi'

Slide 23

Slide 23 text

Databases on Kubernetes: Yay or Nay? Deploy an advanced cluster with Kubernetes (w/ StackGres) 4-16h CKA ● Create YAMLs for several CRDs ● Create Ingress if needed ● Expose Web Console (Ingress/LB) ● Integrate with GitOps

Slide 24

Slide 24 text

Databases on Kubernetes: Yay or Nay? ● Kubernetes also allows to automate Day 2 operations ● CKA is enough, mostly no Postgres expertise needed ● E.g. Day 2 operations implemented in StackGres: ○ Repack ○ Vacuum ○ Repack ○ Minor version upgrade ○ Major version upgrade ○ Controlled restart ○ Benchmark Automating Day 2 operations

Slide 25

Slide 25 text

Databases on Kubernetes: Yay or Nay? Postgres operators for Kubernetes Fully Open Source ● CloudNativePG ● KubeDB ● Kubegres (unmaintained?) ● Percona ● StackGres ● Zalando ● New upcoming operators… ● … Proprietary/paid-for (production) ● Crunchydata ● EnterpriseDB ● Fujitsu ● VMware Tanzu ● …

Slide 26

Slide 26 text

Databases on Kubernetes: Yay or Nay? Operator Feature Matrix https://github.com/dokc/operator-feature-matrix

Slide 27

Slide 27 text

Databases on Kubernetes: Yay or Nay? Demo

Slide 28

Slide 28 text

Databases on Kubernetes: Yay or Nay? Q & A Alvaro Hernandez @ahachete