Running Your Own Postgres-as-a-service in Kubernetes | Linux Open Source Summit 2020

Slide 1

Slide 1 text

@LukasFittl Running Postgres-as-a-Service In Kubernetes

Slide 2

Slide 2 text

@LukasFittl

Slide 3

Slide 3 text

Why (not) run Postgres in Kubernetes?

Slide 4

Slide 4 text

Consistency - Kubernetes manages all workloads, including databases

Slide 5

Slide 5 text

Better Portability - Consistent deployment experience across clouds, instead of relying on cloud-specific APIs

Slide 6

Slide 6 text

Low Latency - Co-locating Compute and Database, allows certain workloads to perform better

Slide 7

Slide 7 text

High Effort - Running anything in Kubernetes is complex, and databases are worse

Slide 8

Slide 8 text

How to deploy Postgres in Kubernetes

Slide 9

Slide 9 text

Postgres

Slide 10

Slide 10 text

Postgres High Availability Scale Out Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 11

Slide 11 text

Zalando Postgres Operator

Slide 12

Slide 12 text

Crunchy Data PostgreSQL Operator

Slide 13

Slide 13 text

PostgreSQL Hyperscale - Azure Arc

Slide 14

Slide 14 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 15

Slide 15 text

K8S Integration CLIs, Operators & API Servers Namespace handling Storage

Slide 16

Slide 16 text

Zalando Postgres Operator Operator Postgres postgresql CRD

Slide 17

Slide 17 text

“pgo” CLI Operator API Server Postgres Crunchy Data PostgreSQL Operator Zalando Postgres Operator Operator Postgres Pgcluster CRD postgresql CRD Pgpolicy CRD Pgreplica CRD Pgtask CRD

Slide 18

Slide 18 text

“pgo” CLI “azdata” CLI Operator API Server Postgres Operator API Server Postgres (Coordinator) PostgreSQL Hyperscale - Azure Arc Postgres (Data Node) Postgres (Data Node) Crunchy Data PostgreSQL Operator Zalando Postgres Operator Operator Postgres Pgcluster CRD postgresql CRD Pgpolicy CRD Pgreplica CRD Pgtask CRD DatabaseService CRD DatabaseServiceTask CRD

Slide 19

Slide 19 text

kind: DatabaseService  metadata:  ...  spec:  docker:  ...  engine:  type: Postgres  version: 12  monitoring:  ...  scale:  shards: 2  scheduling:  default:  resources:  requests:  memory: 256Mi  service:  port: 5432  type: NodePort  storage:  volumeSize: 1Gi kind: postgresql metadata: ... spec: databases: foo: zalando numberOfInstances: 2 postgresql: version: "12" preparedDatabases: bar: {} teamId: acid users: foo_user: [] zalando: - superuser - createdb volume: size: 1Gi kind: Pgcluster metadata: ... spec: ArchiveStorage: ... BackrestStorage: ... PrimaryStorage: ... ReplicaStorage: ... WALStorage: ... ... clustername: hippo database: hippo exporterport: "9187" limits: {} name: hippo namespace: pgo podAntiAffinity: default: preferred port: "5432" primaryhost: hippo replicas: "0" resources: memory: 128Mi rootsecretname: hippo-postgres-secret syncReplication: null tablespaceMounts: {} tlsOnly: false Zalando Postgres Operator Crunchy Data PostgreSQL Operator PostgreSQL Hyperscale - Azure Arc

Slide 20

Slide 20 text

Namespace handling Crunchy Data’ Postgres Operator Namespace Modes: dynamic:  Operator can create, delete, update any namespaces and manage RBAC.  Operator requires ClusterRole privilege. readonly:  Namespaces need to be pre-created and RBAC pre-configured.  Operator requires ClusterRole privilege. disabled:  Deploy to single namespace, no ClusterRole privilege required.

Slide 21

Slide 21 text

Storage Generally, expect Persistent Volume Claims (PVCs) to be utilized for the database  storage.    Crunchy Data PostgreSQL Operator also supports table spaces, to utilize different  storage types within the same database server (be careful when using it) Postgres Persistent Volume Persistent Volume Claim Network Storage

Slide 22

Slide 22 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 23

Slide 23 text

Scenario 1: Automated Failover within a K8S cluster K8S Cluster 1 PG1 Primary PG2 Secondary Sync Rep Operator

Slide 24

Slide 24 text

Scenario 1: Automated Failover within a K8S cluster K8S Cluster 1 PG1 Primary PG2 Secondary Operator Node Failure Detect

Slide 25

Slide 25 text

Scenario 1: Automated Failover within a K8S cluster K8S Cluster 1 PG1 Primary PG2 Primary Operator Node Failure Promote

Slide 26

Slide 26 text

Scenario 1: Automated Failover within a K8S cluster K8S Cluster 1 PG1 Secondary PG2 Primary Operator Sync Rep Recover

Slide 27

Slide 27 text

Scenario 2: Disaster Recovery to another K8S cluster K8S Cluster 1 PG1 Primary PG2 Secondary Sync Rep K8S Cluster 2 PG3 Secondary Async Replication Operator Operator

Slide 28

Slide 28 text

Scenario 2: Disaster Recovery to another K8S cluster K8S Cluster 1 PG1 Primary PG2 Secondary Sync Rep K8S Cluster 2 PG3 Primary Operator Operator Large-Scale Data Center Failure Promote

Slide 29

Slide 29 text

High Availability HA within Same K8S Cluster HA across K8S Clusters Zalando  Postgres Operator Built-In Manual Crunchy Data PostgreSQL Operator Built-In Manual PostgreSQL Hyperscale  - Azure Arc Built-In Manual

Slide 30

Slide 30 text

Pod Anti-Affinity label: failure-domain.beta.kubernetes.io/region=westus2 failure-domain…/zone=0 failure-domain…/zone=1 failure-domain…/zone=2

Slide 31

Slide 31 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 32

Slide 32 text

Backups Local Volume Backups Point-in-time-Restore Offsite Backups Zalando  Postgres Operator n/a Built-in  (wal-e) Built-in  (wal-e) Crunchy Data PostgreSQL Operator Built-in (pgBackRest) Built-in Built-in  (Amazon S3) PostgreSQL Hyperscale  - Azure Arc Built-in Built-in Built-in (K8S Volume Mount)

Slide 33

Slide 33 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 34

Slide 34 text

Monitoring Metrics Logs Zalando  Postgres Operator not built in not built in Crunchy Data PostgreSQL Operator Grafana Built-in pgbadger PostgreSQL Hyperscale  - Azure Arc Grafana  + Azure Monitor Kibana  +  Azure Log Analytics

Slide 35

Slide 35 text

PostgreSQL Hyperscale - Azure Arc

Slide 36

Slide 36 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 37

Slide 37 text

Connection Pooling Postgres pgbouncer Application pgbouncer is important for idle connection scaling in Postgres Idle connection in Postgres: 5-10MB  Idle connection in pgbouncer: <1MB

Slide 38

Slide 38 text

Connection Pooling Pgbouncer Zalando  Postgres Operator Built-in Crunchy Data PostgreSQL Operator Built-in PostgreSQL Hyperscale  - Azure Arc Planned

Slide 39

Slide 39 text

Postgres High Availability Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration

Slide 40

Slide 40 text

Scaling Capabilities Read Replicas: Help you scale the read performance; Max data size = max storage size per node Postgres Postgres (Read Only) Postgres (Read Only) Application

Slide 41

Slide 41 text

Scaling Capabilities Hyperscale (Citus): Scales both read and write performance; Max data size = # of data nodes * storage size per node Postgres (Coordinator) Postgres (Data Node) Postgres (Data Node) Application

Slide 42

Slide 42 text

Scaling Up Grow to 100’s of database nodes,  without re-architecting your application Block growth on 1 (monolithic) database vs. 18 Total Nodes Scaling Out