Slide 1

Slide 1 text

PostgreSQL High Availability on Kubernetes with Patroni Atmosphere Conference 2018 Oleksii Kliukin 18-06-2018

Slide 2

Slide 2 text

2 Oleksii Kliukin Database Engineer @ Zalando Berlin PostgreSQL meetup organizer [email protected] twitter: @hintbits About me

Slide 3

Slide 3 text

3 A brief history of PostgreSQL at Zalando Live DEMO (what can possibly go wrong?) How to stop worrying (and embrace Patroni) Kubernetes: the real thing What is in the name: Postgres Operator TABLE OF CONTENTS Kubernetes-native Patroni

Slide 4

Slide 4 text

4 WE BRING FASHION TO PEOPLE IN 17 COUNTRIES ● In Poland since 2013 ● 2018: Czech Republic, Ireland ● HQ in Berlin ● Tech HUBS: Dortmund, Helsinki, Dublin and Lisbon

Slide 5

Slide 5 text

5 PostgreSQL at Zalando as at May 2017 > 300 In the data centers > 170 Databases on AWS Managed by DB team > 250 Databases in other Kubernetes clusters > 165 Run in the ACID’s Kubernetes cluster

Slide 6

Slide 6 text

6 Let’s start with names ` ` primary standby standby PostgreSQL (HA) cluster stream ing replication template0 postgres template1 PostgreSQL instance

Slide 7

Slide 7 text

7 Why PostgreSQL

Slide 8

Slide 8 text

Strong consistency

Slide 9

Slide 9 text

Sophisticated transactional system (true serializability, transactional DDL)

Slide 10

Slide 10 text

Extensibility (custom data types, indexes, even server processes)

Slide 11

Slide 11 text

Excellent community!

Slide 12

Slide 12 text

12 Brief history of PostgreSQL at Zalando

Slide 13

Slide 13 text

13 Vintage (DC) and modern (AWS) PostgreSQL environments DC1 DC2 (one hour delay) NFS WAL archive AWS VPC

Slide 14

Slide 14 text

Should you run your PostgreSQL inside a container?

Slide 15

Slide 15 text

15 Spilo Docker image at Zalando • PGDATA on an external volume (EBS or i3/c5 NVME) • Environment-variables based configuration • One container per one EC2 instance • PostgreSQL versions from 9.4 up to 10 • Plenty of extensions (all contrib, PostGIS, timescaleDB, PL/V8, pg_cron, etc) • Additional tools (pgbouncer, pgq) • Extremely lightweight (69MB)

Slide 16

Slide 16 text

github.com/zalando/spilo

Slide 17

Slide 17 text

17 Cluster Security Group Auto-Scaling Availability Zone A Data Volume Root volume Master Elastic IP Cloud Formation Stack Replica DB Availability Zone B Data Volume Root volume Master DB Availability Zone C Data Volume Root volume Replica DB Replica ELB Security Group Replica Elastic Load Balancer 5432 5432, 8008 GET /replica db.zalando db-repl.zalando S3 bucket: Backup + WAL User Data: - Docker image - Backup schedule - Superuser password - Replication password - Postgres parameters Etcd

Slide 18

Slide 18 text

Patroni is a secret ingredient to make it all work

Slide 19

Slide 19 text

19 What is Patroni • Automatic failover solution for PostgreSQL streaming-replication • A daemon that manages one PostgreSQL instance • Keeps the state of the cluster in a DCS (Etcd, Zookeeper, Consul, Kubernetes), also referred to as a consistency layer • For new instances decides whether to initialize a new cluster or join an existing one • For running instances executes promotion/demotion when necessary • A number of additional related functions (global configuration, scheduled actions, pause mode, pg_rewind support, etc)

Slide 20

Slide 20 text

20 What Patroni is not • Not an arbiter for the whole HA cluster • Not a swiss-army knife of Postgres maintenance • Not a substitute for a proper monitoring • Not a tool to use if you don’t understand how Etcd (or another DCS that you use) works. • Not a silver bullet (but tries to balance easy-to-use vs extensibility) • Not just an internal project of Zalando (IBM Compose, Red Hat and many other companies use it)

Slide 21

Slide 21 text

21 Why distributed consistency? Etcd cluster Primary candidate Primary candidate Take leader Take leader Primary candidate Take leader

Slide 22

Slide 22 text

github.com/zalando/patroni

Slide 23

Slide 23 text

23 • A set of open-source components running on one or more servers • A container orchestration system • An abstraction layer over your real or virtualized hardware • An “infrastructure as code” system • Automatic resource allocation • Next step after Ansible/Chef/Puppet What is Kubernetes?

Slide 24

Slide 24 text

24 • An operating system • A magical way to make your infrastructure scalable • An excuse to fire your devops (someone has to configure it) • A good solution for running 2-3 servers What Kubernetes is not?

Slide 25

Slide 25 text

25 Kubernetes • Node • Pod • Container • Persistent Volumes • Service/Endpoint • Labels • Secrets Terminology: traditional DC compared to Kubernetes Traditional infrastructure • Physical server • Virtual machine • Individual application • NAS/SAN • Load balancer • Application registry/hardware information • Password files

Slide 26

Slide 26 text

26 Declarative resource description (manifest) apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: ports: - port: 80 name: web clusterIP: None selector: app: nginx

Slide 27

Slide 27 text

27 Building a PostgreSQL cluster on Kubernetes • A statefulset to bind pods with persistent volumes and provide auto-recovery • A service to route client connections • Spilo as a docker container (Patroni + PostgreSQL) for HA • Secrets to store database user passwords

Slide 28

Slide 28 text

28 • At least four long YAML manifests to write • Different parts of PostgreSQL configuration spread over multiple manifests • No easy way to work with a cluster as a whole (update, delete) • Manual generation of DB objects, i.e. users, and their passwords. Manual deployment of HA PostgreSQL cluster on Kubernetes

Slide 29

Slide 29 text

29 • A template for your manifests • Only one place to fill-in deployment-related values • Requires running a special pod (tiller) in your Kubernetes cluster github.com/kubernetes/charts/blob/master/incubator/patroni Initial approach to automation: HELM

Slide 30

Slide 30 text

30 • Implement a controller application to act on custom resources • CRD (custom resource definitions) to describe a domain-specific object (i.e. a Postgres cluster) • Encapsulates knowledge of a human operating the service https://coreos.com/blog/introducing-operators.html Kubernetes operator pattern

Slide 31

Slide 31 text

31 • Defines a custom Postgres resource • Watches instances of Postgres, creates/updates/deletes corresponding Kubernetes objects • Allows updating running-cluster resources (memory, cpu, volumes), postgres configuration • Creates databases, users and automatically generates passwords • Auto-repairs, smart rolling updates (switchover to replicas before updating the master) Zalando Postgres operator

Slide 32

Slide 32 text

32 github.com/zalando-incubator/postgres-operator postgres-operator.readthedocs.io Zalando Postgres operator

Slide 33

Slide 33 text

33 Simple Postgres manifest apiVersion: "acid.zalan.do/v1" kind: postgresql metadata: name: acid-minimal-cluster spec: teamId: "ACID" volume: size: 1Gi numberOfInstances: 2 users: # database owner zalando: - superuser - createdb # role for application foo foo_user: #databases: name->owner databases: foo: zalando postgresql: version: "10"

Slide 34

Slide 34 text

34 Just a piece of cake • Operator starts pods with Spilo docker image • Operator provides environment variables to Spilo • Operator makes sure all Kubernetes objects are in sync • Spilo generates Patroni configuration • Patroni creates roles and configures PostgreSQL • Patroni makes sure there is only one master • Patroni uses Kubernetes for cluster state and leader lock • Patroni creates roles and applies configuration • Patroni changes service endpoints on failover

Slide 35

Slide 35 text

deploy cluster manifest Stateful set Spilo pod Kubernetes cluster PATRONI operator pod Endpoint Service Client application operator config map Cluster secrets DB deployer create create create watch Infrastructure roles

Slide 36

Slide 36 text

We had to get rid of the dependency on Etcd in Patroni

Slide 37

Slide 37 text

37 • External dependency: Etcd • Etcd should always be available • Rock-solid Etcd vs Kubernetes cluster with frequent upgrades • There is already an Etcd deployed for Kubernetes Patroni with Etcd in Kubernetes $ etcdctl ls --recursive --sort -p /service/batman /service/batman/config /service/batman/history /service/batman/initialize /service/batman/leader /service/batman/members/ /service/batman/members/postgresql0 /service/batman/members/postgresql1 /service/batman/optime/ /service/batman/optime/leader $ etcdctl get /service/batman/leader postgresql1 $ etcdctl get /service/batman/members/postgresql1 {"conn_url":"postgres://127.0.0.1:5433/postgres","api_ur l":"http://127.0.0.1:8009/patroni","state":"running","ro le":"master","xlog_location":50476648,"timeline":2} $ etcdctl get /service/batman/history [[1,50331744,"no recovery target specified","2018-01-18T16:04:46+01:00"]]

Slide 38

Slide 38 text

38 Kubernetes-native Patroni • Use Kubernetes as a DCS • Patroni keys as Kubernetes metadata (pods, configmaps or endpoints) • ResourceVersions for compare-and-set • “Soft” TTL

Slide 39

Slide 39 text

39 $ kubectl get pods acid-minimal-cluster-0 -o jsonpath='{.metadata.annotations}' map[status:{"conn_url":"postgres://10.1.1.149:5432/postgres","api_url":"http://10.1.1.149:8008/patroni" ,"state":"running","role":"replica","xlog_location":26809088,"timeline":1}] $ kubectl get endpoints acid-minimal-cluster -o jsonpath='{.metadata.annotations}' map[acquireTime:2018-06-14T10:52:14.617442+00:00 leader:acid-minimal-cluster-1 optime:26809520 renewTime:2018-06-14T10:52:45.735291+00:00 transitions:2 ttl:30] $ kubectl get endpoints acid-minimal-cluster-config -o jsonpath='{.metadata.annotations.config}' {"loop_wait":10,"maximum_lag_on_failover":33554432,"postgresql":{"parameters":{"archive_mode":"on","arc hive_timeout":"1800s","hot_standby":"on","max_replication_slots":5,"max_wal_senders":5,""wal_level":" hot_standby","wal_log_hints":"on"},"use_pg_rewind":true,"use_slots":true},"retry_timeout":10,"ttl":30 } $kubectl get endpoints acid-minimal-cluster-config -o jsonpath='{.metadata.annotations.initialize}' 6566889706167685175 Patroni metadata in Kubernetes objects

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

41 Should you run your PostgreSQL clusters in on Kubernetes Strong interest in the community • Zalando Postgres Operator • CrunchyData Postgres Operator • Red Hat Project Atomic • KubeDB • Project Habitat

Slide 42

Slide 42 text

42 Why not AWS RDS or Aurora PostgreSQL Not an easy answer :) Full control • Independent of cloud provider • Real super user available • Custom extensions, PAM • Streaming/WAL replication in and out • Local storage not supported on RDS (NVMe SSDs) Costs? Cost of development? ...

Slide 43

Slide 43 text

43 • PostgreSQL monitoring with bg_mon • PAM OAuth2 for PostgreSQL Bonus projects

Slide 44

Slide 44 text

44 PostgreSQL monitoring with bg_mon ● PostgreSQL allows a custom child process called background worker ● Background workers in PostgreSQL can attach to the database. ● Background workers in PostgreSQL can do everything else. ● Let’s make (an open-source) one that emits top-line DB statistics via the REST API https://github.com/CyberDem0n/bg_mon

Slide 45

Slide 45 text

45 ● PAM module written in C ● Open-source: https://github.com/CyberDem0n/pam-oauth2 ● Equivalent of arbitrary-long automatically generated, auto-expiring passwords. ● Can supply arbitrary key=value pairs to check in the OAuth response (i.e. realm=/employees) OAUTH2 PAM authentication

Slide 46

Slide 46 text

46 OAUTH2 PAM authentication Operator configuration: pam_configuration: https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees pam_role_name: users Operator sets PAM_OAUTH2 Spilo environment variable, adds a line to pg_hba.conf hostssl all +users all pam Spilo writes /etc/pam.d/postgresql using PAM_OAUTH2 value.

Slide 47

Slide 47 text

47 Made possible by great people inside and outside of Zalando Patroni and Spilo: github.com/zalando/patroni, github.com/zalando/spilo Alexander Kukushkin, Ants Aasma, Feike Steenbergen, Josh Berkus Postgres Operator: github.com/zalando-incubator/postgres-operator Murat Kabilov, Sergey Dudoladov, Manuel Gómez, PAM Oauth2: https://github.com/CyberDem0n/pam-oauth2 Alexander Kukushkin Put it all together in a sane way: Jan Mußler

Slide 48

Slide 48 text

48 “HIRE THE BEST PEOPLE YOU CAN, AND GET OUT OF THEIR WAY.“

Slide 49

Slide 49 text

We are hiring Database Engineers https://jobs.zalando.com/jobs/570376-database-engineer-postgresql

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Thank you!