3 A brief history of PostgreSQL at Zalando Live DEMO (what can possibly go wrong?) How to stop worrying (and embrace Patroni) Kubernetes: the real thing What is in the name: Postgres Operator SELECT topics FROM agenda; Kubernetes-native Patroni
4 WE BRING FASHION TO PEOPLE IN 17 COUNTRIES ● In Italy since 2013 ● 2018: Czech Republic, Ireland ● HQ in Berlin ● Tech HUBS: Dortmund, Helsinki, Dublin and Lisbon
5 PostgreSQL at Zalando as at May 2017 > 300 In the data centers > 170 Databases on AWS Managed by DB team > 250 Databases in other Kubernetes clusters > 165 Run in the ACID’s Kubernetes cluster
15 Spilo Docker image at Zalando • PGDATA on an external volume (EBS or i3/c5 NVME) • Environment-variables based configuration • One container per one EC2 instance • PostgreSQL versions from 9.4 up to 10 • Plenty of extensions (all contrib, PostGIS, timescaleDB, PL/V8, pg_cron, etc) • Additional tools (pgbouncer, pgq) • Extremely lightweight (69MB)
17 Cluster Security Group Auto-Scaling Availability Zone A Data Volume Root volume Master Elastic IP Cloud Formation Stack Replica DB Availability Zone B Data Volume Root volume Master DB Availability Zone C Data Volume Root volume Replica DB Replica ELB Security Group Replica Elastic Load Balancer 5432 5432, 8008 GET /replica db.zalando db-repl.zalando S3 bucket: Backup + WAL User Data: - Docker image - Backup schedule - Superuser password - Replication password - Postgres parameters Etcd
19 What is Patroni • Automatic failover solution for PostgreSQL streaming-replication • A daemon that manages one PostgreSQL instance • Keeps the state of the cluster in a DCS (Etcd, Zookeeper, Consul, Kubernetes), also referred to as a consistency layer • For new instances decides whether to initialize a new cluster or join an existing one • For running instances executes promotion/demotion when necessary • A number of additional related functions (global configuration, scheduled actions, pause mode, pg_rewind support, etc)
20 What Patroni is not • Not an arbiter for the whole HA cluster • Not a swiss-army knife of Postgres maintenance • Not a substitute for a proper monitoring • Not a tool to use if you don’t understand how Etcd (or another DCS that you use) works. • Not a silver bullet (but tries to balance easy-to-use vs extensibility) • Not just an internal project of Zalando (IBM Compose, Red Hat and many other companies use it)
23 • A set of open-source components running on one or more servers • A container orchestration system • An abstraction layer over your real or virtualized hardware • An “infrastructure as code” system • Automatic resource allocation • Next step after Ansible/Chef/Puppet What is Kubernetes?
24 • An operating system • A magical way to make your infrastructure scalable • An excuse to fire your devops (someone has to configure it) • A good solution for running 2-3 servers What Kubernetes is not?
27 Building a PostgreSQL cluster on Kubernetes • A statefulset to bind pods with persistent volumes and provide auto-recovery • A service to route client connections • Spilo as a docker container (Patroni + PostgreSQL) for HA • Secrets to store database user passwords
28 • At least four long YAML manifests to write • Different parts of PostgreSQL configuration spread over multiple manifests • No easy way to work with a cluster as a whole (update, delete) • Manual generation of DB objects, i.e. users, and their passwords. Manual deployment of HA PostgreSQL cluster on Kubernetes
29 • A template for your manifests • Only one place to fill-in deployment-related values • Requires running a special pod (tiller) in your Kubernetes cluster github.com/kubernetes/charts/blob/master/incubator/patroni Initial approach to automation: HELM
30 • Implement a controller application to act on custom resources • CRD (custom resource definitions) to describe a domain-specific object (i.e. a Postgres cluster) • Encapsulates knowledge of a human operating the service https://coreos.com/blog/introducing-operators.html Kubernetes operator pattern
34 Just a piece of cake • Operator starts pods with Spilo docker image • Operator provides environment variables to Spilo • Operator makes sure all Kubernetes objects are in sync • Spilo generates Patroni configuration • Patroni creates roles and configures PostgreSQL • Patroni makes sure there is only one master • Patroni uses Kubernetes for cluster state and leader lock • Patroni creates roles and applies configuration • Patroni changes service endpoints on failover
37 • External dependency: Etcd • Etcd should always be available • Rock-solid Etcd vs Kubernetes cluster with frequent upgrades • There is already an Etcd deployed for Kubernetes Patroni with Etcd in Kubernetes $ etcdctl ls --recursive --sort -p /service/batman /service/batman/config /service/batman/history /service/batman/initialize /service/batman/leader /service/batman/members/ /service/batman/members/postgresql0 /service/batman/members/postgresql1 /service/batman/optime/ /service/batman/optime/leader $ etcdctl get /service/batman/leader postgresql1 $ etcdctl get /service/batman/members/postgresql1 {"conn_url":"postgres://127.0.0.1:5433/postgres","api_ur l":"http://127.0.0.1:8009/patroni","state":"running","ro le":"master","xlog_location":50476648,"timeline":2} $ etcdctl get /service/batman/history [[1,50331744,"no recovery target specified","2018-01-18T16:04:46+01:00"]]
38 Kubernetes-native Patroni • Use Kubernetes as a DCS • Patroni keys as Kubernetes metadata (pods, configmaps or endpoints) • ResourceVersions for compare-and-set • “Soft” TTL
41 Should you run your PostgreSQL clusters on Kubernetes Strong interest in the community • Zalando Postgres Operator • CrunchyData Postgres Operator • Red Hat Project Atomic • KubeDB • Project Habitat
43 ● Detect the to-be-decommissioned node by lack of the ready label and SchedulingDisalbed status ● Move all master pods: ○ move replicas to the already updated node ○ Failover to those replicas ● Pod Disruption Budget to prevent killing nodes with at least one primary. ● Anti-affinity to prevent scheduling pods on “not-ready” nodes Dealing with Kubernetes updates
44 Why not AWS RDS or Aurora PostgreSQL Not an easy answer :) Full control • Independent of cloud provider • Real super user available • Custom extensions, PAM • Streaming/WAL replication in and out • Local storage not supported on RDS (NVMe SSDs) Costs? Cost of development? ...
46 PostgreSQL monitoring with bg_mon ● PostgreSQL allows a custom child process called background worker ● Background workers in PostgreSQL can attach to the database. ● Background workers in PostgreSQL can do everything else. ● Let’s make (an open-source) one that emits top-line DB statistics via the REST API https://github.com/CyberDem0n/bg_mon
47 ● PAM module written in C ● Open-source: https://github.com/CyberDem0n/pam-oauth2 ● Equivalent of arbitrary-long automatically generated, auto-expiring passwords. ● Can supply arbitrary key=value pairs to check in the OAuth response (i.e. realm=/employees) OAUTH2 PAM authentication
49 Made possible by great people inside and outside of Zalando Patroni and Spilo: github.com/zalando/patroni, github.com/zalando/spilo Alexander Kukushkin, Ants Aasma, Feike Steenbergen, Josh Berkus Postgres Operator: github.com/zalando-incubator/postgres-operator Murat Kabilov, Sergey Dudoladov, Manuel Gómez, PAM Oauth2: https://github.com/CyberDem0n/pam-oauth2 Alexander Kukushkin Put it all together in a sane way: Jan Mußler