Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PGDay.IT 2018

PGDay.IT 2018

PostgreSQL database as a service on Kubernetes

Oleksii Kliukin

June 29, 2018
Tweet

More Decks by Oleksii Kliukin

Other Decks in Programming

Transcript

  1. PostgreSQL
    Database
    as a Service
    with Kubernetes
    PGDay.IT 2018
    Lazise, Italy
    Oleksii Kliukin
    29-06-2018

    View Slide

  2. 2
    Oleksii Kliukin
    Database Engineer @ Zalando
    PostgreSQL Contributor
    Berlin PostgreSQL meetup organizer
    [email protected]
    twitter: @hintbits
    About me

    View Slide

  3. 3
    A brief history of PostgreSQL at Zalando
    Live DEMO (what can possibly go wrong?)
    How to stop worrying (and embrace Patroni)
    Kubernetes: the real thing
    What is in the name: Postgres Operator
    SELECT topics FROM agenda;
    Kubernetes-native Patroni

    View Slide

  4. 4
    WE BRING FASHION TO PEOPLE IN 17 COUNTRIES
    ● In Italy since 2013
    ● 2018: Czech Republic, Ireland
    ● HQ in Berlin
    ● Tech HUBS:
    Dortmund, Helsinki, Dublin and
    Lisbon

    View Slide

  5. 5
    PostgreSQL at Zalando
    as at May 2017
    > 300
    In the data centers
    > 170
    Databases on AWS
    Managed by DB team
    > 250
    Databases in other
    Kubernetes clusters
    > 165
    Run in the ACID’s
    Kubernetes cluster

    View Slide

  6. 6
    Why PostgreSQL

    View Slide

  7. Strong consistency

    View Slide

  8. Sophisticated transactional system
    (true serializability, transactional DDL)

    View Slide

  9. Extensibility (custom data types, indexes,
    even server processes)

    View Slide

  10. Excellent community!

    View Slide

  11. 11
    Brief history of
    PostgreSQL at Zalando

    View Slide

  12. 12
    Let’s start with names
    ` `
    primary
    standby standby
    PostgreSQL (HA) cluster
    stream
    ing
    replication
    template0
    postgres
    template1
    PostgreSQL instance

    View Slide

  13. 13
    Vintage (DC) and modern (AWS) PostgreSQL environments
    DC1
    DC2
    (one hour delay)
    NFS
    WAL
    archive
    AWS VPC

    View Slide

  14. Should you run your PostgreSQL inside a
    container?

    View Slide

  15. 15
    Spilo Docker image at Zalando
    • PGDATA on an external volume (EBS or i3/c5 NVME)
    • Environment-variables based configuration
    • One container per one EC2 instance
    • PostgreSQL versions from 9.4 up to 10
    • Plenty of extensions (all contrib, PostGIS, timescaleDB, PL/V8,
    pg_cron, etc)
    • Additional tools (pgbouncer, pgq)
    • Extremely lightweight (69MB)

    View Slide

  16. github.com/zalando/spilo

    View Slide

  17. 17
    Cluster Security Group
    Auto-Scaling
    Availability Zone A
    Data Volume
    Root volume
    Master Elastic IP
    Cloud Formation Stack
    Replica
    DB
    Availability Zone B
    Data Volume
    Root volume
    Master DB
    Availability Zone C
    Data Volume
    Root volume
    Replica
    DB
    Replica ELB
    Security Group
    Replica Elastic
    Load Balancer
    5432
    5432, 8008
    GET /replica
    db.zalando
    db-repl.zalando
    S3 bucket:
    Backup + WAL
    User Data:
    - Docker image
    - Backup schedule
    - Superuser password
    - Replication password
    - Postgres parameters
    Etcd

    View Slide

  18. Patroni is a secret ingredient to make it all work

    View Slide

  19. 19
    What is Patroni
    • Automatic failover solution for PostgreSQL
    streaming-replication
    • A daemon that manages one PostgreSQL instance
    • Keeps the state of the cluster in a DCS (Etcd, Zookeeper,
    Consul, Kubernetes), also referred to as a consistency layer
    • For new instances decides whether to initialize a new cluster or
    join an existing one
    • For running instances executes promotion/demotion when
    necessary
    • A number of additional related functions (global configuration,
    scheduled actions, pause mode, pg_rewind support, etc)

    View Slide

  20. 20
    What Patroni is not
    • Not an arbiter for the whole HA cluster
    • Not a swiss-army knife of Postgres maintenance
    • Not a substitute for a proper monitoring
    • Not a tool to use if you don’t understand how Etcd (or another
    DCS that you use) works.
    • Not a silver bullet (but tries to balance easy-to-use vs
    extensibility)
    • Not just an internal project of Zalando (IBM Compose, Red Hat
    and many other companies use it)

    View Slide

  21. 21
    Why distributed consistency?
    Etcd cluster
    Primary
    candidate
    Primary
    candidate
    Take leader
    Take leader
    Primary
    candidate
    Take
    leader

    View Slide

  22. github.com/zalando/patroni

    View Slide

  23. 23
    • A set of open-source components running on one or
    more servers
    • A container orchestration system
    • An abstraction layer over your real or virtualized
    hardware
    • An “infrastructure as code” system
    • Automatic resource allocation
    • Next step after Ansible/Chef/Puppet
    What is Kubernetes?

    View Slide

  24. 24
    • An operating system
    • A magical way to make your infrastructure scalable
    • An excuse to fire your devops (someone has to
    configure it)
    • A good solution for running 2-3 servers
    What Kubernetes is not?

    View Slide

  25. 25
    Kubernetes
    • Node
    • Pod
    • Container
    • Persistent Volumes
    • Service/Endpoint
    • Labels
    • Secrets
    Terminology: traditional DC compared to Kubernetes
    Traditional infrastructure
    • Physical server
    • Virtual machine
    • Individual application
    • NAS/SAN
    • Load balancer
    • Application registry/hardware information
    • Password files

    View Slide

  26. 26
    Declarative resource description (manifest)
    apiVersion: v1
    kind: Service
    metadata:
    name: nginx
    labels:
    app: nginx
    spec:
    ports:
    - port: 80
    name: web
    clusterIP: None
    selector:
    app: nginx

    View Slide

  27. 27
    Building a PostgreSQL cluster on Kubernetes
    • A statefulset to bind pods with
    persistent volumes and provide
    auto-recovery
    • A service to route client connections
    • Spilo as a docker container (Patroni
    + PostgreSQL) for HA
    • Secrets to store database user
    passwords

    View Slide

  28. 28
    • At least four long YAML manifests to write
    • Different parts of PostgreSQL configuration
    spread over multiple manifests
    • No easy way to work with a cluster as a whole
    (update, delete)
    • Manual generation of DB objects, i.e. users, and
    their passwords.
    Manual deployment of HA PostgreSQL cluster on Kubernetes

    View Slide

  29. 29
    • A template for your manifests
    • Only one place to fill-in deployment-related values
    • Requires running a special pod (tiller) in your Kubernetes
    cluster
    github.com/kubernetes/charts/blob/master/incubator/patroni
    Initial approach to automation: HELM

    View Slide

  30. 30
    • Implement a controller application to act on custom
    resources
    • CRD (custom resource definitions) to describe a
    domain-specific object (i.e. a Postgres cluster)
    • Encapsulates knowledge of a human operating the service
    https://coreos.com/blog/introducing-operators.html
    Kubernetes operator pattern

    View Slide

  31. 31
    • Defines a custom Postgres resource
    • Watches instances of Postgres, creates/updates/deletes
    corresponding Kubernetes objects
    • Allows updating running-cluster resources (memory, cpu,
    volumes), postgres configuration
    • Creates databases, users and automatically generates
    passwords
    • Auto-repairs, smart rolling updates (switchover to replicas
    before updating the master)
    Zalando Postgres operator

    View Slide

  32. 32
    github.com/zalando-incubator/postgres-operator
    postgres-operator.readthedocs.io
    Zalando Postgres operator

    View Slide

  33. 33
    Simple Postgres manifest
    apiVersion: "acid.zalan.do/v1"
    kind: postgresql
    metadata:
    name: acid-minimal-cluster
    spec:
    teamId: "ACID"
    volume:
    size: 1Gi
    numberOfInstances: 2
    users:
    # database owner
    zalando:
    - superuser
    - createdb
    # role for application foo
    foo_user:
    #databases: name->owner
    databases:
    foo: zalando
    postgresql:
    version: "10"

    View Slide

  34. 34
    Just a piece of cake
    • Operator starts pods with Spilo docker image
    • Operator provides environment variables to Spilo
    • Operator makes sure all Kubernetes objects are
    in sync
    • Spilo generates Patroni configuration
    • Patroni creates roles and configures PostgreSQL
    • Patroni makes sure there is only one master
    • Patroni uses Kubernetes for cluster state and
    leader lock
    • Patroni creates roles and applies configuration
    • Patroni changes service endpoints on failover

    View Slide

  35. deploy
    cluster
    manifest
    Stateful set
    Spilo pod
    Kubernetes cluster
    PATRONI
    operator
    pod
    Endpoint
    Service
    Client
    application
    operator
    config
    map
    Cluster
    secrets
    DB
    deployer
    create
    create
    create
    watch
    Infrastructure
    roles

    View Slide

  36. We had to get rid of the dependency on
    Etcd in Patroni

    View Slide

  37. 37
    • External dependency: Etcd
    • Etcd should always be available
    • Rock-solid Etcd vs Kubernetes
    cluster with frequent upgrades
    • There is already an Etcd
    deployed for Kubernetes
    Patroni with Etcd in Kubernetes
    $ etcdctl ls --recursive --sort -p /service/batman
    /service/batman/config
    /service/batman/history
    /service/batman/initialize
    /service/batman/leader
    /service/batman/members/
    /service/batman/members/postgresql0
    /service/batman/members/postgresql1
    /service/batman/optime/
    /service/batman/optime/leader
    $ etcdctl get /service/batman/leader
    postgresql1
    $ etcdctl get /service/batman/members/postgresql1
    {"conn_url":"postgres://127.0.0.1:5433/postgres","api_ur
    l":"http://127.0.0.1:8009/patroni","state":"running","ro
    le":"master","xlog_location":50476648,"timeline":2}
    $ etcdctl get /service/batman/history
    [[1,50331744,"no recovery target
    specified","2018-01-18T16:04:46+01:00"]]

    View Slide

  38. 38
    Kubernetes-native Patroni
    • Use Kubernetes as a DCS
    • Patroni keys as Kubernetes metadata
    (pods, configmaps or endpoints)
    • ResourceVersions for compare-and-set
    • “Soft” TTL

    View Slide

  39. 39
    $ kubectl get pods acid-minimal-cluster-0 -o jsonpath='{.metadata.annotations}'
    map[status:{"conn_url":"postgres://10.1.1.149:5432/postgres","api_url":"http://10.1.1.149:8008/patroni"
    ,"state":"running","role":"replica","xlog_location":26809088,"timeline":1}]
    $ kubectl get endpoints acid-minimal-cluster -o jsonpath='{.metadata.annotations}'
    map[acquireTime:2018-06-14T10:52:14.617442+00:00 leader:acid-minimal-cluster-1 optime:26809520
    renewTime:2018-06-14T10:52:45.735291+00:00 transitions:2 ttl:30]
    $ kubectl get endpoints acid-minimal-cluster-config -o jsonpath='{.metadata.annotations.config}'
    {"loop_wait":10,"maximum_lag_on_failover":33554432,"postgresql":{"parameters":{"archive_mode":"on","arc
    hive_timeout":"1800s","hot_standby":"on","max_replication_slots":5,"max_wal_senders":5,""wal_level":"
    hot_standby","wal_log_hints":"on"},"use_pg_rewind":true,"use_slots":true},"retry_timeout":10,"ttl":30
    }
    $kubectl get endpoints acid-minimal-cluster-config -o jsonpath='{.metadata.annotations.initialize}'
    6566889706167685175
    Patroni metadata in Kubernetes objects

    View Slide

  40. View Slide

  41. 41
    Should you run your PostgreSQL clusters on Kubernetes
    Strong interest in the
    community
    • Zalando Postgres Operator
    • CrunchyData Postgres Operator
    • Red Hat Project Atomic
    • KubeDB
    • Project Habitat

    View Slide

  42. 42
    ● Kubernetes cluster autoscaler vs multiple AWS AZs and persistent volumes
    ● Frequent Kubernetes cluster upgrades
    ● Switching between multiple independent Kubernetes clusters
    ● Clients outside of the DB Kubernetes cluster
    ● Too many databases in one cluster: AWS request limit exceeded.
    ● Resource management (OOM and noisy neighbour issues)
    ● Docker PID reuse
    Operational challenges

    View Slide

  43. 43
    ● Detect the to-be-decommissioned node by lack of the ready label and
    SchedulingDisalbed status
    ● Move all master pods:
    ○ move replicas to the already updated node
    ○ Failover to those replicas
    ● Pod Disruption Budget to prevent killing nodes with at least one primary.
    ● Anti-affinity to prevent scheduling pods on “not-ready” nodes
    Dealing with Kubernetes updates

    View Slide

  44. 44
    Why not AWS RDS or Aurora PostgreSQL
    Not an easy answer :)
    Full control
    • Independent of cloud provider
    • Real super user available
    • Custom extensions, PAM
    • Streaming/WAL replication in and out
    • Local storage not supported on RDS (NVMe
    SSDs)
    Costs? Cost of development? ...

    View Slide

  45. 45
    • PostgreSQL monitoring with
    bg_mon
    • PAM OAuth2 for
    PostgreSQL
    Bonus projects

    View Slide

  46. 46
    PostgreSQL monitoring with bg_mon
    ● PostgreSQL allows a custom child process called background worker
    ● Background workers in PostgreSQL can attach to the database.
    ● Background workers in PostgreSQL can do everything else.
    ● Let’s make (an open-source) one that emits top-line DB statistics via the
    REST API
    https://github.com/CyberDem0n/bg_mon

    View Slide

  47. 47
    ● PAM module written in C
    ● Open-source: https://github.com/CyberDem0n/pam-oauth2
    ● Equivalent of arbitrary-long automatically generated,
    auto-expiring passwords.
    ● Can supply arbitrary key=value pairs to check in the OAuth
    response (i.e. realm=/employees)
    OAUTH2 PAM authentication

    View Slide

  48. 48
    OAUTH2 PAM authentication
    Operator configuration:
    pam_configuration:
    https://info.example.com/oauth2/tokeninfo?access_token= uid
    realm=/employees
    pam_role_name: users
    Operator sets PAM_OAUTH2 Spilo environment variable, adds a line to pg_hba.conf
    hostssl all +users all pam
    Spilo writes /etc/pam.d/postgresql using PAM_OAUTH2 value.

    View Slide

  49. 49
    Made possible by great people inside and outside of Zalando
    Patroni and Spilo: github.com/zalando/patroni, github.com/zalando/spilo
    Alexander Kukushkin, Ants Aasma, Feike Steenbergen, Josh Berkus
    Postgres Operator: github.com/zalando-incubator/postgres-operator
    Murat Kabilov, Sergey Dudoladov, Manuel Gómez,
    PAM Oauth2: https://github.com/CyberDem0n/pam-oauth2
    Alexander Kukushkin
    Put it all together in a sane way:
    Jan Mußler

    View Slide

  50. 50
    “HIRE THE BEST PEOPLE YOU CAN, AND GET OUT OF THEIR WAY.“

    View Slide

  51. We are hiring Database Engineers
    jobs.zalando.com

    View Slide

  52. Thank you!

    View Slide