Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running Your Own Postgres-as-a-service in Kubernetes | Linux Open Source Summit 2020 | Lukas Fittl

Running Your Own Postgres-as-a-service in Kubernetes | Linux Open Source Summit 2020 | Lukas Fittl

Running a database is hard. And that's especially the case when you need tens or hundreds of databases that should be configured consistently, and have capabilities such as HA, backups, monitoring, and more.

As someone who is part of a team that has built a solution in this space, leveraging Kubernetes for the underlying cluster management, this talk will be an introduction to the tools available today to run open-source databases, and in particular PostgreSQL, at scale.

We'll review what solutions exist today for achieving this, how they interact with Kubernetes concepts such as Custom Resource Definitions (CRDs), and what matters when deciding to pick a solution, or when building your own.

The target audience for this talk is both folks that need a database like PostgreSQL to run at scale, with many servers, as well as those that are just starting out, but may have an interest in topics such as how to provide a database for micro services in a Kubernetes environment.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. @LukasFittl
    Running Postgres-as-a-Service
    In Kubernetes

    View Slide

  2. @LukasFittl

    View Slide

  3. Why (not) run Postgres in Kubernetes?

    View Slide

  4. Consistency - Kubernetes manages all workloads,
    including databases

    View Slide

  5. Better Portability - Consistent deployment experience
    across clouds, instead of relying on cloud-specific APIs

    View Slide

  6. Low Latency - Co-locating Compute and Database,
    allows certain workloads to perform better

    View Slide

  7. High Effort - Running anything in Kubernetes is
    complex, and databases are worse

    View Slide

  8. How to deploy Postgres in Kubernetes

    View Slide

  9. Postgres

    View Slide

  10. Postgres
    High Availability
    Scale Out
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  11. Zalando Postgres Operator

    View Slide

  12. Crunchy Data PostgreSQL Operator

    View Slide

  13. PostgreSQL Hyperscale - Azure Arc

    View Slide

  14. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  15. K8S Integration
    CLIs, Operators & API Servers
    Namespace handling
    Storage

    View Slide

  16. Zalando Postgres Operator
    Operator Postgres
    postgresql CRD

    View Slide

  17. “pgo” CLI
    Operator
    API Server
    Postgres
    Crunchy Data PostgreSQL Operator
    Zalando Postgres Operator
    Operator Postgres
    Pgcluster CRD
    postgresql CRD
    Pgpolicy CRD
    Pgreplica CRD
    Pgtask CRD

    View Slide

  18. “pgo” CLI
    “azdata” CLI
    Operator
    API Server
    Postgres
    Operator
    API Server
    Postgres (Coordinator)
    PostgreSQL Hyperscale - Azure Arc
    Postgres (Data Node)
    Postgres (Data Node)
    Crunchy Data PostgreSQL Operator
    Zalando Postgres Operator
    Operator Postgres
    Pgcluster CRD
    postgresql CRD
    Pgpolicy CRD
    Pgreplica CRD
    Pgtask CRD
    DatabaseService CRD
    DatabaseServiceTask CRD

    View Slide

  19. kind: DatabaseService

    metadata:

    ...

    spec:

    docker:

    ...

    engine:

    type: Postgres

    version: 12

    monitoring:

    ...

    scale:

    shards: 2

    scheduling:

    default:

    resources:

    requests:

    memory: 256Mi

    service:

    port: 5432

    type: NodePort

    storage:

    volumeSize: 1Gi
    kind: postgresql
    metadata:
    ...
    spec:
    databases:
    foo: zalando
    numberOfInstances: 2
    postgresql:
    version: "12"
    preparedDatabases:
    bar: {}
    teamId: acid
    users:
    foo_user: []
    zalando:
    - superuser
    - createdb
    volume:
    size: 1Gi
    kind: Pgcluster
    metadata:
    ...
    spec:
    ArchiveStorage:
    ...
    BackrestStorage:
    ...
    PrimaryStorage:
    ...
    ReplicaStorage:
    ...
    WALStorage:
    ...
    ...
    clustername: hippo
    database: hippo
    exporterport: "9187"
    limits: {}
    name: hippo
    namespace: pgo
    podAntiAffinity:
    default: preferred
    port: "5432"
    primaryhost: hippo
    replicas: "0"
    resources:
    memory: 128Mi
    rootsecretname: hippo-postgres-secret
    syncReplication: null
    tablespaceMounts: {}
    tlsOnly: false
    Zalando Postgres Operator Crunchy Data PostgreSQL Operator PostgreSQL Hyperscale - Azure Arc

    View Slide

  20. Namespace handling
    Crunchy Data’ Postgres Operator Namespace Modes:
    dynamic:

    Operator can create, delete, update any namespaces and manage RBAC.

    Operator requires ClusterRole privilege.
    readonly:

    Namespaces need to be pre-created and RBAC pre-configured.

    Operator requires ClusterRole privilege.
    disabled:

    Deploy to single namespace, no ClusterRole privilege required.

    View Slide

  21. Storage
    Generally, expect Persistent Volume Claims (PVCs) to be utilized for the database

    storage.


    Crunchy Data PostgreSQL Operator also supports table spaces, to utilize different

    storage types within the same database server (be careful when using it)
    Postgres Persistent Volume Persistent Volume Claim Network Storage

    View Slide

  22. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  23. Scenario 1: Automated Failover within a K8S cluster
    K8S Cluster 1
    PG1
    Primary
    PG2
    Secondary
    Sync Rep
    Operator

    View Slide

  24. Scenario 1: Automated Failover within a K8S cluster
    K8S Cluster 1
    PG1
    Primary
    PG2
    Secondary
    Operator
    Node Failure
    Detect

    View Slide

  25. Scenario 1: Automated Failover within a K8S cluster
    K8S Cluster 1
    PG1
    Primary
    PG2
    Primary
    Operator
    Node Failure
    Promote

    View Slide

  26. Scenario 1: Automated Failover within a K8S cluster
    K8S Cluster 1
    PG1
    Secondary
    PG2
    Primary
    Operator
    Sync Rep
    Recover

    View Slide

  27. Scenario 2: Disaster Recovery to another K8S cluster
    K8S Cluster 1
    PG1
    Primary
    PG2
    Secondary
    Sync Rep
    K8S Cluster 2
    PG3
    Secondary
    Async Replication
    Operator Operator

    View Slide

  28. Scenario 2: Disaster Recovery to another K8S cluster
    K8S Cluster 1
    PG1
    Primary
    PG2
    Secondary
    Sync Rep
    K8S Cluster 2
    PG3
    Primary
    Operator Operator
    Large-Scale Data Center Failure
    Promote

    View Slide

  29. High Availability
    HA within Same K8S Cluster HA across K8S Clusters
    Zalando

    Postgres Operator
    Built-In Manual
    Crunchy Data
    PostgreSQL Operator
    Built-In Manual
    PostgreSQL Hyperscale

    - Azure Arc
    Built-In Manual

    View Slide

  30. Pod Anti-Affinity
    label: failure-domain.beta.kubernetes.io/region=westus2
    failure-domain…/zone=0 failure-domain…/zone=1 failure-domain…/zone=2

    View Slide

  31. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  32. Backups
    Local Volume Backups Point-in-time-Restore Offsite Backups
    Zalando

    Postgres Operator
    n/a
    Built-in

    (wal-e)
    Built-in

    (wal-e)
    Crunchy Data
    PostgreSQL Operator
    Built-in
    (pgBackRest)
    Built-in
    Built-in

    (Amazon S3)
    PostgreSQL Hyperscale

    - Azure Arc
    Built-in Built-in
    Built-in
    (K8S Volume Mount)

    View Slide

  33. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  34. Monitoring
    Metrics Logs
    Zalando

    Postgres Operator
    not built in not built in
    Crunchy Data
    PostgreSQL Operator
    Grafana Built-in pgbadger
    PostgreSQL Hyperscale

    - Azure Arc
    Grafana

    +
    Azure Monitor
    Kibana

    +

    Azure Log Analytics

    View Slide

  35. PostgreSQL Hyperscale - Azure Arc

    View Slide

  36. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  37. Connection Pooling
    Postgres
    pgbouncer
    Application
    pgbouncer is important for idle connection scaling in Postgres
    Idle connection in Postgres: 5-10MB

    Idle connection in pgbouncer: <1MB

    View Slide

  38. Connection Pooling
    Pgbouncer
    Zalando

    Postgres Operator
    Built-in
    Crunchy Data
    PostgreSQL Operator
    Built-in
    PostgreSQL Hyperscale

    - Azure Arc
    Planned

    View Slide

  39. Postgres
    High Availability
    Scaling
    Capabilities
    Backups
    Connection
    Pooling
    Monitoring
    K8S
    Integration

    View Slide

  40. Scaling Capabilities
    Read Replicas:
    Help you scale the read performance; Max data size = max storage size per node
    Postgres
    Postgres (Read Only)
    Postgres (Read Only)
    Application

    View Slide

  41. Scaling Capabilities
    Hyperscale (Citus):
    Scales both read and write performance; Max data size = # of data nodes * storage size per node
    Postgres (Coordinator)
    Postgres (Data Node)
    Postgres (Data Node)
    Application

    View Slide

  42. Scaling Up
    Grow to 100’s of database nodes,

    without re-architecting your application
    Block growth on 1
    (monolithic) database
    vs.
    18
    Total Nodes
    Scaling Out

    View Slide

  43. Demo
    Deploying & Scaling out with
    PostgreSQL Hyperscale - Azure Arc

    View Slide

  44. Thank you!
    [email protected]
    @LukasFittl

    View Slide