Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Step-by-step Patroni cooking guide

Step-by-step Patroni cooking guide

From day one Patroni was positioned as a template for PostgreSQL HA, because it is far from being a one-size-fits-all or plug-and-play system. Setting it up in a test lab is a very simple task, while going to production requires solving problems which are not always typical for DBA:
- How to chose/setup/manage Etcd/Consul/Zookeeper cluster?
- How to do backups/PITR with Patroni?
- How to build/reinitialize replicas without hitting the primary?
- How to manage PostgreSQL configuration with Paroni?
- How to monitor the whole system?
- How to provide a single endpoint for applications?
- Where would be a good place for pgbouncer in this setup?

All these (and many more) questions regularly appear among issues in the Patroni repository and in the #Patroni Slack channel.

In the talk I will guide you through the whole process of setting up a production-ready HA cluster managed by Patroni, starting from Etcd and finishing by the single connection endpoint for applications.

Alexander Kukushkin

March 12, 2024
Tweet

More Decks by Alexander Kukushkin

Other Decks in Technology

Transcript

  1. Agenda • What is Patroni (and how it works) •

    Choosing DCS (Etcd, Consul, Zookeeper or K8s API) • Backup/PITR • Configuration management (Patroni and Postgres) • Providing a single endpoint (+connection pooling) • Monitoring and troubleshooting 3
  2. Distributed Configuration (Key-Value) Store • Consul, Etcd (v2/v3), Zookeeper, Kubernetes

    API • Service Discovery ◦ Every Postgres node maintains a key with its state ◦ Leader key points to the primary • Lease/Session/TTL to expire data (i.e. leader key) • Atomic CAS operations • Watches for important keys (i.e. leader key) 5
  3. Patroni: Normal operation 6 Node A Primary Node B Standby

    Node C Standby /leader: A, ttl: 30 UPDATE /leader, A, ttl=30, prev=A WATCH /leader WATCH /leader SUCCESS
  4. Patroni: primary dies, leader key holds 7 Node A Primary

    Node B Standby Node C Standby /leader: A, ttl: 7 WATCH /leader WATCH /leader BANG!
  5. Patroni: leader key expires 8 Node A Node B Standby

    Node C Standby /leader: A, ttl: 0 NOTIFY /leader, expired=true NOTIFY /leader, expired=true
  6. Patroni: leader race 9 Node A Node B Standby Node

    C Standby Node B: GET http://A:8008/patroni -> failed/timeout GET http://C:8008/patroni -> wal_lsn: 100 Node C: GET http://A:8008/patroni -> failed/timeout GET http://B:8008/patroni -> wal_lsn: 100
  7. Patroni: leader race 10 Node A Node B Standby Node

    C Standby CREATE /leader, B, ttl=30, prevExists=false CREATE /leader, C, ttl=30, prevExists=false FAIL SUCCESS
  8. Patroni: promote and continue replication 11 Node A Node B

    Standby Node C Primary WATCH /leader promote
  9. Choosing DCS • Rule of thumb ◦ pick something what

    is already used in your organization • If nothing - use Etcd (etcd3) ◦ A single Etcd cluster can serve thouthands Patroni clusters 12
  10. DCS, general tips • Make sure hwclock are in sync

    between nodes! • Run DCS on nodes different from Patroni/Postgres ◦ They are critical to I/O latency ◦ If can’t - separate Postgres and DCS volumes • Protect your DCS ◦ Server/client certificates + passwords ◦ When possible - RBAC 13
  11. 14 • Don’t run Patroni on Consul server nodes ◦

    Consul immediately invalidates sessions when node experience network issues • Don’t set consul.checks (default) ◦ Session is invalidated if check fails • Leader key is removed when session is invalidated ◦ Failover!
  12. Kubernetes • Prefer Endpoints to ConfigMaps ◦ Leader Endpoint subsets

    are atomically updated with the leader Pod IP address • Can use external DCS, but you’ll have to implement callbacks to update Pod labels. 15
  13. Enable DCS Failsafe Mode Patroni will keep primary intact when

    DCS is down, but all members of the Patroni/PostgreSQL cluster are up. 16 $ patronictl edit-config --- +++ @@ -4,3 +4,4 @@ use_pg_rewind: true retry_timeout: 10 ttl: 30 +failsafe_mode: on Apply these changes? [y/N]: y Configuration changed Documentation: DCS Failsafe Mode
  14. Backup/PITR • HA != Backup • Use real backup tools

    ◦ pg_basebackup ◦ pgBackRest ◦ wal-g ◦ Barman 17 • Binary backups require continuous archiving! • Set following GUCs: ◦ archive_mode = 'on' ◦ archive_command = '...' ◦ restore_command = '...'
  15. postgresql: create_replica_methods: - pgbackrest # First try pgbackrest - basebackup

    # Try pg_basebackup is pgbackrest failed pgbackrest: command: /usr/bin/pgbackrest --stanza=<scope> --delta restore keep_data: on no_params: on basebackup: max-rate: '100M' Docs . Build replicas from backup 18
  16. postgresql: create_replica_methods: - pgbackrest # First try pgbackrest - basebackup

    # Try pg_basebackup is pgbackrest failed pgbackrest: command: /usr/bin/pgbackrest --stanza=<scope> --delta restore keep_data: on no_params: on basebackup: max-rate: '100M' tags: clonefrom: on # allow to fetch basebackup from this node Docs . Build replicas from backup (or other nodes) 19
  17. Permanent slots 20 $ patronictl edit-config slots: node1: type: physical

    node2: type: physical node3: type: physical • Helps when can’t implement continuous archiving/PITR • Node names should be static • Protects WAL files when nodes are temporary down
  18. Custom bootstrap (PITR) bootstrap: method: <custom_bootstrap_method_name> # instead of initdb

    <custom_bootstrap_method_name>: command: <path_to_custom_bootstrap_script> [param1 [, ...]] keep_existing_recovery_conf: false no_params: false recovery_conf: recovery_target_action: promote recovery_target_timeline: latest recovery_target_time: '2024-03-12 12:00:00 CET' restore_command: <method_specific_restore_command> Custom bootstrap documentation. 21
  19. Configuration management • Global dynamic config (stored in DCS) ◦

    Patroni/Postgres parameters ◦ patronictl edit-config ◦ applied on all Patroni nodes, unless overridden • Local config (patroni.yaml) ◦ Patroni/Postgres parameters • Environment configuration ◦ Only Patroni parameters 22
  20. Configuration management (example) etcd: /config -> patroni.yaml postgresql.conf ALTER SYSTEM

    SET 23 {"postgresql":{"parameters":{"work_mem":"16MB"}}} postgresql: parameters: work_mem: 12MB # Do not edit this file manually! # It will be overwritten by Patroni! include 'postgresql.base.conf' work_mem = '12MB' work_mem TO '24MB'; $ psql -c "show work_mem" work_mem ---------- 24MB (1 row)
  21. Configuration management (exceptions) • Global only Postgres parameters ◦ max_connections

    ◦ max_locks_per_transaction ◦ max_worker_processes ◦ max_prepared_transactions ◦ wal_level ◦ track_commit_timestamp ◦ max_wal_senders ◦ max_replication_slots ◦ wal_keep_size 24 • Global only Patroni parameters ◦ ttl ◦ loop_wait ◦ retry_timeout ◦ maximum_lag_on_failover ◦ synchronous_mode ◦ failsafe_mode ◦ primary_start_timeout ◦ max_timelines_history ◦ check_timeline ◦ postgresql.use_slots ◦ slots ◦ ignore_slots ◦ standby_cluster
  22. Example: change max_connections $ patronictl edit-config \ -p max_connections=101 --force

    --- +++ @@ -2,7 +2,7 @@ maximum_lag_on_failover: 1048576 postgresql: parameters: - max_connections: 100 + max_connections: 101 # increase! use_pg_rewind: true primary_start_timeout: 0 retry_timeout: 10 Configuration changed 25 # when increasing max_connections restart standby nodes first $ patronictl restart batman node2 # standby first $ patronictl restart batman node1 # primary # when decreasing max_connections restart primary first $ patronictl restart batman node1 # primary first $ patronictl restart batman node2 # standby
  23. Bootstrap configuration # The bootstrap configuration. Works only when the

    cluster is not yet initialized. # If the cluster is already initialized, all changes in the `bootstrap` section are ignored! bootstrap: # This section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster # and all other cluster members will use it as a `global configuration`. # WARNING! If you want to change any of the parameters that were set up # via `bootstrap.dcs` section, please use `patronictl edit-config`! dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 postgresql: parameters: max_connections: 150 26
  24. Security aspects of configuration • Protect Patroni REST API ◦

    Enable SSL ◦ Enable basic auth, or better client certs • pg_hba, archive_command, restore_command, archive_cleanup, callbacks, and similar – better to have in local config ◦ think about attacker having access to DCS 28
  25. Connection routing 29 Node A Primary Node B Standby Node

    C Primary BANG! Application Application Application Application error: connection to server at "10.0.0.1", port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? Application
  26. Connection routing options • List all nodes in the connection

    string: ◦ jdbc:postgresql://node1,node2,node3/postgres?targetServerType=primary ◦ postgresql://host1:port2,host2:port2/?target_session_attrs=read-write • Cybertec vip-manager • postgresql.callbacks scripts ◦ on_start, on_stop, on_restart, on_role_change ◦ Update DNS, assign/remove VIP 30
  27. Connection routing via middleware (LB) 31 Node A Primary Node

    B Standby Node C Primary BANG! Application Application Application Application Application
  28. Load-balancing basics 32 restapi: listen: '*:8008' connect_address: '10.0.0.1:8008' postgresql: listen:

    '127.0.0.1,10.0.0.1:5432' connect_address: '10.0.0.1:5432' $ curl --head http://node1:8008/standby HTTP/1.0 503 Service Unavailable $ curl --head http://node1:8008/primary HTTP/1.0 200 OK global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen my_cluster_primary bind *:5000 option httpchk HEAD /primary http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1_5432 node1:5432 maxconn 100 check port 8008 server node2_5432 node2:5432 maxconn 100 check port 8008 listen my_cluster_standby bind *:5001 option httpchk HEAD /standby # Load-balance between standby nodes http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1_5432 node1:5432 maxconn 100 check port 8008 server node2_5432 node2:5432 maxconn 100 check port 8008
  29. Making load-balancing dynamic • HAProxy + confd to manage HAProxy

    and configs from DCS ◦ Sample configs and templates listen my_cluster_primary bind *:5000 option httpchk HEAD /primary http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions {{range gets "/members/*"}} server {{base .Key}} {{$data := json .Value}}{{base (replace (index (split $data.conn_url "/") 2) "@" "/" -1)}} maxconn 100 check port {{index (split (index (split $data.api_url "/") 2) ":") 1}} {{end}} 33
  30. What if HAProxy fails? • Run HAProxy on multiple nodes

    ◦ Next to Patroni/Postgres, or dedicated nodes • List all IPs in DNS A record (2nd level balancing) ◦ Beware of DNS caches! • Use keepalived 34
  31. Connection pooling, option 1 35 35 Node A Primary Node

    B Standby Node C Standby PgBouncer PgBouncer PgBouncer Application HAProxy pool
  32. Connection pooling, option 2 36 36 Node A Primary Node

    B HAProxy pool Standby Node C Standby Application PgBouncer PgBouncer
  33. Connection pooling PgBouncer in front of HAProxy • fewer connection

    between HAProxy and PostgreSQL • switchover without errors (PgBouncer pause/resume) 37 PgBouncer in front of PostgreSQL • smaller latency between PgBouncer and PostgreSQL
  34. patronictl • helps to manage Patroni cluster(s) ◦ manual failover/switchover

    ◦ check status ◦ edit configuration • can work on a node different from Patroni ◦ just needs a config file • Please read patronictl documentation! 38
  35. Monitoring • Patroni doesn’t replace your monitoring solution ◦ It

    doesn’t react on overloaded CPU, increased I/O, slow queries or similar • Patroni/Postgres should be monitored (on every node!) ◦ Monitoring endpoint: GET /patroni (JSON) ◦ Prometheus endpoint: GET /metrics 39
  36. Monitoring Patroni cluster $ curl -s http://node1:8008/patroni | jq .

    { "state": "running", "postmaster_start_time": "2024-02-22 11:19:50.056018+01:00", "role": "master", "server_version": 150006, "xlog": {"location": 4845225536}, "timeline": 18, "replication": [ { "usename": "replicator", "application_name": "postgresql1", "client_addr": "10.0.0.2", "state": "streaming", "sync_state": "async", "sync_priority": 0 } ], "dcs_last_seen": 1708678958, "database_system_identifier": "7333181567713260675", "patroni": { "version": "3.2.2", "scope": "batman", "name": "postgresql0" } } 40 $ curl -s http://node2:8008/patroni | jq . { "state": "running", "postmaster_start_time": "2024-02-23 10:01:54.222365+01:00", "role": "replica", "server_version": 150006, "xlog": { "received_location": 4845225536, "replayed_location": 4845225536, "replayed_timestamp": null, "paused": false }, "timeline": 18, "replication_state": "streaming", "dcs_last_seen": 1708678958, "database_system_identifier": "7333181567713260675", "patroni": { "version": "3.2.2", "scope": "batman", "name": "postgresql1" } }
  37. Troubleshooting • Check logs on ALL nodes, Patroni and Postgres

    ◦ Not just latest logs, but from the time when the problem started! • If log.dir is used: ◦ Check syslog for additional logs from postgres/pg_basebackup/etc 41
  38. Troubleshooting and debugging • Try to reproduce problem on latest

    Patroni version • Disable Patroni service and start Patroni from terminal ◦ With debug logs: log.level: DEBUG ◦ Without log.dir! 42
  39. Ansible • github.com/vitabaks/postgresql_cluster ◦ Etcd, Consul ◦ Supports in-place major

    upgrades! • Pigsty (PostgreSQL In Great STYle) ◦ Etcd ◦ Monitoring, Dashboards, and all the things 43