Step-by-step Patroni cooking guide

Step-by-step Patroni cooking guide Alexander Kukushkin Nordic PGDay 2024, Oslo
2024-03-12

About me Alexander Kukushkin Principal Software Engineer @Microsoft The Patroni
guy [email protected] Twitter: @cyberdemn 2

Agenda • What is Patroni (and how it works) •
Choosing DCS (Etcd, Consul, Zookeeper or K8s API) • Backup/PITR • Configuration management (Patroni and Postgres) • Providing a single endpoint (+connection pooling) • Monitoring and troubleshooting 3

High availability with Patroni 4 Primary Standby Etcd cluster (Quorum)
I am the leader! Who is the leader?

Distributed Configuration (Key-Value) Store • Consul, Etcd (v2/v3), Zookeeper, Kubernetes
API • Service Discovery ◦ Every Postgres node maintains a key with its state ◦ Leader key points to the primary • Lease/Session/TTL to expire data (i.e. leader key) • Atomic CAS operations • Watches for important keys (i.e. leader key) 5

Patroni: Normal operation 6 Node A Primary Node B Standby
Node C Standby /leader: A, ttl: 30 UPDATE /leader, A, ttl=30, prev=A WATCH /leader WATCH /leader SUCCESS

Patroni: primary dies, leader key holds 7 Node A Primary
Node B Standby Node C Standby /leader: A, ttl: 7 WATCH /leader WATCH /leader BANG!

Patroni: leader key expires 8 Node A Node B Standby
Node C Standby /leader: A, ttl: 0 NOTIFY /leader, expired=true NOTIFY /leader, expired=true

Patroni: leader race 9 Node A Node B Standby Node
C Standby Node B: GET http://A:8008/patroni -> failed/timeout GET http://C:8008/patroni -> wal_lsn: 100 Node C: GET http://A:8008/patroni -> failed/timeout GET http://B:8008/patroni -> wal_lsn: 100

Patroni: leader race 10 Node A Node B Standby Node
C Standby CREATE /leader, B, ttl=30, prevExists=false CREATE /leader, C, ttl=30, prevExists=false FAIL SUCCESS

Patroni: promote and continue replication 11 Node A Node B
Standby Node C Primary WATCH /leader promote

Choosing DCS • Rule of thumb ◦ pick something what
is already used in your organization • If nothing - use Etcd (etcd3) ◦ A single Etcd cluster can serve thouthands Patroni clusters 12

DCS, general tips • Make sure hwclock are in sync
between nodes! • Run DCS on nodes different from Patroni/Postgres ◦ They are critical to I/O latency ◦ If can’t - separate Postgres and DCS volumes • Protect your DCS ◦ Server/client certificates + passwords ◦ When possible - RBAC 13

14 • Don’t run Patroni on Consul server nodes ◦
Consul immediately invalidates sessions when node experience network issues • Don’t set consul.checks (default) ◦ Session is invalidated if check fails • Leader key is removed when session is invalidated ◦ Failover!

Kubernetes • Prefer Endpoints to ConfigMaps ◦ Leader Endpoint subsets
are atomically updated with the leader Pod IP address • Can use external DCS, but you’ll have to implement callbacks to update Pod labels. 15

Enable DCS Failsafe Mode Patroni will keep primary intact when
DCS is down, but all members of the Patroni/PostgreSQL cluster are up. 16 $ patronictl edit-config --- +++ @@ -4,3 +4,4 @@ use_pg_rewind: true retry_timeout: 10 ttl: 30 +failsafe_mode: on Apply these changes? [y/N]: y Configuration changed Documentation: DCS Failsafe Mode

Backup/PITR • HA != Backup • Use real backup tools
◦ pg_basebackup ◦ pgBackRest ◦ wal-g ◦ Barman 17 • Binary backups require continuous archiving! • Set following GUCs: ◦ archive_mode = 'on' ◦ archive_command = '...' ◦ restore_command = '...'

postgresql: create_replica_methods: - pgbackrest # First try pgbackrest - basebackup
# Try pg_basebackup is pgbackrest failed pgbackrest: command: /usr/bin/pgbackrest --stanza=<scope> --delta restore keep_data: on no_params: on basebackup: max-rate: '100M' Docs . Build replicas from backup 18

postgresql: create_replica_methods: - pgbackrest # First try pgbackrest - basebackup
# Try pg_basebackup is pgbackrest failed pgbackrest: command: /usr/bin/pgbackrest --stanza=<scope> --delta restore keep_data: on no_params: on basebackup: max-rate: '100M' tags: clonefrom: on # allow to fetch basebackup from this node Docs . Build replicas from backup (or other nodes) 19

Permanent slots 20 $ patronictl edit-config slots: node1: type: physical
node2: type: physical node3: type: physical • Helps when can’t implement continuous archiving/PITR • Node names should be static • Protects WAL files when nodes are temporary down

Custom bootstrap (PITR) bootstrap: method: <custom_bootstrap_method_name> # instead of initdb
<custom_bootstrap_method_name>: command: <path_to_custom_bootstrap_script> [param1 [, ...]] keep_existing_recovery_conf: false no_params: false recovery_conf: recovery_target_action: promote recovery_target_timeline: latest recovery_target_time: '2024-03-12 12:00:00 CET' restore_command: <method_specific_restore_command> Custom bootstrap documentation. 21

Configuration management • Global dynamic config (stored in DCS) ◦
Patroni/Postgres parameters ◦ patronictl edit-config ◦ applied on all Patroni nodes, unless overridden • Local config (patroni.yaml) ◦ Patroni/Postgres parameters • Environment configuration ◦ Only Patroni parameters 22

Configuration management (example) etcd: /config -> patroni.yaml postgresql.conf ALTER SYSTEM
SET 23 {"postgresql":{"parameters":{"work_mem":"16MB"}}} postgresql: parameters: work_mem: 12MB # Do not edit this file manually! # It will be overwritten by Patroni! include 'postgresql.base.conf' work_mem = '12MB' work_mem TO '24MB'; $ psql -c "show work_mem" work_mem ---------- 24MB (1 row)

Configuration management (exceptions) • Global only Postgres parameters ◦ max_connections
◦ max_locks_per_transaction ◦ max_worker_processes ◦ max_prepared_transactions ◦ wal_level ◦ track_commit_timestamp ◦ max_wal_senders ◦ max_replication_slots ◦ wal_keep_size 24 • Global only Patroni parameters ◦ ttl ◦ loop_wait ◦ retry_timeout ◦ maximum_lag_on_failover ◦ synchronous_mode ◦ failsafe_mode ◦ primary_start_timeout ◦ max_timelines_history ◦ check_timeline ◦ postgresql.use_slots ◦ slots ◦ ignore_slots ◦ standby_cluster

Example: change max_connections $ patronictl edit-config \ -p max_connections=101 --force
--- +++ @@ -2,7 +2,7 @@ maximum_lag_on_failover: 1048576 postgresql: parameters: - max_connections: 100 + max_connections: 101 # increase! use_pg_rewind: true primary_start_timeout: 0 retry_timeout: 10 Configuration changed 25 # when increasing max_connections restart standby nodes first $ patronictl restart batman node2 # standby first $ patronictl restart batman node1 # primary # when decreasing max_connections restart primary first $ patronictl restart batman node1 # primary first $ patronictl restart batman node2 # standby

Bootstrap configuration # The bootstrap configuration. Works only when the
cluster is not yet initialized. # If the cluster is already initialized, all changes in the `bootstrap` section are ignored! bootstrap: # This section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster # and all other cluster members will use it as a `global configuration`. # WARNING! If you want to change any of the parameters that were set up # via `bootstrap.dcs` section, please use `patronictl edit-config`! dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 postgresql: parameters: max_connections: 150 26

connect_address restapi: listen: '*:8008' connect_address: '10.0.0.1:8008' postgresql: listen: '127.0.0.1,10.0.0.1:5432' connect_address:
'10.0.0.1:5432' connect_address MUST be accessible from other nodes! 27

Security aspects of configuration • Protect Patroni REST API ◦
Enable SSL ◦ Enable basic auth, or better client certs • pg_hba, archive_command, restore_command, archive_cleanup, callbacks, and similar – better to have in local config ◦ think about attacker having access to DCS 28

Connection routing 29 Node A Primary Node B Standby Node
C Primary BANG! Application Application Application Application error: connection to server at "10.0.0.1", port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? Application

Connection routing options • List all nodes in the connection
string: ◦ jdbc:postgresql://node1,node2,node3/postgres?targetServerType=primary ◦ postgresql://host1:port2,host2:port2/?target_session_attrs=read-write • Cybertec vip-manager • postgresql.callbacks scripts ◦ on_start, on_stop, on_restart, on_role_change ◦ Update DNS, assign/remove VIP 30

Connection routing via middleware (LB) 31 Node A Primary Node
B Standby Node C Primary BANG! Application Application Application Application Application

Load-balancing basics 32 restapi: listen: '*:8008' connect_address: '10.0.0.1:8008' postgresql: listen:
'127.0.0.1,10.0.0.1:5432' connect_address: '10.0.0.1:5432' $ curl --head http://node1:8008/standby HTTP/1.0 503 Service Unavailable $ curl --head http://node1:8008/primary HTTP/1.0 200 OK global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen my_cluster_primary bind *:5000 option httpchk HEAD /primary http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1_5432 node1:5432 maxconn 100 check port 8008 server node2_5432 node2:5432 maxconn 100 check port 8008 listen my_cluster_standby bind *:5001 option httpchk HEAD /standby # Load-balance between standby nodes http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1_5432 node1:5432 maxconn 100 check port 8008 server node2_5432 node2:5432 maxconn 100 check port 8008

Making load-balancing dynamic • HAProxy + confd to manage HAProxy
and configs from DCS ◦ Sample configs and templates listen my_cluster_primary bind *:5000 option httpchk HEAD /primary http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions {{range gets "/members/*"}} server {{base .Key}} {{$data := json .Value}}{{base (replace (index (split $data.conn_url "/") 2) "@" "/" -1)}} maxconn 100 check port {{index (split (index (split $data.api_url "/") 2) ":") 1}} {{end}} 33

What if HAProxy fails? • Run HAProxy on multiple nodes
◦ Next to Patroni/Postgres, or dedicated nodes • List all IPs in DNS A record (2nd level balancing) ◦ Beware of DNS caches! • Use keepalived 34

Connection pooling, option 1 35 35 Node A Primary Node
B Standby Node C Standby PgBouncer PgBouncer PgBouncer Application HAProxy pool

Connection pooling, option 2 36 36 Node A Primary Node
B HAProxy pool Standby Node C Standby Application PgBouncer PgBouncer

Connection pooling PgBouncer in front of HAProxy • fewer connection
between HAProxy and PostgreSQL • switchover without errors (PgBouncer pause/resume) 37 PgBouncer in front of PostgreSQL • smaller latency between PgBouncer and PostgreSQL

patronictl • helps to manage Patroni cluster(s) ◦ manual failover/switchover
◦ check status ◦ edit configuration • can work on a node different from Patroni ◦ just needs a config file • Please read patronictl documentation! 38

Monitoring • Patroni doesn’t replace your monitoring solution ◦ It
doesn’t react on overloaded CPU, increased I/O, slow queries or similar • Patroni/Postgres should be monitored (on every node!) ◦ Monitoring endpoint: GET /patroni (JSON) ◦ Prometheus endpoint: GET /metrics 39

Monitoring Patroni cluster $ curl -s http://node1:8008/patroni | jq .
{ "state": "running", "postmaster_start_time": "2024-02-22 11:19:50.056018+01:00", "role": "master", "server_version": 150006, "xlog": {"location": 4845225536}, "timeline": 18, "replication": [ { "usename": "replicator", "application_name": "postgresql1", "client_addr": "10.0.0.2", "state": "streaming", "sync_state": "async", "sync_priority": 0 } ], "dcs_last_seen": 1708678958, "database_system_identifier": "7333181567713260675", "patroni": { "version": "3.2.2", "scope": "batman", "name": "postgresql0" } } 40 $ curl -s http://node2:8008/patroni | jq . { "state": "running", "postmaster_start_time": "2024-02-23 10:01:54.222365+01:00", "role": "replica", "server_version": 150006, "xlog": { "received_location": 4845225536, "replayed_location": 4845225536, "replayed_timestamp": null, "paused": false }, "timeline": 18, "replication_state": "streaming", "dcs_last_seen": 1708678958, "database_system_identifier": "7333181567713260675", "patroni": { "version": "3.2.2", "scope": "batman", "name": "postgresql1" } }

Troubleshooting • Check logs on ALL nodes, Patroni and Postgres
◦ Not just latest logs, but from the time when the problem started! • If log.dir is used: ◦ Check syslog for additional logs from postgres/pg_basebackup/etc 41

Troubleshooting and debugging • Try to reproduce problem on latest
Patroni version • Disable Patroni service and start Patroni from terminal ◦ With debug logs: log.level: DEBUG ◦ Without log.dir! 42

Ansible • github.com/vitabaks/postgresql_cluster ◦ Etcd, Consul ◦ Supports in-place major
upgrades! • Pigsty (PostgreSQL In Great STYle) ◦ Etcd ◦ Monitoring, Dashboards, and all the things 43

Questions? 44

Step-by-step Patroni cooking guide

Step-by-step Patroni cooking guide

More Decks by Alexander Kukushkin

Other Decks in Technology

Featured

Transcript