Failsafe Patroni 3.0

Failsafe Patroni 3.0 Prague PostgreSQL Developer Day Alexander Kukushkin &
Polina Bungina 2023 • 02 • 01 Presented by

About us • Principal Software Engineer @Microsoft • The Patroni
guy • [email protected] • Twitter: @cyberdemn • Software Engineer @ZalandoTech • [email protected] • Twitter: @hugh_capet Polina Bungina Alexander Kukushkin

Agenda Introduction to Patroni Observer problem Demo 1 DCS dailsafe
feature Demo 2 Conclusion

4 • Service-Level Agreement (SLA) • Recovery point objective (RPO)
• Recovery time objective (RTO) Do we need it at all?

5 • Cluster state stored in Distributed Conﬁguration Store (DCS)
◦ ZooKeeper ◦ Etcd ◦ Consul ◦ Kubernetes control-plane • Session/TTL to expire data (i.e. leader key) • Atomic CAS operations • Watches for important keys Architecture overview

6 Leader race DCS CREATE “/leader”, “A”, ttl=30, prevExists=False CREATE
“/leader”, “B”, ttl=30, prevExists=False Success Fail A B promote

7 Normal operational mode DCS UPDATE “/leader”, “A”, ttl=30 prevValue=”A”
Success A B WATCH(“/leader”) primary replica

8 Normal operational mode DCS NOTIFY(“/leader”, expired=True) replica B

9 Normal operational mode DCS B promote primary CREATE “/leader”,
“B”, ttl=30, prevExists=False Success

10 DCS can't be accessed UPDATE “/leader”, “A”, ttl=30 prevValue=”A”
Fail A B primary replica

11 DCS can't be accessed UPDATE “/leader”, “A”, ttl=30 prevValue=”A”
Success A B primary replica

12 Why did update fail? • DCS is down? •
Network issues?

13 Network partition UPDATE “/leader”, “A”, ttl=30 prevValue=”A” Fail A
primary B replica

14 primary Leader key expired UPDATE “/leader”, “A”, ttl=30 prevValue=”A”
Fail B CREATE “/leader”, “B”, ttl=30 prevExists=False Success promote A primary

15 So, to be on the safe side… UPDATE “/leader”,
“A”, ttl=30 prevValue=”A” Fail A demote primary B replica

16 So, to be on the safe side… A replica
primary B promote CREATE “/leader”, “B”, ttl=30 prevExists=False Success

17 Still not perfect A replica CREATE “/leader”, “B”, ttl=30
prevExists=False Fail B replica

19 • Etcd, Zookeeper - very unlikely (if conﬁgured correctly)
• Consul - local agent is a SPoF! • Kubernetes control-plane - typical SLA for managed services is 99.95% (4h22m per year) DCS down

20 What if… A B standby primary Do you see
DCS? NO

21 Split-brain! A B standby primary Do you see DCS?
NO C primary

22 • Continue to run as primary if can see
ALL Patroni nodes • Don’t allow “unknown” nodes to become primary! Idea

23 • Patroni clusters are mainly “static”, but nodes can
join and leave • If topology changes - write list of of Patroni nodes names to DCS • Nodes outside of this list are “unknown” and not allowed to become primary “Unknown” node?

24 DCS failsafe mode UPDATE “/leader”, “A”, ttl=30 Fail node1
nodeN 1 POST /failsafe 2 [cache primary data for ttl] 3 4 200 OK /failsafe: node1, node2, …, nodeN

25 • Introduce /failsafe key - list of currently presented
members in the cluster ◦ Maintained by the leader ◦ Cache its value in Patroni (on all nodes) • Introduce POST /failsafe REST API endpoint ◦ Payload contains information about primary and permanent logical slots ◦ Primary checks response code and demotes if not 200 Implementation details

26 • Replica disqualiﬁes itself from the leader race if
not listed in the DCS /failsafe key • Primary executes the failsafe check only with nodes from the failsafe list ◦ Continue as primary if ALL nodes are accessible ◦ Otherwise demote • Replicas call pg_advance_replication_slot() if necessary. Implementation details (continue)

27 $ patronictl edit-config --- +++ @@ -4,3 +4,4 @@
use_pg_rewind: true retry_timeout: 10 ttl: 30 +failsafe_mode: on Apply these changes? [y/N]: y Configuration changed $ etcdctl get /service/batman/failsafe { "postgresql0": "http://127.0.0.1:8008/patroni", "postgresql1": "http://127.0.0.1:8009/patroni" } How to enable failsafe mode $ curl http://127.0.0.1:8008/failsafe { "postgresql0": "http://127.0.0.1:8008/patroni", "postgresql1": "http://127.0.0.1:8009/patroni" }

28 $ curl -s http://127.0.0.1:8008/patroni | jq . { "state":
"running", "postmaster_start_time": "2023-01-26 16:11:04.848424+00:00", "role": "master", "server_version": 150001, "xlog": {"location": 67419584}, "timeline": 2, "replication": [ {"usename": "replicator", "application_name": "postgresql1", "client_addr": "127.0.0.1", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "cluster_unlocked": true, "failsafe_mode_is_active": true, "dcs_last_seen": 1674749503, "database_system_identifier": "7192993973708324892", "patroni": {"version": "3.0.0", "scope": "demo"} } Monitoring

30 • When nodes could change their names after “restart”
(with old storage) ◦ If ALL nodes are restarted at the same time cluster will not recover automatically Example: • K8s deployment without StatefulSet ◦ Crunchy Postgres Operator (PGO) When not to use it

31 Thank you! Questions?

Failsafe Patroni 3.0

Failsafe Patroni 3.0

Alexander Kukushkin

More Decks by Alexander Kukushkin

Other Decks in Technology

Featured

Transcript

Failsafe Patroni 3.0 Prague PostgreSQL Developer Day Alexander Kukushkin &

About us • Principal Software Engineer @Microsoft • The Patroni

Agenda Introduction to Patroni Observer problem Demo 1 DCS dailsafe

4 • Service-Level Agreement (SLA) • Recovery point objective (RPO)

5 • Cluster state stored in Distributed Conﬁguration Store (DCS)

6 Leader race DCS CREATE “/leader”, “A”, ttl=30, prevExists=False CREATE

7 Normal operational mode DCS UPDATE “/leader”, “A”, ttl=30 prevValue=”A”

8 Normal operational mode DCS NOTIFY(“/leader”, expired=True) replica B

9 Normal operational mode DCS B promote primary CREATE “/leader”,

10 DCS can't be accessed UPDATE “/leader”, “A”, ttl=30 prevValue=”A”

11 DCS can't be accessed UPDATE “/leader”, “A”, ttl=30 prevValue=”A”

12 Why did update fail? • DCS is down? •

13 Network partition UPDATE “/leader”, “A”, ttl=30 prevValue=”A” Fail A

14 primary Leader key expired UPDATE “/leader”, “A”, ttl=30 prevValue=”A”

15 So, to be on the safe side… UPDATE “/leader”,

16 So, to be on the safe side… A replica

17 Still not perfect A replica CREATE “/leader”, “B”, ttl=30

19 • Etcd, Zookeeper - very unlikely (if conﬁgured correctly)

20 What if… A B standby primary Do you see

21 Split-brain! A B standby primary Do you see DCS?

22 • Continue to run as primary if can see

23 • Patroni clusters are mainly “static”, but nodes can

24 DCS failsafe mode UPDATE “/leader”, “A”, ttl=30 Fail node1

25 • Introduce /failsafe key - list of currently presented

26 • Replica disqualiﬁes itself from the leader race if

27 $ patronictl edit-config --- +++ @@ -4,3 +4,4 @@

28 $ curl -s http://127.0.0.1:8008/patroni | jq . { "state":

30 • When nodes could change their names after “restart”

31 Thank you! Questions?