Slide 1

Slide 1 text

Modern PostgreSQL High Availability Álvaro Hernandez Tortosa PGConf.Russia 2018

Slide 2

Slide 2 text

CEO ALVARO HERNANDEZ @ahachete DBA and Java Software Developer OnGres and ToroDB Founder Well-Known member of the PostgreSQL Community World- Class Database Expert (+30 Talks in last 2 years)

Slide 3

Slide 3 text

PGConf.Russia 2018 Introduction

Slide 4

Slide 4 text

PGConf.Russia 2018 What HA is… NOT • Easy • A mechanism to improve the performance • A risk-free or data-loss free silver-bullet • Totally Available (in the CAP sense). Either CP or Eventually Consistent with High Availability • Something integrated into core

Slide 5

Slide 5 text

PGConf.Russia 2018 What HA is… NOT SR != HA (Streaming Replication)

Slide 6

Slide 6 text

PGConf.Russia 2018 What I don’t call “Modern” HA • Aurora is modern, but opaque • Hardware-assisted, DRBD, iSCSI, etc: a lot of limitations • Typically limited to only 2 nodes • Hot-standby (waste of resources) HA built on disk replication or OS virtualization

Slide 7

Slide 7 text

PGConf.Russia 2018 What I don’t call “Modern” HA • HA is a distributed “agreement” problem • Only well-known consensus algorithms (Paxos, Raft, ZAB) are proven to work • Solutions based on home-grown algorithms would probably fail Solutions not based on consensus algorithms

Slide 8

Slide 8 text

PGConf.Russia 2018 Some open-source “Modern” HA solutions 
 Patroni (forked from Governor)
 https://github.com/zalando/patroni
 Python 
 Stolon
 https://github.com/sorintlab/stolon
 Go

Slide 9

Slide 9 text

PGConf.Russia 2018 Architecture of typical "modern" HA solutions DISTRIBUTED CONSENSUS CLUSTER ( ETCD, CONSUL, ZOOKEPER) Disk ENTRY POINT (LOAD BALANCER, DNS, PROXY) NOC 1/ AZ 1 NOC 2 / AZ 2 NOC 3 / AZ 3 Consensus Node Consensus Node Consensus Node HA Agent PG HA Agent PG HA Agent PG SR (+ WAL shipping?) SR (+ WAL shipping?) Feedback/Scripting Feedback/Scripting Feedback/Scripting

Slide 10

Slide 10 text

PGConf.Russia 2018 Current limitations / problems • Involved set up • Invasive (agent controls PG, even config) • No entry-point included (Stolon does, but perf, testing?) • Consensus is external (yet another moving part) • Hard automatic deployments (complexity) • Tested, but more is welcome Stolon Architecture

Slide 11

Slide 11 text

PGConf.Russia 2018 HA caveats / recommendations / best practices

Slide 12

Slide 12 text

PGConf.Russia 2018 Replication options (I) • Asynchronous (default): ✓ data-loss risk ✓ may require pg_rewind • Synchronous: ✓ lower performance (pick wisely!) ✓ may lead to cluster unavailability

Slide 13

Slide 13 text

PGConf.Russia 2018 Replication options (II) • Synchronous + asynchronous: ✓ Multiple synchronous, PG 9.6: 
 ✓ Quorum commit, PG 10:
 synchronous_standby_names = 'x (node_1, node_2, …, node_N)’ synchronous_standby_names = 'ANY x (node_1, node_2, …, node_N)'

Slide 14

Slide 14 text

PGConf.Russia 2018 Replication options (III) • If you include synchronous, you may also want (9.6): (otherwise, you achieve no data-loss, but only casual reads) synchronous_commit = ‘remote_apply’

Slide 15

Slide 15 text

PGConf.Russia 2018 The entry point (I) How do PostgreSQL clients connect to master? PostgreSQL does not provide support in the protocol for advertising the master and/or propagating cluster topology. MongoDB does: https://github.com/mongodb/ specifications/blob/master/source/server- discovery-and-monitoring/server-discovery-and- monitoring.rst

Slide 16

Slide 16 text

PGConf.Russia 2018 The entry point (II) • Client libraries in PG 10 add support for multiple hosts/ ports: 
 • As well as read-write only or any server:
 postgresql://...?target_session_attrs=read- write|any host=host1,host2 port=port1,port2
 postgresql://host1:port2,host2:port2/

Slide 17

Slide 17 text

PGConf.Russia 2018 The entry point (III) • Similarly, JDBC introduced support for multiple hosts and read-write/read-only host selection (compatible with earlier PG versions too): jdbc:postgresql://node1,node2,node3,..., nodeN/db?targetServerType=master jdbc:postgresql://node1,node2,..., nodeN/db? targetServerType=preferSlave&loadBalanceHosts=t rue|false

Slide 18

Slide 18 text

PGConf.Russia 2018 The entry point (IV) • HAProxy, pgbouncer, cloud load balancers, DNS • A bit DIY: how do entry point knows of changes to master state? Typically scripting required • Patroni healthcheck URLs return 200|503: 
 • Stolon provides a proxy as entry-point. Performance? GET|OPTIONS http://ip:8008/(master|replica)

Slide 19

Slide 19 text

PGConf.Russia 2018 The entry point (V) • Consul is a distributed consensus database, that also provides authoritative DNS support • It could be used to consistently propagate master state changes via DNS • No changes required on clients • Consul DNS not used by open source HA solutions

Slide 20

Slide 20 text

PGConf.Russia 2018 Difficulties in automation / deployment • You may not be able to use already created automation “roles” for PostgreSQL, as they “take over” it and control even initdb process • Many moving parts, many automations needed • Provisioning is done by HA or AutoScaling? • You may need to rely on fixed private IPs or use internal load balancer

Slide 21

Slide 21 text

PGConf.Russia 2018 STONITH • Master gets partitioned away from cluster. But… clients still connected and writing! • You need to fence/STONITH “old” master. • Architecture/environment/cloud dependent (scripting again…) • Possible unreplicated transactions due to asynchronous mode (data loss) • Best is done through proxy (Stolon)

Slide 22

Slide 22 text

PGConf.Russia 2018 Application retry logic • Application needs to be programmed to retry connections, as they may fail and be restarted • Transactions may be aborted on connection fails, augment retry logic to also transactions • On PG>=10 you may check status of a tx:
 BEGIN; SELECT txid_current(); …; END;
 SELECT txid_status(txid);

Slide 23

Slide 23 text

PGConf.Russia 2018 HA replicas vs scale read-only replicas • They are not the same! • Should have the same configuration and same hardware on HA replicas • Fine-tune, if desired, scale read-only replicas for specific workloads --but never promote them! • Application should be aware of read-only replicas’ stale- reads "(use synchronous_commit=remote-apply or read-your-own-writes from the master)"

Slide 24

Slide 24 text

PGConf.Russia 2018 What about logical replication? • Logical decoding (basic infrastructure) only since 9.4, logical replication (patch required, external) since 9.6, only in core in 10 • Does not replicate DDL • Replicates on a per-database, not per-cluster level • Does not prevent writes on replicas You cannot do HA with logical replication, use streaming replication

Slide 25

Slide 25 text

PGConf.Russia 2018 Sneak-peek into the future

Slide 26

Slide 26 text

PGConf.Russia 2018 K8S-based PostgreSQL HA solutions • K8S provides monitoring, master election • Also shared storage • And also proxy services Should be straightforward, right? • Not sufficiently mature / stable yet • Generic solution, not PostgreSQL-specific • It is already implemented in Patroni and Stolon

Slide 27

Slide 27 text

PGConf.Russia 2018 Multi-datacenter HA: Hydra (OnGres + Groupon) • Uses Consul KV and DNS Query APIs to provide consistent, multi data center entry points • Failover in a single data center, followed by instances in other data centers • Switchover between full data centers • Automatic election of most recent replica • Async REST API for management

Slide 28

Slide 28 text

www.ongres.com [email protected] Questions? @ongresinc