Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modern PostgreSQL High Availability

OnGres
February 06, 2018

Modern PostgreSQL High Availability

It’s 3am. Your phone rings. PostgreSQL is down, you need to promote a replica to master. Why the h**l isn’t this automatic?

If you thought of this before, you want automatic High Availability (HA). Don’t miss this talk! We will enter the world of Modern PostgreSQL HA.

Good news, there are several new, “modern” solutions for PostgreSQL HA. However, there are several solutions and it's not easy to pick one. Most require non-trivial setups, and there are many small caveats about HA like how to provide entry points to the application, HA correctness, HA vs. read scaling, external dependencies, interaction with cloud environments, and so forth.

Master PostgreSQL HA and how to deploy it on current times.

OnGres

February 06, 2018
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. CEO ALVARO HERNANDEZ @ahachete DBA and Java Software Developer OnGres

    and ToroDB Founder Well-Known member of the PostgreSQL Community World- Class Database Expert (+30 Talks in last 2 years)
  2. PGConf.Russia 2018 What HA is… NOT • Easy • A

    mechanism to improve the performance • A risk-free or data-loss free silver-bullet • Totally Available (in the CAP sense). Either CP or Eventually Consistent with High Availability • Something integrated into core
  3. PGConf.Russia 2018 What I don’t call “Modern” HA • Aurora

    is modern, but opaque • Hardware-assisted, DRBD, iSCSI, etc: a lot of limitations • Typically limited to only 2 nodes • Hot-standby (waste of resources) HA built on disk replication or OS virtualization
  4. PGConf.Russia 2018 What I don’t call “Modern” HA • HA

    is a distributed “agreement” problem • Only well-known consensus algorithms (Paxos, Raft, ZAB) are proven to work • Solutions based on home-grown algorithms would probably fail Solutions not based on consensus algorithms
  5. PGConf.Russia 2018 Some open-source “Modern” HA solutions 
 Patroni (forked

    from Governor)
 https://github.com/zalando/patroni
 Python 
 Stolon
 https://github.com/sorintlab/stolon
 Go
  6. PGConf.Russia 2018 Architecture of typical "modern" HA solutions DISTRIBUTED CONSENSUS

    CLUSTER ( ETCD, CONSUL, ZOOKEPER) Disk ENTRY POINT (LOAD BALANCER, DNS, PROXY) NOC 1/ AZ 1 NOC 2 / AZ 2 NOC 3 / AZ 3 Consensus Node Consensus Node Consensus Node HA Agent PG HA Agent PG HA Agent PG SR (+ WAL shipping?) SR (+ WAL shipping?) Feedback/Scripting Feedback/Scripting Feedback/Scripting
  7. PGConf.Russia 2018 Current limitations / problems • Involved set up

    • Invasive (agent controls PG, even config) • No entry-point included (Stolon does, but perf, testing?) • Consensus is external (yet another moving part) • Hard automatic deployments (complexity) • Tested, but more is welcome Stolon Architecture
  8. PGConf.Russia 2018 Replication options (I) • Asynchronous (default): ✓ data-loss

    risk ✓ may require pg_rewind • Synchronous: ✓ lower performance (pick wisely!) ✓ may lead to cluster unavailability
  9. PGConf.Russia 2018 Replication options (II) • Synchronous + asynchronous: ✓

    Multiple synchronous, PG 9.6: 
 ✓ Quorum commit, PG 10:
 synchronous_standby_names = 'x (node_1, node_2, …, node_N)’ synchronous_standby_names = 'ANY x (node_1, node_2, …, node_N)'
  10. PGConf.Russia 2018 Replication options (III) • If you include synchronous,

    you may also want (9.6): (otherwise, you achieve no data-loss, but only casual reads) synchronous_commit = ‘remote_apply’
  11. PGConf.Russia 2018 The entry point (I) How do PostgreSQL clients

    connect to master? PostgreSQL does not provide support in the protocol for advertising the master and/or propagating cluster topology. MongoDB does: https://github.com/mongodb/ specifications/blob/master/source/server- discovery-and-monitoring/server-discovery-and- monitoring.rst
  12. PGConf.Russia 2018 The entry point (II) • Client libraries in

    PG 10 add support for multiple hosts/ ports: 
 • As well as read-write only or any server:
 postgresql://...?target_session_attrs=read- write|any host=host1,host2 port=port1,port2
 postgresql://host1:port2,host2:port2/
  13. PGConf.Russia 2018 The entry point (III) • Similarly, JDBC introduced

    support for multiple hosts and read-write/read-only host selection (compatible with earlier PG versions too): jdbc:postgresql://node1,node2,node3,..., nodeN/db?targetServerType=master jdbc:postgresql://node1,node2,..., nodeN/db? targetServerType=preferSlave&loadBalanceHosts=t rue|false
  14. PGConf.Russia 2018 The entry point (IV) • HAProxy, pgbouncer, cloud

    load balancers, DNS • A bit DIY: how do entry point knows of changes to master state? Typically scripting required • Patroni healthcheck URLs return 200|503: 
 • Stolon provides a proxy as entry-point. Performance? GET|OPTIONS http://ip:8008/(master|replica)
  15. PGConf.Russia 2018 The entry point (V) • Consul is a

    distributed consensus database, that also provides authoritative DNS support • It could be used to consistently propagate master state changes via DNS • No changes required on clients • Consul DNS not used by open source HA solutions
  16. PGConf.Russia 2018 Difficulties in automation / deployment • You may

    not be able to use already created automation “roles” for PostgreSQL, as they “take over” it and control even initdb process • Many moving parts, many automations needed • Provisioning is done by HA or AutoScaling? • You may need to rely on fixed private IPs or use internal load balancer
  17. PGConf.Russia 2018 STONITH • Master gets partitioned away from cluster.

    But… clients still connected and writing! • You need to fence/STONITH “old” master. • Architecture/environment/cloud dependent (scripting again…) • Possible unreplicated transactions due to asynchronous mode (data loss) • Best is done through proxy (Stolon)
  18. PGConf.Russia 2018 Application retry logic • Application needs to be

    programmed to retry connections, as they may fail and be restarted • Transactions may be aborted on connection fails, augment retry logic to also transactions • On PG>=10 you may check status of a tx:
 BEGIN; SELECT txid_current(); …; END;
 SELECT txid_status(txid);
  19. PGConf.Russia 2018 HA replicas vs scale read-only replicas • They

    are not the same! • Should have the same configuration and same hardware on HA replicas • Fine-tune, if desired, scale read-only replicas for specific workloads --but never promote them! • Application should be aware of read-only replicas’ stale- reads "(use synchronous_commit=remote-apply or read-your-own-writes from the master)"
  20. PGConf.Russia 2018 What about logical replication? • Logical decoding (basic

    infrastructure) only since 9.4, logical replication (patch required, external) since 9.6, only in core in 10 • Does not replicate DDL • Replicates on a per-database, not per-cluster level • Does not prevent writes on replicas You cannot do HA with logical replication, use streaming replication
  21. PGConf.Russia 2018 K8S-based PostgreSQL HA solutions • K8S provides monitoring,

    master election • Also shared storage • And also proxy services Should be straightforward, right? • Not sufficiently mature / stable yet • Generic solution, not PostgreSQL-specific • It is already implemented in Patroni and Stolon
  22. PGConf.Russia 2018 Multi-datacenter HA: Hydra (OnGres + Groupon) • Uses

    Consul KV and DNS Query APIs to provide consistent, multi data center entry points • Failover in a single data center, followed by instances in other data centers • Switchover between full data centers • Automatic election of most recent replica • Async REST API for management