Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Postgresql and Pacemaker from the Ground UP

Postgresql and Pacemaker from the Ground UP

PostgresOpen 2015

Brian Cosgrove

September 18, 2015
Tweet

Other Decks in Programming

Transcript

  1. High Availability High availability refers to a system or component

    that is continuously operational for a desirably long length of time. Availability can be measured relative to "100% operational" or "never failing." A widely-held but difficult-to- achieve standard of availability for a system or product is known as "five 9s" (99.999 percent) availability[1]. [1] http://searchdatacenter.techtarget.com/definition/high-availability
  2. Designing HA: Eliminate Single Points of Failure Principle: Add redundancy

    to the system so that failure of a component does not mean failure of the entire system. Use PostgreSQL in combination with synchronous replication to provide a hot-standby that can be promoted if the primary fails.
  3. “The automated failover of our main production database could be

    described as the root cause of both of these downtime events… we've made changes to our Pacemaker configuration to ensure failover of the 'active' database role will only occur when initiated by a member of our operations team.” - A post-mortem
  4. 1. Have one candidate for fail-over per-database cluster 2. Fail-over

    is a one-way operation - no flapping 3. Let humans take over if Pacemaker is confused Mitigating some of the risks involved in automated failover
  5. Our topology Each Postgres cluster gets its own Pacemaker cluster

    Protect against split-brains by introducing a third server which only provides a vote in leader elections Achieve even more isolation by running each Pacemaker cluster on its own “heartbeat” VLAN
  6. Putting it together: Resources Resources are controlled by init-like OCF

    scripts Resources for our installation fall into the following categories: • VIPs (virtual IP addresses) - “IPAddr2” resource • pgsql resources - these are the Postgres clusters themselves • STONITH - we use a custom resource that operates via SNMP on our APC PDUs