Postgresql and Pacemaker
from the Ground Up
Brian Cosgrove
Braintree
Slide 2
Slide 2 text
What is HA? Why do we want it?
Slide 3
Slide 3 text
High Availability
High availability refers to a system or component that is continuously
operational for a desirably long length of time. Availability can be measured
relative to "100% operational" or "never failing." A widely-held but difficult-to-
achieve standard of availability for a system or product is known as "five 9s"
(99.999 percent) availability[1].
[1] http://searchdatacenter.techtarget.com/definition/high-availability
Slide 4
Slide 4 text
$50,000,000,000 / year
$95,000 / minute
http://venturebeat.com/2015/09/17/paypals-braintree-is-now-likely-bigger-than-square-and-stripe-combined/
Slide 5
Slide 5 text
Designing HA: Detecting failure
Reliable and quick detection of failure allows us
to minimize user-facing impact.
Slide 6
Slide 6 text
Designing HA:
Eliminate Single
Points of Failure
Principle: Add redundancy to the
system so that failure of a component
does not mean failure of the entire
system.
Use PostgreSQL in combination with
synchronous replication to provide a
hot-standby that can be promoted if
the primary fails.
Slide 7
Slide 7 text
Designing HA:
Reliable failover
Pacemaker can automate the
promotion of standbys.
Slide 8
Slide 8 text
“The automated failover of our main production database could be
described as the root cause of both of these downtime events… we've
made changes to our Pacemaker configuration to ensure failover of the
'active' database role will only occur when initiated by a member of our
operations team.”
- A post-mortem
Slide 9
Slide 9 text
1. Have one candidate for fail-over per-database cluster
2. Fail-over is a one-way operation - no flapping
3. Let humans take over if Pacemaker is confused
Mitigating some of the risks involved in
automated failover
Slide 10
Slide 10 text
The nuts and bolts: Pacemaker
and Corosync at Braintree
Slide 11
Slide 11 text
Our topology
Each Postgres cluster gets its own
Pacemaker cluster
Protect against split-brains by
introducing a third server which only
provides a vote in leader elections
Achieve even more isolation by
running each Pacemaker cluster on its
own “heartbeat” VLAN
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
Putting it together: Resources
Resources are controlled by init-like OCF scripts
Resources for our installation fall into the following categories:
● VIPs (virtual IP addresses) - “IPAddr2” resource
● pgsql resources - these are the Postgres clusters themselves
● STONITH - we use a custom resource that operates via SNMP on our APC PDUs