×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Zero-downtime Postgres upgrades Restarting databases without the apps noticing @ChrisSinjo
Slide 2
Slide 2 text
GOCARDLESS
Slide 3
Slide 3 text
POST /cash/monies HTTP/1.1 { amount: 100 }
Slide 4
Slide 4 text
High per-request
Slide 5
Slide 5 text
Uptime is
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
Good durability guarantees
Slide 8
Slide 8 text
Good durability guarantees Feature-cautious
Slide 9
Slide 9 text
Good durability guarantees Feature-cautious Transactions are cool
Slide 10
Slide 10 text
–Postgres “Speak to this one node.”
Slide 11
Slide 11 text
Client Postgres
Slide 12
Slide 12 text
Client Postgres Postgres Replication
Slide 13
Slide 13 text
Client Postgres Postgres Replication
Slide 14
Slide 14 text
Wake a human up
Slide 15
Slide 15 text
Client Postgres Postgres Replication
Slide 16
Slide 16 text
Client Postgres Postgres
Slide 17
Slide 17 text
Client Postgres Postgres
Slide 18
Slide 18 text
Client Postgres Postgres Replication
Slide 19
Slide 19 text
Awful time-to-recovery Error-prone
Slide 20
Slide 20 text
You gotta perform: - Many steps - In the right order - Perfectly
Slide 21
Slide 21 text
Don’t make a tired SRE think
Slide 22
Slide 22 text
Add automation
Slide 23
Slide 23 text
Pacemaker A clustering tool
Slide 24
Slide 24 text
Client Postgres Postgres Replication
Slide 25
Slide 25 text
How do we know a node has failed?
Slide 26
Slide 26 text
Jepsen https://aphyr.com/tags/jepsen
Slide 27
Slide 27 text
https://aphyr.com/posts/317-jepsen-elasticsearch
Slide 28
Slide 28 text
Client Postgres Postgres Replication
Slide 29
Slide 29 text
Client Postgres Postgres Postgres Repl Repl
Slide 30
Slide 30 text
Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker
Slide 31
Slide 31 text
Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP
Slide 32
Slide 32 text
Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP
Slide 33
Slide 33 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 34
Slide 34 text
Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP
Slide 35
Slide 35 text
Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP
Slide 36
Slide 36 text
Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP
Slide 37
Slide 37 text
Client Postgres Postgres Postgres Repl Repl VIP Pacemaker Pacemaker Pacemaker
Slide 38
Slide 38 text
$
Slide 39
Slide 39 text
Seems hard, right?
Slide 40
Slide 40 text
It kinda is
Slide 41
Slide 41 text
You gotta know: - Postgres - Distributed systems - Pacemaker
Slide 42
Slide 42 text
Get someone else to run it for you
Slide 43
Slide 43 text
Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP
Slide 44
Slide 44 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 45
Slide 45 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 46
Slide 46 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 47
Slide 47 text
Every move means a connection reset
Slide 48
Slide 48 text
Every move means dropped requests
Slide 49
Slide 49 text
POST /cash/monies HTTP/1.1 { amount: 100 }
Slide 50
Slide 50 text
POST /cash/monies HTTP/1.1 { amount: 100 } 500 Internal Server Error
Slide 51
Slide 51 text
What does this mean for upgrades?
Slide 52
Slide 52 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 53
Slide 53 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.9 9.4.9 9.4.9 VIP
Slide 54
Slide 54 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.9 9.4.9 9.4.9 Repl Repl VIP
Slide 55
Slide 55 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.10 9.4.9 9.4.10 Repl Repl VIP
Slide 56
Slide 56 text
Client Postgres Postgres Postgres Repl Repl VIP Pacemaker Pacemaker Pacemaker 9.4.10 9.4.9 9.4.10
Slide 57
Slide 57 text
Every upgrade means a connection reset
Slide 58
Slide 58 text
Every upgrade means dropped requests
Slide 59
Slide 59 text
POST /cash/monies HTTP/1.1 { amount: 100 } 500 Internal Server Error
Slide 60
Slide 60 text
Solution: never upgrade
Slide 61
Slide 61 text
No content
Slide 62
Slide 62 text
Not upgrading is never an option
Slide 63
Slide 63 text
Solution: never upgrade
Slide 64
Slide 64 text
Solution: never upgrade
Slide 65
Slide 65 text
Solution: ???
Slide 66
Slide 66 text
1thing missing
Slide 67
Slide 67 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP
Slide 68
Slide 68 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP
Slide 69
Slide 69 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP
Slide 70
Slide 70 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP VIP
Slide 71
Slide 71 text
PgBouncer has This One Weird Trick™
Slide 72
Slide 72 text
PAUSE;
Slide 73
Slide 73 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP VIP
Slide 74
Slide 74 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP VIP PAUSE;
Slide 75
Slide 75 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP PAUSE; VIP
Slide 76
Slide 76 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP PAUSE; VIP
Slide 77
Slide 77 text
So what does this mean for upgrades?
Slide 78
Slide 78 text
Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer VIP VIP
Slide 79
Slide 79 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP
Slide 80
Slide 80 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP 9.4.10 9.4.9 9.4.10
Slide 81
Slide 81 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP 9.4.10 9.4.9 9.4.10 PAUSE;
Slide 82
Slide 82 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.9 9.4.10 VIP PAUSE;
Slide 83
Slide 83 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.9 9.4.10 VIP RESUME;
Slide 84
Slide 84 text
Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.10 9.4.10 VIP RESUME;
Slide 85
Slide 85 text
$
Slide 86
Slide 86 text
Caveats
Slide 87
Slide 87 text
Minor versions
Slide 88
Slide 88 text
9.4.9 → 9.4.10
Slide 89
Slide 89 text
pglogical
Slide 90
Slide 90 text
Minor versions Long-running transactions
Slide 91
Slide 91 text
while(running_queries): if(now > timeout): abandon_migration else: sleep(0.1) promote_new_primary
Slide 92
Slide 92 text
Minor versions Long-running transactions Pause length
Slide 93
Slide 93 text
7-10s total
Slide 94
Slide 94 text
$
Slide 95
Slide 95 text
One more thing… (#sorrynotsorry)
Slide 96
Slide 96 text
github.com/gocardless/our-postgresql-setup
Slide 97
Slide 97 text
We’re hiring '❤ @ChrisSinjo @GoCardlessEng
Slide 98
Slide 98 text
Thank you '❤ @ChrisSinjo @GoCardlessEng
Slide 99
Slide 99 text
Questions? '❤ @ChrisSinjo @GoCardlessEng