Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero-downtime Postgres upgrades (DoxLon Edition)

Zero-downtime Postgres upgrades (DoxLon Edition)

At GoCardless, we use Postgres as the primary store for data that matters - records of merchants, customers and payments.

As a payments API, it's important to our users that we maintain a high level of uptime. At the same time, we believe that performing upgrades is an important reality of running software in production - databases included. Even the most stable software has critical bugs from time to time, and you have to deploy patches.

When it came to Postgres, we found ourselves caught between our desire to minimise downtime and our need to keep our software stack up-to-date. Postgres doesn't ship with all the machinery you need to do zero-downtime upgrades, so we knew we had work to do.

In the talk, we'll look at the problems faced when trying to upgrade Postgres without downtime, and explore our approach to building automation to upgrade Postgres without the apps noticing.

Chris Sinjakli

January 26, 2017
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Zero-downtime Postgres upgrades Restarting databases without the apps noticing @ChrisSinjo

  2. GOCARDLESS

  3. POST /cash/monies HTTP/1.1 { amount: 100 }

  4. High per-request

  5. Uptime is

  6. None
  7. Good durability guarantees

  8. Good durability guarantees Feature-cautious

  9. Good durability guarantees Feature-cautious Transactions are cool

  10. –Postgres “Speak to this one node.”

  11. Client Postgres

  12. Client Postgres Postgres Replication

  13. Client Postgres Postgres Replication

  14. Wake a human up

  15. Client Postgres Postgres Replication

  16. Client Postgres Postgres

  17. Client Postgres Postgres

  18. Client Postgres Postgres Replication

  19. Awful time-to-recovery Error-prone

  20. You gotta perform: - Many steps - In the right

    order - Perfectly
  21. Don’t make a tired SRE think

  22. Add automation

  23. Pacemaker A clustering tool

  24. Client Postgres Postgres Replication

  25. How do we know a node has failed?

  26. Jepsen https://aphyr.com/tags/jepsen

  27. https://aphyr.com/posts/317-jepsen-elasticsearch

  28. Client Postgres Postgres Replication

  29. Client Postgres Postgres Postgres Repl Repl

  30. Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker

  31. Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP

  32. Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP

  33. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  34. Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP

  35. Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP

  36. Postgres Postgres Postgres Repl Pacemaker Pacemaker Pacemaker Client VIP

  37. Client Postgres Postgres Postgres Repl Repl VIP Pacemaker Pacemaker Pacemaker

  38. $

  39. Seems hard, right?

  40. It kinda is

  41. You gotta know: - Postgres - Distributed systems - Pacemaker

  42. Get someone else to run it for you

  43. Client Postgres Postgres Postgres Repl Repl Pacemaker Pacemaker Pacemaker VIP

  44. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  45. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  46. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  47. Every move means a connection reset

  48. Every move means dropped requests

  49. POST /cash/monies HTTP/1.1 { amount: 100 }

  50. POST /cash/monies HTTP/1.1 { amount: 100 } 500 Internal Server

    Error
  51. What does this mean for upgrades?

  52. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  53. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.9 9.4.9 9.4.9

    VIP
  54. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.9 9.4.9 9.4.9

    Repl Repl VIP
  55. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker 9.4.10 9.4.9 9.4.10

    Repl Repl VIP
  56. Client Postgres Postgres Postgres Repl Repl VIP Pacemaker Pacemaker Pacemaker

    9.4.10 9.4.9 9.4.10
  57. Every upgrade means a connection reset

  58. Every upgrade means dropped requests

  59. POST /cash/monies HTTP/1.1 { amount: 100 } 500 Internal Server

    Error
  60. Solution: never upgrade

  61. None
  62. Not upgrading is never an option

  63. Solution: never upgrade

  64. Solution: never upgrade

  65. Solution: ???

  66. 1thing missing

  67. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker VIP

  68. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP
  69. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP
  70. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP VIP
  71. PgBouncer has This One Weird Trick™

  72. PAUSE;

  73. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP VIP
  74. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP VIP PAUSE;
  75. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP PAUSE; VIP
  76. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP PAUSE; VIP
  77. So what does this mean for upgrades?

  78. Client Postgres Postgres Postgres Pacemaker Pacemaker Pacemaker PgBouncer PgBouncer PgBouncer

    VIP VIP
  79. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP

  80. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP 9.4.10

    9.4.9 9.4.10
  81. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP VIP 9.4.10

    9.4.9 9.4.10 PAUSE;
  82. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.9

    9.4.10 VIP PAUSE;
  83. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.9

    9.4.10 VIP RESUME;
  84. Client Postgres Postgres Postgres PgBouncer PgBouncer PgBouncer VIP 9.4.10 9.4.10

    9.4.10 VIP RESUME;
  85. $

  86. Caveats

  87. Minor versions

  88. 9.4.9 → 9.4.10

  89. pglogical

  90. Minor versions Long-running transactions

  91. while(running_queries): if(now > timeout): abandon_migration else: sleep(0.1) promote_new_primary

  92. Minor versions Long-running transactions Pause length

  93. 7-10s total

  94. $

  95. One more thing… (#sorrynotsorry)

  96. github.com/gocardless/our-postgresql-setup

  97. We’re hiring '❤ @ChrisSinjo @GoCardlessEng

  98. Thank you '❤ @ChrisSinjo @GoCardlessEng

  99. Questions? '❤ @ChrisSinjo @GoCardlessEng