Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Upgrading PostgreSQL on AWS RDS with (almost) zero downtime

Upgrading PostgreSQL on AWS RDS with (almost) zero downtime

Stanislav Komanec (Platform Team Leader at Kiwi.com) @ Brno-Moscow Python Meetup 2
"Every database admin’s nightmare & every manager’s dream: Database upgrades without an outage. But is it really so difficult?
In my presentation, I will describe our journey and how we were able to do it. We will explore the challenges of the process including preparation, schema changes, the upgrade, and finally data consistency verification. It’s all about consistency and readiness, isn’t it?"

Video: http://www.moscowpython.ru/meetup/2/upgrade-postresql/

Moscow Python Meetup

December 05, 2018
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. Short introduction ➔Head of Platform @ Kiwi.com ◆ Python Programmer

    ◆ Booking Backend Team Leader ➔Platform Team @ Kiwi.com ◆ Providing building blocks ◆ Auditing ◆ Data, Infrastructure, Software ➔Happy father ➔Whiskey lover 2
  2. Kiwi.com microservices ➔We have more than 300 microservices in production

    ◆ Python ~80% of all services ◆ Javascript, Go, C++ ➔A lot of databases in production ◆ 80 PostgreSQL ◆ A lot of read replicas ◆ Huge Scylla cluster (20 nodes, 100+ TB) ➔API, API everywhere ➔SLOs, SLAs in place 4
  3. 6

  4. Why do we need to upgrade ➔It’s cool to have

    latest technology ➔A lot of great things in Postgres 10+ ➔ Improved Query Parallelism ➔ Logical Replication ➔ Declarative Table Partitioning - no ugly TRIGGER hacks needed anymore ➔ Security enhancements 7
  5. 8

  6. How we did it before ➔AWS console click ➔A lot

    of manual steps ◆ Traffic stop ◆ Stop all services ◆ Backup ◆ Upgrade ◆ Pray ➔Zero low visibility in console ➔Preparation, synchronization, announcing… 10
  7. How RDS was helpful ➔RDS has: ➔ Backups ➔ Point

    in time recovery ➔ Replicas ➔ Auto patch upgrade ➔ Multi Availability Zones 11
  8. How RDS was NOT helpful ➔Poor logs ➔Can’t do replicas

    between major versions ➔You are blind, you need to just wait ➔A lot of $$$ lost, very long outage 12
  9. How other companies doing it? ➔Managed solution ➔ Citus data

    ➔ 2ndQuadrant ➔They did not help at all :( ➔We can do better! 15
  10. How we will get there? ➔New instance ➔DNS ➔Mirroring /

    replicating the data ➔Integrity of data 17
  11. How to use bucardo? ➔Another instance ➔Run it! ➔Initial run

    (took 1 day) ➔What do you need to mirror ➔ Sequences ➔ Tables 20
  12. Bucardo preparation export SOURCE_HOST=source_db.csz2fsdsadhy2ued2z.eu-west-1.rds.amazonaws.com export SOURCE_PORT=5432 export SOURCE_DATABASE=user_database export SOURCE_USERNAME=user_database

    export SOURCE_PASSWORD=you-can-remove-me1 export DEST_HOST=bcdtest2.csz2fsdsadhy2ued2z.eu-west-1.rds.amazonaws.com export DEST_PORT=5432 export DEST_DATABASE=user_database export DEST_USERNAME=user_database export DEST_PASSWORD=you-can-remove-me2 echo $DEST_HOST:$DEST_PORT:$DEST_DATABASE:$DEST_USERNAME:$DEST_PASSWORD >> $HOME/.pgpass echo $SOURCE_HOST:$SOURCE_PORT:$SOURCE_DATABASE:$SOURCE_USERNAME:$SOURCE_ PASSWORD >> $HOME/.pgpass 21
  13. Bucardo start bucardo add all tables db=source_db herd=fherd bucardo add

    all sequences db=source_db herd=fherd bucardo add sync the_sync relgroup=fherd dbs=source_db:source,dest_db:target bucardo start bucardo status 22
  14. Under the hood ➔Creates triggers ➔Create tables ➔ You change/add

    record ➔ Bucardo adds it to temp. table ➔ Bucardo is periodically checking the table ➔ Bucardo shall forward the data to destination table as it is 24
  15. Integrity of data ➔Python script ➔ Check all sequences ➔

    Check all tables ➔ Check all data ➔ Check counts 25
  16. Integrity check ➔ Get all source tables ➔ Get all

    source sequences ◆ Check their values in both tables and compare it ➔ For each table: ◆ Check last n records and compare with destination table ◆ Check count 26
  17. Schema upgrade ALTER TABLE customers ADD COLUMN created_at TIMESTAMP WITH

    TIME ZONE; ALTER TABLE customers ADD COLUMN updated_at TIMESTAMP WITH TIME ZONE; CREATE TRIGGER customers_set_created_at BEFORE INSERT ON customers FOR EACH ROW EXECUTE PROCEDURE set_created_at(); CREATE TRIGGER customers_set_updated_at BEFORE UPDATE ON customers FOR EACH ROW EXECUTE PROCEDURE set_updated_at(); CREATE INDEX CONCURRENTLY idx_customers_created_at ON customers (created_at); CREATE INDEX CONCURRENTLY idx_customers_updated_at ON customers (updated_at); 27
  18. Integrity check ➔ Threads, threads everywhere ➔ Pooling ➔ Optimized

    queries to run as fast as possible SELECT * FROM (SELECT last_value FROM test_id_seq) test_id_seq_value, (SELECT last_value FROM test_id_seq2) test_id_seq_value2; 28