Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Upgrading PostgreSQL on AWS RDS with (almost) zero downtime

Upgrading PostgreSQL on AWS RDS with (almost) zero downtime

Stanislav Komanec (Platform Team Leader at Kiwi.com) @ Brno-Moscow Python Meetup 2
"Every database admin’s nightmare & every manager’s dream: Database upgrades without an outage. But is it really so difficult?
In my presentation, I will describe our journey and how we were able to do it. We will explore the challenges of the process including preparation, schema changes, the upgrade, and finally data consistency verification. It’s all about consistency and readiness, isn’t it?"

Video: http://www.moscowpython.ru/meetup/2/upgrade-postresql/

Moscow Python Meetup
PRO

December 05, 2018
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. Stanislav Komanec, [email protected]
    Upgrading PostgreSQL
    (Not only) on AWS RDS

    View Slide

  2. Short introduction
    ➔Head of Platform @ Kiwi.com
    ◆ Python Programmer
    ◆ Booking Backend Team Leader
    ➔Platform Team @ Kiwi.com
    ◆ Providing building blocks
    ◆ Auditing
    ◆ Data, Infrastructure, Software
    ➔Happy father
    ➔Whiskey lover
    2

    View Slide

  3. Sold bookings per day
    10 000+
    Microservices
    300+
    Locations
    10+
    Kiwi.com in numbers
    3

    View Slide

  4. Kiwi.com microservices
    ➔We have more than 300 microservices in production
    ◆ Python ~80% of all services
    ◆ Javascript, Go, C++
    ➔A lot of databases in production
    ◆ 80 PostgreSQL
    ◆ A lot of read replicas
    ◆ Huge Scylla cluster (20 nodes, 100+ TB)
    ➔API, API everywhere
    ➔SLOs, SLAs in place
    4

    View Slide

  5. 5
    Do you upgrade regularly?

    View Slide

  6. 6

    View Slide

  7. Why do we need to upgrade
    ➔It’s cool to have latest technology
    ➔A lot of great things in Postgres 10+
    ➔ Improved Query Parallelism
    ➔ Logical Replication
    ➔ Declarative Table Partitioning - no ugly TRIGGER hacks needed anymore
    ➔ Security enhancements
    7

    View Slide

  8. 8

    View Slide

  9. 9
    So … what now?

    View Slide

  10. How we did it before
    ➔AWS console click
    ➔A lot of manual steps
    ◆ Traffic stop
    ◆ Stop all services
    ◆ Backup
    ◆ Upgrade
    ◆ Pray
    ➔Zero low visibility in console
    ➔Preparation, synchronization, announcing…
    10

    View Slide

  11. How RDS was helpful
    ➔RDS has:
    ➔ Backups
    ➔ Point in time recovery
    ➔ Replicas
    ➔ Auto patch upgrade
    ➔ Multi Availability Zones
    11

    View Slide

  12. How RDS was NOT helpful
    ➔Poor logs
    ➔Can’t do replicas between major versions
    ➔You are blind, you need to just wait
    ➔A lot of $$$ lost, very long outage
    12

    View Slide

  13. 13
    There MUST be a better way

    View Slide

  14. What do we need?
    (Nearly) zero downtime

    View Slide

  15. How other companies doing it?
    ➔Managed solution
    ➔ Citus data
    ➔ 2ndQuadrant
    ➔They did not help at all :(
    ➔We can do better!
    15

    View Slide

  16. The battle plan
    16
    Source Destination
    super_db.kiwi.com
    Mirroring

    View Slide

  17. How we will get there?
    ➔New instance
    ➔DNS
    ➔Mirroring / replicating the data
    ➔Integrity of data
    17

    View Slide

  18. What is missing?
    ➔Mirroring / replicating the data - find a tool
    ➔Integrity of data
    18

    View Slide

  19. Bucardo comes
    ➔Mirroring tool
    ➔Easy to setup
    ➔Opensource
    19

    View Slide

  20. How to use bucardo?
    ➔Another instance
    ➔Run it!
    ➔Initial run (took 1 day)
    ➔What do you need to mirror
    ➔ Sequences
    ➔ Tables
    20

    View Slide

  21. Bucardo preparation
    export SOURCE_HOST=source_db.csz2fsdsadhy2ued2z.eu-west-1.rds.amazonaws.com
    export SOURCE_PORT=5432
    export SOURCE_DATABASE=user_database
    export SOURCE_USERNAME=user_database
    export SOURCE_PASSWORD=you-can-remove-me1
    export DEST_HOST=bcdtest2.csz2fsdsadhy2ued2z.eu-west-1.rds.amazonaws.com
    export DEST_PORT=5432
    export DEST_DATABASE=user_database
    export DEST_USERNAME=user_database
    export DEST_PASSWORD=you-can-remove-me2
    echo
    $DEST_HOST:$DEST_PORT:$DEST_DATABASE:$DEST_USERNAME:$DEST_PASSWORD >>
    $HOME/.pgpass
    echo
    $SOURCE_HOST:$SOURCE_PORT:$SOURCE_DATABASE:$SOURCE_USERNAME:$SOURCE_
    PASSWORD >> $HOME/.pgpass
    21

    View Slide

  22. Bucardo start
    bucardo add all tables db=source_db herd=fherd
    bucardo add all sequences db=source_db herd=fherd
    bucardo add sync the_sync relgroup=fherd
    dbs=source_db:source,dest_db:target
    bucardo start
    bucardo status
    22

    View Slide

  23. 23
    Under the hood

    View Slide

  24. Under the hood
    ➔Creates triggers
    ➔Create tables
    ➔ You change/add record
    ➔ Bucardo adds it to temp. table
    ➔ Bucardo is periodically checking the table
    ➔ Bucardo shall forward the data to destination table as it is
    24

    View Slide

  25. Integrity of data
    ➔Python script
    ➔ Check all sequences
    ➔ Check all tables
    ➔ Check all data
    ➔ Check counts
    25

    View Slide

  26. Integrity check
    ➔ Get all source tables
    ➔ Get all source sequences
    ◆ Check their values in both tables and compare it
    ➔ For each table:
    ◆ Check last n records and compare with destination table
    ◆ Check count
    26

    View Slide

  27. Schema upgrade
    ALTER TABLE customers ADD COLUMN created_at TIMESTAMP WITH TIME ZONE;
    ALTER TABLE customers ADD COLUMN updated_at TIMESTAMP WITH TIME ZONE;
    CREATE TRIGGER customers_set_created_at BEFORE INSERT ON customers FOR EACH
    ROW
    EXECUTE PROCEDURE set_created_at();
    CREATE TRIGGER customers_set_updated_at BEFORE UPDATE ON customers FOR EACH
    ROW
    EXECUTE PROCEDURE set_updated_at();
    CREATE INDEX CONCURRENTLY idx_customers_created_at ON customers (created_at);
    CREATE INDEX CONCURRENTLY idx_customers_updated_at ON customers (updated_at);
    27

    View Slide

  28. Integrity check
    ➔ Threads, threads everywhere
    ➔ Pooling
    ➔ Optimized queries to run as fast as possible
    SELECT * FROM
    (SELECT last_value FROM test_id_seq) test_id_seq_value,
    (SELECT last_value FROM test_id_seq2) test_id_seq_value2;
    28

    View Slide

  29. 29
    Future

    View Slide

  30. Takeovers
    ➔Use logical replication if you can
    ➔Use Bucardo
    ➔Better safe than sorry
    30

    View Slide

  31. ANY QUESTIONS?
    You can find me at Twitter @netman92 & [email protected]
    Thanks!

    View Slide