Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Partitioning Billions of Rows Without Downtime

Partitioning Billions of Rows Without Downtime

It's a case study on a table partitioning data migration. The goals were to:
- Promote the use of PostgreSQL native "Declarative Partitioning", when it's appropriate
- Grow adoption of the technique by new users, like app developers
- Discuss challenges and mistakes, sharing open source code

Video: https://www.youtube.com/watch?v=W8d3roay29w

Andrew Atkinson

April 20, 2023
Tweet

Resources

"Partitioned Table Conversion: Concept to Reality" with Andrew Atkinson

https://www.youtube.com/watch?v=W8d3roay29w

More Decks by Andrew Atkinson

Other Decks in Programming

Transcript

  1. Audience Polling #1: Who has used table partitioning in production

    in any capacity? #2: Converted from unpartitioned to partitioned? #3: Online conversion with pgslice or similar? bit.ly/PartPG2023 Slide 3 of 60
  2. Assumptions Audience has some familiarity with what PostgreSQL Table Partitioning

    is. Crunchy Playground Partitioning Large Table Partitioning with PostgreSQL and Django How to use table partitioning to scale PostgreSQL A slice of life: table partitioning in PostgreSQL databases Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 4 of 60
  3. Learner Outcomes PostgreSQL Table Partitioning Types, History, and Case Study

    Review Online table conversion and swap, practical details Lessons from our challenges and mistakes How partitioned tables help with cost management and data lifecycle management Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 5 of 60
  4. App Background and Context HR Tech App, SaaS, B2B, Monolithic

    ( ~100K LOC , > 90% test coverage) Running Ruby on Rails 6.1 Using public_activity Ruby gem 3K High growth activities table (Not partitioned) Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 6 of 60
  5. Event log for applicants in their hiring process Events are

    visible to hiring managers and recruiters A lot of data. Denormalized YAML. There can be around ~40 activities per applicant, and there are millions of applicants Errors in this area of the app are highly visible to users Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 7 of 60
  6. PostgreSQL 13.x ~10 production databases each with activities table (sizes

    from 50 GB to 1.5 TB ) Mix of regular RDS PostgreSQL and Aurora Sum of rows across ~10 DBs is ~3 billion rows and ~2 TB of data PK column is id , uses sequence, bigint Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 10 of 60
  7. Problems activities table was very large, modifications like adding indexes

    are painful Range queries can time out Despite 8 years of data from 2015, users interact most with recent data, and thus most data is never accessed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 11 of 60
  8. We'd like to stop paying forever increasing costs storing data

    that's not accessed We'd like to reduce overall storage costs, including from consumption, backups, and transfers We'd like to modify the table easily Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 12 of 60
  9. Considering Table Partitioning Minimum table size to benefit from partitioning?

    Postgres.fm recommends >= 100GB PG Docs Guideline "table size exceeds server memory size" ( 1.25TB > ~384 GB ) Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 13 of 60
  10. Isn't Partitioning too much work? Old method is a lot

    of work, Inheritence based, a.k.a "trigger based" partitioning Requires functions, triggers, CHECK constraints to INSERT data in correct spot Uses table inheritance, the INHERITS keyword Considered a "Legacy Method" Sources: #1 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 14 of 60
  11. How to get started? Engineering's job is to build a

    reliable and scalable SaaS platform. 1. Form hypotheses about acceptable changes for customers 2. Test and validate hypotheses Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 15 of 60
  12. Hypothesis: 2 years of data is more than enough Added

    Feature Flag to LIMIT queries, to test hypothesis Scream test that for a few weeks After weeks passed w/o concerns being raised, we moved forward with partitioning Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 16 of 60
  13. Partition Strategy Use RANGE type Partition on created_at timestamp column

    Monthly partitions Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 17 of 60
  14. Declarative Partitioning Launched in Version 10 (2017) LIST , HASH

    (11), we used RANGE Achieves slicing up data from one big table, into parent and child tables, with less to do No triggers, functions, or CHECK constraints needed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 18 of 60
  15. Declarative Partitioning Improvements 10 Introduction 11 Added DEFAULT partition, HASH

    type 11 Support for PK , FK constraints, Indexes, Triggers 11 INSERT to correct partition, UPDATE between partitions 12 FK can reference partitions Sources: #1, #2 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 19 of 60
  16. Declarative Partitioning Improvements 12 ATTACH PARTITION supports CONCURRENTLY 13 Row-level

    BEFORE triggers on partitioned tables 13 Logical replication on partitioned tables 14 DETACH PARTITION supports CONCURRENTLY 14 More operations use Partition Pruning 15 Vacuum partitions in parallel Source: Percona Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 20 of 60
  17. Partitioning Options pg_partman 1.5k (w/ pg_cron 1.5k ) pg_party 400

    No libraries or extensions, roll your own: John Nunemaker Blog, Brandur Blog pgslice 405 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 21 of 60
  18. Selected pgslice CLI program, written in Ruby, language used by

    team, open source Small and minimal library Non-Destructive, Row Copying approach Iterative rollout, ability to step forward and backward Leaves unmodified original table as retired Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 22 of 60
  19. pgslice Quick Tour prep creates <tbl>_intermediate add_partitions Adds past and

    future partitions fill (Batched copies, used --where option to limit to 2 years) analyze Perform SQL ANALYZE on all partitions swap Swap table names Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 23 of 60
  20. pgslice Implementation Added gem to Gemfile Created binstub: bundle exec

    binstubs pgslice , e.g. bin/pgslice <command-name> Added helper Ruby class ( PgsliceHelper ) to help app integration Invoked helper with rails runner Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 24 of 60
  21. Downsides of pgslice and Row Copying Intended for Append Only

    tables, i.e. no UPDATE or DELETE statements during copy By copying rows, need more storage space Online migration adds load to server, need spare capacity Requires a lot of advanced development and testing Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 25 of 60
  22. Managing Partitions In Prod and Dev We wanted developers to

    use a partitioned table to discover bugs or performance problems, but not have to manage partitions In Prod, we wanted 2 years of partitions Dev doesn't need partitions at all Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 26 of 60
  23. Selected DEFAULT Partition (11+) Keeps table Partitioned for Dev Avoids

    partition maintenance in Dev A nice middle ground Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 27 of 60
  24. Partition Management In Prod, Cron Job Creates future partitions (

    monitor this! ) In Dev, DEFAULT partition info is dumped to db/structure.sql A Cron job could DETACH partitions to cycle out old ones as well Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 28 of 60
  25. Implementation Notes Added retire / unretire DEFAULT partition Added customization

    of schema Configured where clause to fill only 2 years Added dump ( pg_dump ) and DROP support for future partition management Source: andyatkinson/rideshare/pgslice Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 31 of 60
  26. Rails and CI Integration No Rails integration No Migrations for

    <tbl>_intermediate table or DEFAULT partition No automatic partitions or Cron guidance. (We used whenever) No test or CI guidance Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 32 of 60
  27. test: ... - image: cimg/ruby:3.2.0 - image: cimg/postgres:13.6 environment: PGSLICE_URL:

    "postgres://root@localhost:5432/rideshare_test" steps: ... - run: name: Database setup command: bundle exec rails db:schema:load --trace - run: name: Partition conversion command: sh bin/partition_conversion.sh - run: name: run tests command: bin/rails test Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 33 of 60
  28. Find Bugs Using Test Suite Should have performed partition conversion

    in CI database from the beginning Code compatibility was not as "seamless" as expected with partitioned table Surprises like UPDATE of records, YAML serialized data in column Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 34 of 60
  29. Code Compatibility Problems PRIMARY KEY changes activerecord-import 405 did not

    work Rails needed to know how to find the id Serialized content in text column required careful escaping Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 36 of 60
  30. Code Compatibility Fixes Explicit primary key configuration self.primary_key = :id

    (dual support) Replaced activerecord-import with .insert_all() for multi-row insert Set unique_by: [] option for .insert_all() ON CONFLICT clause, adding UNIQUE constraint/index Added created_at to INSERT statements Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 37 of 60
  31. Fixing Test Code Tests write activities records as the base

    data. Unfortunately they hard-coded historical dates, INSERT fails due to missing partition Updated dozens of tests to not use hard-coded dates, e.g. used Time.current Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 38 of 60
  32. Ready To Go General Steps 1. Intermediate table is ready

    to fill 2. Add env vars like PGSLICE_URL 3. Run steps one by one at off-peak time, coordinating with team Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 39 of 60
  33. Run pgslice fill in a tmux session, detach, monitor INSERT

    INTO ... SELECT * FROM WHERE id > 0 AND id < 10000 Options: --batch-size , --where , --sleep Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 41 of 60
  34. pgslice analyze pgslice swap BEGIN; Rename <table> <table>_retired <table>_intermediate <table>

    COMMIT; Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 42 of 60
  35. Customer Accomodation Out of hundreds of customers, one wanted old

    data Solution was to add more partitions and fill from further back for customer Added a customer identifier column to use with pgslice and the --where "customer_id = 123" Added partial index, populated column Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 43 of 60
  36. ALTER TABLE activities ADD COLUMN account_id INTEGER; ALTER TABLE activities_intermediate

    ADD COLUMN account_id INTEGER; CREATE SCHEMA temp; CREATE TABLE temp.customer ( id SERIAL PRIMARY KEY, applicant_id BIGINT UNIQUE, -- JOIN on activities.owner_id account_id INTEGER); -- fill temp.customer INSERT INTO temp.customer (applicant_id, account_id) SELECT id AS applicant_id, account_id FROM <table> WHERE account_id = <val> AND id > 0 AND id < 10000; Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 44 of 60
  37. pgslice multiple fill passes pgslice only does INSERT statements Stops

    when it finds duplicates, so can't make multiple passes Solution was to TRUNCATE some recent partitions, then fill again from there Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 45 of 60
  38. Beware of Schema Cache Following pgslice swap , running app

    had stale Active Record Schema Cache Solution was to restart app pods, but also catch an exception and force a reset of the Schema Cache while rolling restart completed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 46 of 60
  39. Speeding Up Row Copies DROP all or most indexes from

    intermediate table. Set the table as unlogged. ALTER TABLE activities_intermediate SET UNLOGGED; When copying is done, add indexes back manually ( CREATE INDEX and match source table columns) Set table back to LOGGED Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 47 of 60
  40. Analyzing Results With only 2 years of data, the replacement

    table was much smaller 215 GB 39 GB 82% decrease 163 GB 45 GB 72% decrease 1500 GB 500 GB 67% decrease relation_size_extended.sql Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 48 of 60
  41. AWS PostgreSQL Costs and Savings RDS PostgreSQL and Aurora costs

    are different Storage and IOPS are combined for RDS General Purpose SSD $0.115 per GB-month, Backups: $0.095 per GB-month, Also: Snapshot exports, transfers S3: $0.025 per GB-month 1TB comparison: RDS: $115. S3: $25. 78% less. Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 49 of 60
  42. AWS RDS Cost Reduction Pay for what's provisioned, not for

    I/O used Storage auto-scaling (2019), you keep paying With space reduction, still pay for what's provisioned Not "easy" to pay less. Need new instance, but less space comes with less IOPS. Source: CloudZero Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 50 of 60
  43. Aurora Cost Reduction You can reduce storage charges by dropping

    tables, indexes, databases, and so on that you no longer need. 10 GB increments Set up partitioning plus regularly archive old partitions $0.10 per GB-month. 1TB for 1 month = $100 Perhaps most useful as a "cost control mechanism" Source: #1 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 51 of 60
  44. Cleaning Up Remove transitional code Remove feature flagging Document steps

    for team members Add monitoring for partition management What about the <table>_retired table? Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 53 of 60
  45. Unlocked Capabilities Retired table can now be archived to low

    cost storage like S3 Old partitions can be automatically detached and archived By regularly archiving old partitions, overall growth slows and may flatten Steps like DETACH CONCURRENTLY , pg_dump , and DROP TABLE can all be automated Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 54 of 60
  46. pg_dump \ --format=c \ # c="custom", easy pg_restore --table activities_retired

    \ --file activities_retired.dump \ $PGSLICE_URL psql -c "DROP TABLE activities_retired" $PGSLICE_URL Docs: pg_dump Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 55 of 60
  47. Archive to S3 SELECT aws_commons.create_s3_uri( '<bucket>', '<sample-filepath>', 'us-west-2' -- region

    ) AS s3_uri_1 \gset Sources: AWS Exports Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 56 of 60
  48. Summary Partitioned a 2 billion rows table using pgslice without

    planned downtime With tables smaller by up to 80%, modifications, queries and maintenance, may be faster Regularly exporting unneeded data is now possible Helped improve our data lifecycle management practices Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 57 of 60
  49. SQL Goodie Bag To Go Find the tables 10 largest

    tables Find append onlys Number of DMLs (append only) Index restore assistance List CREATE INDEX statements Partitions row counts List partition row counts Partition size estimates Estimated sizes for partitions following pattern bit.ly/PartPG2023 Slide 58 of 60
  50. Special Recognition! PG Day Chicago organizers for accepting this talk!

    Team members that supported this project Andrew, the creator of pgslice, and reviewer! Reviewers: Haki, Ryan, Elizabeth bit.ly/PartPG2023 Slide 59 of 60