Partitioning Billions of Rows Without Downtime

Partitioning Billions Of Rows Without Downtime Andrew Atkinson #PartPG2023 bit.ly/PartPG2023
Slide 1 of 60

bit.ly/PartPG2023 Slide 2 of 60

Audience Polling #1: Who has used table partitioning in production
in any capacity? #2: Converted from unpartitioned to partitioned? #3: Online conversion with pgslice or similar? bit.ly/PartPG2023 Slide 3 of 60

Assumptions Audience has some familiarity with what PostgreSQL Table Partitioning
is. Crunchy Playground Partitioning Large Table Partitioning with PostgreSQL and Django How to use table partitioning to scale PostgreSQL A slice of life: table partitioning in PostgreSQL databases Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 4 of 60

Learner Outcomes PostgreSQL Table Partitioning Types, History, and Case Study
Review Online table conversion and swap, practical details Lessons from our challenges and mistakes How partitioned tables help with cost management and data lifecycle management Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 5 of 60

App Background and Context HR Tech App, SaaS, B2B, Monolithic
( ~100K LOC , > 90% test coverage) Running Ruby on Rails 6.1 Using public_activity Ruby gem 3K High growth activities table (Not partitioned) Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 6 of 60

Event log for applicants in their hiring process Events are
visible to hiring managers and recruiters A lot of data. Denormalized YAML. There can be around ~40 activities per applicant, and there are millions of applicants Errors in this area of the app are highly visible to users Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 7 of 60

Source: public_activity Intro and App Problem Solution Implementation Next Steps

PostgreSQL 13.x ~10 production databases each with activities table (sizes
from 50 GB to 1.5 TB ) Mix of regular RDS PostgreSQL and Aurora Sum of rows across ~10 DBs is ~3 billion rows and ~2 TB of data PK column is id , uses sequence, bigint Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 10 of 60

Problems activities table was very large, modifications like adding indexes
are painful Range queries can time out Despite 8 years of data from 2015, users interact most with recent data, and thus most data is never accessed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 11 of 60

We'd like to stop paying forever increasing costs storing data
that's not accessed We'd like to reduce overall storage costs, including from consumption, backups, and transfers We'd like to modify the table easily Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 12 of 60

Considering Table Partitioning Minimum table size to benefit from partitioning?
Postgres.fm recommends >= 100GB PG Docs Guideline "table size exceeds server memory size" ( 1.25TB > ~384 GB ) Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 13 of 60

Isn't Partitioning too much work? Old method is a lot
of work, Inheritence based, a.k.a "trigger based" partitioning Requires functions, triggers, CHECK constraints to INSERT data in correct spot Uses table inheritance, the INHERITS keyword Considered a "Legacy Method" Sources: #1 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 14 of 60

How to get started? Engineering's job is to build a
reliable and scalable SaaS platform. 1. Form hypotheses about acceptable changes for customers 2. Test and validate hypotheses Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 15 of 60

Hypothesis: 2 years of data is more than enough Added
Feature Flag to LIMIT queries, to test hypothesis Scream test that for a few weeks After weeks passed w/o concerns being raised, we moved forward with partitioning Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 16 of 60

Partition Strategy Use RANGE type Partition on created_at timestamp column
Monthly partitions Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 17 of 60

Declarative Partitioning Launched in Version 10 (2017) LIST , HASH
(11), we used RANGE Achieves slicing up data from one big table, into parent and child tables, with less to do No triggers, functions, or CHECK constraints needed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 18 of 60

Declarative Partitioning Improvements 10 Introduction 11 Added DEFAULT partition, HASH
type 11 Support for PK , FK constraints, Indexes, Triggers 11 INSERT to correct partition, UPDATE between partitions 12 FK can reference partitions Sources: #1, #2 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 19 of 60

Declarative Partitioning Improvements 12 ATTACH PARTITION supports CONCURRENTLY 13 Row-level
BEFORE triggers on partitioned tables 13 Logical replication on partitioned tables 14 DETACH PARTITION supports CONCURRENTLY 14 More operations use Partition Pruning 15 Vacuum partitions in parallel Source: Percona Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 20 of 60

Partitioning Options pg_partman 1.5k (w/ pg_cron 1.5k ) pg_party 400
No libraries or extensions, roll your own: John Nunemaker Blog, Brandur Blog pgslice 405 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 21 of 60

Selected pgslice CLI program, written in Ruby, language used by
team, open source Small and minimal library Non-Destructive, Row Copying approach Iterative rollout, ability to step forward and backward Leaves unmodified original table as retired Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 22 of 60

pgslice Quick Tour prep creates <tbl>_intermediate add_partitions Adds past and
future partitions fill (Batched copies, used --where option to limit to 2 years) analyze Perform SQL ANALYZE on all partitions swap Swap table names Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 23 of 60

pgslice Implementation Added gem to Gemfile Created binstub: bundle exec
binstubs pgslice , e.g. bin/pgslice <command-name> Added helper Ruby class ( PgsliceHelper ) to help app integration Invoked helper with rails runner Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 24 of 60

Downsides of pgslice and Row Copying Intended for Append Only
tables, i.e. no UPDATE or DELETE statements during copy By copying rows, need more storage space Online migration adds load to server, need spare capacity Requires a lot of advanced development and testing Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 25 of 60

Managing Partitions In Prod and Dev We wanted developers to
use a partitioned table to discover bugs or performance problems, but not have to manage partitions In Prod, we wanted 2 years of partitions Dev doesn't need partitions at all Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 26 of 60

Selected DEFAULT Partition (11+) Keeps table Partitioned for Dev Avoids
partition maintenance in Dev A nice middle ground Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 27 of 60

Partition Management In Prod, Cron Job Creates future partitions (
monitor this! ) In Dev, DEFAULT partition info is dumped to db/structure.sql A Cron job could DETACH partitions to cycle out old ones as well Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 28 of 60

Source: GitHub Example Intro and App Problem Solution Implementation Next
Steps bit.ly/PartPG2023 Slide 29 of 60

Source: GitHub Example Intro and App Problem Solution Implementation Next
Steps bit.ly/PartPG2023 Slide 30 of 60

Implementation Notes Added retire / unretire DEFAULT partition Added customization
of schema Configured where clause to fill only 2 years Added dump ( pg_dump ) and DROP support for future partition management Source: andyatkinson/rideshare/pgslice Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 31 of 60

Rails and CI Integration No Rails integration No Migrations for
<tbl>_intermediate table or DEFAULT partition No automatic partitions or Cron guidance. (We used whenever) No test or CI guidance Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 32 of 60

test: ... - image: cimg/ruby:3.2.0 - image: cimg/postgres:13.6 environment: PGSLICE_URL:
"postgres://root@localhost:5432/rideshare_test" steps: ... - run: name: Database setup command: bundle exec rails db:schema:load --trace - run: name: Partition conversion command: sh bin/partition_conversion.sh - run: name: run tests command: bin/rails test Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 33 of 60

Find Bugs Using Test Suite Should have performed partition conversion
in CI database from the beginning Code compatibility was not as "seamless" as expected with partitioned table Surprises like UPDATE of records, YAML serialized data in column Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 34 of 60

Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide
35 of 60

Code Compatibility Problems PRIMARY KEY changes activerecord-import 405 did not
work Rails needed to know how to find the id Serialized content in text column required careful escaping Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 36 of 60

Code Compatibility Fixes Explicit primary key configuration self.primary_key = :id
(dual support) Replaced activerecord-import with .insert_all() for multi-row insert Set unique_by: [] option for .insert_all() ON CONFLICT clause, adding UNIQUE constraint/index Added created_at to INSERT statements Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 37 of 60

Fixing Test Code Tests write activities records as the base
data. Unfortunately they hard-coded historical dates, INSERT fails due to missing partition Updated dozens of tests to not use hard-coded dates, e.g. used Time.current Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 38 of 60

Ready To Go General Steps 1. Intermediate table is ready
to fill 2. Add env vars like PGSLICE_URL 3. Run steps one by one at off-peak time, coordinating with team Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 39 of 60

Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide
40 of 60

Run pgslice fill in a tmux session, detach, monitor INSERT
INTO ... SELECT * FROM WHERE id > 0 AND id < 10000 Options: --batch-size , --where , --sleep Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 41 of 60

pgslice analyze pgslice swap BEGIN; Rename <table> <table>_retired <table>_intermediate <table>
COMMIT; Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 42 of 60

Customer Accomodation Out of hundreds of customers, one wanted old
data Solution was to add more partitions and fill from further back for customer Added a customer identifier column to use with pgslice and the --where "customer_id = 123" Added partial index, populated column Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 43 of 60

ALTER TABLE activities ADD COLUMN account_id INTEGER; ALTER TABLE activities_intermediate
ADD COLUMN account_id INTEGER; CREATE SCHEMA temp; CREATE TABLE temp.customer ( id SERIAL PRIMARY KEY, applicant_id BIGINT UNIQUE, -- JOIN on activities.owner_id account_id INTEGER); -- fill temp.customer INSERT INTO temp.customer (applicant_id, account_id) SELECT id AS applicant_id, account_id FROM <table> WHERE account_id = <val> AND id > 0 AND id < 10000; Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 44 of 60

pgslice multiple fill passes pgslice only does INSERT statements Stops
when it finds duplicates, so can't make multiple passes Solution was to TRUNCATE some recent partitions, then fill again from there Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 45 of 60

Beware of Schema Cache Following pgslice swap , running app
had stale Active Record Schema Cache Solution was to restart app pods, but also catch an exception and force a reset of the Schema Cache while rolling restart completed Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 46 of 60

Speeding Up Row Copies DROP all or most indexes from
intermediate table. Set the table as unlogged. ALTER TABLE activities_intermediate SET UNLOGGED; When copying is done, add indexes back manually ( CREATE INDEX and match source table columns) Set table back to LOGGED Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 47 of 60

Analyzing Results With only 2 years of data, the replacement
table was much smaller 215 GB 39 GB 82% decrease 163 GB 45 GB 72% decrease 1500 GB 500 GB 67% decrease relation_size_extended.sql Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 48 of 60

AWS PostgreSQL Costs and Savings RDS PostgreSQL and Aurora costs
are different Storage and IOPS are combined for RDS General Purpose SSD $0.115 per GB-month, Backups: $0.095 per GB-month, Also: Snapshot exports, transfers S3: $0.025 per GB-month 1TB comparison: RDS: $115. S3: $25. 78% less. Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 49 of 60

AWS RDS Cost Reduction Pay for what's provisioned, not for
I/O used Storage auto-scaling (2019), you keep paying With space reduction, still pay for what's provisioned Not "easy" to pay less. Need new instance, but less space comes with less IOPS. Source: CloudZero Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 50 of 60

Aurora Cost Reduction You can reduce storage charges by dropping
tables, indexes, databases, and so on that you no longer need. 10 GB increments Set up partitioning plus regularly archive old partitions $0.10 per GB-month. 1TB for 1 month = $100 Perhaps most useful as a "cost control mechanism" Source: #1 Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 51 of 60

Cleaning Up Remove transitional code Remove feature flagging Document steps
for team members Add monitoring for partition management What about the <table>_retired table? Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 53 of 60

Unlocked Capabilities Retired table can now be archived to low
cost storage like S3 Old partitions can be automatically detached and archived By regularly archiving old partitions, overall growth slows and may flatten Steps like DETACH CONCURRENTLY , pg_dump , and DROP TABLE can all be automated Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 54 of 60

pg_dump \ --format=c \ # c="custom", easy pg_restore --table activities_retired
\ --file activities_retired.dump \ $PGSLICE_URL psql -c "DROP TABLE activities_retired" $PGSLICE_URL Docs: pg_dump Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 55 of 60

Archive to S3 SELECT aws_commons.create_s3_uri( '<bucket>', '<sample-filepath>', 'us-west-2' -- region
) AS s3_uri_1 \gset Sources: AWS Exports Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 56 of 60

Summary Partitioned a 2 billion rows table using pgslice without
planned downtime With tables smaller by up to 80%, modifications, queries and maintenance, may be faster Regularly exporting unneeded data is now possible Helped improve our data lifecycle management practices Intro and App Problem Solution Implementation Next Steps bit.ly/PartPG2023 Slide 57 of 60

SQL Goodie Bag To Go Find the tables 10 largest
tables Find append onlys Number of DMLs (append only) Index restore assistance List CREATE INDEX statements Partitions row counts List partition row counts Partition size estimates Estimated sizes for partitions following pattern bit.ly/PartPG2023 Slide 58 of 60

Special Recognition! PG Day Chicago organizers for accepting this talk!
Team members that supported this project Andrew, the creator of pgslice, and reviewer! Reviewers: Haki, Ryan, Elizabeth bit.ly/PartPG2023 Slide 59 of 60

Thanks! andyatkinson.com/contact #PartPG2023 bit.ly/PartPG2023 Slide 60 of 60

Partitioning Billions of Rows Without Downtime

Partitioning Billions of Rows Without Downtime

Resources

"Partitioned Table Conversion: Concept to Reality" with Andrew Atkinson

More Decks by Andrew Atkinson

Other Decks in Programming

Featured

Transcript