Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Partitioning Billions of Rows Without Downtime

Partitioning Billions of Rows Without Downtime

- Promote PostgreSQL Declarative Table Partitioning
- Grow knowledge and usage, especially for app developers
- Discuss challenges, mistakes, and share open source code samples from a recent conversion

Andrew Atkinson
PRO

April 20, 2023
Tweet

More Decks by Andrew Atkinson

Other Decks in Programming

Transcript

  1. Partitioning Billions
    Of Rows Without
    Downtime
    Andrew Atkinson
    #PartPG2023
    bit.ly/PartPG2023 Slide 1 of 60

    View Slide

  2. bit.ly/PartPG2023 Slide 2 of 60

    View Slide

  3. Audience Polling
    #1: Who has used table partitioning in production in any
    capacity?
    #2: Converted from unpartitioned to partitioned?
    #3: Online conversion with pgslice
    or similar?
    bit.ly/PartPG2023 Slide 3 of 60

    View Slide

  4. Assumptions
    Audience has some familiarity with what PostgreSQL Table
    Partitioning is.
    Crunchy Playground Partitioning
    Large Table Partitioning with PostgreSQL and Django
    How to use table partitioning to scale PostgreSQL
    A slice of life: table partitioning in PostgreSQL databases
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 4 of 60

    View Slide

  5. Learner Outcomes
    PostgreSQL Table Partitioning Types, History, and Case Study
    Review
    Online table conversion and swap, practical details
    Lessons from our challenges and mistakes
    How partitioned tables help with cost management and data
    lifecycle management
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 5 of 60

    View Slide

  6. App Background and Context
    HR Tech App, SaaS, B2B, Monolithic ( ~100K LOC
    , > 90%
    test coverage)
    Running Ruby on Rails 6.1
    Using public_activity Ruby gem 3K
    High growth activities
    table (Not partitioned)
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 6 of 60

    View Slide

  7. Event log for applicants in their hiring process
    Events are visible to hiring managers and recruiters
    A lot of data. Denormalized YAML.
    There can be around ~40 activities
    per applicant, and
    there are millions of applicants
    Errors in this area of the app are highly visible to users
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 7 of 60

    View Slide

  8. bit.ly/PartPG2023 Slide 8 of 60

    View Slide

  9. Source: public_activity
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 9 of 60

    View Slide

  10. PostgreSQL 13.x
    ~10 production databases each with activities
    table
    (sizes from 50 GB
    to 1.5 TB
    )
    Mix of regular RDS PostgreSQL and Aurora
    Sum of rows across ~10
    DBs is ~3 billion rows
    and ~2
    TB
    of data
    PK
    column is id
    , uses sequence, bigint
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 10 of 60

    View Slide

  11. Problems
    activities
    table was very large, modifications like adding
    indexes are painful
    Range queries can time out
    Despite 8 years of data from 2015, users interact most with
    recent data, and thus most data is never accessed
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 11 of 60

    View Slide

  12. We'd like to stop paying forever increasing costs
    storing data that's not accessed
    We'd like to reduce overall storage costs, including
    from consumption, backups, and transfers
    We'd like to modify the table easily
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 12 of 60

    View Slide

  13. Considering Table Partitioning
    Minimum table size to benefit from partitioning?
    Postgres.fm recommends >= 100GB
    PG Docs Guideline "table size exceeds server
    memory size" ( 1.25TB
    > ~384 GB
    )
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 13 of 60

    View Slide

  14. Isn't Partitioning too much work?
    Old method is a lot of work, Inheritence based, a.k.a "trigger
    based" partitioning
    Requires functions, triggers, CHECK
    constraints to INSERT
    data in correct spot
    Uses table inheritance, the INHERITS
    keyword
    Considered a "Legacy Method"
    Sources: #1
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 14 of 60

    View Slide

  15. How to get started?
    Engineering's job is to build a reliable and scalable SaaS platform.
    1. Form hypotheses about acceptable changes for customers
    2. Test and validate hypotheses
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 15 of 60

    View Slide

  16. Hypothesis: 2 years of data is more than enough
    Added Feature Flag to LIMIT
    queries, to test
    hypothesis
    Scream test that for a few weeks
    After weeks passed w/o concerns being raised,
    we moved forward with partitioning
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 16 of 60

    View Slide

  17. Partition Strategy
    Use RANGE
    type
    Partition on created_at
    timestamp column
    Monthly partitions
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 17 of 60

    View Slide

  18. Declarative Partitioning
    Launched in Version 10 (2017)
    LIST
    , HASH
    (11), we used RANGE
    Achieves slicing up data from one big table, into parent and
    child tables, with less to do
    No triggers, functions, or CHECK
    constraints needed
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 18 of 60

    View Slide

  19. Declarative Partitioning Improvements
    10 Introduction
    11 Added DEFAULT
    partition, HASH
    type
    11 Support for PK
    , FK
    constraints, Indexes, Triggers
    11 INSERT
    to correct partition, UPDATE
    between partitions
    12 FK
    can reference partitions
    Sources: #1, #2
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 19 of 60

    View Slide

  20. Declarative Partitioning Improvements
    12 ATTACH PARTITION
    supports CONCURRENTLY
    13 Row-level BEFORE
    triggers on partitioned tables
    13 Logical replication on partitioned tables
    14 DETACH PARTITION
    supports CONCURRENTLY
    14 More operations use Partition Pruning
    15 Vacuum partitions in parallel Source: Percona
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 20 of 60

    View Slide

  21. Partitioning Options
    pg_partman 1.5k (w/ pg_cron
    1.5k )
    pg_party 400
    No libraries or extensions, roll your own: John Nunemaker
    Blog, Brandur Blog
    pgslice
    405
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 21 of 60

    View Slide

  22. Selected pgslice
    CLI program, written in Ruby, language used by team, open
    source
    Small and minimal library
    Non-Destructive, Row Copying approach
    Iterative rollout, ability to step forward and backward
    Leaves unmodified original table as retired
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 22 of 60

    View Slide

  23. pgslice
    Quick Tour
    prep
    creates _intermediate
    add_partitions
    Adds past and future partitions
    fill
    (Batched copies, used --where
    option to
    limit to 2 years)
    analyze
    Perform SQL ANALYZE
    on all partitions
    swap
    Swap table names
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 23 of 60

    View Slide

  24. pgslice
    Implementation
    Added gem to Gemfile
    Created binstub: bundle exec binstubs pgslice
    , e.g.
    bin/pgslice
    Added helper Ruby class ( PgsliceHelper
    ) to help app
    integration
    Invoked helper with rails runner
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 24 of 60

    View Slide

  25. Downsides of pgslice
    and Row Copying
    Intended for Append Only tables, i.e. no UPDATE
    or DELETE
    statements during copy
    By copying rows, need more storage space
    Online migration adds load to server, need spare capacity
    Requires a lot of advanced development and testing
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 25 of 60

    View Slide

  26. Managing Partitions In Prod and Dev
    We wanted developers to use a partitioned table to discover
    bugs or performance problems, but not have to manage
    partitions
    In Prod, we wanted 2 years of partitions
    Dev doesn't need partitions at all
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 26 of 60

    View Slide

  27. Selected DEFAULT
    Partition (11+)
    Keeps table Partitioned for Dev
    Avoids partition maintenance in Dev
    A nice middle ground
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 27 of 60

    View Slide

  28. Partition Management
    In Prod, Cron Job Creates future partitions ( monitor
    this!
    )
    In Dev, DEFAULT
    partition info is dumped to
    db/structure.sql
    A Cron job could DETACH
    partitions to cycle out old ones as
    well
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 28 of 60

    View Slide

  29. Source: GitHub Example
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 29 of 60

    View Slide

  30. Source: GitHub Example
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 30 of 60

    View Slide

  31. Implementation Notes
    Added retire
    / unretire DEFAULT
    partition
    Added customization of schema
    Configured where
    clause to fill
    only 2 years
    Added dump
    ( pg_dump
    ) and DROP
    support for future
    partition management
    Source: andyatkinson/rideshare/pgslice
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 31 of 60

    View Slide

  32. Rails and CI Integration
    No Rails integration
    No Migrations for _intermediate
    table or DEFAULT
    partition
    No automatic partitions or Cron guidance. (We used
    whenever)
    No test or CI guidance
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 32 of 60

    View Slide

  33. test:
    ...
    - image: cimg/ruby:3.2.0
    - image: cimg/postgres:13.6
    environment:
    PGSLICE_URL: "postgres://root@localhost:5432/rideshare_test"
    steps:
    ...
    - run:
    name: Database setup
    command: bundle exec rails db:schema:load --trace
    - run:
    name: Partition conversion
    command: sh bin/partition_conversion.sh
    - run:
    name: run tests
    command: bin/rails test
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 33 of 60

    View Slide

  34. Find Bugs Using Test Suite
    Should have performed partition conversion in CI database
    from the beginning
    Code compatibility was not as "seamless" as expected with
    partitioned table
    Surprises like UPDATE
    of records, YAML serialized data in
    column
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 34 of 60

    View Slide

  35. Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 35 of 60

    View Slide

  36. Code Compatibility Problems
    PRIMARY KEY
    changes
    activerecord-import
    405 did not work
    Rails needed to know how to find the id
    Serialized content in text
    column required careful escaping
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 36 of 60

    View Slide

  37. Code Compatibility Fixes
    Explicit primary key configuration self.primary_key = :id
    (dual
    support)
    Replaced activerecord-import
    with .insert_all()
    for multi-row
    insert
    Set unique_by: []
    option for .insert_all() ON CONFLICT
    clause, adding UNIQUE
    constraint/index
    Added created_at
    to INSERT
    statements
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 37 of 60

    View Slide

  38. Fixing Test Code
    Tests write activities
    records as the base data.
    Unfortunately they hard-coded historical dates, INSERT
    fails
    due to missing partition
    Updated dozens of tests to not use hard-coded dates, e.g.
    used Time.current
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 38 of 60

    View Slide

  39. Ready To Go
    General Steps
    1. Intermediate table is ready to fill
    2. Add env vars like PGSLICE_URL
    3. Run steps one by one at off-peak time, coordinating with
    team
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 39 of 60

    View Slide

  40. Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 40 of 60

    View Slide

  41. Run pgslice fill
    in a tmux
    session, detach,
    monitor
    INSERT INTO ... SELECT * FROM
    WHERE id > 0
    AND id < 10000
    Options: --batch-size
    , --where
    , --sleep
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 41 of 60

    View Slide

  42. pgslice analyze
    pgslice swap
    BEGIN;
    Rename _retired
    _intermediate
    COMMIT;
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 42 of 60

    View Slide

  43. Customer Accomodation
    Out of hundreds of customers, one wanted old data
    Solution was to add more partitions and fill from further back for customer
    Added a customer identifier column to use with pgslice
    and the --where
    "customer_id = 123"
    Added partial index, populated column
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 43 of 60

    View Slide

  44. ALTER TABLE activities ADD COLUMN account_id INTEGER;
    ALTER TABLE activities_intermediate ADD COLUMN account_id INTEGER;
    CREATE SCHEMA temp;
    CREATE TABLE temp.customer (
    id SERIAL PRIMARY KEY,
    applicant_id BIGINT UNIQUE, -- JOIN on activities.owner_id
    account_id INTEGER);
    -- fill temp.customer
    INSERT INTO temp.customer (applicant_id, account_id)
    SELECT id AS applicant_id, account_id
    FROM WHERE account_id =
    AND id > 0 AND id < 10000;
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 44 of 60

    View Slide

  45. pgslice
    multiple fill passes
    pgslice
    only does INSERT
    statements
    Stops when it finds duplicates, so can't make multiple passes
    Solution was to TRUNCATE
    some recent partitions, then fill
    again from there
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 45 of 60

    View Slide

  46. Beware of Schema Cache
    Following pgslice swap
    , running app had stale Active
    Record Schema Cache
    Solution was to restart app pods, but also catch an exception
    and force a reset of the Schema Cache while rolling restart
    completed
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 46 of 60

    View Slide

  47. Speeding Up Row Copies
    DROP
    all or most indexes from intermediate table.
    Set the table as unlogged. ALTER TABLE
    activities_intermediate SET UNLOGGED;
    When copying is done, add indexes back manually ( CREATE
    INDEX
    and match source table columns)
    Set table back to LOGGED
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 47 of 60

    View Slide

  48. Analyzing Results
    With only 2 years of data, the replacement table was much smaller
    215 GB 39 GB
    82% decrease
    163 GB 45 GB
    72% decrease
    1500 GB 500 GB
    67% decrease
    relation_size_extended.sql
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 48 of 60

    View Slide

  49. AWS PostgreSQL Costs and Savings
    RDS PostgreSQL and Aurora costs are different
    Storage and IOPS are combined for RDS
    General Purpose SSD $0.115 per GB-month, Backups:
    $0.095 per GB-month, Also: Snapshot exports, transfers
    S3: $0.025 per GB-month
    1TB comparison: RDS: $115. S3: $25. 78% less.
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 49 of 60

    View Slide

  50. AWS RDS Cost Reduction
    Pay for what's provisioned, not for I/O used
    Storage auto-scaling (2019), you keep paying
    With space reduction, still pay for what's provisioned
    Not "easy" to pay less. Need new instance, but less space
    comes with less IOPS.
    Source: CloudZero
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 50 of 60

    View Slide

  51. Aurora Cost Reduction
    You can reduce storage charges by dropping tables, indexes, databases, and
    so on that you no longer need.
    10 GB increments
    Set up partitioning plus regularly archive old partitions
    $0.10 per GB-month. 1TB for 1 month = $100
    Perhaps most useful as a "cost control mechanism"
    Source: #1
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 51 of 60

    View Slide

  52. bit.ly/PartPG2023 Slide 52 of 60

    View Slide

  53. Cleaning Up
    Remove transitional code
    Remove feature flagging
    Document steps for team members
    Add monitoring for partition management
    What about the _retired
    table?
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 53 of 60

    View Slide

  54. Unlocked Capabilities
    Retired table can now be archived to low cost storage like S3
    Old partitions can be automatically detached and archived
    By regularly archiving old partitions, overall growth slows and
    may flatten
    Steps like DETACH CONCURRENTLY
    , pg_dump
    , and DROP
    TABLE
    can all be automated
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 54 of 60

    View Slide

  55. pg_dump \
    --format=c \ # c="custom", easy pg_restore
    --table activities_retired \
    --file activities_retired.dump \
    $PGSLICE_URL
    psql -c "DROP TABLE activities_retired" $PGSLICE_URL
    Docs: pg_dump
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023
    Slide 55 of 60

    View Slide

  56. Archive to S3
    SELECT aws_commons.create_s3_uri(
    '',
    '',
    'us-west-2' -- region
    ) AS s3_uri_1 \gset
    Sources: AWS Exports
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 56 of 60

    View Slide

  57. Summary
    Partitioned a 2 billion rows table using pgslice
    without
    planned downtime
    With tables smaller by up to 80%, modifications, queries and
    maintenance, may be faster
    Regularly exporting unneeded data is now possible
    Helped improve our data lifecycle management practices
    Intro and App Problem Solution Implementation Next Steps
    bit.ly/PartPG2023 Slide 57 of 60

    View Slide

  58. SQL Goodie Bag To Go
    Find the tables 10 largest tables
    Find append onlys Number of DMLs (append only)
    Index restore assistance List CREATE INDEX
    statements
    Partitions row counts List partition row counts
    Partition size estimates Estimated sizes for partitions following pattern
    bit.ly/PartPG2023 Slide 58 of 60

    View Slide

  59. Special Recognition!
    PG Day Chicago organizers for accepting this talk!
    Team members that supported this project
    Andrew, the creator of pgslice, and reviewer!
    Reviewers: Haki, Ryan, Elizabeth
    bit.ly/PartPG2023 Slide 59 of 60

    View Slide

  60. Thanks!
    andyatkinson.com/contact
    #PartPG2023
    bit.ly/PartPG2023
    Slide 60 of 60

    View Slide