$30 off During Our Annual Pro Sale. View Details »

How We Made PostgreSQL Fitter, Happier, More Productive

How We Made PostgreSQL Fitter, Happier, More Productive

This talk will walk through a series of improvements made to our PostgreSQL 10 databases to improve performance, increase operational efficiency, and tune parameters. Some of the techniques were to add monitoring, remove and replace indexes, remove bloat, and split tables into a new database without application downtime.

Andrew Atkinson
PRO

November 24, 2021
Tweet

More Decks by Andrew Atkinson

Other Decks in Technology

Transcript

  1. How We Made PG Fitter,
    Happier, More Productive
    Andrew Atkinson
    bit.ly/pg-fitter-
    happier
    PostgreSQL Fitter, Happier 2021 Slide 1 of 72

    View Slide

  2. Thanks Radiohead
    Fitter, Happier - OK Computer (1997)
    PostgreSQL Fitter, Happier 2021 Slide 2 of 72

    View Slide

  3. About Me
    Staff Software Engineer at Fountain
    Contact
    andyatkinson.com
    @andatki
    PostgreSQL Fitter, Happier 2021 Slide 3 of 72

    View Slide

  4. Fountain
    Fountain is a High Volume Hiring
    Platform
    NOTE
    This talk is not about
    Fountain
    Hiring DBAs, Engineers right now!
    http://get.fountain.com/careers
    PostgreSQL Fitter, Happier 2021 Slide 4 of 72

    View Slide

  5. Context and Inspiration
    The platform experienced a 10x usage increase during the Covid-19 pandemic
    We had no dedicated DBA
    There was an opportunity to rapidly learn to better utilize PG
    PostgreSQL Fitter, Happier 2021 Slide 5 of 72

    View Slide

  6. Case Study: Scaling and Optimizing PostgreSQL
    Share our challenges, findings and some successful results
    Connect our challenges with common scaling and optimization issues
    Backend powered by Rails and PostgreSQL 10 on AWS RDS
    Single writer primary with async replication
    PostgreSQL Fitter, Happier 2021 Slide 6 of 72

    View Slide

  7. PostgreSQL Fitter, Happier 2021 Slide 7 of 72

    View Slide

  8. PostgreSQL Fitter, Happier 2021 Slide 8 of 72

    View Slide

  9. Database Connections Problems
    As web app and DB activity load increased, performance degraded
    Web app servers and background process workers couldn't get connections
    ...Is the server running on host example and accepting TCP/IP
    connections?...
    Could not add more servers to handle load due to lack of database connections
    PostgreSQL Fitter, Happier 2021 Slide 9 of 72

    View Slide

  10. Connections Learnings
    Connections are costly!
    PostgreSQL forks a new process for each connection
    Connections are a limited resource!
    Connections may be active
    , idle
    , idle in transaction
    .
    Too many idle connections can be a problem
    Connections in transaction but idle that are old may be a problem
    PostgreSQL Fitter, Happier 2021 Slide 10 of 72

    View Slide

  11. PostgreSQL Fitter, Happier 2021 Slide 11 of 72

    View Slide

  12. Connection Poolers
    pgbouncer
    Can be monitored by pgdash
    Pgpool-II
    RDS Proxy
    PostgreSQL Fitter, Happier 2021 Slide 12 of 72

    View Slide

  13. PostgreSQL Fitter, Happier 2021 Slide 13 of 72

    View Slide

  14. Connection Pooler Benefits
    Connection pooling enables many more client transactions with a fixed number of
    server connections
    Safely add more app servers and background workers that require connections
    Improve performance
    More efficiently utilize connections with transactions
    Eliminate overhead of connection establishment
    PostgreSQL Fitter, Happier 2021 Slide 14 of 72

    View Slide

  15. Added RDS Proxy
    RDS Proxy meant our small team didn't have to manage the infrastructure
    We no longer ran out of connections or saw degraded performance
    RDS Proxy was a big win for system stability!
    PostgreSQL Fitter, Happier 2021 Slide 15 of 72

    View Slide

  16. PostgreSQL Fitter, Happier 2021 Slide 16 of 72

    View Slide

  17. Excessive Disk Space Observations
    Tables and indexes had a very high bloat percentage
    No table or index bloat reductions had been performed
    Non-optimized indexes were taking up more space than necessary
    PostgreSQL Fitter, Happier 2021 Slide 17 of 72

    View Slide

  18. PostgreSQL Fitter, Happier 2021 Slide 18 of 72

    View Slide

  19. Space Reduction for Indexes
    Bloated indexes can be rebuilt without bloat
    Indexes can be reduced in size by limiting the fields and rows
    Unnecessary indexes can be removed entirely
    PostgreSQL Fitter, Happier 2021 Slide 19 of 72

    View Slide

  20. PostgreSQL Fitter, Happier 2021 Slide 20 of 72

    View Slide

  21. Non-Optimized Indexes
    We had over 100 unused indexes among 75 tables
    Some bloated indexes were over 100 GB in size
    Estimated bloat percentage >= 80%
    PostgreSQL Fitter, Happier 2021 Slide 21 of 72

    View Slide

  22. Rebuild Indexes
    Used pg_repack with PG 10 to rebuild indexes one by one
    On PG >= 12 REINDEX CONCURRENTLY
    is available
    Rebuilt over 30 indexes reclaiming 230 GB of space
    PostgreSQL Fitter, Happier 2021 Slide 22 of 72

    View Slide

  23. PostgreSQL Fitter, Happier 2021 Slide 23 of 72

    View Slide

  24. Removed Unused Indexes
    Find indexes with zero scans
    Confirm no usages across the app, cron jobs, background jobs etc.
    In Rails create migration to drop the index ( DROP INDEX index_name
    )
    Removed over 100 indexes and 300 GB of space!
    PostgreSQL Fitter, Happier 2021 Slide 24 of 72

    View Slide

  25. Relocated Database Tables
    We had 6 tables of metrics style data
    This data was typically not presented in the app
    High writes, insert only
    Around 700GB of data
    PostgreSQL Fitter, Happier 2021 Slide 25 of 72

    View Slide

  26. No Downtime Relocation Process
    1. Change queries that use fields from tables being moved
    Typically involved splitting into more queries
    2. Create new replica
    3. Test and release queries against replica
    4. Send writes to old and new. Writes will fail on read until promoted
    5. Promote replica
    6. Stop writing to old location and remove redundant obsolete tables.
    PostgreSQL Fitter, Happier 2021 Slide 26 of 72

    View Slide

  27. Space Reduction
    Technique Size
    Initial DB Size 2.5 TB
    Relocated tables 700 GB
    Removed unused
    indexes 300 GB
    Removed index bloat 200 GB
    New DB Size 1.2 TB
    PostgreSQL Fitter, Happier 2021 Slide 27 of 72

    View Slide

  28. PostgreSQL Fitter, Happier 2021 Slide 28 of 72

    View Slide

  29. Performance Challenges
    10x increase in RPM
    Autovacuum not running at all
    Autovacuum running for a very long time
    Slow queries
    Non-optimized indexes
    PostgreSQL Fitter, Happier 2021 Slide 29 of 72

    View Slide

  30. Scaled Up Database Server
    (Vertically) Scaled Up Database Server
    96 vCPUs
    768 GB memory
    Provisioned IOPS
    PostgreSQL Fitter, Happier 2021 Slide 30 of 72

    View Slide

  31. Why Tune Autovacuum?
    Remove dead rows for future inserts and updates quicker
    Reduce excessive disk space growth by avoiding inefficient disk use
    Ensure data statistics are refreshed as quickly as possible
    Reduce autovacuum run times
    PostgreSQL Fitter, Happier 2021 Slide 31 of 72

    View Slide

  32. PostgreSQL Fitter, Happier 2021 Slide 32 of 72

    View Slide

  33. Autovacuum Parameters
    Run vacuum in proportion to UPDATE and DELETE workload
    Param (PG 10) Original Tuned
    autovacuum_vacuum_scale_factor 20% 1%
    (tuned for big tables)
    autovacuum_vacuum_cost_limit 200 2000
    autovacuum_vacuum_cost_delay 20ms 2ms
    maintenance_work_mem
    Sized by RDS
    autovacuum_max_workers
    PostgreSQL Fitter, Happier 2021 Slide 33 of 72

    View Slide

  34. Query Performance
    Query load can be categorized as write or read
    Slow queries contribute to system load
    Super slow reads on replica can affect primary
    Bloat can affect query performance
    PostgreSQL Fitter, Happier 2021 Slide 34 of 72

    View Slide

  35. PostgreSQL Fitter, Happier 2021 Slide 35 of 72

    View Slide

  36. Relocated Read Queries to Replica
    High proportion of reads (SELECT) to writes (INSERT, UPDATE, DELETE)
    Relocated queries to replica to reduce load on primary
    Additional replicas can be added to distribute read load
    PostgreSQL Fitter, Happier 2021 Slide 36 of 72

    View Slide

  37. Query Relocation Process
    Identify slow queries in APM or PgHero
    Determine if they are read only
    Wrap query in a code block that connects to the replica
    Fall back to the primary if not found
    Fall back to the primary if replication lag is too high
    PostgreSQL Fitter, Happier 2021 Slide 37 of 72

    View Slide

  38. Bloat and Query Performance
    Very high bloat can contribute to poor or inaccurate query plans
    Bloat causes unnecessary scans through pages
    Bloat can slow down index only scans
    PostgreSQL Fitter, Happier 2021 Slide 38 of 72

    View Slide

  39. Fast Database Migrations
    Use strong_migrations and code review to detect slow migrations
    Avoid common problems with large tables, like adding a column with a default
    PostgreSQL Fitter, Happier 2021 Slide 39 of 72

    View Slide

  40. PostgreSQL Fitter, Happier 2021 Slide 40 of 72

    View Slide

  41. Write Rate Performance and Indexes
    Indexes add some maintenance overhead for writes
    Remove unused, duplicate or invalid indexes
    Use PgHero or this SQL query to find unused indexes
    Eliminate redundant indexes e.g. querying a
    when indexes exist for a & b
    and a
    H.O.T. updates require non-indexed columns
    PostgreSQL Fitter, Happier 2021 Slide 41 of 72

    View Slide

  42. Query Statistics
    Enabled pg_stat_statements
    to collect query statistics
    Viewed total_time
    and calls
    statistics with PgHero
    Slow queries > 20ms flagged
    https://wiki.postgresql.org/wiki/Monitoring
    PostgreSQL Fitter, Happier 2021 Slide 42 of 72

    View Slide

  43. Query Optimization Process Overview
    Use EXPLAIN
    to look at query execution plan estimates
    Use EXPLAIN ANALYZE
    and look at where significant time is spent
    Look for techniques like adding an index to improve query performance
    Focus on highest impact queries
    Test with similar hardware and with similar data (very important)
    PostgreSQL Fitter, Happier 2021 Slide 43 of 72

    View Slide

  44. PostgreSQL Fitter, Happier 2021 Slide 44 of 72

    View Slide

  45. --

    -- Query: Drivers sorted by average rating and trip count

    --

    rideshare_development> explain SELECT

    CONCAT(d.first_name, ' ', d.last_name) AS driver_name,

    AVG(t.rating) AS avg_rating,

    COUNT(t.rating) AS trip_count

    FROM trips t
    JOIN users d ON t.driver_id = d.id

    GROUP BY t.driver_id, d.first_name, d.last_name

    ORDER BY COUNT(t.rating) DESC;

    PostgreSQL Fitter, Happier 2021 Slide 45 of 72

    View Slide

  46. +---------------------------------------------------------------------------------+

    | QUERY PLAN |

    |---------------------------------------------------------------------------------|

    | Sort (cost=231.61..236.11 rows=1800 width=90) |

    | Sort Key: (count(t.rating)) DESC |

    | -> HashAggregate (cost=107.28..134.28 rows=1800 width=90) |

    | Group Key: t.driver_id, d.first_name, d.last_name |

    | -> Hash Join (cost=3.35..69.78 rows=3000 width=22) |

    | Hash Cond: (t.driver_id = d.id) |

    | -> Seq Scan on trips t (cost=0.00..58.00 rows=3000 width=8) |

    | -> Hash (cost=2.60..2.60 rows=60 width=22) |

    | -> Seq Scan on users d (cost=0.00..2.60 rows=60 width=22) |

    +---------------------------------------------------------------------------------+

    EXPLAIN

    Time: 0.027s

    PostgreSQL Fitter, Happier 2021 Slide 46 of 72

    View Slide

  47. Query Optimization Tips
    Look for Index Only Scan opportunities
    Query fewer fields (avoid select *
    )
    Eliminate Heap Fetch
    by limiting fields
    Consider a Covering index by using INCLUDE (b)
    to include b
    payload column
    (11+)
    Avoid ORDER BY
    and DISTINCT
    if possible, or take advantage of sorted property of
    indexes
    PostgreSQL Fitter, Happier 2021 Slide 47 of 72

    View Slide

  48. Indexes Not Used
    A sequential scan may be more efficient than an index scan
    Query execution planner will look at cardinality (unique values) and estimate row
    counts and selectivity
    In a test database, sequential scans can be disabled (NOT FOR PROD) to confirm
    indexes are used
    rideshare_development> SET enable_seqscan = OFF;

    SET

    PostgreSQL Fitter, Happier 2021 Slide 48 of 72

    View Slide

  49. +---------------------------------------------------------------------------------------------------------------+

    | QUERY PLAN |

    |---------------------------------------------------------------------------------------------------------------|

    | Sort (cost=300.56..305.06 rows=1800 width=90) |

    | Sort Key: (count(t.rating)) DESC |

    | -> HashAggregate (cost=176.23..203.23 rows=1800 width=90) |

    | Group Key: t.driver_id, d.first_name, d.last_name |

    | -> Hash Join (cost=15.07..138.73 rows=3000 width=22) |

    | Hash Cond: (t.driver_id = d.id) |

    | -> Index Scan using index_trips_on_driver_id on trips t (cost=0.28..115.51 rows=3000 width=8) |

    | -> Hash (cost=14.04..14.04 rows=60 width=22) |

    | -> Index Scan using users_pkey on users d (cost=0.14..14.04 rows=60 width=22) |

    +---------------------------------------------------------------------------------------------------------------+

    EXPLAIN

    Time: 0.022s

    PostgreSQL Fitter, Happier 2021 Slide 49 of 72

    View Slide

  50. Partial Indexes
    Partial indexes avoid indexing common values
    Write rate benefit as new rows likely will not match predicate
    PostgreSQL Fitter, Happier 2021 Slide 50 of 72

    View Slide

  51. Partial Index Example 1 of 2
    Query trips with no rating
    3000 trips, 750 without a rating
    -> Bitmap Index Scan on index_trips_on_rating (cost=0.00..13.90 rows=750 width=0)

    Index Cond: (rating IS NULL)

    PostgreSQL Fitter, Happier 2021 Slide 51 of 72

    View Slide

  52. Partial Index Example 2 of 2
    create index index_trips_on_rating_partial ON trips (rating) where rating IS NULL;

    Bitmap Index Scan on index_trips_on_rating_partial (cost=0.00..11.90 rows=750 width=0)

    Index Cond: (rating IS NULL)
    Index size Partial index size
    40 kB 16 kB
    60% decrease
    PostgreSQL Fitter, Happier 2021 Slide 52 of 72

    View Slide

  53. Partial Indexes Results
    Created replacement partial indexes for several existing indexes
    Two examples of biggest size reductions were 82% and 97% smaller
    On large tables this is significant
    PostgreSQL Fitter, Happier 2021 Slide 53 of 72

    View Slide

  54. Reduced Columns in Multi-Column Indexes
    Remove columns that are not needed
    Replace 3 columns with 2, or 2 with 1
    PostgreSQL can combine multiple single column indexes!
    PostgreSQL Fitter, Happier 2021 Slide 54 of 72

    View Slide

  55. Slow Queries Results
    Fixed more than 10 slow queries identified by PgHero
    PostgreSQL Fitter, Happier 2021 Slide 55 of 72

    View Slide

  56. PostgreSQL Fitter, Happier 2021 Slide 56 of 72

    View Slide

  57. PostgreSQL Fitter, Happier 2021 Slide 57 of 72

    View Slide

  58. Performance Analysis Tools
    https://wiki.postgresql.org/wiki/Performance_Analysis_Tools
    Percona pg_stat_monitor
    - PostgreSQL Statistics Collector (extension for 11, 12,
    13)
    pganalyze Index Advisor - Index recommendations!
    PostgreSQL Fitter, Happier 2021 Slide 58 of 72

    View Slide

  59. PostgreSQL Fitter, Happier 2021 Slide 59 of 72

    View Slide

  60. Query Errors
    Queries sometimes never finish
    Checking out a connection from the connection pool may never finish
    Acquiring a lock may never finish
    A transaction may never commit or rollback
    PostgreSQL Fitter, Happier 2021 Slide 60 of 72

    View Slide

  61. PostgreSQL Fitter, Happier 2021 Slide 61 of 72

    View Slide

  62. Statement Timeout
    Sets a maximum time a query can run before being canceled
    Parameter Example value
    statement_timeout
    4-5 seconds
    PostgreSQL Fitter, Happier 2021 Slide 62 of 72

    View Slide

  63. Checkout Timeout
    Sets a maximum time to wait for a connection to be available
    Parameter Example value
    checkout_timeout
    4-5 seconds
    PostgreSQL Fitter, Happier 2021 Slide 63 of 72

    View Slide

  64. Lock Timeout
    Sets a maximum time to wait to acquire a lock
    SHOW lock_timeout
    SET LOCAL lock_timeout = '5s'
    PostgreSQL Fitter, Happier 2021 Slide 64 of 72

    View Slide

  65. Idle Transactions
    Active Record manages its own idle transactions in connection pool
    Sets a maximum time a connection can be idle before being disconnected
    Idle transactions can prevent vacuum from running
    Parameter Default
    idle_timeout
    300s (Rails default)
    idle_in_transaction_session_timeout
    24 hours?
    PostgreSQL Fitter, Happier 2021 Slide 65 of 72

    View Slide

  66. Replication Errors
    Replication lag that gets too high can stop replication entirely
    Rebuilding indexes is IO intensive which will cause a replication lag spike
    Replication can conflict with vacuum on primary
    PostgreSQL Fitter, Happier 2021 Slide 66 of 72

    View Slide

  67. PostgreSQL Fitter, Happier 2021 Slide 67 of 72

    View Slide

  68. Special Recognition!
    Past team for opportunities to learn
    Special thanks to David
    Practice runs: Jamie, current team, Lukas,
    Andrew, and Craig
    PostgreSQL community: Documentation,
    IRC, Slack
    PostgreSQL Fitter, Happier 2021 Slide 68 of 72

    View Slide

  69. Connections and Space Summary
    Added a connection pooler to increase performance and enable additional scale
    Re-created 30 indexes to eliminate bloat
    Removed more than 100 unused indexes
    Distributed write load by relocating metrics tables to their own database
    Tuned Autovacuum to run more frequently and for longer
    PostgreSQL Fitter, Happier 2021 Slide 69 of 72

    View Slide

  70. Performance and Errors Summary
    Added a dashboard to show slow queries and fixed the top 10
    Created awareness of index maintenance cost especially for high write tables
    Moved more than 10 queries to a read replica
    Added timeouts to increase resiliency from errors related to queries, database
    connections, and locks
    PostgreSQL Fitter, Happier 2021 Slide 70 of 72

    View Slide

  71. We made PostgreSQL
    Fitter , Happier , More Productive
    PostgreSQL Fitter, Happier 2021 Slide 71 of 72

    View Slide

  72. Thanks!
    bit.ly/pg-fitter-happier
    PostgreSQL Fitter, Happier 2021 Slide 72 of 72

    View Slide