How We Made PostgreSQL Fitter, Happier, More Productive

How We Made PG Fitter, Happier, More Productive Andrew Atkinson
bit.ly/pg-fitter- happier PostgreSQL Fitter, Happier 2021 Slide 1 of 72

Thanks Radiohead Fitter, Happier - OK Computer (1997) PostgreSQL Fitter,
Happier 2021 Slide 2 of 72

About Me Staff Software Engineer at Fountain Contact andyatkinson.com @andatki
PostgreSQL Fitter, Happier 2021 Slide 3 of 72

Fountain Fountain is a High Volume Hiring Platform NOTE This
talk is not about Fountain Hiring DBAs, Engineers right now! http://get.fountain.com/careers PostgreSQL Fitter, Happier 2021 Slide 4 of 72

Context and Inspiration The platform experienced a 10x usage increase
during the Covid-19 pandemic We had no dedicated DBA There was an opportunity to rapidly learn to better utilize PG PostgreSQL Fitter, Happier 2021 Slide 5 of 72

Case Study: Scaling and Optimizing PostgreSQL Share our challenges, findings
and some successful results Connect our challenges with common scaling and optimization issues Backend powered by Rails and PostgreSQL 10 on AWS RDS Single writer primary with async replication PostgreSQL Fitter, Happier 2021 Slide 6 of 72

Database Connections Problems As web app and DB activity load
increased, performance degraded Web app servers and background process workers couldn't get connections ...Is the server running on host example and accepting TCP/IP connections?... Could not add more servers to handle load due to lack of database connections PostgreSQL Fitter, Happier 2021 Slide 9 of 72

Connections Learnings Connections are costly! PostgreSQL forks a new process
for each connection Connections are a limited resource! Connections may be active , idle , idle in transaction . Too many idle connections can be a problem Connections in transaction but idle that are old may be a problem PostgreSQL Fitter, Happier 2021 Slide 10 of 72

Connection Poolers pgbouncer Can be monitored by pgdash Pgpool-II RDS
Proxy PostgreSQL Fitter, Happier 2021 Slide 12 of 72

Connection Pooler Benefits Connection pooling enables many more client transactions
with a fixed number of server connections Safely add more app servers and background workers that require connections Improve performance More efficiently utilize connections with transactions Eliminate overhead of connection establishment PostgreSQL Fitter, Happier 2021 Slide 14 of 72

Added RDS Proxy RDS Proxy meant our small team didn't
have to manage the infrastructure We no longer ran out of connections or saw degraded performance RDS Proxy was a big win for system stability! PostgreSQL Fitter, Happier 2021 Slide 15 of 72

Excessive Disk Space Observations Tables and indexes had a very
high bloat percentage No table or index bloat reductions had been performed Non-optimized indexes were taking up more space than necessary PostgreSQL Fitter, Happier 2021 Slide 17 of 72

Space Reduction for Indexes Bloated indexes can be rebuilt without
bloat Indexes can be reduced in size by limiting the fields and rows Unnecessary indexes can be removed entirely PostgreSQL Fitter, Happier 2021 Slide 19 of 72

Non-Optimized Indexes We had over 100 unused indexes among 75
tables Some bloated indexes were over 100 GB in size Estimated bloat percentage >= 80% PostgreSQL Fitter, Happier 2021 Slide 21 of 72

Rebuild Indexes Used pg_repack with PG 10 to rebuild indexes
one by one On PG >= 12 REINDEX CONCURRENTLY is available Rebuilt over 30 indexes reclaiming 230 GB of space PostgreSQL Fitter, Happier 2021 Slide 22 of 72

Removed Unused Indexes Find indexes with zero scans Confirm no
usages across the app, cron jobs, background jobs etc. In Rails create migration to drop the index ( DROP INDEX index_name ) Removed over 100 indexes and 300 GB of space! PostgreSQL Fitter, Happier 2021 Slide 24 of 72

Relocated Database Tables We had 6 tables of metrics style
data This data was typically not presented in the app High writes, insert only Around 700GB of data PostgreSQL Fitter, Happier 2021 Slide 25 of 72

No Downtime Relocation Process 1. Change queries that use fields
from tables being moved Typically involved splitting into more queries 2. Create new replica 3. Test and release queries against replica 4. Send writes to old and new. Writes will fail on read until promoted 5. Promote replica 6. Stop writing to old location and remove redundant obsolete tables. PostgreSQL Fitter, Happier 2021 Slide 26 of 72

Space Reduction Technique Size Initial DB Size 2.5 TB Relocated
tables 700 GB Removed unused indexes 300 GB Removed index bloat 200 GB New DB Size 1.2 TB PostgreSQL Fitter, Happier 2021 Slide 27 of 72

Performance Challenges 10x increase in RPM Autovacuum not running at
all Autovacuum running for a very long time Slow queries Non-optimized indexes PostgreSQL Fitter, Happier 2021 Slide 29 of 72

Scaled Up Database Server (Vertically) Scaled Up Database Server 96
vCPUs 768 GB memory Provisioned IOPS PostgreSQL Fitter, Happier 2021 Slide 30 of 72

Why Tune Autovacuum? Remove dead rows for future inserts and
updates quicker Reduce excessive disk space growth by avoiding inefficient disk use Ensure data statistics are refreshed as quickly as possible Reduce autovacuum run times PostgreSQL Fitter, Happier 2021 Slide 31 of 72

Autovacuum Parameters Run vacuum in proportion to UPDATE and DELETE
workload Param (PG 10) Original Tuned autovacuum_vacuum_scale_factor 20% 1% (tuned for big tables) autovacuum_vacuum_cost_limit 200 2000 autovacuum_vacuum_cost_delay 20ms 2ms maintenance_work_mem Sized by RDS autovacuum_max_workers PostgreSQL Fitter, Happier 2021 Slide 33 of 72

Query Performance Query load can be categorized as write or
read Slow queries contribute to system load Super slow reads on replica can affect primary Bloat can affect query performance PostgreSQL Fitter, Happier 2021 Slide 34 of 72

Relocated Read Queries to Replica High proportion of reads (SELECT)
to writes (INSERT, UPDATE, DELETE) Relocated queries to replica to reduce load on primary Additional replicas can be added to distribute read load PostgreSQL Fitter, Happier 2021 Slide 36 of 72

Query Relocation Process Identify slow queries in APM or PgHero
Determine if they are read only Wrap query in a code block that connects to the replica Fall back to the primary if not found Fall back to the primary if replication lag is too high PostgreSQL Fitter, Happier 2021 Slide 37 of 72

Bloat and Query Performance Very high bloat can contribute to
poor or inaccurate query plans Bloat causes unnecessary scans through pages Bloat can slow down index only scans PostgreSQL Fitter, Happier 2021 Slide 38 of 72

Fast Database Migrations Use strong_migrations and code review to detect
slow migrations Avoid common problems with large tables, like adding a column with a default PostgreSQL Fitter, Happier 2021 Slide 39 of 72

Write Rate Performance and Indexes Indexes add some maintenance overhead
for writes Remove unused, duplicate or invalid indexes Use PgHero or this SQL query to find unused indexes Eliminate redundant indexes e.g. querying a when indexes exist for a & b and a H.O.T. updates require non-indexed columns PostgreSQL Fitter, Happier 2021 Slide 41 of 72

Query Statistics Enabled pg_stat_statements to collect query statistics Viewed total_time
and calls statistics with PgHero Slow queries > 20ms flagged https://wiki.postgresql.org/wiki/Monitoring PostgreSQL Fitter, Happier 2021 Slide 42 of 72

Query Optimization Process Overview Use EXPLAIN to look at query
execution plan estimates Use EXPLAIN ANALYZE and look at where significant time is spent Look for techniques like adding an index to improve query performance Focus on highest impact queries Test with similar hardware and with similar data (very important) PostgreSQL Fitter, Happier 2021 Slide 43 of 72

-- -- Query: Drivers sorted by average rating and trip
count -- rideshare_development> explain SELECT CONCAT(d.first_name, ' ', d.last_name) AS driver_name, AVG(t.rating) AS avg_rating, COUNT(t.rating) AS trip_count FROM trips t JOIN users d ON t.driver_id = d.id GROUP BY t.driver_id, d.first_name, d.last_name ORDER BY COUNT(t.rating) DESC; PostgreSQL Fitter, Happier 2021 Slide 45 of 72

+---------------------------------------------------------------------------------+ | QUERY PLAN | |---------------------------------------------------------------------------------| | Sort (cost=231.61..236.11 rows=1800
width=90) | | Sort Key: (count(t.rating)) DESC | | -> HashAggregate (cost=107.28..134.28 rows=1800 width=90) | | Group Key: t.driver_id, d.first_name, d.last_name | | -> Hash Join (cost=3.35..69.78 rows=3000 width=22) | | Hash Cond: (t.driver_id = d.id) | | -> Seq Scan on trips t (cost=0.00..58.00 rows=3000 width=8) | | -> Hash (cost=2.60..2.60 rows=60 width=22) | | -> Seq Scan on users d (cost=0.00..2.60 rows=60 width=22) | +---------------------------------------------------------------------------------+ EXPLAIN Time: 0.027s PostgreSQL Fitter, Happier 2021 Slide 46 of 72

Query Optimization Tips Look for Index Only Scan opportunities Query
fewer fields (avoid select * ) Eliminate Heap Fetch by limiting fields Consider a Covering index by using INCLUDE (b) to include b payload column (11+) Avoid ORDER BY and DISTINCT if possible, or take advantage of sorted property of indexes PostgreSQL Fitter, Happier 2021 Slide 47 of 72

Indexes Not Used A sequential scan may be more efficient
than an index scan Query execution planner will look at cardinality (unique values) and estimate row counts and selectivity In a test database, sequential scans can be disabled (NOT FOR PROD) to confirm indexes are used rideshare_development> SET enable_seqscan = OFF; SET PostgreSQL Fitter, Happier 2021 Slide 48 of 72

+---------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | |---------------------------------------------------------------------------------------------------------------| | Sort (cost=300.56..305.06 rows=1800
width=90) | | Sort Key: (count(t.rating)) DESC | | -> HashAggregate (cost=176.23..203.23 rows=1800 width=90) | | Group Key: t.driver_id, d.first_name, d.last_name | | -> Hash Join (cost=15.07..138.73 rows=3000 width=22) | | Hash Cond: (t.driver_id = d.id) | | -> Index Scan using index_trips_on_driver_id on trips t (cost=0.28..115.51 rows=3000 width=8) | | -> Hash (cost=14.04..14.04 rows=60 width=22) | | -> Index Scan using users_pkey on users d (cost=0.14..14.04 rows=60 width=22) | +---------------------------------------------------------------------------------------------------------------+ EXPLAIN Time: 0.022s PostgreSQL Fitter, Happier 2021 Slide 49 of 72

Partial Indexes Partial indexes avoid indexing common values Write rate
benefit as new rows likely will not match predicate PostgreSQL Fitter, Happier 2021 Slide 50 of 72

Partial Index Example 1 of 2 Query trips with no
rating 3000 trips, 750 without a rating -> Bitmap Index Scan on index_trips_on_rating (cost=0.00..13.90 rows=750 width=0) Index Cond: (rating IS NULL) PostgreSQL Fitter, Happier 2021 Slide 51 of 72

Partial Index Example 2 of 2 create index index_trips_on_rating_partial ON
trips (rating) where rating IS NULL; Bitmap Index Scan on index_trips_on_rating_partial (cost=0.00..11.90 rows=750 width=0) Index Cond: (rating IS NULL) Index size Partial index size 40 kB 16 kB 60% decrease PostgreSQL Fitter, Happier 2021 Slide 52 of 72

Partial Indexes Results Created replacement partial indexes for several existing
indexes Two examples of biggest size reductions were 82% and 97% smaller On large tables this is significant PostgreSQL Fitter, Happier 2021 Slide 53 of 72

Reduced Columns in Multi-Column Indexes Remove columns that are not
needed Replace 3 columns with 2, or 2 with 1 PostgreSQL can combine multiple single column indexes! PostgreSQL Fitter, Happier 2021 Slide 54 of 72

Slow Queries Results Fixed more than 10 slow queries identified
by PgHero PostgreSQL Fitter, Happier 2021 Slide 55 of 72

Performance Analysis Tools https://wiki.postgresql.org/wiki/Performance_Analysis_Tools Percona pg_stat_monitor - PostgreSQL Statistics Collector
(extension for 11, 12, 13) pganalyze Index Advisor - Index recommendations! PostgreSQL Fitter, Happier 2021 Slide 58 of 72

Query Errors Queries sometimes never finish Checking out a connection
from the connection pool may never finish Acquiring a lock may never finish A transaction may never commit or rollback PostgreSQL Fitter, Happier 2021 Slide 60 of 72

Statement Timeout Sets a maximum time a query can run
before being canceled Parameter Example value statement_timeout 4-5 seconds PostgreSQL Fitter, Happier 2021 Slide 62 of 72

Checkout Timeout Sets a maximum time to wait for a
connection to be available Parameter Example value checkout_timeout 4-5 seconds PostgreSQL Fitter, Happier 2021 Slide 63 of 72

Lock Timeout Sets a maximum time to wait to acquire
a lock SHOW lock_timeout SET LOCAL lock_timeout = '5s' PostgreSQL Fitter, Happier 2021 Slide 64 of 72

Idle Transactions Active Record manages its own idle transactions in
connection pool Sets a maximum time a connection can be idle before being disconnected Idle transactions can prevent vacuum from running Parameter Default idle_timeout 300s (Rails default) idle_in_transaction_session_timeout 24 hours? PostgreSQL Fitter, Happier 2021 Slide 65 of 72

Replication Errors Replication lag that gets too high can stop
replication entirely Rebuilding indexes is IO intensive which will cause a replication lag spike Replication can conflict with vacuum on primary PostgreSQL Fitter, Happier 2021 Slide 66 of 72

Special Recognition! Past team for opportunities to learn Special thanks
to David Practice runs: Jamie, current team, Lukas, Andrew, and Craig PostgreSQL community: Documentation, IRC, Slack PostgreSQL Fitter, Happier 2021 Slide 68 of 72

Connections and Space Summary Added a connection pooler to increase
performance and enable additional scale Re-created 30 indexes to eliminate bloat Removed more than 100 unused indexes Distributed write load by relocating metrics tables to their own database Tuned Autovacuum to run more frequently and for longer PostgreSQL Fitter, Happier 2021 Slide 69 of 72

Performance and Errors Summary Added a dashboard to show slow
queries and fixed the top 10 Created awareness of index maintenance cost especially for high write tables Moved more than 10 queries to a read replica Added timeouts to increase resiliency from errors related to queries, database connections, and locks PostgreSQL Fitter, Happier 2021 Slide 70 of 72

We made PostgreSQL Fitter , Happier , More Productive PostgreSQL
Fitter, Happier 2021 Slide 71 of 72

Thanks! bit.ly/pg-fitter-happier PostgreSQL Fitter, Happier 2021 Slide 72 of 72

How We Made PostgreSQL Fitter, Happier, More Pr...

How We Made PostgreSQL Fitter, Happier, More Productive

Resources

PGConf NYC 2021 - How we made PG Fitter, Happier, More Productive by Andrew Atkinson

More Decks by Andrew Atkinson

Other Decks in Technology

Featured

Transcript