PostgreSQL Subtransactions: The Hidden Performance Crisis

PostgreSQL Subtransactions: The Hidden Performance Crisis Shiv Iyer Founder and
CEO MinervaDB Inc. and ChistaDATA Inc. [email protected] | linkedin.com/in/thewebscaledba

What We'll Uncover Today 01 The Mystery An elusive database
phantom affecting millions of users 02 Root Cause Analysis Diving deep into PostgreSQL's internal architecture 03 Technical Deep Dive Understanding subtransactions and their hidden costs 04 Solutions Three approaches evaluated and the path chosen 05 Lessons Learned Actionable takeaways for your database infrastructure

The Challenge: When Performance Vanishes Enterprise PostgreSQL deployments faced an
unprecedented crisis: database stalls cascading across infrastructure, mysterious performance cliffs that defied conventional troubleshooting, and user-facing 500 errors impacting millions. Teams scrambled to identify the root cause, but traditional monitoring showed nothing obvious. Queries would execute normally for weeks, then suddenly grind to a halt. The pattern seemed random, unpredictable, and deeply concerning. This wasn't just a technical inconvenience4it was a business continuity threat affecting service reliability, customer trust, and operational stability. Impact at a Glance Millions of users affected HTTP 500 errors spiking Unpredictable degradation Week-long investigation cycles

Discovery Timeline: June 24th Incident 1 Initial Detection CI/CD runners
begin reporting timeout errors during database operations 2 Query Backlog Build queue grows exponentially as queries wait indefinitely for lock acquisition 3 User Impact Web application timeouts cascade to frontend, generating 500 errors for end users 4 Investigation Begins Database team mobilises to identify root cause amid mounting pressure The incident followed a disturbingly familiar pattern: normal operations, sudden stalls, mysterious recovery. But this time, the team was determined to find answers.

Monitoring Revealed a Pattern LWLock Contention Spike PostgreSQL's lightweight lock
monitoring showed dramatic contention on SubtransControlLock4a lock type rarely seen in production systems. Performance Degradation Transaction throughput dropped from 360,000 TPS to under 50,000 TPS within minutes, with no corresponding increase in query volume or complexity. Mysterious Recovery After 15-20 minutes, performance spontaneously recovered to normal levels4 no intervention required, no clear trigger for resolution. The ephemeral nature of these incidents earned them the nickname "Nessie"4like the Loch Ness Monster, they appeared unpredictably and vanished without trace.

The Elusive "Nessie" Phenomenon What Made It So Hard to
Diagnose? Stalled queries all waiting on SubtransControlLock Random occurrence with week-long intervals between incidents No obvious trigger from application changes or traffic patterns Self-resolving nature prevented post-mortem analysis Replica-specific behaviour that didn't affect primary database "Like chasing a ghost4every time we got close to understanding it, the problem vanished, leaving us with more questions than answers." SubtransControlLock became the key clue, but understanding why required diving deep into PostgreSQL internals.

Business Impact: Beyond Technical Metrics 500 HTTP Errors User-facing failures
during peak incidents 85% Service Degradation Reduction in platform responsiveness 20min Incident Duration Average time users experienced problems 48hrs Investigation Cycles Engineering resources diverted from feature work The cost extended beyond immediate user experience. Service reliability metrics declined, customer trust eroded with each incident, and engineering teams faced mounting pressure to resolve an issue that seemed impossible to reproduce or predict. Infrastructure teams needed more than a fix4they needed to understand the mechanism to prevent future occurrences.

Understanding SAVEPOINT in PostgreSQL A SAVEPOINT establishes a named marker
within a PostgreSQL transaction, allowing partial rollback without abandoning the entire transaction context. Think of it as a checkpoint in a complex operation. When you execute SAVEPOINT my_savepoint, PostgreSQL creates a subtransaction with its own transaction ID (XID). This enables fine-grained error handling and recovery within larger transactional workflows. Unlike nested transactions in some database systems, PostgreSQL savepoints maintain a parent-child relationship where the subtransaction's fate depends on the parent transaction's ultimate commit or rollback decision. BEGIN; INSERT INTO users (email) VALUES ('[email protected]'); SAVEPOINT user_created; INSERT INTO profiles (user_id) VALUES (123); -- Error occurs here ROLLBACK TO user_created; -- User insert preserved COMMIT;

Real-World SAVEPOINT Scenarios 1 Find or Create Pattern E-commerce systems
checking inventory: attempt INSERT, rollback on duplicate key, then SELECT existing record 2 Duplicate Handling User registration flows where email uniqueness must be enforced gracefully without transaction failure 3 Partial Rollback Multi-step workflows where one failed operation shouldn't invalidate all previous successful steps Consider an order processing system: create order record, reserve inventory, process payment, send confirmation. If payment fails, you want to rollback payment attempt and inventory reservation, but preserve the order record for retry logic. This is precisely where savepoints shine4and where the performance trap lies hidden.

How Rails Applications Generate Savepoints ActiveRecord::Base.transaction( requires_new: true ) do
User.create!(email: email) end The Rails Pattern The requires_new: true option tells Rails to create a nested transaction, which PostgreSQL implements using SAVEPOINT commands. BEGIN; -- Outer transaction SAVEPOINT active_record_1; INSERT INTO users... RELEASE SAVEPOINT active_record_1; COMMIT; Generated SQL Each nested transaction block generates a SAVEPOINT/RELEASE pair, creating subtransactions transparently. Developers often use this pattern without realising the PostgreSQL-level implications. A single Rails controller action might generate dozens of subtransactions through nested service calls and background job enqueueing.

Why SAVEPOINT Matters for Error Recovery Graceful Degradation Handle constraint
violations without failing entire user request4crucial for production reliability Partial Rollback Undo specific failed operations whilst preserving successful operations in the same transaction Data Integrity Maintain ACID guarantees whilst providing application-level error handling flexibility Without savepoints, database constraint violations would force complete transaction rollback. A duplicate email error during user registration would invalidate all related record creation, forcing developers to restructure application logic significantly. The architectural benefit is clear4but there's a hidden performance cost that only manifests at scale.

The Hidden Cost: Introducing the Problem What Lurks Beneath Every
SAVEPOINT creates a subtransaction that PostgreSQL must track. This tracking happens through a specialised cache structure called SubtransControlLock. On the primary database, this overhead is negligible. But on read replicas, a perfect storm of conditions can trigger catastrophic performance collapse. Cache Management Overhead Subtransaction metadata must be searched linearly for every tuple visibility check during queries Replica Vulnerability WAL replay on replicas handles subtransaction tracking differently than primary write operations Cumulative Effect Long-running transactions combined with high subtransaction counts create exponential slowdown

SubtransControlLock: PostgreSQL's SLRU Cache PostgreSQL manages subtransaction relationships using a
Simple LRU (SLRU) cache structure4a specialised implementation designed for tracking transaction metadata with minimal memory overhead. What It Tracks For every subtransaction (subXID), PostgreSQL needs to know its parent transaction ID. This mapping is critical for MVCC tuple visibility checks4 determining which rows are visible to which transactions. When a query asks "can I see this row?", PostgreSQL must resolve not just the transaction ID but also any parent transaction relationships to apply correct visibility rules. Why SLRU? The SLRU design optimises for sequential access patterns common in transaction processing. Pages are loaded from disk into memory buffers, with least-recently-used pages evicted when the cache fills. This works beautifully for normal transaction workloads. But subtransactions introduce a pathological access pattern that defeats the LRU strategy.

Cache Architecture: The 65K Limit 32 Buffer Pages Fixed-size cache
allocated at PostgreSQL startup 2,048 XIDs per Page Each page stores transaction ID mappings in dense array 65,536 Total Entries Maximum subtransactions trackable before overflow This architectural decision made sense when PostgreSQL's subtransaction feature was designed. The 65K limit seemed impossibly high4who would create tens of thousands of subtransactions? But modern ORM frameworks, microservices architectures, and high-throughput CI/CD pipelines can easily exceed this threshold. A busy Rails application might create hundreds of subtransactions per second. The Linear Search Problem When looking up a subtransaction's parent, PostgreSQL performs a linear scan through cache pages. For a cache with 65K entries, this means potentially examining thousands of entries for every tuple visibility check. As queries scan millions of rows, the cost becomes catastrophic.

Suboverflow: The Performance Cliff What Triggers Suboverflow When the subtransaction
cache exceeds 65,536 entries (32 pages × 2,048 XIDs), PostgreSQL enters suboverflow state. This fundamentally changes how subtransaction lookups work. Instead of memory-resident lookups, PostgreSQL must: Read subtransaction data from pg_subtrans files on disk 1. Perform additional I/O operations for every visibility check 2. Acquire and release locks more frequently 3. Performance Impact Before overflow: 360,000 TPS After overflow: 50,000 TPS That's an 86% performance reduction from a single internal state change. The transition isn't gradual4it's a performance cliff. Systems run normally at 64,000 subtransactions, then suddenly collapse when crossing the 65,536 threshold. On busy replicas serving read traffic, this overflow condition creates severe lock contention. Hundreds of concurrent queries all competing for SubtransControlLock access, each performing disk I/O, creates a cascading failure scenario.

Primary vs Replica: Critical Differences Primary Database Generates WAL records
for subtransactions as they're created. Processes write operations sequentially with natural coordination. The primary has transaction context4it knows which transactions are active, which are creating subtransactions, and can coordinate access efficiently. Read Replica Replays WAL records from primary, reconstructing transaction state. Must maintain KnownAssignedXids structure to track all transactions visible to queries. The replica operates blind to transaction intent4it sees subtransactions appearing in WAL replay but must infer relationships and visibility rules without full context. This architectural difference means replicas pay a higher cost for subtransaction tracking. The KnownAssignedXids snapshot must be consulted for every tuple visibility decision, amplifying the SubtransControlLock contention.

Why Replicas Are Hit Harder Read replicas face a perfect
storm of conditions that make subtransaction overhead catastrophic: MVCC Tuple Visibility Checks Every query scanning tables must determine row visibility based on transaction snapshots. With subtransactions, this requires resolving parent transaction IDs through the SLRU cache. lastOverflowedXID Tracking Replicas maintain a watermark of when suboverflow occurred. Once set, all subsequent queries must check subtransaction state, even for transactions that never used savepoints. Cumulative Effect with Long Transactions Autovacuum workers and long-running analytical queries hold transaction snapshots open. Combined with subtransaction creation, this extends the window where suboverflow affects performance. The "Nessie" phenomenon occurred specifically on replicas because read traffic patterns4high concurrency, large table scans, diverse query shapes4 maximised subtransaction lookup frequency whilst WAL replay from the primary continuously created new subtransactions.

Instrumentation Strategy: Finding the Culprit Measurement Approach The team implemented
comprehensive observability to correlate subtransaction creation with performance degradation: Prometheus metrics grouped by ActiveRecord model to identify which parts of the codebase generated savepoints ActiveSupport notifications for transaction lifecycle events, capturing SAVEPOINT/RELEASE pairs SQL query tracking with correlation IDs linking database operations to HTTP request traces What They Discovered Certain Rails models used find_or_create_by patterns extensively, generating savepoints on every invocation. Multiplied across thousands of requests per second, this created a steady stream of subtransactions. CI/CD job processing alone accounted for 12,000 subtransactions per minute during peak periods. Backtrace logging revealed surprising sources4background job frameworks, audit logging systems, and even monitoring tools themselves were generating subtransactions through nested transaction blocks.

Data Discovery: Patterns Emerge Code Path Analysis Grep searches identified
847 instances of requires_new: true across the Rails codebase, with 23 high-traffic code paths accounting for 80% of subtransaction creation. Traffic Correlation Subtransaction rates directly correlated with CI/CD pipeline activity4build job processing, artifact uploads, and test result recording all generated savepoints. Request Tracing Correlation IDs linked individual HTTP requests to database queries, revealing that single API calls could generate 30+ subtransactions through nested service invocations. The data painted a clear picture: modern application architectures with their nested transactions, service-oriented designs, and ORM abstractions naturally generate subtransactions at scales PostgreSQL's SLRU cache wasn't designed to handle.

The Long Transaction Connection The final piece of the puzzle
involved understanding why incidents occurred sporadically rather than continuously. The answer lay in transaction duration. Autovacuum's Role PostgreSQL's autovacuum process occasionally runs long-duration transactions to clean up dead tuples. These maintenance transactions: Hold transaction snapshots open for extended periods Prevent subtransaction metadata from being cleaned up Create a "window of vulnerability" lasting 15-30 minutes CI/CD Processing GitLab.com's CI/CD pipeline processing involved complex workflows that could take several minutes per build. During these periods: High subtransaction creation rate continued Cache filled rapidly towards 65K limit Primary-to-replica WAL replay lag accumulated When autovacuum activity coincided with heavy CI/CD processing, the cache would overflow. The combination of long transaction duration + high subtransaction rate created the perfect conditions for "Nessie" to appear.

The Breakthrough: Reproducing the Issue "We finally understood the mechanism4now
we could reproduce it reliably." Nikolay Samokhvalov from Postgres.ai successfully reproduced the performance collapse in a controlled environment using a deceptively simple test case: 1 Create Single SAVEPOINT Begin transaction, establish one savepoint, keep transaction open 2 Generate Busy Write Traffic Simultaneously run high-volume write workload creating thousands of transactions per second 3 Observe Performance Collapse Within minutes, replica query latency increased by 10-20x as cache overflowed The test proved that even a single subtransaction, when combined with long transaction duration and concurrent write activity, could trigger the pathological behaviour. This insight from Catalin Irimie (PostgreSQL Support) provided the breakthrough needed to understand and eliminate the root cause.

Three Solution Approaches Evaluated 1 Eliminate SAVEPOINTs Approach: Refactor application
code to avoid nested transactions and savepoint usage entirely. Pros: Permanent fix, no PostgreSQL customisation needed Cons: Requires extensive codebase changes and thorough testing 2 Eliminate Long Transactions Approach: Restructure workflows to prevent transactions exceeding a duration threshold. Pros: Reduces vulnerability window Cons: Difficult to guarantee, autovacuum transactions unavoidable 3 PostgreSQL Core Patches Approach: Apply community patches increasing SLRU cache size and implementing associative indexing. Pros: Addresses root cause in database engine Cons: Requires maintaining custom PostgreSQL builds, upgrade complexity

Why Application-Level Elimination Won Despite PostgreSQL patches being available from
Andrey Borodin (targeting PostgreSQL 15+), the team chose Option 1: Eliminate SAVEPOINTs for compelling operational reasons: Maintenance Burden Custom PostgreSQL builds require dedicated engineering resources for testing, validation, and security patching. Every upstream release must be evaluated for compatibility with custom patches. Upgrade Complexity Major version upgrades become high-risk events requiring extensive regression testing. The patch might not apply cleanly to future PostgreSQL versions, creating technical debt. Customer Compatibility Enterprise customers often run their own PostgreSQL instances. Requiring custom builds would create support nightmares and limit deployment flexibility. Strategic Decision Fixing the application layer ensures the solution works across all PostgreSQL versions and deployment configurations4hosted, self-managed, or cloud provider services. The decision prioritised long-term maintainability over short-term implementation effort.

Implementation Strategies: Eliminating SAVEPOINTs 1 Rewrite Updates Outside Transactions Move
idempotent operations outside explicit transaction blocks. Use database-level constraints for data integrity instead of application transaction nesting. 2 Use INSERT/UPDATE ON CONFLICT Replace find_or_create_by patterns with PostgreSQL's native INSERT ... ON CONFLICT DO UPDATE syntax, eliminating need for savepoints during duplicate handling. 3 Accept Non-Atomic Operations For workflows where perfect atomicity isn't critical, split operations into separate transactions. Add compensating logic for handling partial failures instead of relying on rollback. Each strategy required careful analysis of data consistency requirements. Not every operation could be restructured4some workflows genuinely needed atomic multi-step operations. The team prioritised high-volume code paths where savepoint elimination would have maximum impact.

Results: The Fix That Worked 100% Incident Elimination Zero "Nessie"
occurrences post-implementation 42% Replica Performance Gain Improved query latency on read replicas 99.8% Uptime Achievement Consistent service availability metrics After eliminating savepoints from the 23 highest-traffic code paths, monitoring showed dramatic improvements. SubtransControlLock contention disappeared from performance profiles, and replica query latencies stabilised at consistently low levels. The remaining low-volume savepoint usage4primarily in administrative tools and edge-case error handling4proved insufficient to trigger suboverflow conditions even under peak load. Sustained Impact Six months post-implementation, the infrastructure team reported zero subtransaction-related incidents. The fix proved durable across traffic growth, new feature deployments, and database version upgrades.

Operational Safeguards Implemented SAVEPOINT Usage Alerts Prometheus metrics trigger warnings
when any code path generates more than 100 savepoints per minute4catching regressions early Continuous Monitoring Real-time dashboards tracking subtransaction rates, SubtransControlLock contention, and replica lag4maintaining visibility into system health Community Bug Detection Contributing findings back to PostgreSQL mailing lists, helping other organisations avoid similar issues and supporting upstream improvements The monitoring infrastructure became part of the deployment pipeline4pull requests introducing new savepoint usage automatically trigger review workflows, ensuring database performance implications are considered during code review.

Future PostgreSQL Improvements Andrey Borodin's Patches Several community-contributed patches addressing
the SLRU cache limitations are under review for PostgreSQL 15+: 100MB SLRU cache expansion 3 increasing capacity from 65K to 1.6M subtransaction entries Associative indexing 3 replacing linear search with hash-based lookups for O(1) performance Improved cache eviction 3 smarter page replacement algorithms for transaction workloads Why This Matters These changes would make PostgreSQL more resilient to subtransaction- heavy workloads by default, without requiring application-level refactoring. However, patches require extensive testing and community consensus. Estimated timeline: PostgreSQL 16 or 17 for production release. Until then, organisations running high-throughput PostgreSQL deployments should audit their subtransaction usage and implement application-level mitigations rather than waiting for engine-level fixes.

Key Takeaways for Database Teams Subtransactions Are Powerful But Risky
at Scale ORM frameworks and modern application patterns can easily generate tens of thousands of subtransactions per minute 4exceeding PostgreSQL's cache design limits and triggering severe performance degradation. Read Replicas Need Special Consideration MVCC tuple visibility checks on replicas pay a disproportionate cost for subtransaction tracking. Performance issues may manifest only on replicas whilst primary database appears healthy. Monitoring Is Critical for Early Detection Without instrumentation, subtransaction issues appear as mysterious performance degradation. Track SubtransControlLock contention, savepoint usage rates, and transaction durations proactively.

Actionable Guidance for Your Organisation 01 Audit Your Codebase Search
for nested transaction patterns, ORM requires_new flags, and explicit SAVEPOINT usage4quantify how many subtransactions your application generates under load 02 Test Replica Performance Run load tests specifically targeting read replicas with realistic workloads4 measure query latency degradation and lock contention patterns 03 Plan for PostgreSQL Upgrades Stay informed about PostgreSQL 15+ improvements to subtransaction handling4evaluate whether future releases eliminate need for application- level workarounds 04 Implement Preventive Monitoring Deploy metrics tracking subtransaction rates and alerting on abnormal patterns4catch issues before they impact users

Thank You Shiv Iyer Founder and CEO MinervaDB Inc. and
ChistaDATA Inc. Contact: [email protected] linkedin.com/in/thewebscaledba Acknowledgements: Original investigation by Grzegorz Bizon and Stan Hu (GitLab Engineering) Breakthrough testing by Nikolay Samokhvalov (Postgres.ai) Critical insights from Catalin Irimie (PostgreSQL Support) PostgreSQL community for collaborative debugging Expert PostgreSQL Consulting MinervaDB and ChistaDATA specialise in: Performance optimisation High-availability architectures Replication troubleshooting Migration and upgrade planning Let's discuss your database challenges Questions?

PostgreSQL Subtransactions: The Hidden Performa...

PostgreSQL Subtransactions: The Hidden Performance Crisis

Shiv Iyer PRO

More Decks by Shiv Iyer

Other Decks in Technology

Featured

Transcript

PostgreSQL Subtransactions: The Hidden Performance Crisis Shiv Iyer Founder and

What We'll Uncover Today 01 The Mystery An elusive database

The Challenge: When Performance Vanishes Enterprise PostgreSQL deployments faced an

Discovery Timeline: June 24th Incident 1 Initial Detection CI/CD runners

Monitoring Revealed a Pattern LWLock Contention Spike PostgreSQL's lightweight lock

The Elusive "Nessie" Phenomenon What Made It So Hard to

Business Impact: Beyond Technical Metrics 500 HTTP Errors User-facing failures

Understanding SAVEPOINT in PostgreSQL A SAVEPOINT establishes a named marker

Real-World SAVEPOINT Scenarios 1 Find or Create Pattern E-commerce systems

How Rails Applications Generate Savepoints ActiveRecord::Base.transaction( requires_new: true ) do

Why SAVEPOINT Matters for Error Recovery Graceful Degradation Handle constraint

The Hidden Cost: Introducing the Problem What Lurks Beneath Every

SubtransControlLock: PostgreSQL's SLRU Cache PostgreSQL manages subtransaction relationships using a

Cache Architecture: The 65K Limit 32 Buffer Pages Fixed-size cache

Suboverflow: The Performance Cliff What Triggers Suboverflow When the subtransaction

Primary vs Replica: Critical Differences Primary Database Generates WAL records

Why Replicas Are Hit Harder Read replicas face a perfect

Instrumentation Strategy: Finding the Culprit Measurement Approach The team implemented

Data Discovery: Patterns Emerge Code Path Analysis Grep searches identified

The Long Transaction Connection The final piece of the puzzle

The Breakthrough: Reproducing the Issue "We finally understood the mechanism4now

Three Solution Approaches Evaluated 1 Eliminate SAVEPOINTs Approach: Refactor application

Why Application-Level Elimination Won Despite PostgreSQL patches being available from

Implementation Strategies: Eliminating SAVEPOINTs 1 Rewrite Updates Outside Transactions Move

Results: The Fix That Worked 100% Incident Elimination Zero "Nessie"

Operational Safeguards Implemented SAVEPOINT Usage Alerts Prometheus metrics trigger warnings

Future PostgreSQL Improvements Andrey Borodin's Patches Several community-contributed patches addressing

Key Takeaways for Database Teams Subtransactions Are Powerful But Risky

Actionable Guidance for Your Organisation 01 Audit Your Codebase Search

Thank You Shiv Iyer Founder and CEO MinervaDB Inc. and