approach. I’m going to take a bottom-up approach, though. This isn’t necessarily better. It might help you to develop better intuitions about VACUUM’s performance, though, especially if you’re familiar with the standard explanation already. 3
of data structures. 2. A bottom-up take on bloat VACUUM, HOT, and the relationship between table bloat and index bloat. 3. VACUUM’s priorities What VACUUM prioritizes. 4. Future work Making garbage collection more efﬁcient in future releases. 4
abstract away physical representation. MVCC more or less versions entries in objects (relations). Heap relations (tables) store rows more or less in arbitrary order. Index relations may also have multiple versions, though this can be avoided by HOT optimization. 6
access, and access by index scans. Records are identiﬁed by TID (e.g., '(2,32)', '(43,89)')— must be stable for index scans. TID is a “physiological” identiﬁer. - Physical across pages/blocks — block number. - Logical within pages/blocks — item pointer offset. 7
speciﬁc records efﬁcient. B-Trees mostly consist of all items that are indexed in natural sort order. (99%+ of blocks are leaf blocks.) B-Tree indexes are particular about which key values go on which page, complicating matters for MVCC garbage collection. Index pages will regularly need to “ﬁt a new entry between existing entries”. Physiological-ness helps with that (index TIDs exist, but are not stable over time). 9
maintain summarizing information. Only heap has visibility map. Maintained by VACUUM. No opportunistic processing. Also comprised of 8KiB blocks. Visibility map allows avoidance of redundant work by VACUUM. As of 9.6, visibility map also tracks all-frozen pages. 15
of data structures. 2. A bottom-up take on bloat VACUUM, HOT, and the relationship between table bloat and index bloat. 3. VACUUM’s priorities What VACUUM prioritizes. 4. Future work Making garbage collection more efﬁcient in future releases. 16
pages has a lot in common across disparate access methods (e.g., heap, B-Tree, …). Too much complexity to manage without breaking down into manageable pieces with commonality. - What else could possibly work? Indexes are very different to tables, but their page layout has quite a lot in common. 20
observable on pages. - “Hint bits” note that transaction that updated subsequently committed. - Opportunistic cleanup reclaims space at the page level, often due to observing a set hint bit. VACUUM does work that signiﬁcantly overlaps with opportunistic cleanup work of pages — pruning. 21
opportunistic garbage collection. - Runs reliably against whole table and all indexes, though. - Less important than opportunistic garbage collection for some workloads. - Sometimes skips pages that it cannot do cleanup on immediately, so is itself somewhat opportunistic. 24
Same as opportunistic pruning by HOT. Mark item pointers as dead, which index scans will need until after step 2. Free space for heap tuple structure. 2. Visits indexes, killing index tuples + item pointers together. 3. Second pass over heap, to mark dead item pointers LP_UNUSED/reusable. Frees less space than step 1. 25
sense when you think about page-level constraints. Indexes resolve key values to heap TIDs. Heap TIDs must be stable over time (within a VACUUM cycle). Index scans must always land on the right heap tuple. Heap cleanup is constrained by this. VACUUM remembers a list of heap TIDs to go kill in indexes.
entries. Dependent on no indexed columns being modiﬁed by UPDATEs. Chain together heap tuples in heap page. Mostly valuable because it avoids bloat in the ﬁrst place. Also enables pruning heap pages opportunistically.
pointers for tuples that cannot be in any index due to being heap-only (HOT) tuples. Important for many workloads. - Can even remove dead tuple space when there are zero HOT UPDATEs. “Hot pruning” is a bit of a misnomer. Cannot reclaim dead item pointers, though. 28
certain index tuples aren’t useful to anybody now. Sets LP_DEAD hint bit when the heap visibility info indicates that the tuple is dead to all possible MVCC snapshots. Simple read queries do this. The only case where visibility information is stored in indexes directly. Subsequent scans won’t visit the heap when LP_DEAD is found set.
Index scans can get away with setting LP_DEAD bit without exclusive buffer lock, but actual restructuring needs this. LP_DEAD hint bit index tuples are reclaimed during subsequent inserts into page that might need a page split. Prevents would-be page splits from actually splitting.
always expects to be able to ﬁnd the heap entry it points to, if only to discover that it’s logically dead. (Item pointers stick around as stubs if we can only prune.) Heap TIDs are supposed to be stable, at least within a VACUUM cycle. 31
of data structures. 2. A bottom-up take on bloat VACUUM, HOT, and the relationship between table bloat and index bloat. 3. VACUUM’s priorities What VACUUM prioritizes. 4. Future work Making garbage collection more efﬁcient in future releases. 32
MVCC snapshots. The burden is not placed on old snapshots to do extra work by examining a subsidiary structure, such as UNDO. “Recently dead” tuples may be dead to most snapshots, but still cannot be garbage collected. “Horizon” before which dead tuples can be killed is conservative. 34
Heavyweight lock strength doesn’t block user queries, including INSERTs, UPDATEs, and DELETEs. Indexes are scanned in physical order, not logical order (B-Tree indexes only). Query performance is the major consideration. - Reclaiming space not necessarily given that much weight. 35
of free space will get reclaimed eventually, when it’s convenient. Indexes have inherent restrictions, too. Only whole index pages can be reclaimed by the free space map — undersized pages cannot be merged together. 36
of data structures. 2. A bottom-up take on bloat VACUUM, HOT, and the relationship between table bloat and index bloat. 3. VACUUM’s priorities What VACUUM prioritizes. 4. Future work Making garbage collection more efﬁcient in future releases. 37
removing old garbage tuples that are obsolete to all possible transactions. Opportunistic techniques (HOT Pruning, LP_DEAD cleanup in B-Tree indexes) also exist. Garbage collection usually happens both ways. VACUUM is exclusively responsible for cleanup of transaction management status (e.g., Freezing tuples, truncating away commit status information). 41
restricted by the same ordering requirements that dictate high-level steps VACUUM performs. - Index scans cannot be allowed to land on an unrelated tuple due to heap TID recycling. Some amount of VACUUMing is almost inevitable, even without freezing, because dead item pointers must eventually be marked reusable. (See lazy_vacuum_heap() and lazy_vacuum_page().) 42