DevTO - 2017-08-28 - Postgres, MVCC, and You (or, Why COUNT(*) is Slow)

Postgres, MVCC, and You  (or, why COUNT(*) is slow) David
Wolever @wolever DevTO, 2017-08-28

Transactional Databases • A series of operations which only make
sense when performed together • "Everything succeeds or everything fails" Example: transferring $100 between bank accounts: 1. Withdraw $100 from ﬁrst account 2. Deposit $100 into second account @wolever

Transactional Databases Another example: changing a test’s answer key. 1.
Update the answer key 2. Recalculate student grades 3. Recalculate the test’s average mark @wolever

Transactional Databases @wolever Make this really easy

Transactional Databases @wolever =# BEGIN;  =# UPDATE assessments SET answer_key
= 'ABC';  =# regrade_assessment_responses();  =# UPDATE assessments SET average_grade = AVG(  -# SELECT grade FROM responses  -# );  =# COMMIT;

Transactions can be faked… @wolever … but that’s out of
the scope of this talk See, eg: http://blog.codekills.net/2014/03/13/atomic-bank-balance- transfer-with-couchdb/

Transactional Databases Supports transactions: • SQL Databases (PostgreSQL, MySQL, MSSQL,
Oracle, etc) via BEGIN/COMMIT • Redis via MULTI/EXEC • Neo4j @wolever Doesn’t support transactions: • Document stores: MongoDB, CouchDB, Solr, etc • Distributed data stores: Cassandra, Hadoop, Riak, etc • Most KV stores: memcachd, Kyoto Cabinet, LevelDB, etc

The "A" in ACID @wolever • All databases: single-statement atomicity
• Transactional databases: multi-statement atomicity

@wolever

Why is COUNT(*) slow? @wolever

=# \timing on =# SELECT COUNT(*) FROM large_table; 14,066,905 Time:
11.55s @wolever

Why is COUNT(*) slow? @wolever

“The reason why this is slow is related to the
MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense.” - https://wiki.postgresql.org/wiki/Slow_Counting @wolever

… @wolever

The database is just a big tree structure… Surely each
node in the tree can just store a count of the number of leaves, right? @wolever

No. @wolever

On both counts. @wolever

First: Postgres doesn’t store rows in a tree (don’t worry,
indexes do use trees) @wolever

Rows (or "tuples") are stored in "pages" @wolever

Second: the number of active rows in a page "depends"
@wolever

Because MVCC @wolever

MVCC: Multi-Version Concurrency Control @wolever

=# BEGIN; =# DELETE FROM users; =# ROLLBACK; @wolever

=# SELECT * FROM users; id | name ----+--------- 1
| David 2 | Alex =# BEGIN; =# DELETE FROM users; =# ROLLBACK; @wolever

How’s that work? @wolever

XID! @wolever

… XID? @wolever

Each transaction is assigned an ID called XID @wolever

=# SELECT txid_current(); txid_current -------------- 1831787 =# SELECT txid_current(); txid_current
-------------- 1831788 @wolever

=# BEGIN; =# SELECT txid_current(); 1234; =# INSERT INTO whiskey
-# VALUES ('Bruich Laddie The Laddie Ten', 99); =# SELECT xmin, xmax, name, rating -# FROM whiskey -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1234 | 0 | Bruichladdich The Laddie Ten | 99 =# COMMIT; @wolever

=# BEGIN; =# SELECT txid_current(); 1235; =# UPDATE whiskey SET
rating = 100 WHERE rating = 99; =# SELECT xmin, xmax, name, rating  -# FROM whiskey  -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1235 | 0 | Bruichladdich The Laddie Ten | 100 ^^^^ A new tuple was inserted! =# ROLLBACK; =# SELECT xmin, xmax, name, rating  -# FROM whiskey  -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1234 | 1235 | Bruichladdich The Laddie Ten | 99 @wolever

Deciding which rows are visible! @wolever

=# SELECT * FROM users; id | name ----+--------- 1
| David 2 | Alex =# BEGIN; =# DELETE FROM users; =# ROLLBACK; @wolever

def is_tuple_visible(cur_xid, row): if txn_is_aborted(row.xmin) or row.xmin > cur_xid: return
False return ( txn_is_aborted(row.xmax) or tuple.xmax >= cur_xid ) def txn_is_aborted(xid): # See pg_clog and "hint bits" for details # https://wiki.postgresql.org/wiki/Hint_Bits @wolever

Back to COUNT(*) @wolever

COUNT(*) is slow because there is no one “correct” COUNT(*)
COUNT(*) depends on the current transaction @wolever

… do we still have time? @wolever

What if two transactions update or delete a row? The
second always blocks! @wolever

What happens when XID overﬂows? Ask our friends at Sentry
@wolever https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres.html 

WARNING: database "whiskey" must be vacuumed within 177009986 transactions HINT:
To avoid a database shutdown, execute a database-wide VACUUM in "whiskey". @wolever

ERROR: database is not accepting commands to avoid wraparound data
loss in database "whiskey" HINT: Stop the postmaster and use a standalone backend to VACUUM in "whiskey". @wolever

David Wolever @wolever Work with me: [email protected]  https://akindi.com/pages/jobs

References: - Postgres Internals Presentations:  http://momjian.us/main/presentations/internals.html - Especially: http://momjian.us/main/writings/pgsql/mvcc.pdf -
Introduction to Postgres' Physical Storage:  http://rachbelaid.com/introduction-to-postgres-physical-storage/ - Transaction ID wraparound:  https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres.html @wolever

DevTO - 2017-08-28 - Postgres, MVCC, and You (o...

DevTO - 2017-08-28 - Postgres, MVCC, and You (or, Why COUNT(*) is Slow)

More Decks by David Wolever

Other Decks in Technology

Featured

Transcript