Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevTO - 2017-08-28 - Postgres, MVCC, and You (or, Why COUNT(*) is Slow)

DevTO - 2017-08-28 - Postgres, MVCC, and You (or, Why COUNT(*) is Slow)

David Wolever

August 28, 2017
Tweet

More Decks by David Wolever

Other Decks in Technology

Transcript

  1. Transactional Databases • A series of operations which only make

    sense when performed together • "Everything succeeds or everything fails" Example: transferring $100 between bank accounts: 1. Withdraw $100 from first account 2. Deposit $100 into second account @wolever
  2. Transactional Databases Another example: changing a test’s answer key. 1.

    Update the answer key 2. Recalculate student grades 3. Recalculate the test’s average mark @wolever
  3. Transactional Databases @wolever =# BEGIN;
 =# UPDATE assessments SET answer_key

    = 'ABC';
 =# regrade_assessment_responses();
 =# UPDATE assessments SET average_grade = AVG(
 -# SELECT grade FROM responses
 -# );
 =# COMMIT;
  4. Transactions can be faked… @wolever … but that’s out of

    the scope of this talk See, eg: http://blog.codekills.net/2014/03/13/atomic-bank-balance- transfer-with-couchdb/
  5. Transactional Databases Supports transactions: • SQL Databases (PostgreSQL, MySQL, MSSQL,

    Oracle, etc) via BEGIN/COMMIT • Redis via MULTI/EXEC • Neo4j @wolever Doesn’t support transactions: • Document stores: MongoDB, CouchDB, Solr, etc • Distributed data stores: Cassandra, Hadoop, Riak, etc • Most KV stores: memcachd, Kyoto Cabinet, LevelDB, etc
  6. The "A" in ACID @wolever • All databases: single-statement atomicity

    • Transactional databases: multi-statement atomicity
  7. “The reason why this is slow is related to the

    MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense.” - https://wiki.postgresql.org/wiki/Slow_Counting @wolever
  8. The database is just a big tree structure… Surely each

    node in the tree can just store a count of the number of leaves, right? @wolever
  9. =# SELECT ctid, name, rating FROM whiskey -# WHERE name

    LIKE '%Laddie Ten%'; ctid | name | rating ----------+----------------------------------+-------- (136,26) | Bruichladdich The Laddie Ten | 91 (136,27) | Bruichladdich The Laddie Ten | 85 Page number Index in page @wolever
  10. =# SELECT * FROM users; id | name ----+--------- 1

    | David 2 | Alex =# BEGIN; =# DELETE FROM users; =# ROLLBACK; @wolever
  11. Each row ("tuple") has two hidden columns: xmin and xmax

    =# SELECT xmin, xmax, name, rating -# FROM whiskey -# WHERE name LIKE '%Laddie Ten%'; xmin | xmax | name | rating --------+---------+----------------------------------+-------- 122406 | 0 | Bruichladdich The Laddie Ten | 91 122406 | 1831785 | Bruichladdich The Laddie Ten | 85 xmin: XID which created the row xmax: XID which updated or deleted the row @wolever
  12. =# BEGIN; =# SELECT txid_current(); 1234; =# INSERT INTO whiskey

    -# VALUES ('Bruich Laddie The Laddie Ten', 99); =# SELECT xmin, xmax, name, rating -# FROM whiskey -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1234 | 0 | Bruichladdich The Laddie Ten | 99 =# COMMIT; @wolever
  13. =# BEGIN; =# SELECT txid_current(); 1235; =# UPDATE whiskey SET

    rating = 100 WHERE rating = 99; =# SELECT xmin, xmax, name, rating
 -# FROM whiskey
 -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1235 | 0 | Bruichladdich The Laddie Ten | 100 ^^^^ A new tuple was inserted! =# ROLLBACK; =# SELECT xmin, xmax, name, rating
 -# FROM whiskey
 -# WHERE name LIKE '%Laddie Ten%' LIMIT 1; xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1234 | 1235 | Bruichladdich The Laddie Ten | 99 @wolever
  14. xmin | xmax | name | rating -------+-------+----------------------------------+-------- 1235 |

    1235 | Bruichladdich The Laddie Ten | 100 And if we could look into the db page, we'd see this tuple: @wolever
  15. =# SELECT * FROM users; id | name ----+--------- 1

    | David 2 | Alex =# BEGIN; =# DELETE FROM users; =# ROLLBACK; @wolever
  16. def is_tuple_visible(cur_xid, row): if txn_is_aborted(row.xmin) or row.xmin > cur_xid: return

    False return ( txn_is_aborted(row.xmax) or tuple.xmax >= cur_xid ) def txn_is_aborted(xid): # See pg_clog and "hint bits" for details # https://wiki.postgresql.org/wiki/Hint_Bits @wolever
  17. COUNT(*) is slow because there is no one “correct” COUNT(*)

    COUNT(*) depends on the current transaction @wolever
  18. What happens when XID overflows? Ask our friends at Sentry

    @wolever https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres.html

  19. WARNING: database "whiskey" must be vacuumed within 177009986 transactions HINT:

    To avoid a database shutdown, execute a database-wide VACUUM in "whiskey". @wolever
  20. ERROR: database is not accepting commands to avoid wraparound data

    loss in database "whiskey" HINT: Stop the postmaster and use a standalone backend to VACUUM in "whiskey". @wolever
  21. References: - Postgres Internals Presentations:
 http://momjian.us/main/presentations/internals.html - Especially: http://momjian.us/main/writings/pgsql/mvcc.pdf -

    Introduction to Postgres' Physical Storage:
 http://rachbelaid.com/introduction-to-postgres-physical-storage/ - Transaction ID wraparound:
 https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres.html @wolever