Slide 1

Slide 1 text

Deep dive into PostgreSQL Statistics. Alexey Lesovsky

Slide 2

Slide 2 text

Agenda PostgreSQL activity statistics. How to use statistics effectively. Solve problems with statistics.

Slide 3

Slide 3 text

Don't talk about Nothing about query planner statistics. Tools and programs overview.

Slide 4

Slide 4 text

Goals Statistics is useful. Using statistics are not scary. Which statistics can be used in particular situation.

Slide 5

Slide 5 text

Black box $ ps hf -u postgres -o cmd /usr/pgsql-9.4/bin/postgres -D /var/lib/pgsql/9.4/data \_ postgres: logger process \_ postgres: checkpointer process \_ postgres: writer process \_ postgres: wal writer process \_ postgres: autovacuum launcher process \_ postgres: stats collector process \_ postgres: postgres pgbench [local] idle in transaction \_ postgres: postgres pgbench [local] idle \_ postgres: postgres pgbench [local] UPDATE \_ postgres: postgres pgbench [local] UPDATE waiting \_ postgres: postgres pgbench [local] UPDATE

Slide 6

Slide 6 text

Where time is wasted Write Ahead Log Shared Buffers Buffers IO Autovacuum Workers Autovacuum Launcher Background Workers Indexes IO Query Execution Query Planning Client Backends Postmaster Relations IO Logger Process Stats Collector Logical Replication WAL Sender Process Archiver Process Background Writer Checkpointer Process Network Storage Recovery Process WAL Receiver Process Tables/Indexes Data Files

Slide 7

Slide 7 text

Problems Information too much (more than 100 counters in 9.4). Statistics provided as an online counters. No history (but reset functions are available). No native handy stat tools in PostgreSQL. Too many 3rd party tools and programs.

Slide 8

Slide 8 text

Problems Information too much (more than 100 counters in 9.4). Statistics provided as an online counters. No history (but reset functions are available). No native handy stat tools in PostgreSQL. Too many 3rd party tools and programs. Important to use stats directly from Postgres. Basic SQL skills are required.

Slide 9

Slide 9 text

Statistics have Events happens in postgres cluster. Objects properties (databases, tables, indexes). Time spent on events.

Slide 10

Slide 10 text

Statistics sources Counters in shared memory. Functions. Builtin views. Official contribs package. Unofficial contribs.

Slide 11

Slide 11 text

Statistics sources pg_stat_all_tables pg_current_xlog_location() pg_replication_slots pg_stat_replication pg_stat_archiver pg_stat_database_conflicts pg_buffercache pg_stat_database pg_stat_user_tables pg_stat_database pg_stat_bgwrier pgstattuple pg_table_size() pg_database_size pg_index_size() pg_stat_activity EXPLAIN pg_locks pg_stat_user_functions pg_stat_all_indexes pg_statio_all_indexes pg_statio_all_tables pg_xlog_location_diff() pg_stat_kcache pg_stat_statements pg_roles pg_database pg_class pg_relation_size() pg_total_relation_size() current_setting() pg_current_xlog_insert_location() pg_last_xact_replay_timestamp() pg_last_xlog_replay_location() pg_tablespace_size() pg_ls_dir()

Slide 12

Slide 12 text

Write Ahead Log Shared Buffers Buffers IO Autovacuum Workers Autovacuum Launcher Background Workers Indexes IO Query Execution Query Planning Client Backends Postmaster Tables IO Logger Process Stats Collector Logical Replication WAL Sender Process Archiver Process Background Writer Checkpointer Process Network Storage Recovery Process WAL Receiver Process Tables/Indexes Data Files pg_stat_all_tables pg_current_xlog_location pg_replication_slots pg_stat_replication pg_stat_archiver pg_stat_database_conflicts pg_buffercache pg_stat_database pg_stat_activity pg_stat_user_tables pg_stat_database pg_stat_bgwrier pgstattuple pg_table_size pg_database_size pg_index_size pg_stat_activity EXPLAIN pg_stat_statements pg_locks pg_stat_user_functions pg_stat_all_indexes pg_stat_database pg_statio_all_indexes pg_statio_all_tables pg_xlog_location_diff nicstat* iostat* pg_stat_kcache pg_stat_statements

Slide 13

Slide 13 text

pg_stat_database $ select * from pg_stat_database where datname = 'shop'; -[ RECORD 1 ]--+------------------------------------------- datid | 16414 datname | shop numbackends | 34 xact_commit | 51167155051 xact_rollback | 44781878826 blks_read | 7978770895 blks_hit | 9683551077519 tup_returned | 12507331807583 tup_fetched | 3885840966616 tup_inserted | 2898024857 tup_updated | 3082071349 tup_deleted | 2751363323 conflicts | 0 temp_files | 377675 temp_bytes | 4783712399875 deadlocks | 151 blk_read_time | 214344118.089 blk_write_time | 1260880.747 stats_reset | 2015-05-31 11:37:52.017967+03

Slide 14

Slide 14 text

Cache hit ratio $ select * from pg_stat_database; ... blks_read | 7978770895 blks_hit | 9683551077519 ... $ select sum(blks_hit)*100/sum(blks_hit+blks_read) as hit_ratio from pg_stat_database; More = better, and not less than 90%

Slide 15

Slide 15 text

Anomalies $ select * from pg_stat_database; ... xact_commit | 51167155051 xact_rollback | 44781878826 conflicts | 0 temp_files | 377675 temp_bytes | 4783712399875 deadlocks | 151 ... $ select datname, (xact_commit*100)/(xact_commit+xact_rollback) as c_ratio, deadlocks, conflicts, temp_file, pg_size_pretty(temp_bytes) as temp_size from pg_stat_database;

Slide 16

Slide 16 text

Anomalies $ select * from pg_stat_database; ... xact_commit | 51167155051 < 95 % xact_rollback | 44781878826 > 5 % conflicts | 0 > 10 temp_files | 377675 > 100 temp_bytes | 4783712399875 > 10 GB deadlocks | 151 > 10 ... $ select datname, (xact_commit*100)/(xact_commit+xact_rollback) as c_ratio, deadlocks, conflicts, temp_file, pg_size_pretty(temp_bytes) as temp_size from pg_stat_database;

Slide 17

Slide 17 text

pg_stat_bgwriter $ select * from pg_stat_bgwriter; -[ RECORD 1 ]---------+------------------------------------ checkpoints_timed | 3267 checkpoints_req | 6 checkpoint_write_time | 10416478591 checkpoint_sync_time | 405039 buffers_checkpoint | 2518065526 buffers_clean | 99602799 maxwritten_clean | 157 buffers_backend | 219356924 buffers_backend_fsync | 0 buffers_alloc | 3477374822 stats_reset | 2015-05-31 11:09:48.413185+03

Slide 18

Slide 18 text

Checkpoint's $ select * from pg_stat_bgwriter; ... checkpoints_timed | 3267 checkpoints_req | 6 ... checkpoints_req > checkpoints_timed = bad postgresql.conf ● checkpoint_segments ● checkpoint_timeout ● checkpoint_completion_target

Slide 19

Slide 19 text

Background Writer $ select * from pg_stat_bgwriter; ... buffers_clean | 99602799 maxwritten_clean | 157 > 0 buffers_backend_fsync | 0 > 0 ... Clean dirty pages for backends. Helps checkpointer. postgresql.conf ● bgwriter_delay ● bgwriter_lru_maxpages ● bgwriter_lru_multiplier

Slide 20

Slide 20 text

pg_stat_replication $ select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------------ pid | 26921 usesysid | 15588142 usename | replica application_name | walreceiver client_addr | 10.0.0.7 client_hostname | client_port | 32956 backend_start | 2015-10-01 19:14:42.979377+03 state | streaming sent_location | 1691/EEE65900 write_location | 1691/EEE65900 flush_location | 1691/EEE65900 replay_location | 1691/EEE658D0 sync_priority | 0 sync_state | async

Slide 21

Slide 21 text

Replication lag $ select * from pg_stat_replication; ... sent_location | 1691/EEE65900 write_location | 1691/EEE65900 flush_location | 1691/EEE65900 replay_location | 1691/EEE658D0 ... 1692/EEE65900 — location in transaction log (WAL) All values are equals = ideal

Slide 22

Slide 22 text

Replication lag What can cause lag: Networking Storage CPU How much bytes written in WAL $ select pg_xlog_location_diff(pg_current_xlog_location(),'0/00000000'); Replication lag in bytes $ select client_addr, pg_xlog_location_diff(pg_current_xlog_location(), replay_location) from pg_stat_replication; Replication lag in seconds $ select extract(epoch from now() - pg_last_xact_replay_timestamp());

Slide 23

Slide 23 text

pg_stat_all_tables $ select * from pg_stat_all_tables; -[ RECORD 1 ]-------+------------------------------------ relid | 98221 schemaname | public relname | clientsession seq_scan | 192 seq_tup_read | 364544695 idx_scan | 2691939318 idx_tup_fetch | 2669551448 n_tup_ins | 239532851 n_tup_upd | 736119030 n_tup_del | 239898968 n_tup_hot_upd | 497688344 n_live_tup | 3493472 n_dead_tup | 1606 n_mod_since_analyze | 349526 ...

Slide 24

Slide 24 text

Sequential scans $ select * from pg_stat_all_tables; ... seq_scan | 192 seq_tup_read | 364544695 > 1000 (seq_tup_avg) ... $ select relname, pg_size_pretty(pg_relation_size(relname::regclass)) as size, seq_scan, seq_tup_read, seq_scan / seq_tup_read as seq_tup_avg from pg_stat_user_tables where seq_tup_read > 0 order by 3,4 desc limit 5;

Slide 25

Slide 25 text

Tables size $ select relname, pg_size_pretty(pg_total_relation_size(relname::regclass)) as full_size, pg_size_pretty(pg_relation_size(relname::regclass)) as table_size, pg_size_pretty(pg_total_relation_size(relname::regclass) - pg_relation_size(relname::regclass)) as index_size from pg_stat_user_tables order by pg_total_relation_size(relname::regclass) desc limit 10; psql meta-commands: \dt+ and \di+

Slide 26

Slide 26 text

Write activity $ select * from pg_stat_all_tables; ... n_tup_upd | 736119030 n_tup_hot_upd | 497688344 ... UPDATE = DELETE + INSERT UPDATE = TABLE UPDATE + INDEX UPDATE

Slide 27

Slide 27 text

Write activity $ select s.relname, pg_size_pretty(pg_relation_size(relid)), coalesce(n_tup_ins,0) + 2 * coalesce(n_tup_upd,0) - coalesce(n_tup_hot_upd,0) + coalesce(n_tup_del,0) AS total_writes, (coalesce(n_tup_hot_upd,0)::float * 100 / (case when n_tup_upd > 0 then n_tup_upd else 1 end)::float)::numeric(10,2) AS hot_rate, (select v[1] FROM regexp_matches(reloptions::text,E'fillfactor=(\\d+)') as r(v) limit 1) AS fillfactor from pg_stat_all_tables s join pg_class c ON c.oid=relid order by total_writes desc limit 50; What is Heap-Only Tuples? HOT do not cause index update. HOT only for non-indexed columns. Big n_tup_hot_upd = good. How increase n_tup_hot_upd?

Slide 28

Slide 28 text

Write activity | pg_size_pretty | total_writes | hot_rate | fillfactor -+----------------+--------------+----------+------------ | 9418 MB | 391459091 | 5.03 | | 417 MB | 285948760 | 0.00 | | 76 MB | 232031972 | 87.31 | 70 | 7123 MB | 124281107 | 99.36 | 70 Fillfactor determine page packing ratio (between 10 and 100). Tables (and indexes) size overhead. ALTER TABLE table_name SET (fillfactor = 70);

Slide 29

Slide 29 text

Autovacuum queue $ select c.relname, current_setting('autovacuum_vacuum_threshold') as av_base_thresh, current_setting('autovacuum_vacuum_scale_factor') as av_scale_factor, (current_setting('autovacuum_vacuum_threshold')::int + (current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples)) as av_thresh, s.n_dead_tup from pg_stat_user_tables s join pg_class c ON s.relname = c.relname where s.n_dead_tup > (current_setting('autovacuum_vacuum_threshold')::int + (current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples)); | av_base_thresh | av_scale_factor | av_thresh | n_dead_tup +----------------+-----------------+-----------+------------ | 50 | 0.01 | 60302 | 85445 | 50 | 0.01 | 50 | 1349 | 50 | 0.01 | 55 | 12479 | 50 | 0.01 | 103 | 3134 | 50 | 0.01 | 52716 | 173100

Slide 30

Slide 30 text

Autovacuum queue $ select c.relname, current_setting('autovacuum_vacuum_threshold') as av_base_thresh, current_setting('autovacuum_vacuum_scale_factor') as av_scale_factor, (current_setting('autovacuum_vacuum_threshold')::int + (current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples)) as av_thresh, s.n_dead_tup from pg_stat_user_tables s join pg_class c ON s.relname = c.relname where s.n_dead_tup > (current_setting('autovacuum_vacuum_threshold')::int + (current_setting('autovacuum_vacuum_scale_factor')::float * c.reltuples)); | av_base_thresh | av_scale_factor | av_thresh | n_dead_tup +----------------+-----------------+-----------+------------ | 50 | 0.01 | 60302 | 60545 | 50 | 0.01 | 50 | 1349 | 50 | 0.01 | 55 | 12479 | 50 | 0.01 | 103 | 3134 | 50 | 0.01 | 52716 | 173100 Avito Experience https://github.com/eshkinkot/avito/blob/master/munin/vacuum_queue

Slide 31

Slide 31 text

pg_stat_all_indexes $ select * from pg_stat_all_indexes where idx_scan = 0; -[ RECORD 1 ]-+------------------------------------------ relid | 98242 indexrelid | 55732253 schemaname | public relname | products indexrelname | products_special2_idx idx_scan | 0 idx_tup_read | 0 idx_tup_fetch | 0

Slide 32

Slide 32 text

Unused indexes $ select * from pg_stat_all_indexes where idx_scan = 0; ... indexrelname | products_special2_idx idx_scan | 0 0 = bad ... https://github.com/PostgreSQL-Consulting/pg-utils/blob/master/sql/lo w_used_indexes.sql http://www.databasesoup.com/2014/05/new-finding-unused-indexes-q uery.html

Slide 33

Slide 33 text

Unused indexes $ select * from pg_stat_all_indexes where idx_scan = 0; ... indexrelname | products_special2_idx idx_scan | 0 0 = bad ... https://github.com/PostgreSQL-Consulting/pg-utils/blob/master/sql/lo w_used_indexes.sql http://www.databasesoup.com/2014/05/new-finding-unused-indexes-q uery.html Unused indexes are bad. Use storage. Slow down UPDATE, DELETE, INSERT operations. Extra work for VACUUM.

Slide 34

Slide 34 text

pg_stat_activity $ select * from pg_stat_activity; -[ RECORD 1 ]----+-------------------------------------------- datid | 16401 datname | ts pid | 116408 usesysid | 16384 usename | tsagent application_name | unicorn_rails worker client_addr | 172.17.0.37 client_hostname | client_port | 50888 backend_start | 2015-10-14 15:18:03.01039+00 xact_start | query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 waiting | f state | idle backend_xid | backend_xmin | query | COMMIT

Slide 35

Slide 35 text

Summary activity $ select * from pg_stat_activity; ... datname | ts usename | tsagent client_addr | 172.17.0.37 ... Used connections ratio $ select count(*)*100/(select current_setting('max_connections')::int) from pg_stat_activity;

Slide 36

Slide 36 text

Summary activity $ select * from pg_stat_activity; ... datname | ts usename | tsagent client_addr | 172.17.0.37 ... $ select client_addr, usename, datname, count(*) from pg_stat_activity group by 1,2,3 order by 4 desc; client_addr | usename | datname | count -------------+-----------+---------+------- 127.0.0.1 | cron_role | ts | 508 127.0.0.1 | tsagent | ts | 60 127.0.0.1 | postgres | ts | 1 127.0.0.1 | pupkin | ts | 1

Slide 37

Slide 37 text

Long queries $ select * from pg_stat_activity; ... backend_start | 2015-10-14 15:18:03.01039+00 xact_start | 2015-10-14 15:21:15.336325+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 ...

Slide 38

Slide 38 text

Long queries $ select * from pg_stat_activity; ... backend_start | 2015-10-14 15:18:03.01039+00 xact_start | 2015-10-14 15:21:15.336325+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 ... $ select client_addr, usename, datname, clock_timestamp() - xact_start as xact_age, clock_timestamp() - query_start as query_age, query from pg_stat_activity order by xact_start, query_start;

Slide 39

Slide 39 text

Long queries $ select * from pg_stat_activity; ... backend_start | 2015-10-14 15:18:03.01039+00 xact_start | 2015-10-14 15:21:15.336325+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 ... $ select client_addr, usename, datname, clock_timestamp() - xact_start as xact_age, clock_timestamp() - query_start as query_age, query from pg_stat_activity order by xact_start, query_start; clock_timestamp() for calculating query or transaction age. Long queries: remember, terminate, optimize.

Slide 40

Slide 40 text

Bad xacts $ select * from pg_stat_activity where state in ('idle in transaction', 'idle in transaction (aborted)'; ... xact_start | 2015-10-14 15:21:21.128192+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 state | idle in transaction ...

Slide 41

Slide 41 text

Bad xacts $ select * from pg_stat_activity where state in ('idle in transaction', 'idle in transaction (aborted)'; ... xact_start | 2015-10-14 15:21:21.128192+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 state | idle in transaction ... idle in transaction, idle in transaction (aborted) = bad Warning value: > 5 clock_timestamp() for calculate xact age. Bad xacts: remember, terminate, optimize app.

Slide 42

Slide 42 text

Waiting clients $ select * from pg_stat_activity where waiting; ... xact_start | 2015-10-14 15:21:21.128192+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 waiting | t ...

Slide 43

Slide 43 text

Waiting clients $ select * from pg_stat_activity where waiting; ... xact_start | 2015-10-14 15:21:21.128192+00 query_start | 2015-10-14 15:21:30.336325+00 state_change | 2015-10-14 15:21:30.33635+00 waiting | t ... waiting = true = bad. clock_timestamp() for calculating query or xact age. Use pg_locks for searching blocking query or xact. Waiting queries: remember, terminate, optimize app.

Slide 44

Slide 44 text

Блокировки https://github.com/lesovsky/uber-scripts/blob/master/postgresql/sql/c4_06_sho w_locked_queries.sql http://big-elephants.com/2013-09/exploring-query-locks-in-postgres/ locked_item | products waiting | t waiting_query | update products set avail = false where id = 154; waiting_mode | RowExclusiveLock waiting_xact_age | 00:00:07.103685 waiting_query_age | 00:00:01.813508 waiting_pid | 5895 waiting_user | app_role waiting_state | active waiting | f locking_query | alter table products drop column avail; locking_mode | AccessExclusiveLock locking_xact_age | 00:00:16.416441 locking_query_age | 00:00:14.416441 locking_pid | 5844 locking_user | postgres locking_state | idle in transaction

Slide 45

Slide 45 text

pg_stat_statements $ select * from pg_stat_statements where blk_read_time <> 0 order by blk_read_time desc; -[ RECORD 1 ]-------+--------------------------------------- userid | 25078444 dbid | 16411 query | SELECT "id" FROM run_plan_xact(?) calls | 11165832 total_time | 11743325.6880088 rows | 11165832 shared_blks_hit | 351353214 shared_blks_read | 205557 shared_blks_dirtied | 256053 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 68894 local_blks_dirtied | 68894 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 495425.535999976 blk_write_time | 0

Slide 46

Slide 46 text

pg_stat_statements $ select * from pg_stat_statements; ... query | SELECT "id" FROM run_plan_xact(?) calls | 11165832 total_time | 11743325.6880088 rows | 11165832 blk_read_time | 495425.535999976 blk_write_time | 0 Statements average time in ms $ select (sum(total_time) / sum(calls))::numeric(6,3) from pg_stat_statements; Most writing (to shared_buffers) queries $ select query, shared_blks_dirtied from pg_stat_statements where shared_blks_dirtied > 0 order by 2 desc;

Slide 47

Slide 47 text

Query reports query total time: 15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%) сalls: 476 (0.00%) rows: 476,000 avg_time: 118881.54ms (IO: 21.2%) user: app_user db: ustats query: select filepath, type, deviceuid from imv5event where state = ?::eventstate and servertime between $1 and $2 order by servertime desc LIMIT $3 OFFSET $4

Slide 48

Slide 48 text

Query reports query total time: 15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%) сalls: 476 (0.00%) rows: 476,000 avg_time: 118881.54ms (IO: 21.2%) user: app_user db: ustats query: select filepath, type, deviceuid from imv5event where state = ?::eventstate and servertime between $1 and $2 order by servertime desc LIMIT $3 OFFSET $4 Use sum() for calculating totals. Calculate our query «contribution» in totals. Resource usage (CPU, IO).

Slide 49

Slide 49 text

Behind the scene pg_statio_all_tables, pg_statio_all_indexes. pg_stat_user_functions. Size functions - \df *size* pgstattuple (official contrib) ● Bloat estimation for tables and indexes. ● Estimation time depends on table (or index) size. pg_buffercache (official contrib) ● Shared buffers inspection. ● Heavy performance impact (buffers lock).

Slide 50

Slide 50 text

Behind the scene pgfincore (3rd party contrib) ● Low-level operations with tables using mincore(). ● OS page cache inspection. pg_stat_kcache (3rd party contrib) ● Using getrusage() before and after query. ● CPU usage and real filesystem operations stats. ● Requires pg_stat_statements and postgresql-9.4. ● No performance impact.

Slide 51

Slide 51 text

Resume ● The ability to use statistics is useful. ● Statistics in Postgres is not difficult. ● Statistics will help answer the questions. ● Do experiment.

Slide 52

Slide 52 text

Links Author, write more!! www.blog.postgresql-consulting.com (english) www.thislinux.org (russian only) The Statistics Collector http://www.postgresql.org/docs/current/static/monitoring-stats.html System Administration Functions http://www.postgresql.org/docs/current/static/functions-admin.html Contrib modules http://www.postgresql.org/docs/current/static/pgstatstatements.html http://www.postgresql.org/docs/current/static/pgstattuple.html http://www.postgresql.org/docs/current/static/pgbuffercache.html https://github.com/klando/pgfincore https://github.com/dalibo/pg_stat_kcache SQL utils and sql code examples https://github.com/PostgreSQL-Consulting/pg-utils

Slide 53

Slide 53 text

Write Ahead Log Shared Buffers Buffers IO Autovacuum Workers Autovacuum Launcher Background Workers Indexes IO Query Execution Query Planning Client Backends Postmaster Tables IO Logger Process Stats Collector Logical Replication WAL Sender Process Archiver Process Background Writer Checkpointer Process Network Storage Recovery Process WAL Receiver Process Tables/Indexes Data Files pg_stat_all_tables pg_current_xlog_location pg_replication_slots pg_stat_replication pg_stat_archiver pg_stat_database_conflicts pg_buffercache pg_stat_database pg_stat_activity pg_stat_user_tables pg_stat_database pg_stat_bgwrier pgstattuple pg_table_size pg_database_size pg_index_size pg_stat_activity EXPLAIN pg_stat_statements pg_locks pg_stat_user_functions pg_stat_all_indexes pg_stat_database pg_statio_all_indexes pg_statio_all_tables pg_xlog_location_diff nicstat* iostat* pg_stat_kcache pg_stat_statements Thanks. Questions?