Slide 1

Slide 1 text

@LukasFittl Advanced pg_stat_statements: Filtering, Regression Testing & more

Slide 2

Slide 2 text

Skilled Developer Amateur Hacker @LukasFittl

Slide 3

Slide 3 text

12mil+ unique queries tracked using pg_stat_statements pganalyze.com

Slide 4

Slide 4 text

Intro userid | 10 dbid | 1397527 query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 rows | 0 shared_blks_hit | 451 shared_blks_read | 41 shared_blks_dirtied | 26 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0 pg_stat_statements

Slide 5

Slide 5 text

Intro Supported on major cloud platforms Amazon RDS Heroku Postgres

Slide 6

Slide 6 text

Intro PgHero

Slide 7

Slide 7 text

Intro pganalyze

Slide 8

Slide 8 text

Intro PoWA

Slide 9

Slide 9 text

Intro SELECT * FROM pg_stat_statements LIMIT 1; userid | 10 dbid | 17025 queryid | 1720234670 query | SELECT * FROM x WHERE y = ? calls | 14 total_time | 0.151 rows | 28 shared_blks_hit | 14 shared_blks_read | 0 shared_blks_dirtied | 0 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0

Slide 10

Slide 10 text

Intro queryid | 1720234670 query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 Query + No. of Calls + Avg Time

Slide 11

Slide 11 text

Intro shared_blks_hit | 2447215 shared_blks_read | 55335 Avg. Shared Buffer Hit Rate: 97% hit_rate = shared_blks_hit / (shared_blks_hit + shared_blks_read)

Slide 12

Slide 12 text

Intro blk_read_time | 14.594 blk_write_time | 465.661 Time spent reading/writing to disk track_io_timing = on (has overhead!)

Slide 13

Slide 13 text

pg_query Improving Data Quality Filtering & Regression Testing

Slide 14

Slide 14 text

pg_query Filtering & Regression Testing Improving Data Quality

Slide 15

Slide 15 text

Truncation (pre 9.4) SELECT "postgres_settings".* FROM "postgres_settings" WHERE "postgres_settings"."database_id" = $1 AND "postgres_settings"."invalidated_at_snapshot_id" IS NULL AND (id not in (70288,70289,70290,70291,70292,70293,70294,70295,70296,70297,70298 ,70299,70300,70301,70302,70303,70304,70305,70306,70307,70308,70309 ,70310,70311,70312,70313,70314,70315,70316,70317,70318,70319,70320 ,70321,70322,70323,70324,70325,70326,70327,99059,99060,70330,70331 ,70332,70333,70334,70335,70336,70337,70338,99061,70340,70341,70342 ,70343,70344,70345,70346,70347,70348,70349,70350,70351,70352,70353 ,70354,70355,70356,70357,70358,70359,70360,99062,70362,70363,70364 ,70365,70366,70367,70368,70369,70370,70371,70372,70373,70374,70375 ,70376,70377,70378,70379,70380,70381,70382,70383,70384,70385,70386 ,99063,99064,99065,99066,99067,70392,70393,70394,70395,70396,70397 ,70398,70399,70400,70401,70402,70403,70404,70405,99068,70407,70408 ,70409,70410,70411,70412,70413,70414,70415,70416,70417,99069,70419 ,70420,70421,99070,70423,70424,70425,70426,70427,70428, Improving Data Quality

Slide 16

Slide 16 text

Improving Data Quality Race Condition during pg_stat_statements_reset() -[ RECORD 1 ]———+-------------------------------- query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 -[ RECORD 2 ]———+-------------------------------- query | SELECT * FROM z WHERE a = 123 calls | 50 total_time | 104.19

Slide 17

Slide 17 text

Lesson Learned: Avoid frequent pg_stat_statements_reset() Improving Data Quality

Slide 18

Slide 18 text

Fingerprinting SELECT a, b FROM x SELECT b, a FROM x != SELECT a AS b SELECT a AS c == y IN (?, ?, ?) y IN (?, ?) != Problematic: Improving Data Quality DEALLOCATE p141 DEALLOCATE p150 !=

Slide 19

Slide 19 text

PostgreSQL 9.4 Improving Data Quality query | SELECT * FROM x WHERE y = ? query | SELECT * FROM x WHERE y = ? queryid | 2515120125

Slide 20

Slide 20 text

PostgreSQL 9.5 Improving Data Quality min_time | 0.25110 max_time | 51.2103 mean_time | 10.2421 stddev_time | …

Slide 21

Slide 21 text

pg_qualstats Improving Data Quality https://github.com/dalibo/pg_qualstats WHERE column1 = ? Gives you statistics on predicates:

Slide 22

Slide 22 text

pg_stat_plans Improving Data Quality Unmaintained pg_s_s variant that differentiates between query plans. Don’t use it before this bug is fixed: https://github.com/2ndQuadrant/pg_stat_plans/issues/39

Slide 23

Slide 23 text

Improving Data Quality Filtering & Regression Testing pg_query

Slide 24

Slide 24 text

Storing & Cleaning pg_stat_statements data pg_query

Slide 25

Slide 25 text

pg_query {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Snapshot Monitoring Setup Production Database Collector {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Monitoring Database Parse Fingerprint Normalize Extract Tables

Slide 26

Slide 26 text

pg_query queries id | 7053479 database_id | 1 received_query | SELECT * FROM x WHERE y = ? normalized_query | SELECT * FROM x WHERE y = ? created_at | 2014-06-27 16:20:08.334705 updated_at | 2014-06-27 16:20:08.334705 parse_tree | [{"SELECT":{...}] parse_error | parse_warnings | statement_types | {SELECT} truncated | f fingerprint | 00704f1fd8442b7c17821cb8a61856c3d61b330e

Slide 27

Slide 27 text

pg_query query_snapshots id | 170661585 query_id | 7053479 calls | 29 total_time | 94.38 rows | 29 snapshot_id | 3386118 id | 3386118 database_id | 408 collected_at | 2014-09-09 20:10:01 submitter | pganalyze-collector 0.6.1 query_source | pg_stat_statements snapshots

Slide 28

Slide 28 text

pg_query Parse Fingerprint Normalize Extract Tables

Slide 29

Slide 29 text

pg_query Parsing an SQL Query Parse Fingerprint Normalize Extract Tables

Slide 30

Slide 30 text

pg_query EXPLAIN (PARSETREE TRUE) SELECT * FROM x WHERE y = 1 ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} Unfortunately doesn’t exist.

Slide 31

Slide 31 text

pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute

Slide 32

Slide 32 text

pg_query tree = raw_parser(query_str); str = nodeToString(tree); printf(str); ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24}

Slide 33

Slide 33 text

pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute

Slide 34

Slide 34 text

github.com/pganalyze/pg_query pg_query Extension Compiles parts of PostgreSQL when you do “gem install pg_query”

Slide 35

Slide 35 text

pg_query ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} :groupClause <> :havingClause <> :windowClause <> :valuesLists <> :sortClause <> :limitOffset <> :limitCount <> :lockingClause <> :withClause <> PgQuery._raw_parse( “SELECT * FROM x WHERE y = 1”)

Slide 36

Slide 36 text

pg_query WARNING: 01000: could not dump unrecognized node type: 754 PgQuery._raw_parse(“CREATE SCHEMA foo”) nodeToString is incomplete :(

Slide 37

Slide 37 text

src/backend/nodes/outfuncs.c pg_query Patch: Generate automatically, JSON output

Slide 38

Slide 38 text

pg_query PgQuery._raw_parse( “SELECT * FROM x WHERE y = 1”) [{"SELECT": { "targetList": [{ "RESTARGET": { "val": { "COLUMNREF": { "fields": [{"A_STAR": {}}], "location": 7 } }, "location": 7 } } ], "fromClause": [ { "RANGEVAR": { "relname": "x", "inhOpt": 2, "relpersistence": "p", "location": 14 } } ], "whereClause": { "AEXPR": { "name": [ "=" ], "lexpr": {

Slide 39

Slide 39 text

pg_query Parsing a normalized SQL query Parse Fingerprint Normalize Extract Tables

Slide 40

Slide 40 text

EXPLAIN SELECT * FROM x WHERE y = 1 QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.15..8.17 rows=1 width=140) Index Cond: (id = 1) Parse Analyze Plan pg_query

Slide 41

Slide 41 text

EXPLAIN SELECT * FROM x WHERE y = ? ERROR: syntax error at or near ";" LINE 1: EXPLAIN SELECT * FROM x WHERE y = ?; Parse Analyze Plan pg_query

Slide 42

Slide 42 text

EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan pg_query

Slide 43

Slide 43 text

pg_query Parser Patch to support parsing “?”

Slide 44

Slide 44 text

pg_query Downside: Breaks ? operator in some cases Real fix: Don’t use ? as a replacement character.

Slide 45

Slide 45 text

pg_query Fingerprinting Parse Fingerprint Normalize Extract Tables

Slide 46

Slide 46 text

pg_query > require ‘pg_query’ > q1 = PgQuery.parse(‘SELECT a, b FROM x’) > q1.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”] > q2 = PgQuery.parse(‘SELECT b, a FROM x’) > q2.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”]

Slide 47

Slide 47 text

pg_query 40 lines of unit-tested Ruby code

Slide 48

Slide 48 text

pg_query Extracting Table References Parse Fingerprint Normalize Extract Tables

Slide 49

Slide 49 text

pg_query > require ‘pg_query’ > q = PgQuery.parse(‘SELECT * FROM x’) > q.tables [“x”]

Slide 50

Slide 50 text

pg_query ~90 lines of unit-tested Ruby code

Slide 51

Slide 51 text

github.com/pganalyze/pg_query pg_query

Slide 52

Slide 52 text

pg_query Improving Data Quality Filtering & Regression Testing

Slide 53

Slide 53 text

Filtering Filtering & Regression Testing

Slide 54

Slide 54 text

pg_qtop Simple top-like tool that shows pg_stat_statements data Filtering & Regression Testing https://github.com/lfittl/pg_qtop

Slide 55

Slide 55 text

Filtering & Regression Testing AVG | QUERY -------------------------------------------------------------------------------- 10.7ms | SELECT oid, typname, typelem, typdelim, typinput FROM pg_type 3.0ms | SET time zone 'UTC' 0.4ms | SELECT a.attname, format_type(a.atttypid, a.atttypmod), pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = ?::regclass AND a.attnum > ? AND NOT a.attisdropped ORDER BY a.attnum 0.2ms | SELECT pg_stat_statements_reset() 0.1ms | SELECT query, calls, total_time FROM pg_stat_statements 0.1ms | SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[?] WHERE cons.contype = ? AND cons.conrelid = ?: :regclass 0.0ms | SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in (?,?) AND c.relname = ? AND n.nspname = ANY (current_schemas(?)) 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SET client_min_messages TO 'panic' 0.0ms | set client_encoding to 'UTF8' 0.0ms | SHOW client_min_messages 0.0ms | SELECT * FROM ad_reels WHERE id = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT ? 0.0ms | SET client_min_messages TO 'warning' 0.0ms | SET standard_conforming_strings = on 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ? 0.0ms | SHOW TIME ZONE pg_qtop -d testdb

Slide 56

Slide 56 text

Filtering & Regression Testing AVG | QUERY -------------------------------------------------------------------------------- 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ? pg_qtop -d testdb -t posts

Slide 57

Slide 57 text

Filtering & Regression Testing AVG | CALLS | HIT RATE | QUERY -------------------------------------------------------------------------------- 0.1ms | 1 | 100.0 | SELECT * FROM users; 0.1ms | 1 | - | SELECT * FROM databases; 0.0ms | 1 | - | SELECT * FROM invoices; 0.0ms | 1 | - | SELECT * FROM query_snapshots; pg_qtop -d testdb -s select

Slide 58

Slide 58 text

github.com/lfittl/pg_qtop Filtering & Regression Testing gem install pg_qtop

Slide 59

Slide 59 text

Regression Testing Filtering & Regression Testing

Slide 60

Slide 60 text

Which query plans are affected by removal of an index? How would execution plans be affected by an upgrade to 9.X? Filtering & Regression Testing

Slide 61

Slide 61 text

Regression Test based on pg_stat_statements + table statistics. (no actual data) Filtering & Regression Testing

Slide 62

Slide 62 text

"n_live_tup": 75, "relpages": 1, "reltuples": 75.0, “stanumbers1": [..], "stavalues1": “{..}”, … Schema Dump + Table Level Statistics Local Test Database Testing Setup Production Database EXPLAIN SELECT FROM x WHERE y = ? Filtering & Regression Testing

Slide 63

Slide 63 text

EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan Filtering & Regression Testing

Slide 64

Slide 64 text

y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing

Slide 65

Slide 65 text

y = $1 y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing

Slide 66

Slide 66 text

y = $1 y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing

Slide 67

Slide 67 text

y = $1 y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer y = (SELECT null::integer) QUERY PLAN ---------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing

Slide 68

Slide 68 text

Finding out the type y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; PREPARE tmp AS SELECT * FROM x WHERE y = $1; SELECT unnest(parameter_types) AS data_type FROM pg_prepared_statements WHERE name = ‘tmp’; DEALLOCATE tmp; data_type ----------- integer pg_prepared_statements Filtering & Regression Testing

Slide 69

Slide 69 text

EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $0 EXPLAIN SELECT * FROM x WHERE y = ((SELECT null::integer)::integer) QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) Parse Analyze Plan Filtering & Regression Testing

Slide 70

Slide 70 text

Open Issue: Planner reads actual physical size whilst planning Filtering & Regression Testing

Slide 71

Slide 71 text

github.com/pganalyze/pg_simulator Filtering & Regression Testing (highly experimental!)

Slide 72

Slide 72 text

pg_query Improving Data Quality Filtering & Regression Testing

Slide 73

Slide 73 text

Monitoring User: Get pg_stat_statements data using a security definer method (created by superuser) Closing

Slide 74

Slide 74 text

Proposal for outfuncs.c: Generate automatically from struct definitions, cutting 3000 hand-written lines down to 1000. Add JSON output support. Closing

Slide 75

Slide 75 text

Proposal: Consider adding a way to get a parsetree more easily. Via SQL / shared library / helper tool. Closing

Slide 76

Slide 76 text

Proposal for pg_s_s: Instead of ? use $0 as replacement character - making the output parseable again. Closing

Slide 77

Slide 77 text

Future idea: Annotation plugins for web frameworks, as well as parsing of PL/pgSQL methods. “Where is that query called from?” Closing

Slide 78

Slide 78 text

Tools & libraries available at Closing github.com/pganalyze

Slide 79

Slide 79 text

@LukasFittl Thank you! github.com/pganalyze pganalyze.com