Automated DBA (Percona Live 2018)

Automated DBA Nikolay Samokhvalov twitter: @postgresmen email: [email protected]

About me Postgres experience: 12+ years (database systems: 17+) Founder
and CTO of 3 startups (total 30M+ users), all based on Postgres Founder of #RuPostgres (1600+ members on Meetup.com) Twitter: @postgresmen Private consulting practice in the U.S. (SF Bay Area) http://PostgreSQL.support 2

Popularity of Postgres is booming among startups 3 YCombinator HackerNews
Hiring Trends https://www.hntrends.com/

Hacker News “Who is hiring” – April 2018 https://news.ycombinator.com/item?id=16735011 List
of job postings, popular among startups. 1068 messages (as of Apr 17 2018) 4

Where are DBAs? Are they disappearing? Why? 7

Where are DBAs? Are they disappearing? Why? • Clouds •
DevOps • any other reasons? 8

DevOps • any other reasons? What is “new DBA”? ...just “database expert”? ...dbOps? 9

DevOps • any other reasons? What is “new DBA”? ...just “database expert”? ...dbOps? Speaking of DevOps... Isn’t it the same/similar story? 10

Already automated: • Setup/tune hardware, OS, FS • Provision Postgres
instances • Create replicas • High Availability: detect failures and switch to replicas • Create backups • Basic monitoring 11

Already automated: • Postgres parameters tuning • Query analysis and
optimization • Index set optimization • Detailed monitoring • Verify optimization ideas • Setup/tune hardware, OS, FS • Provision Postgres instances • Create replicas • High Availability: detect failures and switch to replicas • Create backups • Basic monitoring Little to zero automatization: 12

optimization • Index set optimization • Detailed monitoring • Verify optimization ideas • Detect performance bottlenecks • Setup/tune hardware, OS, FS • Provision Postgres instances • Create replicas • High Availability: detect failures and switch to replicas • Create backups • Basic monitoring Little to zero automatization: 13

optimization • Index set optimization • Detailed monitoring • Verify optimization ideas • Detect performance bottlenecks • Predict performance bottlenecks • Setup/tune hardware, OS, FS • Provision Postgres instances • Create replicas • High Availability: detect failures and switch to replicas • Create backups • Basic monitoring Little to zero automatization: 14

optimization • Index set optimization • Detailed monitoring • Verify optimization ideas • Detect performance bottlenecks • Predict performance bottlenecks • Prevent performance bottlenecks • Setup/tune hardware, OS, FS • Provision Postgres instances • Create replicas • High Availability: detect failures and switch to replicas • Create backups • Basic monitoring Little to zero automatization: The ultimate goal of automatization 15

Manual steps to look into database psql is great, choice
#1 to work with Postgres for many experts 16

#1 to work with Postgres for many experts • \? to see psql commands • \h to help with SQL (example: “\h select”) 17

#1 to work with Postgres for many experts • \? to see psql commands • \h to help with SQL (example: “\h select”) The 1st question: What are the biggest DBs/tables/indexes? Show me the list! • \l+ to see databases • \d+ to see tables, views, materialized views, sequences or separately: \dt+, \dv+, \dm+, \ds+ • \di+ to see indexes 18

Manual steps to look into database ...but: if your screen
is not wide enough postgres=# \l+ 19

Manual steps to look into database pspg – a must
have if you use psql daily. https://github.com/okbob/pspg postgres=# \l+ 20

Manual steps to look into database Let’s find the biggest
databases postgres=# \l+ It seems this database is the largest one 21

Manual steps to look into database Let’s find the biggest
databases postgres=# \l+ And this one looks like #2... 22

Manual steps to look into database OK. What about tables?
test=# \dt+ 37 tables, oh.. 23

The largest tables – something better than \dt+ is needed
24 How many times did you google things like this?

Meet postgres_dba postgres_dba – The missing set of useful tools
for Postgres https://github.com/NikolayS/postgres_dba 25

Meet postgres_dba postgres_dba – The missing set of useful tools
for Postgres https://github.com/NikolayS/postgres_dba 26

Meet postgres_dba Report #2: sizes of the tables in the
current database 27

Installation of postgres_dba Installation is trivial: Important: psql version 10
is needed. (install postgresql-client-10 package, see README) Server version may be older (Use ssh tunnel to connect to remote servers, see README) 28

Installation of postgres_dba Installation is trivial: Important: psql version 10
is needed. Server version may be older Now open psql and type ":dba" to launch postgres_dba’s menu. That’s it. 29

Meet postgres_dba Let’s continue... 30

Meet postgres_dba Report #0: Postgres instance info + current DB’s
details 31

Table bloat How can it (fast estimation) be done: •
pg_stat_user_tables, n_dead_tup & n_live_tup – might be very wrong / lagging 32

pg_stat_user_tables, n_dead_tup & n_live_tup – might be very wrong / lagging • Jehan-Guillaume (ioguix) de Rorthais, Dalibo https://github.com/ioguix/pgsql-bloat-estimation • PG Experts https://github.com/pgexperts/pgx_scripts/tree/master/bloat • Data Egret https://github.com/dataegret/pg-utils/tree/master/sql – similar reports, also might be wrong (see next slides) 33

pg_stat_user_tables, n_dead_tup & n_live_tup – might be very wrong / lagging • Jehan-Guillaume (ioguix) de Rorthais, Dalibo https://github.com/ioguix/pgsql-bloat-estimation • PG Experts https://github.com/pgexperts/pgx_scripts/tree/master/bloat • Data Egret https://github.com/dataegret/pg-utils/tree/master/sql – similar reports, also might be wrong (see next slides) • pgstattuple extension, exact but affects performance (don’t run on entire prod database!) 34

Table bloat b1: table bloat estimation 35

Table bloat Remember: it’s only an estimation Simple experiment, showing
how wrong can it be: 36

how wrong can it be: 37 1 byte, then 8 bytes, then again 1 byte

how wrong can it be: 38 1 byte, then 8 bytes, then again 1 byte 1 byte, 1 byte, then 8 bytes

Table bloat What will report “b1” show for our experiment?
39

Alignment Padding Good short article about it, “Column Tetris”: https://stackoverflow.com/questions/2966524/calculating-and-saving-space-in-postgresql/7431468
40 Inspected with pg_hexedit https://github.com/petergeoghegan/pg_hexedit

Experimental report to see alignment padding p1 – experimental report
showing alignment padding and suggesting column reorder 41

When estimation is not enough Use pgstattuple module: • b3
and b4 reports • don’t run on production! • it’s good if you can do it on a replica – better on an experimental/temporary one (perhaps, promoted to be a separate master) Simple approach – clone table: create table t1_clone as select * from t1; • And then compare sizes (report 2) • Good way if you don’t have pgstattuple (Heroku, Google Cloud SQL, etc) • Still: better do it on an experimental clone (create a replica and promote) 42

What else? i1 and i2 – find unused and redundant
indexes, get rid of them i1: i2: 43

What else? s1 – slowest queries (grouped, pg_stat_statements) 44

Why do we need to have more automated tools? Example:
full and partial indexes on a column, big temptation to choose partial one… But with partial index, we’ll lose HOT updates!! Read the full story: https://medium.com/@postgresql/how-partial-indexes-affect-update-performance-in-postgres-d05e0052abc 45

How to add your own report Adding new reports to
postgres_dba is trivial: 46

How to add your own report Adding new reports to
postgres_dba is trivial: 47

“Extended” mode “x” in postgres_dba’s menu will make tables wider
48 More columns, good to debug and good for machines/APIs

postgres_dba: future development • More reports (pg_statio_**, memory analysis, etc)
• API- & monitoring-friendly versions for each report • Better formatting • Brief help for each report + useful links (knowledge base) • Detect [potential] issues and recommend changes Maybe: • Snapshots for pg_stat_** and time series analysis • Single-key navigation • Color highlighting • “Active” tools: create/alter/drop a DB user, fix permissions, reset pwd, refresh mat. views, run pg_repack, etc Try it now: https://github.com/NikolayS/postgres_dba 49

Back to full-fledged automation • Detect performance bottlenecks • Predict
performance bottlenecks • Prevent performance bottlenecks But how? 50 The ultimate goal of automatization

How to get a really automated performance control over your
database? The Postgres ecosystem currently gives us: pg_stat_statements – the default tool for most users, available on AWS RDS. But: ◦ naïve grouping (based on syntax) ◦ no historical data ◦ no locks/wait events data ◦ no disk/FS-level data ◦ no sample queries ◦ nothing about actual plans 51

database? The Postgres ecosystem currently gives us: auto_explain – included to postgresql-contrib, available on AWS RDS But: ◦ everything goes to logs, nothing inside Postgres itself ◦ no grouping, only individual queries → needs some additional tool to get real reports • weird: file_fdw (log_fdw on RDS) and process inside Postgres • pgBadger 52

database? The Postgres ecosystem currently gives us: pgBadger – the most powerful tool to analyze logs. Lots of reports. But: • you need to be a good Postgres expert to work with it • not always easy/possible to use (e.g. Google Cloud SQL has only error log) • logs processing approach – even more “reactive” than pg_stat_* 53

database? Modern monitoring systems are good: pgHero – free, easy to install, basic things work well pgAnalyze.com – interesting innovative reports (vacuum activity, log insights) okmeter.io – great reports helping to find bottlenecks, based on pg_stat_statements VividCortex – awesome visualizations of slow queries, synced to system graphs New AWS RDS’s “Database Load” report is awesome (I’ll skip my “BUTs” for each particular system, it could be another talk) 54

database? “Ecosystem” tools and modern monitoring systems are good… BUT: 55

database? “Ecosystem” tools and modern monitoring systems are good… BUT: • They require a lot of manual actions • You still need to be a database expert • All of them are reactive 56 Detect performance bottlenecks

Human DBAs are proactive create table t1 (id int8 primary
key, param1 int); select * from t1 where param1 = :param1; Observation #1: An experienced human DBA doesn’t need to wait until t1 grows to 1M records to tell developer: “Hey, you must create an index on param1 column” Actually, in such cases, DBA won’t wait for any data. 57 Predict performance bottlenecks

Human DBAs know which optimizations are most important • 280+
config params in postgresql.conf • thousands of grouped queries in pg_stat_statements • often spikes in monitoring, swapping, temporary files, • high level of table bloat, disk space growing too fast, • checkpoints occur too frequently, etc, etc. → We could generate dozens/hundreds of recommendations (and some monitoring systems already have “advices” or “hints”). But which of them matter most? Observation #2: An experienced human DBA knows which changes will give the better performance gain right now … and in the future 58 Prevent performance bottlenecks

A human DBA is experimenting a lot For experienced DBA,
it’s usually much better to: • provision a new replica (thanks, clouds!), • promote it, and • run a bunch of experiments, collect results, and destroy this instance ...than: • try to verify ideas on dev/test (or – omg – prod) environment, or • just “guess” and deploy. Observation #3: An experienced human DBA makes a lot of experiments 59

Conclusion Some new solutions are to appear. They will combine:
• DB-oriented monitoring capabilities (“detection”) • Ability to find bottlenecks at very early stage (“prediction”) • Suggest developers how to fix issues (“preventing”) • And BONUS: automatically verify all ideas (or, perhaps, all DDL? “Continuous Database Administration”?) 60

DIY automated pipeline for DB optimization How to automate database
optimization using ecosystem tools and AWS? Analyze: • pg_stat_statements • auto_explan • pgBadger to parse logs, use JSON output • pg_query to group queries better Configuration: • annotated.conf • pgtune, pgconfigurator, postgresqlco.nf (wip) • ottertune Suggested indexes • (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: • pgreplay to replay logs (different log_line_prefix, you need to handle it) • EC2 spot instances Machine learning • MADlib 61

Conducting experiments Postgres temporary EC2 spot instance Postgres logs from
production, short period of high activity auto_explain.log_min_duration = 100 ms auto_explain.log_timing = on auto_explain.log_analyze = on auto_explain.log_analyze = on auto_explain.log_triggers = on auto_explain.log_verbose = on auto_explain.log_nested_statements = on pgreplay Metadata Storage (permanent) Collect results: • pg_stat_statements • pg_stat_*** • pgBadger (JSON output) with auto_explain-ed samples • pgBadger and pgreplay require different log formats (conversion is needed) • SQL queries from log might expect different DB snapshot (having few errors is OK) • Big challenge: how to speed up DB snapshot initialization?

Compare results of experiments Original I-20180416-141109 (details) -2m23s (-5.23%) TOTAL
43m08s 45m31s Query Group #1 select from asset where owner_id = ? and price is not null; 11m55s 15m03s -3m08s (-20.82%) Samples (3) Samples (3) Query Group #2 update asset set price = price + ? where id = ?; 8m38s 7m30s +1m08s (+15.33%) Samples (3) Samples (3)

Meet PostgreSQL.support AI-based cloud-friendly platform to automate database administration 64
Steve AI-based expert in database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in resource planning. Conducts experiments with benchmarks Sign up for early access: http://PostgreSQL.support

Main Ideas 1. Automate everything 2. Make more experiments, verify
ideas 3. Detect, predict and prevent performance problems 65

Thanks! Nikolay Samokhvalov [email protected] twitter: @postgresmen http://PostgreSQL.support 66

Automated DBA (Percona Live 2018)

Automated DBA (Percona Live 2018)

More Decks by Nikolay

Other Decks in Technology

Featured

Transcript