Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Sad State of PostgreSQL Monitoring

deric
March 26, 2018

The Sad State of PostgreSQL Monitoring

PostgreSQL monitoring tools talks given at https://www.meetup.com/Prague-PostgreSQL-Meetup/events/tbrslpyxfbjc/ on 26th March 2018

deric

March 26, 2018
Tweet

Other Decks in Technology

Transcript

  1. PostgreSQL monitoring • PostgreSQL is an awesome database! We try

    to focus on premise open-source solutions.
  2. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome We try to focus on premise open-source solutions.
  3. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome • Database developers are usually not good with UX (or developing frontend) We try to focus on premise open-source solutions.
  4. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome • Database developers are usually not good with UX (or developing frontend) • (Too) Many monitoring tools are available We try to focus on premise open-source solutions.
  5. From PostgreSQL wiki: monitoring solutions • EnterpriseDB Postgres Enterprise Manager

    • pganalyze • pgwatch2 • pg_statsinfo & pg_stats_reporter • PGObserver • pgCluu • PoWA • OPM: Open PostgreSQL Monitoring Source: https://wiki.postgresql.org/wiki/Monitoring
  6. 1. Basic OS metrics • CPU • RAM • Disk

    I/O 2. PostgreSQL statistics • Database stats • Table stats • Index stats • Query stats 3. Configurable dashboards • Simple UI 4. Configuration hints • Query optimization • Index/Table bloat 5. Alerting Requirements
  7. Common metrics • most interesting metrics can be collected from

    build-in tables: • pg_stat_database • pg_stat_bgwriter
  8. Supported functions function/version 9.2+ 9.1 8.3-9.0 8.1-8.2 7.4-8.0 datid x

    x x x datname x x x x numbackends x x x x x xact_commit x x x x x xact_rollback x x x x x blks_read x x x x x blks_hit x x x x x tup_returned x x x tup_fetched x x x
  9. Supported functions (continue) function/version 9.2+ 9.1 8.3-9.0 8.1-8.2 7.4-8.0 tup_returned

    x x x tup_fetched x x x tup_updated x x x tup_deleted x x x conflicts x x temp_files x temp_bytes x deadlocks x blk_read_time x
  10. PGObserver • From Zalando • Components: • Gatherer (Java) •

    Frontend (Python + JavaScript) • Backend DB (PostgreSQL) • Optional: • Aggregation crons • Blocking monitor • InfluxDB exporter • Timeline analysis
  11. Interesting features • Unused indexes • Duplicate indexes • Bloated

    tables Disadvantages • Not actively developed • Old-school interface • Complicated installation PGOBserver - summary
  12. OPM (Open PostgreSQL Monitoring) • Components: • opm-core (JavaScript/Perl) •

    check_pgactivity • wh_nagios (PostgreSQL extension) • PostgreSQL (data storage) • nagios / icinga (check execution)
  13. • connections • bgwriter • btree_bloat • commit_ratio • database_size

    • hit_ratio • invalid_indexes • last_analyze • last_vacuum • longest_query • max_freeze_age • streaming_delta • temp_files • wal_files OPM - many checks few examples (not a complete list):
  14. OPM - example • check_streaming_delta • reflects changes in PostgreSQL

    versions q{SELECT application_name, client_addr, pid, sent_lsn, write_lsn, flush_lsn, replay_lsn, CASE pg_is_in_recovery() WHEN true THEN pg_last_wal_receive_lsn() ELSE pg_current_wal_lsn() END FROM pg_stat_replication WHERE state NOT IN ('startup', 'backup')}, 01. 02. 03. 04. 05. 06.
  15. Interesting features • Well maintained nagios check script • No

    need to install agent (if you're using Nagios) Disadvantages • Clumsy UI • Can't compare multiple databases OPM - summary
  16. PostreSQL Workload Analyzer (PoWA) • Components: • powa-archivist (PostgreSQL extension)

    • powa-web (Python/JavaScript) • Optional: • pg_stat_statements - providing data about queries being executed • pg_qualstats - providing data about predicates, or where clauses • pg_stat_kcache - providing data about operating-system level cache • HypoPG - allowing you to create hypothetical indexes
  17. Interesting features • Query hints • Resources used per query

    • Package for Debian/Redhat Disadvantages • Clumsy UI • Can't compare multiple databases • PoWA is storing data in DB we're monitoring PoWA - summary
  18. pgCluu • Statically generated reports • Components: • pgcluu_collectd (Perl)

    • pgcluu.service - generate reports (Perl/JavaScript)
  19. Interesting features • System/database statistics in one place Disadvantages •

    Clumsy UI • Can't compare multiple databases • Can't change date range pgCluu - summary
  20. Interesting features • Query hints • Table stats • Package

    for Debian/Redhat • Easy installation Disadvantages • SaaS • No free trial (with full features) • No open-source backend pganalyze - summary
  21. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics
  22. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage
  23. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage • Hardly customizable solutions
  24. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage • Hardly customizable solutions • Similar functionality reimplemented many times
  25. What do we need? • Unobtrusive monitoring • Common protocols

    • Modular components / customizable features
  26. What do we need? • Unobtrusive monitoring • Common protocols

    • Modular components / customizable features • Usable UI
  27. pgwatch2 • Components: • Metrics collector (go) • Web UI

    (Grafana) • Timeseries DB (InfluxDB) • Config DB (PostgreSQL)
  28. Interesting features • Great UI (thx to Grafana!) • Timeseries

    storage • Default data retention 90 days ( PW2_IRETENTIONDAYS ) Disadvantages • High coupling (single container) • Persistence? • Scalability? pgwatch2 - summary
  29. Collectd • Metric collector (written in C) • Includes predefined

    queries for PostgreSQL • Customizable queries • Supports OS metrics (CPU, RAM, disk plugins) • Multiple supported outputs: • Graphite • Prometheus • Riemann • OpenTSDB
  30. Collectd example <Query table_states> Statement "SELECT sum(n_live_tup) AS live,\ sum(n_dead_tup)

    AS dead FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_g" InstancePrefix "live" ValuesFrom "live" </Result> MinVersion 80300 </Query> 01. 02. 03. 04. 05. 06. 07. 08. 09. 10.
  31. Telegraf • Metric collector (go) • Reads data from pg_stat_database

    and pg_stat_bgwriter • Part of TICK stack • Supports OS metrics (CPU, RAM, disk plugins) • Customizable queries [[inputs.postgresql]] address = "postgres://telegraf@localhost/someDB" databases = ["app_production"] 01. 02. 03.
  32. Datadog • Basic metrics description • Collecting metrics from pg_stat*

    Open-source • Similar dashboards can be displayed in Grafana • Most problematic is storing high- cardinality data in timeseries DB Examples of SaaS solutions
  33. Conclusion • DB clusters might require different approaches to monitoring

    • Metric collection daemons ( collectd , telegraf ) works well for common metrics • Grafana is currently best (open-source) UI for monitoring dashboards • Expensive monitoring queries should be executed in less frequent intervals
  34. Future? • Monitoring PostgreSQL from container (no DB restart needed)

    • Unix-like components (single purpose) rather than complex monitoring systems • Advanced query analysis and hints
  35. • Sad Elephant by gawrifort • You're being monitored •

    Kitchen Stories (2003) • Business vector created by Dooder - Freepik.com • Icons made by Webalys Freebies from www.flaticon.com • Building software by Manu