$30 off During Our Annual Pro Sale. View Details »

The Sad State of PostgreSQL Monitoring

deric
March 26, 2018

The Sad State of PostgreSQL Monitoring

PostgreSQL monitoring tools talks given at https://www.meetup.com/Prague-PostgreSQL-Meetup/events/tbrslpyxfbjc/ on 26th March 2018

deric

March 26, 2018
Tweet

Other Decks in Technology

Transcript

  1. Tomas Barton (@barton_tomas) The Sad State of PostgreSQL Monitoring ©

    gawrifort
  2. PostgreSQL monitoring • PostgreSQL is an awesome database! We try

    to focus on premise open-source solutions.
  3. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome We try to focus on premise open-source solutions.
  4. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome • Database developers are usually not good with UX (or developing frontend) We try to focus on premise open-source solutions.
  5. PostgreSQL monitoring • PostgreSQL is an awesome database! • PostgreSQL

    monitoring tools are not so awesome • Database developers are usually not good with UX (or developing frontend) • (Too) Many monitoring tools are available We try to focus on premise open-source solutions.
  6. From PostgreSQL wiki: monitoring solutions • EnterpriseDB Postgres Enterprise Manager

    • pganalyze • pgwatch2 • pg_statsinfo & pg_stats_reporter • PGObserver • pgCluu • PoWA • OPM: Open PostgreSQL Monitoring Source: https://wiki.postgresql.org/wiki/Monitoring
  7. One monitoring solution to rule them all?

  8. Probably not. Depends on requirements.

  9. 1. Basic OS metrics • CPU • RAM • Disk

    I/O 2. PostgreSQL statistics • Database stats • Table stats • Index stats • Query stats 3. Configurable dashboards • Simple UI 4. Configuration hints • Query optimization • Index/Table bloat 5. Alerting Requirements
  10. None
  11. Common metrics • most interesting metrics can be collected from

    build-in tables: • pg_stat_database • pg_stat_bgwriter
  12. Supported functions function/version 9.2+ 9.1 8.3-9.0 8.1-8.2 7.4-8.0 datid x

    x x x datname x x x x numbackends x x x x x xact_commit x x x x x xact_rollback x x x x x blks_read x x x x x blks_hit x x x x x tup_returned x x x tup_fetched x x x
  13. Supported functions (continue) function/version 9.2+ 9.1 8.3-9.0 8.1-8.2 7.4-8.0 tup_returned

    x x x tup_fetched x x x tup_updated x x x tup_deleted x x x conflicts x x temp_files x temp_bytes x deadlocks x blk_read_time x
  14. PGObserver • From Zalando • Components: • Gatherer (Java) •

    Frontend (Python + JavaScript) • Backend DB (PostgreSQL) • Optional: • Aggregation crons • Blocking monitor • InfluxDB exporter • Timeline analysis
  15. None
  16. None
  17. None
  18. Interesting features • Unused indexes • Duplicate indexes • Bloated

    tables Disadvantages • Not actively developed • Old-school interface • Complicated installation PGOBserver - summary
  19. OPM (Open PostgreSQL Monitoring) • Components: • opm-core (JavaScript/Perl) •

    check_pgactivity • wh_nagios (PostgreSQL extension) • PostgreSQL (data storage) • nagios / icinga (check execution)
  20. None
  21. None
  22. None
  23. • connections • bgwriter • btree_bloat • commit_ratio • database_size

    • hit_ratio • invalid_indexes • last_analyze • last_vacuum • longest_query • max_freeze_age • streaming_delta • temp_files • wal_files OPM - many checks few examples (not a complete list):
  24. OPM - example • check_streaming_delta • reflects changes in PostgreSQL

    versions q{SELECT application_name, client_addr, pid, sent_lsn, write_lsn, flush_lsn, replay_lsn, CASE pg_is_in_recovery() WHEN true THEN pg_last_wal_receive_lsn() ELSE pg_current_wal_lsn() END FROM pg_stat_replication WHERE state NOT IN ('startup', 'backup')}, 01. 02. 03. 04. 05. 06.
  25. Interesting features • Well maintained nagios check script • No

    need to install agent (if you're using Nagios) Disadvantages • Clumsy UI • Can't compare multiple databases OPM - summary
  26. PostreSQL Workload Analyzer (PoWA) • Components: • powa-archivist (PostgreSQL extension)

    • powa-web (Python/JavaScript) • Optional: • pg_stat_statements - providing data about queries being executed • pg_qualstats - providing data about predicates, or where clauses • pg_stat_kcache - providing data about operating-system level cache • HypoPG - allowing you to create hypothetical indexes
  27. None
  28. None
  29. Interesting features • Query hints • Resources used per query

    • Package for Debian/Redhat Disadvantages • Clumsy UI • Can't compare multiple databases • PoWA is storing data in DB we're monitoring PoWA - summary
  30. pgCluu • Statically generated reports • Components: • pgcluu_collectd (Perl)

    • pgcluu.service - generate reports (Perl/JavaScript)
  31. None
  32. None
  33. Interesting features • System/database statistics in one place Disadvantages •

    Clumsy UI • Can't compare multiple databases • Can't change date range pgCluu - summary
  34. pganalyze • Components: • collector (go) • web interface (hosted)

  35. None
  36. Interesting features • Query hints • Table stats • Package

    for Debian/Redhat • Easy installation Disadvantages • SaaS • No free trial (with full features) • No open-source backend pganalyze - summary
  37. openclipart.org

  38. What we're getting?

  39. What we're getting? • Various attempts on frontend UI

  40. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics
  41. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage
  42. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage • Hardly customizable solutions
  43. What we're getting? • Various attempts on frontend UI •

    Different techniques of collecting metrics • Incompatible agents/storage • Hardly customizable solutions • Similar functionality reimplemented many times
  44. What do we need?

  45. Unobtrusive monitoring Kitchen Stories (2003) - Bent Hamer

  46. What do we need? • Unobtrusive monitoring

  47. What do we need? • Unobtrusive monitoring • Common protocols

  48. What do we need? • Unobtrusive monitoring • Common protocols

    • Modular components / customizable features
  49. What do we need? • Unobtrusive monitoring • Common protocols

    • Modular components / customizable features • Usable UI
  50. pgwatch2 • Components: • Metrics collector (go) • Web UI

    (Grafana) • Timeseries DB (InfluxDB) • Config DB (PostgreSQL)
  51. None
  52. None
  53. None
  54. Interesting features • Great UI (thx to Grafana!) • Timeseries

    storage • Default data retention 90 days ( PW2_IRETENTIONDAYS ) Disadvantages • High coupling (single container) • Persistence? • Scalability? pgwatch2 - summary
  55. Collectd • Metric collector (written in C) • Includes predefined

    queries for PostgreSQL • Customizable queries • Supports OS metrics (CPU, RAM, disk plugins) • Multiple supported outputs: • Graphite • Prometheus • Riemann • OpenTSDB
  56. Collectd example <Query table_states> Statement "SELECT sum(n_live_tup) AS live,\ sum(n_dead_tup)

    AS dead FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_g" InstancePrefix "live" ValuesFrom "live" </Result> MinVersion 80300 </Query> 01. 02. 03. 04. 05. 06. 07. 08. 09. 10.
  57. Telegraf • Metric collector (go) • Reads data from pg_stat_database

    and pg_stat_bgwriter • Part of TICK stack • Supports OS metrics (CPU, RAM, disk plugins) • Customizable queries [[inputs.postgresql]] address = "postgres://telegraf@localhost/someDB" databases = ["app_production"] 01. 02. 03.
  58. Monitoring system we'd like to have

  59. Datadog • Basic metrics description • Collecting metrics from pg_stat*

    Open-source • Similar dashboards can be displayed in Grafana • Most problematic is storing high- cardinality data in timeseries DB Examples of SaaS solutions
  60. Datadog

  61. None
  62. Sysdig sysdig -s 2000 -A -c echo_fds proc.name=postgres \ and

    evt.buffer contains SELECT 01. 02.
  63. Conclusion • DB clusters might require different approaches to monitoring

    • Metric collection daemons ( collectd , telegraf ) works well for common metrics • Grafana is currently best (open-source) UI for monitoring dashboards • Expensive monitoring queries should be executed in less frequent intervals
  64. Future? • Monitoring PostgreSQL from container (no DB restart needed)

    • Unix-like components (single purpose) rather than complex monitoring systems • Advanced query analysis and hints
  65. #grafanaporn

  66. @barton_tomas Thank you for attention!

  67. Created in Shower

  68. • Sad Elephant by gawrifort • You're being monitored •

    Kitchen Stories (2003) • Business vector created by Dooder - Freepik.com • Icons made by Webalys Freebies from www.flaticon.com • Building software by Manu