Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prometheus metrics from host-local services

Luca Bruno
December 12, 2019

Prometheus metrics from host-local services

This talk shows how host-local services can benefit from instrumentation and Prometheus metrics, using Fedora CoreOS auto-updates logic as a case-study. In particular this covers how to instrument Rust services, how to expose Prometheus metrics without requiring a TCP port or an HTTP stack, and how to bridge metrics from local services to the cluster via a "local_exporter".

Luca Bruno

December 12, 2019
Tweet

More Decks by Luca Bruno

Other Decks in Programming

Transcript

  1. Luca BRUNO @lucabruno | [email protected] | github.com/lucab Prometheus metrics from

    host-local services case study: Monitoring Fedora CoreOS auto-updates
  2. “OS engineer, Rust & Go developer, enthusiast FLOSS supporter” •

    ex-CoreOS, Software Engineer • Redhat, CoreOS-Berlin office • Previously: security researcher/engineer $ whoami
  3. • Fedora CoreOS (FCOS) ◦ Auto-updates as first-class OS feature

    • Zincati ◦ Instrumenting Rust daemons • local_exporter ◦ Bridging host-local services & cluster monitoring Overview
  4. Overall goals • Port Container Linux model to Fedora CoreOS

    • Continuous auto-updates as first-class OS feature • Atomic OS updates/rollbacks • Phased rollouts with multiple update channels • Cluster-orchestrated reboots • Observability, single pane of glass ◦ Our focus for today ◦ Stack for demo: Prometheus + Grafana
  5. Auto-updates Cincinnati fedoraproject.org infra OSTree repo Zincati airlock (or other)

    etcd3 (or other) Local cluster FCOS Host rpm-ostree Our scope for monitoring Monitoring
  6. Zincati Update agent • Rust daemon (on-host) • Checks for

    auto-updates, triggers reboots • TOML configuration, with systemd-style dropins • Internal state-machine with few possible states (<10) • Exposes metrics in Prometheus format Written from scratch, but in practice an evolution of update-engine and locksmith
  7. Zincati - internals Metrics: • Instrumentation via PingCAP’s Prometheus library:

    https://github.com/tikv/rust-prometheus • Actor-based architecture; each logical actor appropriately tracks its metrics • Mostly Gauges and Counters; some “labeled booleans” too Exposition: • A dedicated actor gathers and serves metrics over node-local IPC (path-named Unix-domain socket). • No TCP port binding (avoid scarce-resource contention) and no HTTP stack (simpler logic)
  8. Zincati - noteworthy State machine: • Metrics expose state changes

    and refreshes (timestamps) • We could expose more internal details, but there are some public-API/stability concerns at this point Error metrics: • Rust enum (sum types) allows strongly typed, exhaustive error encapsulation • We expose variant kind as label in error metrics
  9. local_exporter • Go application (containerized) • Web service, bind to

    a TCP port on container network • Fan-out to local targets • TOML configuration, single file • Allow defining multiple selectors/endpoints Quick free-time sketch, may have rough edges (happy if it finds a new owner)
  10. local_exporter - design Heavily inspired by node-exporter, however: • Configuration

    via file only (TOML) • Keep different endpoint metrics separate • Can pick up single files from different directories • Does not contain content-translation logic • Only bridges across “transports” • No internal caches
  11. local_exporter - backend selectors Local IPC Regular file local_exporter Targets:

    • A • B • C TCP (HTTP) Unix socket DBus endpoint Selectors config: • A • B • C A B C
  12. Simple example [bridge.selectors] "zincati" = { kind = "uds", path

    = "/host/run/zincati/private/metrics.promsock" } - job_name: 'os_updates' metrics_path: '/bridge' params: selector : [ 'zincati' ] prometheus.yml local_exporter.toml Host bind-mount Selector
  13. Demo (recorded) Single pane of glass for all auto-updates machinery:

    • Prometheus for metrics collection • Grafana for visualization Recorded demo with subtitles: https://youtu.be/_gU1mHKlmQw (equivalent screenshots in backup slides)
  14. References • Fedora CoreOS docs https://docs.fedoraproject.org/en-US/fedora-coreos/ • Airlock https://github.com/coreos/airlock •

    Zincati https://github.com/coreos/zincati • Local_exporter https://github.com/lucab/local_exporter