Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hynek Schlawack - Get Instrumented: How Prometh...

Hynek Schlawack - Get Instrumented: How Prometheus Can Unify Your Metrics

To get real time insight into your running applications you need to instrument them and collect metrics: count events, measure times, expose numbers. Sadly this important aspect of development was a patchwork of half-integrated solutions for years. Prometheus changed that and this talk will walk you through instrumenting your apps and servers, building dashboards, and monitoring using metrics.

https://us.pycon.org/2016/schedule/presentation/1601/

PyCon 2016

May 29, 2016
Tweet

More Decks by PyCon 2016

Other Decks in Programming

Transcript

  1. Metrics 12:00 12:01 12:02 12:03 12:04 avg latency 0.3 0.5

    0.8 1.1 2.6 server load 0.3 1.0 2.3 3.5 5.2
  2. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 Averages
  3. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 Averages
  4. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 ❖ median({1, 1, 100_000}) = 1 Averages
  5. Pull: Advantages ❖ multiple Prometheis easy ❖ outage detection ❖

    predictable, no self-DoS ❖ easy to instrument 3rd parties
  6. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  7. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  8. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  9. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  10. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  11. Apache nginx Django PostgreSQL MySQL MongoDB CouchDB redis Varnish etcd

    Kubernetes Consul collectd HAProxy statsd graphite InfluxDB SNMP
  12. Apache nginx Django PostgreSQL MySQL MongoDB CouchDB redis Varnish etcd

    Kubernetes Consul collectd HAProxy statsd graphite InfluxDB SNMP
  13. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  14. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  15. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  16. from prometheus_client import \ start_http_server # … if __name__ ==

    "__main__": start_http_server(8000) app.run()
  17. from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds",

    "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.")
  18. from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds",

    "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.") IN_PROGRESS = Gauge( "cat_or_not_in_progress_requests", "Number of requests in progress.")
  19. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  20. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  21. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  22. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()