Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond grep: Practical Logging and Metrics

Beyond grep: Practical Logging and Metrics

Your Python applications are running but you’re wondering what they are doing? Your only clue about their current state is the server load after ssh-ing into the servers? Let’s change that!

Hynek Schlawack

April 12, 2015
Tweet

More Decks by Hynek Schlawack

Other Decks in Programming

Transcript

  1. Vanilla from raven import Client client = Client("https://yoursentry") try: 1

    / 0 except ZeroDivisionError: client.captureException()
  2. Vanilla from raven import Client client = Client("https://yoursentry") try: 1

    / 0 except ZeroDivisionError: client.captureException()
  3. Vanilla from raven import Client client = Client("https://yoursentry") try: 1

    / 0 except ZeroDivisionError: client.captureException()
  4. System Metrics vs App Metrics • load • network traffic

    • I/O • … • counters • timers
  5. System Metrics vs App Metrics • load • network traffic

    • I/O • … • counters • timers • gauges
  6. System Metrics vs App Metrics • load • network traffic

    • I/O • … • counters • timers • gauges • …
  7. Math • # reqs / s? • worst 0.01% ⟨req

    time⟩? • don’t try this alone!
  8. Approaches 1. external aggregation: StatsD, Riemann + no state, simple

    – no direct introspection 2. aggregate in-app, deliver to DB
  9. Approaches 1. external aggregation: StatsD, Riemann + no state, simple

    – no direct introspection 2. aggregate in-app, deliver to DB + in-app dashboard, useful in dev
  10. Approaches 1. external aggregation: StatsD, Riemann + no state, simple

    – no direct introspection 2. aggregate in-app, deliver to DB + in-app dashboard, useful in dev – state w/i app
  11. Scales from greplin import scales from greplin.scales.meter import MeterStat STATS

    = scales.collection( "/Resource", MeterStat("reqs"), scales.PmfStat("request_time") )
  12. Scales from greplin import scales from greplin.scales.meter import MeterStat STATS

    = scales.collection( "/Resource", MeterStat("reqs"), scales.PmfStat("request_time") )
  13. Dashboard Scales … "request_time": { "count": 567315293, "99percentile": 0.10978688716888428, "75percentile":

    0.013181567192077637, "min": 0.0002448558807373047, "max": 30.134822130203247, "98percentile": 0.08934824466705339, "95percentile": 0.027234303951263434, "median": 0.009176492691040039, "999percentile": 0.14235656142234793, "stddev": 0.01676855570363413, "mean": 0.013247184020535955 }, …
  14. Original Logger BoundLogger Processor 1 Processor n Return Value Return

    Value bind values log.bind(key=value) Context log events log.info(event, another_key=another_value) + structlog
  15. Original Logger BoundLogger Processor 1 Processor n Return Value Return

    Value bind values log.bind(key=value) Context log events log.info(event, another_key=another_value) + structlog
  16. Capture • into files • to syslog / a queue

    • pipe into a logging agent
  17. {"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog logstash-forwarder

    logstash stdout logging /var/log/app/current runit’s svlogd (adds TAI64 timestamp)
  18. {"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog logstash-forwarder

    logstash 1010001101 Elasticsearch stdout logging /var/log/app/current runit’s svlogd (adds TAI64 timestamp)