Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond grep – EuroPython Edition

Beyond grep – EuroPython Edition

Knowing that your application is up and running is great. However in order to make informed decisions about the future, you also need to know in what state your application currently is and how its state is developing over time.

This talk combines two topics that are usually discussed separately. However I do believe that they have a lot of overlap and ultimately a similar goal: giving you vital insights about your system in production.

We'll have a look at their commonalities, differences, popular tools, and how to apply everything in your own systems while avoiding some common pitfalls.

Hynek Schlawack

July 21, 2015
Tweet

More Decks by Hynek Schlawack

Other Decks in Technology

Transcript

  1. Beyond grep

    Pragmatic Logging & Metrics
    Hynek Schlawack

    View full-size slide

  2. Hi!
    @hynek
    https://hynek.me
    https://github.com/hynek

    View full-size slide

  3. www.variomedia.de

    View full-size slide

  4. Requirements

    View full-size slide

  5. Requirements
    •fast

    View full-size slide

  6. Requirements
    •fast
    •once

    View full-size slide

  7. Requirements
    •fast
    •once

    View full-size slide

  8. Requirements
    •fast
    •once
    •context

    View full-size slide

  9. Requirements
    •fast
    •once
    •context

    View full-size slide

  10. Raven-Python

    View full-size slide

  11. Raven-Python
    Transports:

    View full-size slide

  12. Raven-Python
    Transports:
    • gevent

    View full-size slide

  13. Raven-Python
    Transports:
    • gevent
    • aiohttp

    View full-size slide

  14. Raven-Python
    Transports:
    • gevent
    • aiohttp
    • Twisted

    View full-size slide

  15. Raven-Python
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  16. Raven-Python
    Integrations:
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  17. Raven-Python
    Integrations:
    • logging
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  18. Raven-Python
    Integrations:
    • logging
    • Django
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  19. Raven-Python
    Integrations:
    • logging
    • Django
    • WSGI
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  20. Raven-Python
    Integrations:
    • logging
    • Django
    • WSGI
    • …9 others
    Transports:
    • gevent
    • aiohttp
    • Twisted
    • …8 others

    View full-size slide

  21. Vanilla
    from raven import Client
    client = Client("https://yoursentry")
    try:
    1 / 0
    except ZeroDivisionError:
    client.captureException()

    View full-size slide

  22. Vanilla
    from raven import Client
    client = Client("https://yoursentry")
    try:
    1 / 0
    except ZeroDivisionError:
    client.captureException()

    View full-size slide

  23. Vanilla
    from raven import Client
    client = Client("https://yoursentry")
    try:
    1 / 0
    except ZeroDivisionError:
    client.captureException()

    View full-size slide

  24. Vanilla
    @client.capture_exceptions
    def foo():
    raise Exception()

    View full-size slide

  25. Django
    INSTALLED_APPS = (

    "raven.contrib.django.raven_compat",

    )

    View full-size slide

  26. Progress!

    View full-size slide

  27. Metrics?
    • numbers in a DB

    View full-size slide

  28. Metrics?
    • numbers in a DB
    • guessing vs knowing

    View full-size slide

  29. –D. J. Bernstein
    “Measure. Don’t speculate.”

    View full-size slide

  30. System Metrics vs App Metrics
    • load
    • network traffic
    • I/O
    • …

    View full-size slide

  31. System Metrics vs App Metrics
    • load
    • network traffic
    • I/O
    • …
    • counters

    View full-size slide

  32. System Metrics vs App Metrics
    • load
    • network traffic
    • I/O
    • …
    • counters
    • timers

    View full-size slide

  33. System Metrics vs App Metrics
    • load
    • network traffic
    • I/O
    • …
    • counters
    • timers
    • gauges

    View full-size slide

  34. System Metrics vs App Metrics
    • load
    • network traffic
    • I/O
    • …
    • counters
    • timers
    • gauges
    • …

    View full-size slide

  35. Math
    • # reqs / s?

    View full-size slide

  36. Math
    • # reqs / s?
    • worst 0.01% ⟨req time⟩?

    View full-size slide

  37. Math
    • # reqs / s?
    • worst 0.01% ⟨req time⟩?
    • don’t try this alone!

    View full-size slide

  38. Monitoring
    • latency

    View full-size slide

  39. Monitoring
    • latency
    • error rates

    View full-size slide

  40. Monitoring
    • latency
    • error rates
    • anomalies

    View full-size slide

  41. Librato Metrics

    View full-size slide

  42. Graphite
    800 pound gorilla

    View full-size slide

  43. InfluxDB
    Graphite++ in Go

    View full-size slide

  44. InfluxDB
    Graphite++ in Go

    View full-size slide

  45. Approaches
    1. external aggregation: StatsD, Riemann

    View full-size slide

  46. Approaches
    1. external aggregation: StatsD, Riemann
    + no state, simple

    View full-size slide

  47. Approaches
    1. external aggregation: StatsD, Riemann
    + no state, simple
    – no direct introspection

    View full-size slide

  48. Approaches
    1. external aggregation: StatsD, Riemann
    + no state, simple
    – no direct introspection
    2. aggregate in-app, deliver to DB

    View full-size slide

  49. Approaches
    1. external aggregation: StatsD, Riemann
    + no state, simple
    – no direct introspection
    2. aggregate in-app, deliver to DB
    + in-app dashboard, useful in dev

    View full-size slide

  50. Approaches
    1. external aggregation: StatsD, Riemann
    + no state, simple
    – no direct introspection
    2. aggregate in-app, deliver to DB
    + in-app dashboard, useful in dev
    – state w/i app

    View full-size slide

  51. (g|py)?statsd(py|-client)?
    import statsd
    c = statsd.StatsClient(
    "statsd.local", 8125
    )

    View full-size slide

  52. (g|py)?statsd(py|-client)?

    View full-size slide

  53. (g|py)?statsd(py|-client)?
    c.incr("foo")

    View full-size slide

  54. (g|py)?statsd(py|-client)?
    c.incr("foo")
    c.timing("stats.timed", 320)

    View full-size slide

  55. Scales
    from greplin import scales
    from greplin.scales.meter import MeterStat
    STATS = scales.collection(
    "/Resource",
    MeterStat("reqs"),
    scales.PmfStat("request_time")
    )

    View full-size slide

  56. Scales
    from greplin import scales
    from greplin.scales.meter import MeterStat
    STATS = scales.collection(
    "/Resource",
    MeterStat("reqs"),
    scales.PmfStat("request_time")
    )

    View full-size slide

  57. Scales
    STATS.reqs.mark()

    View full-size slide

  58. Scales
    STATS.reqs.mark()
    with STATS.request_time.time():

    View full-size slide

  59. Scales
    STATS.reqs.mark()
    with STATS.request_time.time():
    do_something_expensive()

    View full-size slide

  60. Dashboard
    Scales

    View full-size slide

  61. Dashboard
    Scales

    View full-size slide

  62. Dashboard
    Scales

    "request_time": {
    "count": 567315293,
    "99percentile": 0.10978688716888428,
    "75percentile": 0.013181567192077637,
    "min": 0.0002448558807373047,
    "max": 30.134822130203247,
    "98percentile": 0.08934824466705339,
    "95percentile": 0.027234303951263434,
    "median": 0.009176492691040039,
    "999percentile": 0.14235656142234793,
    "stddev": 0.01676855570363413,
    "mean": 0.013247184020535955
    },

    View full-size slide

  63. from greplin.scales import graphite
    pusher = graphite.GraphitePeriodicPusher(
    "graphite.host",
    2003,
    period=30,
    prefix="yourapp",
    )
    pusher.allow("**")
    pusher.start()

    View full-size slide

  64. Progress!

    View full-size slide

  65. Progress!
    ✓ ✓

    View full-size slide

  66. ELK
    Elasticsearch + Logstash + Kibana

    View full-size slide

  67. Elasticsearch
    Logstash

    View full-size slide

  68. Elasticsearch
    Logstash

    View full-size slide

  69. Elasticsearch
    Logstash
    Kibana

    View full-size slide

  70. Goal
    @400000005270e0d604afce64 {
    "event": "logged_in",
    "user": "guido",
    "ip": "8.8.8.8",
    "referrer": "http://google.com"
    }

    View full-size slide

  71. Context

    &
    Format

    View full-size slide

  72. BoundLogger
    structlog

    View full-size slide

  73. Original
    Logger
    BoundLogger
    structlog
    e.g.
    logging.Logger

    View full-size slide

  74. Original
    Logger
    BoundLogger
    bind values
    log.bind(key=value)
    Context
    structlog

    View full-size slide

  75. Original
    Logger
    BoundLogger
    bind values
    log.bind(key=value)
    Context
    log events
    log.info(event, another_key=another_value)
    +
    structlog

    View full-size slide

  76. Original
    Logger
    BoundLogger
    Processor 1 Processor n Return
    Value
    Return
    Value
    bind values
    log.bind(key=value)
    Context
    log events
    log.info(event, another_key=another_value)
    +
    structlog

    View full-size slide

  77. >>> import structlog
    >>> log = structlog.get_logger()
    >>> log.error("ohai", why=42, wtf=True)
    why=42 wtf=True event='ohai'

    View full-size slide

  78. >>> import structlog
    >>> log = structlog.get_logger()
    >>> log.error("ohai", why=42, wtf=True)
    why=42 wtf=True event='ohai'

    View full-size slide

  79. >>> import structlog
    >>> log = structlog.get_logger()
    >>> log.error("ohai", why=42, wtf=True)
    why=42 wtf=True event='ohai'

    View full-size slide

  80. >>> import structlog
    >>> log = structlog.get_logger()
    >>> log.error("ohai", why=42, wtf=True)
    why=42 wtf=True event='ohai'

    View full-size slide

  81. >>> from structlog import get_logger
    >>> log = get_logger()
    >>> log = log.bind(user='anonymous', some_key=23)
    >>> log = log.bind(user='hynek', another_key=42)
    >>> log.info('user.logged_in', happy=True)
    some_key=23 user='hynek' another_key=42
    happy=True event='user.logged_in'

    View full-size slide

  82. >>> from structlog import get_logger
    >>> log = get_logger()
    >>> log = log.bind(user='anonymous', some_key=23)
    >>> log = log.bind(user='hynek', another_key=42)
    >>> log.info('user.logged_in', happy=True)
    some_key=23 user='hynek' another_key=42
    happy=True event='user.logged_in'

    View full-size slide

  83. >>> from structlog import get_logger
    >>> log = get_logger()
    >>> log = log.bind(user='anonymous', some_key=23)
    >>> log = log.bind(user='hynek', another_key=42)
    >>> log.info('user.logged_in', happy=True)
    some_key=23 user='hynek' another_key=42
    happy=True event='user.logged_in'

    View full-size slide

  84. >>> from structlog import get_logger
    >>> log = get_logger()
    >>> log = log.bind(user='anonymous', some_key=23)
    >>> log = log.bind(user='hynek', another_key=42)
    >>> log.info('user.logged_in', happy=True)
    some_key=23 user='hynek' another_key=42
    happy=True event='user.logged_in'

    View full-size slide

  85. >>> from structlog import get_logger
    >>> log = get_logger()
    >>> log = log.bind(user='anonymous', some_key=23)
    >>> log = log.bind(user='hynek', another_key=42)
    >>> log.info('user.logged_in', happy=True)
    some_key=23 user='hynek' another_key=42
    happy=True event='user.logged_in'

    View full-size slide

  86. def view(request):
    log = logger.bind(request=request)
    log.info("view_rendered")
    return "hello, world"

    View full-size slide

  87. def view(request):
    log = logger.bind(request=request)
    log.info("view_rendered")
    return "hello, world"

    View full-size slide

  88. def view(request):
    log = logger.bind(request=request)
    log.info("view_rendered")
    return "hello, world"

    View full-size slide

  89. def request_extractor(_, __, event):
    req = event.pop("request", None)
    if req is not None:
    event.update({
    "client_addr":
    req.client_addr,
    "user_id":
    req.authenticated_userid,
    })
    return event

    View full-size slide

  90. def request_extractor(_, __, event):
    req = event.pop("request", None)
    if req is not None:
    event.update({
    "client_addr":
    req.client_addr,
    "user_id":
    req.authenticated_userid,
    })
    return event

    View full-size slide

  91. def request_extractor(_, __, event):
    req = event.pop("request", None)
    if req is not None:
    event.update({
    "client_addr":
    req.client_addr,
    "user_id":
    req.authenticated_userid,
    })
    return event

    View full-size slide

  92. def request_extractor(_, __, event):
    req = event.pop("request", None)
    if req is not None:
    event.update({
    "client_addr":
    req.client_addr,
    "user_id":
    req.authenticated_userid,
    })
    return event

    View full-size slide

  93. {"client_addr": "192.168.42.42",
    "user_id": null,
    "event": "view_rendered"}

    View full-size slide

  94. import logging, sys
    logger = logging.getLogger()
    logger.addHandler(
    logging.StreamHandler(
    sys.stdout
    )
    )

    View full-size slide

  95. import logging, sys
    logger = logging.getLogger()
    logger.addHandler(
    logging.StreamHandler(
    sys.stdout
    )
    )

    View full-size slide

  96. Capture
    • into files

    View full-size slide

  97. Capture
    • into files
    • to syslog / a queue

    View full-size slide

  98. Capture
    • into files
    • to syslog / a queue
    • pipe into a logging agent

    View full-size slide

  99. log = log.bind(user="guido")
    log.info("user.login")

    View full-size slide

  100. {"event": "user.login", "user": "guido"}
    log = log.bind(user="guido")
    log.info("user.login")
    structlog

    View full-size slide

  101. {"event": "user.login", "user": "guido"}
    log = log.bind(user="guido")
    log.info("user.login")
    structlog
    stdout
    logging

    View full-size slide

  102. {"event": "user.login", "user": "guido"}
    log = log.bind(user="guido")
    log.info("user.login")
    structlog
    stdout
    logging
    /var/log/app/current
    runit’s svlogd
    (adds TAI64 timestamp)

    View full-size slide

  103. {"event": "user.login", "user": "guido"}
    log = log.bind(user="guido")
    log.info("user.login")
    structlog
    logstash-forwarder
    logstash
    stdout
    logging
    /var/log/app/current
    runit’s svlogd
    (adds TAI64 timestamp)

    View full-size slide

  104. {"event": "user.login", "user": "guido"}
    log = log.bind(user="guido")
    log.info("user.login")
    structlog
    logstash-forwarder
    logstash
    1010001101
    Elasticsearch
    stdout
    logging
    /var/log/app/current
    runit’s svlogd
    (adds TAI64 timestamp)

    View full-size slide

  105. Progress!
    ✓ ✓

    View full-size slide

  106. Progress!
    ✓ ✓ ✓

    View full-size slide

  107. Wait a Minute…

    View full-size slide

  108. Wait a Minute…

    View full-size slide

  109. Ugh
    try:
    STATS.time.timing():
    something()
    except Exception as e:
    log.error("omg", exc_info=e)
    raven_client.captureError()
    STATS.errors.mark()

    View full-size slide

  110. Ugh
    try:
    STATS.time.timing():
    something()
    except Exception as e:
    log.error("omg", exc_info=e)
    raven_client.captureError()
    STATS.errors.mark()

    View full-size slide

  111. Awww
    try:
    something()
    except Exception as e:
    log.exception("omg")

    View full-size slide

  112. Errors
    • logging integration

    View full-size slide

  113. Errors
    • logging
    • structlog

    View full-size slide

  114. Errors
    • logging
    • structlog
    • web apps: error views

    View full-size slide

  115. Error View
    @view_config(context=Exception)
    def err(exc, request):
    return Response(
    "oops: " + raven_client.captureException()
    )

    View full-size slide

  116. Metrics
    measure from outside

    View full-size slide

  117. WSGI Servers

    View full-size slide

  118. WSGI Servers
    • gunicorn: --statsd-host

    View full-size slide

  119. WSGI Servers
    • gunicorn: --statsd-host
    • uWSGI:

    View full-size slide

  120. WSGI Servers
    • gunicorn: --statsd-host
    • uWSGI:
    • --stats-push statsd:

    View full-size slide

  121. WSGI Servers
    • gunicorn: --statsd-host
    • uWSGI:
    • --stats-push statsd:
    • --carbon

    View full-size slide

  122. Middleware
    def timing_tween_factory(handler, registry):
    def timing_tween(request):
    with STATS.request_time.time():
    return handler(request)
    return timing_tween

    View full-size slide

  123. Middleware
    def timing_tween_factory(handler, registry):
    def timing_tween(request):
    with STATS.request_time.time():
    return handler(request)
    return timing_tween

    View full-size slide

  124. Extract from Logs

    View full-size slide

  125. Extract from Logs
    logstash

    View full-size slide

  126. Extract from Logs
    logstash
    • StatsD

    View full-size slide

  127. Extract from Logs
    logstash
    • StatsD
    • Graphite

    View full-size slide

  128. Extract from Logs
    logstash
    • StatsD
    • Graphite
    • librato

    View full-size slide

  129. Extract from Logs
    logstash
    • StatsD
    • Graphite
    • librato
    • …

    View full-size slide

  130. Extract from Logs
    logstash
    • StatsD
    • Graphite
    • librato
    • …
    or structlog

    View full-size slide

  131. Leverage
    Monitoring

    View full-size slide

  132. Remaining
    1. measure code paths

    View full-size slide

  133. Remaining
    1. measure code paths
    2. expose gauges

    View full-size slide

  134. ox.cx/b
    @hynek
    vrmd.de

    View full-size slide