Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Beyond grep – PyCon JP
Search
Hynek Schlawack
October 10, 2015
Programming
3.7k
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Beyond grep – PyCon JP
Hynek Schlawack
October 10, 2015
More Decks by Hynek Schlawack
See All by Hynek Schlawack
Python’s True Superpower
hynek
0
250
Design Pressure
hynek
0
1.8k
Subclassing, Composition, Python, and You
hynek
3
540
Classy Abstractions @ Python Web Conf
hynek
0
240
On the Meaning of Version Numbers
hynek
0
420
Maintaining a Python Project When It’s Not Your Job
hynek
1
2.5k
How to Write Deployment-friendly Applications
hynek
0
2.6k
Solid Snakes or: How to Take 5 Weeks of Vacation
hynek
2
5.9k
Get Instrumented: How Prometheus Can Unify Your Metrics
hynek
4
11k
Other Decks in Programming
See All in Programming
JavaDoc 再入門
nagise
0
280
今さら聞けないCancellationToken
htkym
0
220
さぁV100、メモリをお食べ・・・
nilpe
0
130
権限チェックの一貫性を型で守る TypeScript による多層防御
mnch
4
1.1k
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
750
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
140
Claspは野良GASの夢をみるか
takter00
0
170
密結合なバックエンドから TypeScript のコードを生成する
kemuridama
1
740
AIとRubyの静的型付け
ukin0k0
0
540
決定論的オーケストレーションの設計と実装 / Design and Implementation of Deterministic Orchestration
nrslib
3
1.1k
キャリア迷子上等 ─ "ない道"は自分で作ればいい
16bitidol
3
1k
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
130
Featured
See All Featured
Accessibility Awareness
sabderemane
1
130
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
160
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
220
Design in an AI World
tapps
1
220
Bash Introduction
62gerente
615
210k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
200
Marketing to machines
jonoalderson
1
5.4k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
2
570
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Transcript
Beyond grep Pragmatic Logging & Metrics Hynek Schlawack
None
ͩΩͷ΅! @hynek https://hynek.me https://github.com/hynek
www.variomedia.de
ox.cx/b
Agenda
Agenda
Agenda
Agenda
Errors
Errors
Requirements
Requirements •fast
Requirements •fast •once
Requirements •fast •once
Requirements •fast •once •context
Requirements •fast •once •context
None
None
None
None
None
None
None
None
None
None
Raven-Python
Raven-Python Transports:
Raven-Python Transports: • gevent
Raven-Python Transports: • gevent • aiohttp
Raven-Python Transports: • gevent • aiohttp • Twisted
Raven-Python Transports: • gevent • aiohttp • Twisted • …8
others
Raven-Python Integrations: Transports: • gevent • aiohttp • Twisted •
…8 others
Raven-Python Integrations: • logging Transports: • gevent • aiohttp •
Twisted • …8 others
Raven-Python Integrations: • logging • Django Transports: • gevent •
aiohttp • Twisted • …8 others
Raven-Python Integrations: • logging • Django • WSGI Transports: •
gevent • aiohttp • Twisted • …8 others
Raven-Python Integrations: • logging • Django • WSGI • …9
others Transports: • gevent • aiohttp • Twisted • …8 others
Vanilla from raven import Client client = Client("https://yoursentry") try: 1
/ 0 except ZeroDivisionError: client.captureException()
Vanilla from raven import Client client = Client("https://yoursentry") try: 1
/ 0 except ZeroDivisionError: client.captureException()
Vanilla from raven import Client client = Client("https://yoursentry") try: 1
/ 0 except ZeroDivisionError: client.captureException()
Vanilla @client.capture_exceptions def foo(): raise Exception()
Django INSTALLED_APPS = ( … "raven.contrib.django.raven_compat", … )
Progress!
Progress! ✓
Metrics
Metrics?
Metrics? • numbers in a DB
Metrics? • numbers in a DB • guessing vs knowing
–D. J. Bernstein “Measure. Don’t speculate.”
System Metrics vs App Metrics • load • network traffic
• I/O • …
System Metrics vs App Metrics • load • network traffic
• I/O • … • counters
System Metrics vs App Metrics • load • network traffic
• I/O • … • counters • timers
System Metrics vs App Metrics • load • network traffic
• I/O • … • counters • timers • gauges
System Metrics vs App Metrics • load • network traffic
• I/O • … • counters • timers • gauges • …
Aggregation
Correlation
Math
Math • # reqs / s?
Math • # reqs / s? • worst 0.01% ⟨req
time⟩?
Math • # reqs / s? • worst 0.01% ⟨req
time⟩? • don’t try this alone!
Monitoring
Monitoring • latency
Monitoring • latency • error rates
Monitoring • latency • error rates • anomalies
Storage
Librato Metrics
Graphite 800 pound gorilla
Grafana
InfluxDB Graphite++ in Go
InfluxDB Graphite++ in Go
Prometheus Metrics & Monitoring
Prometheus Metrics & Monitoring
Prometheus Metrics & Monitoring
Prometheus Metrics & Monitoring
Prometheus Metrics & Monitoring
Prometheus Metrics & Monitoring
Collecting
Approaches
Approaches 1. external aggregation: StatsD, Riemann
Approaches 1. external aggregation: StatsD, Riemann + no state, simple
Approaches 1. external aggregation: StatsD, Riemann + no state, simple
– no direct introspection
Approaches 1. external aggregation: StatsD, Riemann + no state, simple
– no direct introspection 2. aggregate in-app, deliver to DB
Approaches 1. external aggregation: StatsD, Riemann + no state, simple
– no direct introspection 2. aggregate in-app, deliver to DB + in-app dashboard, useful in dev
Approaches 1. external aggregation: StatsD, Riemann + no state, simple
– no direct introspection 2. aggregate in-app, deliver to DB + in-app dashboard, useful in dev – state w/i app
None
(g|py)?statsd(py|-client)? import statsd c = statsd.StatsClient( "statsd.local", 8125 )
(g|py)?statsd(py|-client)?
(g|py)?statsd(py|-client)? c.incr("foo")
(g|py)?statsd(py|-client)? c.incr("foo") c.timing("stats.timed", 320)
Scales from greplin import scales from greplin.scales.meter import MeterStat STATS
= scales.collection( "/Resource", MeterStat("reqs"), scales.PmfStat("request_time") )
Scales from greplin import scales from greplin.scales.meter import MeterStat STATS
= scales.collection( "/Resource", MeterStat("reqs"), scales.PmfStat("request_time") )
Scales
Scales STATS.reqs.mark()
Scales STATS.reqs.mark() with STATS.request_time.time():
Scales STATS.reqs.mark() with STATS.request_time.time(): do_something_expensive()
Dashboard Scales
Dashboard Scales
Dashboard Scales … "request_time": { "count": 567315293, "99percentile": 0.10978688716888428, "75percentile":
0.013181567192077637, "min": 0.0002448558807373047, "max": 30.134822130203247, "98percentile": 0.08934824466705339, "95percentile": 0.027234303951263434, "median": 0.009176492691040039, "999percentile": 0.14235656142234793, "stddev": 0.01676855570363413, "mean": 0.013247184020535955 }, …
from greplin.scales import graphite pusher = graphite.GraphitePeriodicPusher( "graphite.host", 2003, period=30,
prefix="yourapp", ) pusher.allow("**") pusher.start()
Prometheus
Prometheus import random, time from prometheus_client import \ start_http_server, Summary
Prometheus import random, time from prometheus_client import \ start_http_server, Summary
FUNC_TIME = Summary( "func_seconds", "Time spent in func")
Prometheus import random, time from prometheus_client import \ start_http_server, Summary
FUNC_TIME = Summary( "func_seconds", "Time spent in func") @FUNC_TIME.time() def func(t): time.sleep(t)
Prometheus import random, time from prometheus_client import \ start_http_server, Summary
FUNC_TIME = Summary( "func_seconds", "Time spent in func") @FUNC_TIME.time() def func(t): time.sleep(t) if __name__ == '__main__': start_http_server(8000) while True: func(random.random())
Prometheus # HELP func_seconds Time spent in func # TYPE
func_seconds summary func_seconds_count 78.0 func_seconds_sum 37.8028838634491
Progress! ✓
Progress! ✓ ✓
Logging
Splunk
Splunk
Papertrail
loggly
ELK Elasticsearch + Logstash + Kibana
None
None
Logstash
Elasticsearch Logstash
Elasticsearch Logstash
Elasticsearch Logstash Kibana
Graylog
Graylog
Graylog
Graylog
None
None
Goal @400000005270e0d604afce64 { "event": "logged_in", "user": "guido", "ip": "8.8.8.8", "referrer":
"http://google.com" }
Context & Format
structlog
BoundLogger structlog
Original Logger BoundLogger structlog e.g. logging.Logger
Original Logger BoundLogger bind values log.bind(key=value) Context structlog
Original Logger BoundLogger bind values log.bind(key=value) Context log events log.info(event,
another_key=another_value) + structlog
Original Logger BoundLogger Processor 1 Processor n Return Value Return
Value bind values log.bind(key=value) Context log events log.info(event, another_key=another_value) + structlog
>>> import structlog >>> log = structlog.get_logger() >>> log.error("ohai", why=42,
wtf=True) why=42 wtf=True event='ohai'
>>> import structlog >>> log = structlog.get_logger() >>> log.error("ohai", why=42,
wtf=True) why=42 wtf=True event='ohai'
>>> import structlog >>> log = structlog.get_logger() >>> log.error("ohai", why=42,
wtf=True) why=42 wtf=True event='ohai'
>>> import structlog >>> log = structlog.get_logger() >>> log.error("ohai", why=42,
wtf=True) why=42 wtf=True event='ohai'
>>> from structlog import get_logger >>> log = get_logger() >>>
log = log.bind(user='anonymous', some_key=23) >>> log = log.bind(user='hynek', another_key=42) >>> log.info('user.logged_in', happy=True) some_key=23 user='hynek' another_key=42 happy=True event='user.logged_in'
>>> from structlog import get_logger >>> log = get_logger() >>>
log = log.bind(user='anonymous', some_key=23) >>> log = log.bind(user='hynek', another_key=42) >>> log.info('user.logged_in', happy=True) some_key=23 user='hynek' another_key=42 happy=True event='user.logged_in'
>>> from structlog import get_logger >>> log = get_logger() >>>
log = log.bind(user='anonymous', some_key=23) >>> log = log.bind(user='hynek', another_key=42) >>> log.info('user.logged_in', happy=True) some_key=23 user='hynek' another_key=42 happy=True event='user.logged_in'
>>> from structlog import get_logger >>> log = get_logger() >>>
log = log.bind(user='anonymous', some_key=23) >>> log = log.bind(user='hynek', another_key=42) >>> log.info('user.logged_in', happy=True) some_key=23 user='hynek' another_key=42 happy=True event='user.logged_in'
>>> from structlog import get_logger >>> log = get_logger() >>>
log = log.bind(user='anonymous', some_key=23) >>> log = log.bind(user='hynek', another_key=42) >>> log.info('user.logged_in', happy=True) some_key=23 user='hynek' another_key=42 happy=True event='user.logged_in'
def view(request): log = logger.bind(request=request) log.info("view_rendered") return "hello, world"
def view(request): log = logger.bind(request=request) log.info("view_rendered") return "hello, world"
def view(request): log = logger.bind(request=request) log.info("view_rendered") return "hello, world"
def request_extractor(_, __, event): req = event.pop("request", None) if req
is not None: event.update({ "client_addr": req.client_addr, "user_id": req.authenticated_userid, }) return event
def request_extractor(_, __, event): req = event.pop("request", None) if req
is not None: event.update({ "client_addr": req.client_addr, "user_id": req.authenticated_userid, }) return event
def request_extractor(_, __, event): req = event.pop("request", None) if req
is not None: event.update({ "client_addr": req.client_addr, "user_id": req.authenticated_userid, }) return event
def request_extractor(_, __, event): req = event.pop("request", None) if req
is not None: event.update({ "client_addr": req.client_addr, "user_id": req.authenticated_userid, }) return event
{"client_addr": "192.168.42.42", "user_id": null, "event": "view_rendered"}
import logging, sys logger = logging.getLogger() logger.addHandler( logging.StreamHandler( sys.stdout )
)
import logging, sys logger = logging.getLogger() logger.addHandler( logging.StreamHandler( sys.stdout )
)
Capture
Capture • into files
Capture • into files • to syslog / a queue
Capture • into files • to syslog / a queue
• pipe into a logging agent
None
log = log.bind(user="guido") log.info("user.login")
{"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog
{"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog stdout
logging
{"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog stdout
logging /var/log/app/current runit’s svlogd (adds TAI64 timestamp)
{"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog logstash-forwarder
logstash stdout logging /var/log/app/current runit’s svlogd (adds TAI64 timestamp)
{"event": "user.login", "user": "guido"} log = log.bind(user="guido") log.info("user.login") structlog logstash-forwarder
logstash 1010001101 Elasticsearch stdout logging /var/log/app/current runit’s svlogd (adds TAI64 timestamp)
Progress! ✓ ✓
Progress! ✓ ✓ ✓
Wait a Minute…
Wait a Minute…
Ugh try: STATS.time.timing(): something() except Exception as e: log.error("omg", exc_info=e)
raven_client.captureError() STATS.errors.mark()
Ugh try: STATS.time.timing(): something() except Exception as e: log.error("omg", exc_info=e)
raven_client.captureError() STATS.errors.mark()
Awww try: something() except Exception as e: log.exception("omg")
Errors
Errors • logging integration
Errors • logging integration • structlog
Errors • logging integration • structlog • web apps: error
views
Error View @view_config(context=Exception) def err(exc, request): return Response( "oops: "
+ raven_client.captureException() )
None
Metrics measure from outside
WSGI Servers
WSGI Servers • gunicorn: --statsd-host <host>
WSGI Servers • gunicorn: --statsd-host <host> • uWSGI:
WSGI Servers • gunicorn: --statsd-host <host> • uWSGI: • --stats-push
statsd:<host>
WSGI Servers • gunicorn: --statsd-host <host> • uWSGI: • --stats-push
statsd:<host> • --carbon <host>
Middleware def timing_tween_factory(handler, registry): def timing_tween(request): with STATS.request_time.time(): return handler(request)
return timing_tween
Middleware def timing_tween_factory(handler, registry): def timing_tween(request): with STATS.request_time.time(): return handler(request)
return timing_tween
Extract from Logs
Extract from Logs logstash
Extract from Logs logstash • StatsD
Extract from Logs logstash • StatsD • Graphite
Extract from Logs logstash • StatsD • Graphite • librato
Extract from Logs logstash • StatsD • Graphite • librato
• …
Extract from Logs logstash • StatsD • Graphite • librato
• … or structlog
Leverage Monitoring
Remaining
Remaining 1. measure code paths
Remaining 1. measure code paths 2. expose gauges
Summary
Summary
Summary
Summary
Summary
ox.cx/b @hynek vrmd.de