Slide 1

Slide 1 text

Monitoring and introspecting Django Simon Willison, @simonw ! PyCon 2014

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

The interesting bugs only happen in production.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

What went wrong? ! What’s going to go wrong?

Slide 6

Slide 6 text

class UserBasedExceptionMiddleware:! def process_exception(self, request, exception):! if request.user.is_superuser:! return technical_500_response(request, *exc_info()) djangosnippets.org/snippets/935/

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

StatsD + Graphite

Slide 10

Slide 10 text

StatsD! • Timers, counters, gauges • Local daemon, speaking UDP • Aggregates stats and sends to Graphite • Graphite! • Stores time-series data • Renders graphs on-demand

Slide 11

Slide 11 text

StatsD! • Timers, counters, gauges • Local daemon, speaking UDP • Aggregates stats and sends to Graphite Graphite! • Stores time-series data • Renders graphs on-demand

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

GRAPH ALL THE THINGS

Slide 15

Slide 15 text

Intercept everything (monkey-patch if you have to)

Slide 16

Slide 16 text

• response/exception middleware • • DatabaseWrapper cursor.execute() • outgoing HTTP traffic render(request, template, context)!

Slide 17

Slide 17 text

Logs should be aggregated and searchable

Slide 18

Slide 18 text

Splunk

Slide 19

Slide 19 text

logstash + kibana

Slide 20

Slide 20 text

Correlation IDs

Slide 21

Slide 21 text

Application Service A Service B request

Slide 22

Slide 22 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 request

Slide 23

Slide 23 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 request

Slide 24

Slide 24 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 SERVICE_A do_action ... 9110dbba-6dd9... request

Slide 25

Slide 25 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 SERVICE_A do_action ... 9110dbba-6dd9... request

Slide 26

Slide 26 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 SERVICE_A do_action ... 9110dbba-6dd9... SERVICE_B do_action ... 9110dbba-6dd9... request

Slide 27

Slide 27 text

Application Service A Service B 9110dbba-6dd9-4d1c-8828-11be81ac0561 SERVICE_A do_action ... 9110dbba-6dd9... SERVICE_B do_action ... 9110dbba-6dd9... request GET /bar/ ... 9110dbba-6dd9...

Slide 28

Slide 28 text

Application Service A Service B SERVICE_A do_action ... 9110dbba-6dd9... SERVICE_B do_action ... 9110dbba-6dd9... response GET /bar/ ... 9110dbba-6dd9...

Slide 29

Slide 29 text

!

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Instrument your SQL queries

Slide 33

Slide 33 text

/* /2014/pycon/ */ SELECT "events_userevent"... /* manage.py send_subscriptions 1000 */ SELECT... /* 9110dbba-6dd9-4d1c-8828-11be81ac0561 */ SELECT...

Slide 34

Slide 34 text

No-one ever said… “I wish I had less information to help debug this problem”

Slide 35

Slide 35 text

• The most interesting bugs happen in production • Use statsd/graphite to understand what’s going on in your stack • Logs should be detailed, aggregated and searchable