Measuring and Logging Everything in Real Time

Slide 1

Slide 1 text

Measuring and Logging Everything in Real-Time @BastianHofmann As the title says in the next hour we are going to talk about two very important aspects of any web application that are often overlooked in the beginning:

Slide 2

Slide 2 text

Logging 1st is logging, so that you actually now, what's happening in your application if you have to track down an error

Slide 3

Slide 3 text

Measuring And 2nd is measuring and monitoring constantly how your application is behaving

Slide 4

Slide 4 text

Many roads Of course as with many things there are many roads to accomplish that and many tools available, that help you with it. i'll not going to show you all of them, but just what we ended up using at ResearchGate. They work great for us, but depending on your use cases other tools may be more suited to your needs. The important thing to get out of this talk: take care of it

Slide 5

Slide 5 text

A few words about me before that ...

Slide 6

Slide 6 text

i work at researchgate, the social network for scientists and researchers

Slide 7

Slide 7 text

ResearchGate gives science back to the people who make it happen. We help researchers build reputation and accelerate scientific progress. On their terms. ‟ the goal is to give...

Slide 8

Slide 8 text

over 2.9 million users

Slide 9

Slide 9 text

here some impressions of the page

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

http://gigaom.com/2013/06/05/heres-how-bill-gatess- researchgate-investment-might-change-the-world-for-the- better http://venturevillage.eu/researchgate

Slide 12

Slide 12 text

have this, and also work on some cool stuff

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

in addition to this i also speak frequently on conferences throughout the world

Slide 15

Slide 15 text

work and live in berlin

Slide 16

Slide 16 text

Questions? Ask by the way, if you have any questions throughout this talk, if you don't understand something, just raise your hand and ask. probably my fault anyways since i spoke to quickly or my accent was too bad

Slide 17

Slide 17 text

Logging So 1st: Logging

Slide 18

Slide 18 text

For when something goes wrong

Slide 19

Slide 19 text

server error log access log debug logs slow query log ... Let's start with a simple setup: we have one server with an apache and some database on it. Of course all of these services are writing logs somewhere

Slide 20

Slide 20 text

$ ssh my_simple_server getting at these is very simple, just ssh into the server

Slide 21

Slide 21 text

$ tail -f error.log $ cat error.log | grep Exception and use your favorite unix command line tools to view the logs

Slide 22

Slide 22 text

Error Logs Let's look at the application related logs a bit more closely, for example the ... as it says, Apache and PHP log all errors that happen in your application there

Slide 23

Slide 23 text

ErrorLog /var/logs/apache/error.log The location is speciﬁed in your Apache conﬁguration somewhere like this

Slide 24

Slide 24 text

php.ini •error_reporting •display_errors •display_startup_errors •log_errors •error_log And there are some ini settings in PHP for conﬁguring how errors should be handled, just read up on them in the documentation

Slide 25

Slide 25 text

callable set_exception_handler( callable $exception_handler ); Since you probably don't want to display an exception to the user (or a blank page) but a nice error page, you should set a custom exception handler in your application

Slide 26

Slide 26 text

callable set_error_handler( callable $error_handler [, int $error_types = E_ALL | E_STRICT ] ); and while you are at it, also set a custom error_handler, hey it's PHP

Slide 27

Slide 27 text

bool error_log ( string $message [, int $message_type = 0 [, string $destination [, string $extra_headers ]]] ) You can then log to the apache error log with the error_log function

Slide 28

Slide 28 text

and the user can get then such a nice error page, looking at this screenshot: ﬁrst you may notice is the http response code displayed there

Slide 29

Slide 29 text

HTTP Response Codes http://www.w3.org/Protocols/rfc2616/rfc2616- sec6.html Please read up on it and choose correct ones for your application. helps you greatly with ﬁltering the important stuff from the unimportant later on: a 503 may be much more serious then a 404

Slide 30

Slide 30 text

on the screenshot here you also see some additional codes and ids displayed that help to identify the error in your system later on

Slide 31

Slide 31 text

Log additional info of course you should not only display this information but also write it to your log ﬁles, you can easily do this in your custom error handler, and if you need to ﬁnd the error that resulted in this error page you can just grep for the error code

Slide 32

Slide 32 text

Not only error codes

Slide 33

Slide 33 text

Request Information

Slide 34

Slide 34 text

But what about fatal errors A bit earlier I said, you should actually write your own error handler to catch them show an error page and add custom information to the error logs, but what about fatal error, the php script is aborted then

Slide 35

Slide 35 text

Use error_get_last() in a Shutdown Handler http://php.net/manual/en/function.error- get-last.php

Slide 36

Slide 36 text

Access Logs Another important log is the access log, which logs each incoming request

Slide 37

Slide 37 text

192.168.56.1 - - [09/Jul/2012:19:18:19 +0200] "GET /rg_trunk/webroot/c/af10c/ images/template/rg_logo_default.png HTTP/ 1.1" 200 882 "http://devm/rg_trunk/webroot/ directory/publications/" this is what it usually looks like

Slide 38

Slide 38 text

LogFormat "%h %l %u %t \"%r\" %>s %b" custom CustomLog /var/logs/apache/access.log custom http://httpd.apache.org/docs/2.2/mod/ mod_log_conﬁg.html#logformat and you conﬁgure it in apache like this, there is already lots of information from the request you can put in there, like url, response code, request time etc

Slide 39

Slide 39 text

http://de.php.net/apache_note string apache_note ( string $note_name [, string $note_value = "" ] ) but with the apache_note function you can add additional information from your application. by the way this is of course also possible with nginx, there you just set some response headers in php, in nginx you can then access these upstream headers, log them, and then ﬁlter them out so the clients don't get them

Slide 40

Slide 40 text

LogFormat "...\"%{referer}i\" \"%{user- agent}i\" %{session_id}n %{account_id}n..." custom but back to apache: these notes then can be referenced in your log format like this

Slide 41

Slide 41 text

Debug Logs next log type that your application can write are plain debug logs that are not related to errors

Slide 42

Slide 42 text

https://github.com/Seldaek/monolog/ a good library recommendation for doing this instead of writing to a ﬁle manually is monolog

Slide 43

Slide 43 text

pushHandler( new StreamHandler( 'path/to/your.log', Logger::WARNING ) ); // add records to the log $log->addWarning('Foo'); $log->addError('Bar');

Slide 44

Slide 44 text

Handlers •Stream •Mail •FirePHP •ChromePHP •Socket •Roating File •MongoDB •Syslog •Gelf •Null •Test •FingersCrossed nice thing is: it supports multiple handlers out of the box, to log wherever you need it, especially useful: the fingers crossed handler: you just put your logs in this handler but it does not log them directly to a file, only when a certain condition was met, like a threshold all already written logs and all further logs are written to a file

Slide 45

Slide 45 text

Log in a structured way one thing that can help you greatly with managing huge amount of logs with lot's of different additional information in it is logging in a structured way (not only for debug but also for error logs)

Slide 46

Slide 46 text

JSON http://www.ietf.org/rfc/rfc4627.txt in my opinion a good format is json, since it is still human readable

Slide 47

Slide 47 text

setFormatter(new JsonFormatter()); $log->pushHandler($handler); in monolog you can just set a formatter for this

Slide 48

Slide 48 text

Logs from other services but of course your application probably is a bit more complex, soa anyone? so you have also other services somewhere logging something

Slide 49

Slide 49 text

web server http service http service http service http service user request log log log log log the setup may look like this, a request comes to a webserver and your php application on there calls other services. each of them have their own logs. for example the http service there. now if an error happens in the http service we are probably going to display an error as well in our php app. but how can we identify then which exception on the http service lead to the displayed error on the web server?

Slide 50

Slide 50 text

Correlation / Tracing ID a nice way of doing this in a generlized, non-custom way is by using a common correlation or tracing id

Slide 51

Slide 51 text

web server http service http service http service http service create unique trace_id for request user request trace_id trace_id trace_id trace_id log log log log log so when the user request ﬁrst hits your system you generate a unique trace_id for this request and you then pass this to all underlying services through an http header.

Slide 52

Slide 52 text

X-Trace-Id: bbr8ehb984tbab894

Slide 53

Slide 53 text

web server http service http service http service http service create unique trace_id for request user request trace_id trace_id trace_id trace_id log log log log log everyone then puts these tracing id in every logs they write. so if you have a tracing id you can easily ﬁnd all logs for this request by just greping for the trace_id

Slide 54

Slide 54 text

but... usually your application does not look like this

Slide 55

Slide 55 text

but more like this: you have multiple servers lying around

Slide 56

Slide 56 text

which means ssh-ing into all of these servers (dsh...) and then grepping for information over multiple logs (access log, error log, debug logs, ...) can become quite tedious and time consuming

Slide 57

Slide 57 text

Aggregate the logs in a central place so to make it easier: work on a central log management and ...

Slide 58

Slide 58 text

Make them easily full-text searchable also ...

Slide 59

Slide 59 text

Make them aggregate-able and make different kind of logs aggregateable

Slide 60

Slide 60 text

Always Log to file but whatever you do and whatever we are going to talk about...

Slide 61

Slide 61 text

Seriously...

Slide 62

Slide 62 text

Always Log to file because this is your backup when your central log management solution will fail (network errors, etc). and i'll guarantee you, it will fail sometime, probably at the worst moment

Slide 63

Slide 63 text

Directly to a database ﬁrst naive approach to central log management is to log directly from you application to a database (you saw the mongoDbHandler from Monolog)

Slide 64

Slide 64 text

webserver webserver webserver DB setup would look like this: everything in one place, easily searchable, great, ﬁnished

Slide 65

Slide 65 text

Disadvantages not quite, it has some disadvantages

Slide 66

Slide 66 text

Database is down? what happens if

Slide 67

Slide 67 text

Database is slow?

Slide 68

Slide 68 text

Database is full?

Slide 69

Slide 69 text

How to integrate access logs?

Slide 70

Slide 70 text

Influences application performance because of all the database queries you are doing, and chances are if there are problems on your platform and multiple errors are occurring, directly writing to a database will make your problems worse

Slide 71

Slide 71 text

Frontend? also there is still no frontend to easily search and monitor exceptions, of course there are sql clients, phpmyadmin, rockmongo etc, but they are multi purpose tools and not really made for displaying and ﬁltering logs

Slide 72

Slide 72 text

Better solutions? are there ...

Slide 73

Slide 73 text

graylog2 http://graylog2.org/ on tool that I quite like for this is called graylog2

Slide 74

Slide 74 text

Full text search it comes out of the box with a very nice interface to do...

Slide 75

Slide 75 text

Structured Messages in ...

Slide 76

Slide 76 text

Metrics & Alarms and offers metrics, alarms on your logs

Slide 77

Slide 77 text

that's what the interface looks like, you'll see this in action in a minute

Slide 78

Slide 78 text

https://github.com/bashofmann/ vm_graylog2_amqp_logstash_apache

Slide 79

Slide 79 text

Graylog2 UDP GELF Messages elasticsearch webserver webserver webserver the easiest approach is to just send the messages in a special format (gelf, supported by a monolog handler) with a upd request from your app servers to graylog which stores them in elasticsearch. udp is quite nice since it is ﬁre and forget, so it does not inﬂuence our application too much (if you reuse connections and don't create a new one for each log)

Slide 80

Slide 80 text

{ "version": "1.0", "host": "www1", "short_message": "Short message", "full_message": "Backtrace here\n \nmore stuff", "timestamp": 1291899928.412, "level": 1, "facility": "payment-backend", "file": "/var/www/somefile.rb", "line": 356, "_user_id": 42, "_something_else": "foo" } a typical graylog gelf message looks like this, some default fields, and user specific fields prepended with an underscore

Slide 81

Slide 81 text

Disadvantages but ...

Slide 82

Slide 82 text

Graylog/ elasticsearch is down? still if graylog/es is down you are using logs

Slide 83

Slide 83 text

Graylog/ elasticsearch is full?

Slide 84

Slide 84 text

Packet loss udp has the disadvantage that because of network errors your logs possibly don't arrive

Slide 85

Slide 85 text

Graylog2 elasticsearch webserver webserver webserver AMQP GELF GELF GELF GELF better approach, put a queue in between, also good for load balancing

Slide 86

Slide 86 text

Don't influence your application by logging but still if you push something in a queue you are still inﬂuencing your production system unnecessarily

Slide 87

Slide 87 text

Always Log to file and remember you should log to ﬁle anyways

Slide 88

Slide 88 text

logstash http://logstash.net/ enter the next tool we are using, logstash is a very powerful tool to handle log processing

Slide 89

Slide 89 text

input filter output basic workflow is that you have some input where logstash gets log messages in, on this input you can execute multiple filters that modify the message and then you can output the filtered message somewhere

Slide 90

Slide 90 text

Very rich plugin system to do this it offers a very large and rich plugin system for all kinds of inputs, ﬁlters and outputs, and you can also write your own

Slide 91

Slide 91 text

Graylog2 elasticsearch webserver webserver webserver AMQP log log log logstash logstash logstash so our setup would look like this, logstash instances on each app server take the local log ﬁles, parse and ﬁlter them and send them to the queue

Slide 92

Slide 92 text

input { file { type => "error" path => [ "/var/logs/php/*.log" ] add_field => [ "severity", "error" ] } file { type => "access" path => [ "/var/logs/apache/*_access.log" ] add_field => [ "severity", "info" ] } } input conﬁguration

Slide 93

Slide 93 text

filter{ grok { match => ["@source", "\/%{USERNAME:facility}\.log$"] } grok { type => "access" pattern => "^%{IP:OriginalIp} \s[a-zA-Z0-9_-]+\s[a-zA-Z0-9_-]+\s\[.*? \]\s\"%{DATA:Request}..." } } ﬁlter conﬁguration

Slide 94

Slide 94 text

output { amqp { host => "amqp.host" exchange_type => "fanout" name => "logs" } } output conﬁguration

Slide 95

Slide 95 text

Measuring now that you have all the log messages in a central, searchable, aggregateable place: lets go to measuring

Slide 96

Slide 96 text

For knowing what happens so that you know what actually happens in your application and in your system

Slide 97

Slide 97 text

Technical Metrics and with that i mean technical metrics like load on your servers, available memory, requests, exceptions, response times (a bit more on that later)

Slide 98

Slide 98 text

Business Metrics but also and probably even more important, business metrics

Slide 99

Slide 99 text

Define your KPIs or so called key performance indicators, example: signups, proﬁle views, messages, uploaded images, average volume per sale, ... it really depends on your application

Slide 100

Slide 100 text

Graphite http://graphite.wikidot.com/ again there are many tools available to collect and display these metrics: one i want to highlight is graphite

Slide 101

Slide 101 text

webserver webserver webserver graphite UDP easiest way to use it is just sending udp requests with the metric you want to count to graphite, but if you're collection and measuring lots of stuff this could put quite a high load on graphite

Slide 102

Slide 102 text

StatsD https://github.com/etsy/statsd/ etsy wrote a great small tool called statsd to help you with that

Slide 103

Slide 103 text

webserver webserver webserver graphite UDP statsd you can just hook it in between your servers and graphite as a small load balancing daemon. it receives all messages, aggregates them and then ﬂushes them in a certain interval, like very 10 seconds to graphite

Slide 104

Slide 104 text

webserver webserver webserver statsd statsd statsd graphite aggregated UPD message of course you also can run one statsd on every webserver, so that your statsd gets not overloaded

Slide 105

Slide 105 text

webserver webserver webserver statsd statsd statsd graphite aggregated UPD message statsd and you even can arrange different statsd in a row

Slide 106

Slide 106 text

Metric? what is a metric: three different types

Slide 107

Slide 107 text

Counters counters (how often did something happen)

Slide 108

Slide 108 text

Timers timer (how long did something take)

Slide 109

Slide 109 text

Gauges gauges (collect arbitrary values (amount of a sale)

Slide 110

Slide 110 text

graphite comes with a powerful interface where you can plot and aggregate this data into graphs and perform different mathematical functions on it to get it exactly the way you want to display your data

Slide 111

Slide 111 text

https://metrics.librato.com/ and if you don't want to be bothered with setting graphite up, you can also use hosted solutions like librato (statsd comes with different backends to push data to)

Slide 112

Slide 112 text

http://mashable.com/2010/10/13/etsy-ofﬁce-pics/ and if you have all of this, you should make your graphs really visible, here a picture from the etsy office

Slide 113

Slide 113 text

Remember: Response Times last but not least:

Slide 114

Slide 114 text

Easiest approach: Get it out of the access log

Slide 115

Slide 115 text

Slide 116

Slide 116 text

Is this actually what we want to measure? Is this really the response time the user sees?

Slide 117

Slide 117 text

External Resources CSS JavaScript Images Slow connections Slow Browsers SSL Handshakes DNS resolving so for the users the actual response time is actually when all this finished loading

Slide 118

Slide 118 text

That is what we want to measure!

Slide 119

Slide 119 text

https://github.com/lognormal/boomerang

Slide 120

Slide 120 text

browser JS: boomerang logstash trackingServer access log requests tracking image with timing information as query parameters graphite statsd boomerang collects from various sources load times of the current page (e.g. window.performance api), and with them requests a tracking image from a tracking server you specify (should be on another domain, so that you don't send unnecessary cookies there). on this tracking server a logstash can parse the access logs and then send the collected timers via statsd to graphite

Slide 121

Slide 121 text

http://twitter.com/BastianHofmann http://profiles.google.com/bashofmann http://lanyrd.com/people/BastianHofmann http://speakerdeck.com/u/bastianhofmann https://github.com/bashofmann https://www.researchgate.net/profile/Bastian_Hofmann/ [email protected] thanks, you can contact me on any of these platforms or via mail. please go to joind.in and rate the talks you have seen