Slide 1

Slide 1 text

Log all the things! Honza Král @honzakral

Slide 2

Slide 2 text

Logs?

Slide 3

Slide 3 text

Log lines Twitter feed Invoices Metrics Events!

Slide 4

Slide 4 text

Why?

Slide 5

Slide 5 text

What happened last Tuesday?

Slide 6

Slide 6 text

Multiple machines Multiple logs Analysis/Discovery Time period Grep?

Slide 7

Slide 7 text

Time? Time?! Time! apache unix timestamp log4j postfix.log ISO 8601 [23/Jan/2014:17:11:55 +0000] 1390994740 2009-01-01T12:00:00+01:00 [2014-01-29 12:28:25,470] Feb 3 20:37:35

Slide 8

Slide 8 text

Web Server logs VS Load Balancer see immediately that caching is off static files leaking to gunicorn Web Server VS Database 500s VS Deploys new version has a bug Traffic VS Ad Campaigns Correlate events

Slide 9

Slide 9 text

Central storage Even for data from different systems Enriched data IP -> location, hostname URL -> author, product, category Search user:honza status:404 Analysis Visualisations for easy pattern discovery Ideal state

Slide 10

Slide 10 text

Centralised Logging

Slide 11

Slide 11 text

Steps Collect data Parse data Enrich data Store data Search and aggregate Visualize data

Slide 12

Slide 12 text

Elastic Stack

Slide 13

Slide 13 text

Steps in Elastic Stack Collect data Parse data Enrich data Store data Search and aggregate Visualize data

Slide 14

Slide 14 text

Steps in Elastic Stack Collect data Parse data Enrich data Store data Search and aggregate Visualize data

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

metricbeat: modules: - module: redis metricsets: ["info"] hosts: ["host1"] period: 1s enabled: true - module: apache metricsets: ["info"] hosts: ["host1"] period: 30s enabled: true filebeat: prospectors: - paths: - "logs/access.log" document_type: access multiline: pattern: ^# negate: true match: after protocols: http: ports: [80, 8000] mysql: ports: [3306] redis: ports: [6379] pgsql: ports: [5432] thrift: ports: [9090] output: logstash: hosts: ["localhost:5044"]

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Inputs Monitoring collectd, graphite, ganglia, snmptrap, zenoss Datastores elasticsearch, redis, sqlite, s3 Queues kafka, rabbitmq, zeromq Logging beats, eventlog, gelf, log4j, relp, syslog, varnish log Platforms drupal_dblog, gemfire, heroku, sqs, s3, twitter Local exec, generator, file, stdin, pipe, unix Protocol imap, irc, stomp, tcp, udp, websocket, wmi, xmpp

Slide 19

Slide 19 text

Filters aggregate alter anonymize collate csv cidr clone cipher checksum date dns drop elasticsearch extractnumbers environment elapsed fingerprint geoip grok i18n json json_encode kv mutate metrics multiline metaevent prune punct ruby range syslog_pri sleep split throttle translate uuid urldecode useragent xml zeromq ...

Slide 20

Slide 20 text

Outputs Store elasticsearch, gemfire, mongodb, redis, riak, rabbitmq, solr Monitoring ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabbix Notification email, hipchat, irc, pagerduty, sns Protocol gelf, http, lumberjack, metriccatcher, stomp, tcp, udp, websocket, xmpp External service google big query, google cloud storage, jira, loggly, riemann, s3, sqs, syslog, datadog External monitoring boundary, circonus, cloudwatch, librato Local csv, dots, exec, file, pipe, stdout, null

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Open Source
 
 Document-based
 
 Based on Lucene 
 JSON over HTTP Distributed Search Engine

Slide 23

Slide 23 text

Cluster Collection of Nodes Index Collection of Shards Shard Unit of scale Distributed across cluster Primary and replica Data Management node 1 orders products 2 1 4 1 node 2 orders products 2 2 node 3 orders 3 4 1 3 products

Slide 24

Slide 24 text

Time based data flow Current replicas to speed up search on stronger boxes Week old snapshot keep only 1 replica Month old move to weaker boxes 2 months close the indices 3 months delete

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Architecture Enrich Visualize Collect Store

Slide 30

Slide 30 text

Logging and Python

Slide 31

Slide 31 text

Track metrics execution time query time # of queries Include metadata user_id content Log as JSON Enhance your logs

Slide 32

Slide 32 text

Add structured info Track info through services Log to file Add filebeat to read the file Structlog

Slide 33

Slide 33 text

Thanks! Honza Král @honzakral