Slide 1

Slide 1 text

‹#› Aaron Mildenstein Logstash Developer The practice & evolution of HTTP access monitoring

Slide 2

Slide 2 text

2 In the beginning…

Slide 3

Slide 3 text

Apache HTTP Server • Logs! • LogFormat "%h %l %u %t \"%r\" %>s %b" common
 CustomLog logs/access_log common • 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 • Tools • cat • tail • grep 3

Slide 4

Slide 4 text

cat This cat doesn't purr… 4

Slide 5

Slide 5 text

tail Let's put a tail on that cat… 5

Slide 6

Slide 6 text

tail • What's the difference between: • tail -f • tail -F • What does the -n flag do? • What does the -v flag do? • What does the --pid flag do? 6 Let's put a tail on that cat…

Slide 7

Slide 7 text

grep • Flags, flags, flags! • No seriously • I'm not going to attempt to describe all the things you can do with grep. • No, really, it's time to move on to other examples. • Okay, fine, just one thing. • cat access.log | grep 404 | tail • See what I did there? • Are you happy now? 7 g/re/p (globally search a regular expression and print)

Slide 8

Slide 8 text

Sadly, no... 8 But will it scale?

Slide 9

Slide 9 text

Anyone else ever try to do this? I used to do it all the time :-( 9

Slide 10

Slide 10 text

Apologies to Billy Idol… 10 What if I want to see more?

Slide 11

Slide 11 text

Elastic Stack (the early edition) • Ingest • Logstash • Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 11

Slide 12

Slide 12 text

Logstash • Ingest 12 input { file { path => "/path/to/access_log" } } message => "127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326"

Slide 13

Slide 13 text

Logstash • Ingest, then Tokenize 13 filter { grok { match => { "message" => "%{COMMONAPACHELOG}" } } }

Slide 14

Slide 14 text

Logstash • grok 14 clientip => "127.0.0.1" ident => "-" auth => "frank" timestamp => "10/Oct/2000:13:55:36 -0700" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326

Slide 15

Slide 15 text

Logstash + grok • Pro • Simple! • No changes to HTTP server configuration needed • Common to many HTTP servers • Con • CPU cost to parse everything • Still have to convert the date • Adding anything custom requires re-tooling your grok 15

Slide 16

Slide 16 text

Apologies to Matt Groening 16 "Computers can do that?!"

Slide 17

Slide 17 text

Apache HTTP Server • Logs, part 2! • CustomLog logs/json_access_log ls_apache_json 17 LogFormat "{\"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \"@version\": \"1\", \"vips\":[\"vip.example.com\"], \ \"clientip\": \"%a\", \"duration\": %D, \ \"status\": %>s, \"request\": \"%U%q\", \ \"urlpath\": \"%U\", \"urlquery\": \"%q\", \ \"bytes\": %B, \"verb\": \"%m\", \ \"referer\": \"%{Referer}i\", \ \"useragent\": \"%{User-agent}i\"}" ls_apache_json

Slide 18

Slide 18 text

Logstash • Pre-formatted JSON Ingest 18 input { file { path => "/path/to/json_access_log" codec => "json" } }

Slide 19

Slide 19 text

Logstash • without grok 19 clientip => "127.0.0.1" @timestamp => "2000-10-10T20:55:36.000Z" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326 duration => 123 referer => "…" useragent => "…" …

Slide 20

Slide 20 text

Logstash + pre-formatted JSON • Pro • CPU cost dramatically reduced • Can add/remove fields without having to edit Logstash • Can add complex fields that would be harder to grok • Con • Not all HTTP servers can do this • Tedious to push changes to lots of servers • Custom fields (like vip names) require custom configuration 20

Slide 21

Slide 21 text

Apologies to Sonny & Cher… 21 The beat goes on…

Slide 22

Slide 22 text

Elastic Stack (the current edition) • Ingest • Beats • Logstash • Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 22

Slide 23

Slide 23 text

Packet capture: type • Currently Packetbeat has several options for traffic capturing: • pcap, which uses the libpcap library and works on most platforms, but it’s not the fastest option. • af_packet, which uses memory mapped sniffing. This option is faster than libpcap and doesn’t require a kernel module, but it’s Linux- specific. • pf_ring, which makes use of an ntop.org project. This setting provides the best sniffing speed, but it requires a kernel module, and it’s Linux-specific. 23

Slide 24

Slide 24 text

Packet capture: protocols • dns • http • memcache • mysql • pgsql • redis • thrift • mongodb 24

Slide 25

Slide 25 text

HTTP: ports • Capture one port: • ports: 80 • Capture multiple ports: • ports: [80, 8080, 8000, 5000, 8002] 25

Slide 26

Slide 26 text

HTTP: send_headers / send_all_headers • Capture all headers: • send_all_headers: true • Capture only named headers: • send_headers: [ "host", "user-agent", "content- type", "referer" ] 26

Slide 27

Slide 27 text

HTTP: hide_keywords • The names of the keyword parameters are case insensitive. • The values will be replaced with the 'xxxxx' string. This is useful for avoiding storing user passwords or other sensitive information. • Only query parameters and top level form parameters are replaced. • hide_keywords: ['pass', 'password', 'passwd'] 27

Slide 28

Slide 28 text

Beats • Ingest (server-side) with Elasticsearch target 28 interfaces: device: eth0 type: af_packet http: ports: [80] send_all_headers: true output: elasticsearch: hosts: ["elasticsearch.example.com:9200"]

Slide 29

Slide 29 text

Beats • Ingest (server-side) with Logstash target 29 interfaces: device: eth0 type: af_packet http: ports: [80] send_all_headers: true output: logstash: hosts: ["logstash.example.com:5044"] tls: certificate_authorities: ["/path/to/certificate.crt"]

Slide 30

Slide 30 text

Why send to Logstash? • Enrich your data! • geoip • useragent • dns • grok • kv 30

Slide 31

Slide 31 text

Logstash • Ingest Beats (Pre-formatted JSON) 31 input { beats { port => 5044 ssl => true ssl_certificate => "/path/to/certificate.crt" ssl_key => "/path/to/private.key" codec => "json" } }

Slide 32

Slide 32 text

Logstash • Filters 32 filter { # Enrich HTTP Packetbeats if [type] == "http" and "packetbeat" in [tags] { geoip { source => "client_ip" } useragent { source => "[http][request_headers][user-agent]" target => "useragent" } } }

Slide 33

Slide 33 text

Extended JSON output from Beats + Logstash 33 "@timestamp": "2016-01-20T21:40:53.300Z", "beat": { "hostname": "ip-172-31-46-141", "name": "ip-172-31-46-141" }, "bytes_in": 189, "bytes_out": 6910, "client_ip": "68.180.229.41", "client_port": 57739, "client_proc": "", "client_server": "", "count": 1, "direction": "in", "http": { "code": 200, "content_length": 6516, "phrase": "OK", "request_headers": { "accept": "*/*", "accept-encoding": "gzip", "host": "example.com"

Slide 34

Slide 34 text

Extended JSON output from Beats + Logstash 34 "user-agent": "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com help/us/ysearch/slurp)" }, "response_headers": { "connection": "keep-alive", "content-type": "application/rss+xml; charset=UTF-8", "date": "Wed, 20 Jan 2016 21:40:53 GMT", "etag": "\"8c0b25ce7ade4b79d5ccf1ebb656fa51\"", "last-modified": "Wed, 24 Jul 2013 20:31:04 GMT", "link": "; rel=\"https://api.w.org/\"", "server": "nginx/1.4.6 (Ubuntu)", "transfer-encoding": "chunked", "x-powered-by": "PHP/5.5.9-1ubuntu4.14" } }, "ip": "172.31.46.141", "method": "GET", "params": "", "path": "/tag/redacted/feed/", "port": 80, "proc": "",

Slide 35

Slide 35 text

Extended JSON output from Beats + Logstash 35 "query": "GET /tag/redacted/feed/", "responsetime": 278, "server": "", "status": "OK", "type": "http", "@version": "1", "host": "ip-172-31-46-941", "tags": [ "packetbeat" ], "geoip": { "ip": "68.180.229.41", "country_code2": "US", "country_code3": "USA", "country_name": "United States", "continent_code": "NA", "region_name": "CA", "city_name": "Sunnyvale", "postal_code": "94089", "latitude": 37.42490000000001, "longitude": -122.00739999999999,

Slide 36

Slide 36 text

Extended JSON output from Beats + Logstash 36 "dma_code": 807, "area_code": 408, "timezone": "America/Los_Angeles", "real_region_name": "California", "location": [ -122.00739999999999, 37.42490000000001 ] }, "useragent": { "name": "Yahoo! Slurp", "os": "Other", "os_name": "Other", "device": "Spider" }

Slide 37

Slide 37 text

Logstash + beats (pre-formatted JSON) • Pro • CPU cost dramatically reduced (Logstash side) • Simple configuration to capture everything. • Logstash not necessary! • Useful to enrich data: geoip, useragent, headers, etc. • Con • Cannot directly monitor SSL traffic • CPU cost (server side) scales with traffic volume. Might be higher for heavy traffic. • Uncaptured packet data is unrecoverable. 37

Slide 38

Slide 38 text

Evolution? Is one path better than another? 38

Slide 39

Slide 39 text

Evolution? Is one path better than another? 39 • Unstructured log data • Structured log data • Captured packet data

Slide 40

Slide 40 text

Conclusions • There are a lot of ways to monitor your traffic and put the data into Elasticsearch. Not all of them require log files any more. • With many options, choose the ingest scenario that works for you. • There's also filebeat, topbeat, and several community contributed beats available. • Don't overlook enriching your data. There's a goldmine in there! 40

Slide 41

Slide 41 text

‹#› Questions? I'll be here all night…