Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Practice and Evolution of HTTP Access Monitoring

The Practice and Evolution of HTTP Access Monitoring

A history of HTTP access monitoring.

An overview of the tools used in the past, and tools available today, which dramatically improve the visibility and value of your traffic data.

B90dd4d638e158ce253acfa84f69e915?s=128

Aaron Mildenstein

July 13, 2016
Tweet

More Decks by Aaron Mildenstein

Other Decks in Programming

Transcript

  1. ‹#› Aaron Mildenstein Logstash Developer The practice & evolution of

    HTTP access monitoring
  2. 2 In the beginning…

  3. Apache HTTP Server • Logs! • LogFormat "%h %l %u

    %t \"%r\" %>s %b" common
 CustomLog logs/access_log common • 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 • Tools • cat • tail • grep 3
  4. cat This cat doesn't purr… 4

  5. tail Let's put a tail on that cat… 5

  6. tail • What's the difference between: • tail -f •

    tail -F • What does the -n flag do? • What does the -v flag do? • What does the --pid flag do? 6 Let's put a tail on that cat…
  7. grep • Flags, flags, flags! • No seriously • I'm

    not going to attempt to describe all the things you can do with grep. • No, really, it's time to move on to other examples. • Okay, fine, just one thing. • cat access.log | grep 404 | tail • See what I did there? • Are you happy now? 7 g/re/p (globally search a regular expression and print)
  8. Sadly, no... 8 But will it scale?

  9. Anyone else ever try to do this? I used to

    do it all the time :-( 9
  10. Apologies to Billy Idol… 10 What if I want to

    see more?
  11. Elastic Stack (the early edition) • Ingest • Logstash •

    Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 11
  12. Logstash • Ingest 12 input { file { path =>

    "/path/to/access_log" } } message => "127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326"
  13. Logstash • Ingest, then Tokenize 13 filter { grok {

    match => { "message" => "%{COMMONAPACHELOG}" } } }
  14. Logstash • grok 14 clientip => "127.0.0.1" ident => "-"

    auth => "frank" timestamp => "10/Oct/2000:13:55:36 -0700" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326
  15. Logstash + grok • Pro • Simple! • No changes

    to HTTP server configuration needed • Common to many HTTP servers • Con • CPU cost to parse everything • Still have to convert the date • Adding anything custom requires re-tooling your grok 15
  16. Apologies to Matt Groening 16 "Computers can do that?!"

  17. Apache HTTP Server • Logs, part 2! • CustomLog logs/json_access_log

    ls_apache_json 17 LogFormat "{\"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \"@version\": \"1\", \"vips\":[\"vip.example.com\"], \ \"clientip\": \"%a\", \"duration\": %D, \ \"status\": %>s, \"request\": \"%U%q\", \ \"urlpath\": \"%U\", \"urlquery\": \"%q\", \ \"bytes\": %B, \"verb\": \"%m\", \ \"referer\": \"%{Referer}i\", \ \"useragent\": \"%{User-agent}i\"}" ls_apache_json
  18. Logstash • Pre-formatted JSON Ingest 18 input { file {

    path => "/path/to/json_access_log" codec => "json" } }
  19. Logstash • without grok 19 clientip => "127.0.0.1" @timestamp =>

    "2000-10-10T20:55:36.000Z" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326 duration => 123 referer => "…" useragent => "…" …
  20. Logstash + pre-formatted JSON • Pro • CPU cost dramatically

    reduced • Can add/remove fields without having to edit Logstash • Can add complex fields that would be harder to grok • Con • Not all HTTP servers can do this • Tedious to push changes to lots of servers • Custom fields (like vip names) require custom configuration 20
  21. Apologies to Sonny & Cher… 21 The beat goes on…

  22. Elastic Stack (the current edition) • Ingest • Beats •

    Logstash • Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 22
  23. Packet capture: type • Currently Packetbeat has several options for

    traffic capturing: • pcap, which uses the libpcap library and works on most platforms, but it’s not the fastest option. • af_packet, which uses memory mapped sniffing. This option is faster than libpcap and doesn’t require a kernel module, but it’s Linux- specific. • pf_ring, which makes use of an ntop.org project. This setting provides the best sniffing speed, but it requires a kernel module, and it’s Linux-specific. 23
  24. Packet capture: protocols • dns • http • memcache •

    mysql • pgsql • redis • thrift • mongodb 24
  25. HTTP: ports • Capture one port: • ports: 80 •

    Capture multiple ports: • ports: [80, 8080, 8000, 5000, 8002] 25
  26. HTTP: send_headers / send_all_headers • Capture all headers: • send_all_headers:

    true • Capture only named headers: • send_headers: [ "host", "user-agent", "content- type", "referer" ] 26
  27. HTTP: hide_keywords • The names of the keyword parameters are

    case insensitive. • The values will be replaced with the 'xxxxx' string. This is useful for avoiding storing user passwords or other sensitive information. • Only query parameters and top level form parameters are replaced. • hide_keywords: ['pass', 'password', 'passwd'] 27
  28. Beats • Ingest (server-side) with Elasticsearch target 28 interfaces: device:

    eth0 type: af_packet http: ports: [80] send_all_headers: true output: elasticsearch: hosts: ["elasticsearch.example.com:9200"]
  29. Beats • Ingest (server-side) with Logstash target 29 interfaces: device:

    eth0 type: af_packet http: ports: [80] send_all_headers: true output: logstash: hosts: ["logstash.example.com:5044"] tls: certificate_authorities: ["/path/to/certificate.crt"]
  30. Why send to Logstash? • Enrich your data! • geoip

    • useragent • dns • grok • kv 30
  31. Logstash • Ingest Beats (Pre-formatted JSON) 31 input { beats

    { port => 5044 ssl => true ssl_certificate => "/path/to/certificate.crt" ssl_key => "/path/to/private.key" codec => "json" } }
  32. Logstash • Filters 32 filter { # Enrich HTTP Packetbeats

    if [type] == "http" and "packetbeat" in [tags] { geoip { source => "client_ip" } useragent { source => "[http][request_headers][user-agent]" target => "useragent" } } }
  33. Extended JSON output from Beats + Logstash 33 "@timestamp": "2016-01-20T21:40:53.300Z",

    "beat": { "hostname": "ip-172-31-46-141", "name": "ip-172-31-46-141" }, "bytes_in": 189, "bytes_out": 6910, "client_ip": "68.180.229.41", "client_port": 57739, "client_proc": "", "client_server": "", "count": 1, "direction": "in", "http": { "code": 200, "content_length": 6516, "phrase": "OK", "request_headers": { "accept": "*/*", "accept-encoding": "gzip", "host": "example.com"
  34. Extended JSON output from Beats + Logstash 34 "user-agent": "Mozilla/5.0

    (compatible; Yahoo! Slurp; http://help.yahoo.com help/us/ysearch/slurp)" }, "response_headers": { "connection": "keep-alive", "content-type": "application/rss+xml; charset=UTF-8", "date": "Wed, 20 Jan 2016 21:40:53 GMT", "etag": "\"8c0b25ce7ade4b79d5ccf1ebb656fa51\"", "last-modified": "Wed, 24 Jul 2013 20:31:04 GMT", "link": "<http://example.com/wp-json/>; rel=\"https://api.w.org/\"", "server": "nginx/1.4.6 (Ubuntu)", "transfer-encoding": "chunked", "x-powered-by": "PHP/5.5.9-1ubuntu4.14" } }, "ip": "172.31.46.141", "method": "GET", "params": "", "path": "/tag/redacted/feed/", "port": 80, "proc": "",
  35. Extended JSON output from Beats + Logstash 35 "query": "GET

    /tag/redacted/feed/", "responsetime": 278, "server": "", "status": "OK", "type": "http", "@version": "1", "host": "ip-172-31-46-941", "tags": [ "packetbeat" ], "geoip": { "ip": "68.180.229.41", "country_code2": "US", "country_code3": "USA", "country_name": "United States", "continent_code": "NA", "region_name": "CA", "city_name": "Sunnyvale", "postal_code": "94089", "latitude": 37.42490000000001, "longitude": -122.00739999999999,
  36. Extended JSON output from Beats + Logstash 36 "dma_code": 807,

    "area_code": 408, "timezone": "America/Los_Angeles", "real_region_name": "California", "location": [ -122.00739999999999, 37.42490000000001 ] }, "useragent": { "name": "Yahoo! Slurp", "os": "Other", "os_name": "Other", "device": "Spider" }
  37. Logstash + beats (pre-formatted JSON) • Pro • CPU cost

    dramatically reduced (Logstash side) • Simple configuration to capture everything. • Logstash not necessary! • Useful to enrich data: geoip, useragent, headers, etc. • Con • Cannot directly monitor SSL traffic • CPU cost (server side) scales with traffic volume. Might be higher for heavy traffic. • Uncaptured packet data is unrecoverable. 37
  38. Evolution? Is one path better than another? 38

  39. Evolution? Is one path better than another? 39 • Unstructured

    log data • Structured log data • Captured packet data
  40. Conclusions • There are a lot of ways to monitor

    your traffic and put the data into Elasticsearch. Not all of them require log files any more. • With many options, choose the ingest scenario that works for you. • There's also filebeat, topbeat, and several community contributed beats available. • Don't overlook enriching your data. There's a goldmine in there! 40
  41. ‹#› Questions? I'll be here all night…