Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Practice and Evolution of HTTP Access Monitoring

The Practice and Evolution of HTTP Access Monitoring

A history of HTTP access monitoring.

An overview of the tools used in the past, and tools available today, which dramatically improve the visibility and value of your traffic data.

Aaron Mildenstein

July 13, 2016
Tweet

More Decks by Aaron Mildenstein

Other Decks in Programming

Transcript

  1. Apache HTTP Server • Logs! • LogFormat "%h %l %u

    %t \"%r\" %>s %b" common
 CustomLog logs/access_log common • 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 • Tools • cat • tail • grep 3
  2. tail • What's the difference between: • tail -f •

    tail -F • What does the -n flag do? • What does the -v flag do? • What does the --pid flag do? 6 Let's put a tail on that cat…
  3. grep • Flags, flags, flags! • No seriously • I'm

    not going to attempt to describe all the things you can do with grep. • No, really, it's time to move on to other examples. • Okay, fine, just one thing. • cat access.log | grep 404 | tail • See what I did there? • Are you happy now? 7 g/re/p (globally search a regular expression and print)
  4. Elastic Stack (the early edition) • Ingest • Logstash •

    Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 11
  5. Logstash • Ingest 12 input { file { path =>

    "/path/to/access_log" } } message => "127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326"
  6. Logstash • Ingest, then Tokenize 13 filter { grok {

    match => { "message" => "%{COMMONAPACHELOG}" } } }
  7. Logstash • grok 14 clientip => "127.0.0.1" ident => "-"

    auth => "frank" timestamp => "10/Oct/2000:13:55:36 -0700" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326
  8. Logstash + grok • Pro • Simple! • No changes

    to HTTP server configuration needed • Common to many HTTP servers • Con • CPU cost to parse everything • Still have to convert the date • Adding anything custom requires re-tooling your grok 15
  9. Apache HTTP Server • Logs, part 2! • CustomLog logs/json_access_log

    ls_apache_json 17 LogFormat "{\"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \"@version\": \"1\", \"vips\":[\"vip.example.com\"], \ \"clientip\": \"%a\", \"duration\": %D, \ \"status\": %>s, \"request\": \"%U%q\", \ \"urlpath\": \"%U\", \"urlquery\": \"%q\", \ \"bytes\": %B, \"verb\": \"%m\", \ \"referer\": \"%{Referer}i\", \ \"useragent\": \"%{User-agent}i\"}" ls_apache_json
  10. Logstash • Pre-formatted JSON Ingest 18 input { file {

    path => "/path/to/json_access_log" codec => "json" } }
  11. Logstash • without grok 19 clientip => "127.0.0.1" @timestamp =>

    "2000-10-10T20:55:36.000Z" verb => "GET" request => "/apache_pb.gif" httpversion => 1.0 response => 200 bytes => 2326 duration => 123 referer => "…" useragent => "…" …
  12. Logstash + pre-formatted JSON • Pro • CPU cost dramatically

    reduced • Can add/remove fields without having to edit Logstash • Can add complex fields that would be harder to grok • Con • Not all HTTP servers can do this • Tedious to push changes to lots of servers • Custom fields (like vip names) require custom configuration 20
  13. Elastic Stack (the current edition) • Ingest • Beats •

    Logstash • Store, Search, and Analyze • Elasticsearch • Visualize • Kibana 22
  14. Packet capture: type • Currently Packetbeat has several options for

    traffic capturing: • pcap, which uses the libpcap library and works on most platforms, but it’s not the fastest option. • af_packet, which uses memory mapped sniffing. This option is faster than libpcap and doesn’t require a kernel module, but it’s Linux- specific. • pf_ring, which makes use of an ntop.org project. This setting provides the best sniffing speed, but it requires a kernel module, and it’s Linux-specific. 23
  15. Packet capture: protocols • dns • http • memcache •

    mysql • pgsql • redis • thrift • mongodb 24
  16. HTTP: ports • Capture one port: • ports: 80 •

    Capture multiple ports: • ports: [80, 8080, 8000, 5000, 8002] 25
  17. HTTP: send_headers / send_all_headers • Capture all headers: • send_all_headers:

    true • Capture only named headers: • send_headers: [ "host", "user-agent", "content- type", "referer" ] 26
  18. HTTP: hide_keywords • The names of the keyword parameters are

    case insensitive. • The values will be replaced with the 'xxxxx' string. This is useful for avoiding storing user passwords or other sensitive information. • Only query parameters and top level form parameters are replaced. • hide_keywords: ['pass', 'password', 'passwd'] 27
  19. Beats • Ingest (server-side) with Elasticsearch target 28 interfaces: device:

    eth0 type: af_packet http: ports: [80] send_all_headers: true output: elasticsearch: hosts: ["elasticsearch.example.com:9200"]
  20. Beats • Ingest (server-side) with Logstash target 29 interfaces: device:

    eth0 type: af_packet http: ports: [80] send_all_headers: true output: logstash: hosts: ["logstash.example.com:5044"] tls: certificate_authorities: ["/path/to/certificate.crt"]
  21. Why send to Logstash? • Enrich your data! • geoip

    • useragent • dns • grok • kv 30
  22. Logstash • Ingest Beats (Pre-formatted JSON) 31 input { beats

    { port => 5044 ssl => true ssl_certificate => "/path/to/certificate.crt" ssl_key => "/path/to/private.key" codec => "json" } }
  23. Logstash • Filters 32 filter { # Enrich HTTP Packetbeats

    if [type] == "http" and "packetbeat" in [tags] { geoip { source => "client_ip" } useragent { source => "[http][request_headers][user-agent]" target => "useragent" } } }
  24. Extended JSON output from Beats + Logstash 33 "@timestamp": "2016-01-20T21:40:53.300Z",

    "beat": { "hostname": "ip-172-31-46-141", "name": "ip-172-31-46-141" }, "bytes_in": 189, "bytes_out": 6910, "client_ip": "68.180.229.41", "client_port": 57739, "client_proc": "", "client_server": "", "count": 1, "direction": "in", "http": { "code": 200, "content_length": 6516, "phrase": "OK", "request_headers": { "accept": "*/*", "accept-encoding": "gzip", "host": "example.com"
  25. Extended JSON output from Beats + Logstash 34 "user-agent": "Mozilla/5.0

    (compatible; Yahoo! Slurp; http://help.yahoo.com help/us/ysearch/slurp)" }, "response_headers": { "connection": "keep-alive", "content-type": "application/rss+xml; charset=UTF-8", "date": "Wed, 20 Jan 2016 21:40:53 GMT", "etag": "\"8c0b25ce7ade4b79d5ccf1ebb656fa51\"", "last-modified": "Wed, 24 Jul 2013 20:31:04 GMT", "link": "<http://example.com/wp-json/>; rel=\"https://api.w.org/\"", "server": "nginx/1.4.6 (Ubuntu)", "transfer-encoding": "chunked", "x-powered-by": "PHP/5.5.9-1ubuntu4.14" } }, "ip": "172.31.46.141", "method": "GET", "params": "", "path": "/tag/redacted/feed/", "port": 80, "proc": "",
  26. Extended JSON output from Beats + Logstash 35 "query": "GET

    /tag/redacted/feed/", "responsetime": 278, "server": "", "status": "OK", "type": "http", "@version": "1", "host": "ip-172-31-46-941", "tags": [ "packetbeat" ], "geoip": { "ip": "68.180.229.41", "country_code2": "US", "country_code3": "USA", "country_name": "United States", "continent_code": "NA", "region_name": "CA", "city_name": "Sunnyvale", "postal_code": "94089", "latitude": 37.42490000000001, "longitude": -122.00739999999999,
  27. Extended JSON output from Beats + Logstash 36 "dma_code": 807,

    "area_code": 408, "timezone": "America/Los_Angeles", "real_region_name": "California", "location": [ -122.00739999999999, 37.42490000000001 ] }, "useragent": { "name": "Yahoo! Slurp", "os": "Other", "os_name": "Other", "device": "Spider" }
  28. Logstash + beats (pre-formatted JSON) • Pro • CPU cost

    dramatically reduced (Logstash side) • Simple configuration to capture everything. • Logstash not necessary! • Useful to enrich data: geoip, useragent, headers, etc. • Con • Cannot directly monitor SSL traffic • CPU cost (server side) scales with traffic volume. Might be higher for heavy traffic. • Uncaptured packet data is unrecoverable. 37
  29. Evolution? Is one path better than another? 39 • Unstructured

    log data • Structured log data • Captured packet data
  30. Conclusions • There are a lot of ways to monitor

    your traffic and put the data into Elasticsearch. Not all of them require log files any more. • With many options, choose the ingest scenario that works for you. • There's also filebeat, topbeat, and several community contributed beats available. • Don't overlook enriching your data. There's a goldmine in there! 40