Upgrade to Pro — share decks privately, control downloads, hide ads and more …

You know, for pings!

Elastic Co
August 04, 2015

You know, for pings!

Elastic engineer Joshua Rich shows us the quality of his ISP and walks us through implementing the monitoring of pings across your network with the ELK stack.

He also touches on his open sourced efforts to develop a Beats plugin, Pingbeat, to make this even easier.

This talk was presented at the August Brisbane (Australia) Devops Meetup - http://www.meetup.com/Devops-Brisbane/events/224090775/

Elastic Co

August 04, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 3 How does an ICMP ping actually work?
  2. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 1. Source creates an ICMP echo-request and sends this to the target. a. Contains an identifier and sequence number to keep track of this specific ping request 2. Source records the timestamp of when the echo-request was sent. 3. Target receives the source echo-request and creates their own ICMP echo-reply, sending this back to the source. a. Contains the identifier and sequence number in addition to a timestamp of when the message was sent back. 4. Source receives the echo-reply and calculates Round-Trip Time (RTT) based on recorded timestamps. What happens if the target doesn’t respond? • Requests are retried after a configured timeout period. • After configured number of retries, source gives up and records packet loss. 4 Ping theory
  3. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 5 ICMP echo-request in Wireshark
  4. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 6 ICMP echo-reply in Wireshark
  5. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited • If ICMP pings are blocked we can fake a ping with TCP. • Make a HTTP (or connection over whatever is open) to the target and measure the time to complete the TCP three-way handshake. How does the three-way handshake work? 1. Source creates a TCP SYN packet and sends this to the target. a. Sets a sequence number to a random value. 2. Target receives the SYN, creates a SYN-ACK packet and sends this back to the source. a. Sets a acknowledgment number to the sequence number+1 3. Source receives the SYN-ACK, creates an ACK packet and sends this back to the target. 4. RTT is (roughly) time between sending first SYN and sending final ACK. • An ICMP ping and a “TCP ping” aren’t necessarily comparable... echoping -v -h / -R google.com.au TCP-Estimated RTT: 0.1030 seconds (103 milliseconds) fping -q -C1 -B1 -r1 google.com.au google.com.au : 14.87 (milliseconds) 7 Faking ping
  6. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 8 TCP “Ping” in Wireshark Cool fact: the Linux kernel calculates an RTT for you!
  7. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 9 Implementation (or let the horrible hack begin)
  8. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 10 Inspiration Smokeping - the venerable goto network monitor in NOCs...
  9. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 11 Behind the curtain For ICMP: • Using fping: http://fping.org/ ◦ A fast implementation that supports pinging multiple hosts. ◦ Simple output layout (read: easy to grok in Logstash) For TCP: • Using echoping: https://github.com/bortzmeyer/echoping ◦ Crusty tool that supports a few more probing methods than fping ◦ Simple layout
  10. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 12 input { exec { command => "/usr/sbin/fping -q -C1 -B1 -r1 < /etc/logstash/fping-google.conf 2>&1" interval => 10 type => "fping" tags => [ "fping-google", "fping" ] } } filter { if "fping" in [tags] { split { } grok { match => { "message" => "%{IPORHOST:target_host}\s+:\s+%{GREEDYDATA:rtt}" } } if [reponse] == "-" { drop { } } else { mutate { convert => { "rtt" => "float" } } } } } FPing Input & Filter Example
  11. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited FPing document in Elasticsearch { "_index": "logstash-2015.07.12", "_type": "fping", "_id": "AU6AEuIM_njdtBL-QDtd", "_score": 1, "_source": { "message": "google.com : 8.39", "@version": "1", "@timestamp": "2015-07-12T02:23:17.080Z", "type": "fping", "tags": [ "fping-google", "fping" ], "host": "proliant", "command": "/usr/sbin/fping -q -C1 -B1 -r1 < /etc/logstash/fping-google.conf 2>&1", "target_host": "google.com", "rtt": 8.39 } } 13
  12. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited input { exec { command => "/usr/bin/echoping -v -h / -R app.asana.com | /usr/bin/grep -E '^TCP-Estimated RTT'" interval => 10 type => "echoping" tags => [ "echopinghttp-app.asana.com", "echopinghttp" ] add_field => { "target_host" => "app.asana.com" } } } filter { if "echopinghttp" in [tags] { split { } grok { break_on_match => false match => { "message" => "TCP-Estimated RTT: %{NUMBER:rtt}" "target_host" => "%{IPORHOST:target_host}(:%{NUMBER:target_port})?" } overwrite => [ "target_host" ] } ruby { code => "event['rtt'] = Float(event['rtt'])*1000" } mutate { convert => { "target_port" => "integer" } } } } 14 Echoping Input & Filter Example
  13. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited { "_index": "logstash-2015.07.15", "_type": "echoping", "_id": "AU6PhqT9_njdtBL-lj6Y", "_score": 6.1473045, "_source": { "message": "TCP-Estimated RTT: 0.2556 seconds (std. deviation 0.096)", "@version": "1", "@timestamp": "2015-07-15T02:24:01.876Z", "type": "echoping", "tags": [ "echopinghttp-app.asana.com", "echopinghttp" ], "host": "proliant", "command": "/usr/bin/echoping -v -h / -R app.asana.com | /usr/bin/grep -E '^TCP-Estimated RTT'", "target_host": "app.asana.com", "rtt": 255.6 } } 15 Echoping document in Elasticsearch
  14. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 18 Side note #1: smoothed lines option Smoothed lines can be misleading. Line-fitting is a delicate art… See also Kibana issue #4215
  15. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 19 Side note #2: WTF is this sh$t Exetel?!
  16. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 20 Exetel STAHP PLS KTHXBYE (actually standard AU ISP contention, but still...)
  17. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 21 Graphs are nice to look at, but I want to be notified when the latency starts to creep up… I want to know whether the RTT for any Google host has gone above 15ms in the last 10 minutes (and I don’t want to be spammed by alerts).
  18. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 22 Watcher to the rescue • Enter Watcher, a plugin for Elasticsearch that provides alerting and notification based on changes in your data. • Define a watch that looks into your data and acts upon it, based on certain conditions. • At a high-level, a typical watch is built from four simple building blocks: ◦ Schedule ▪ Define the schedule on which to trigger the query and check the condition. ◦ Query ▪ Specify the query to run as input to the condition. Watcher supports the full Elasticsearch query language, including aggregations. ◦ Condition ▪ Define your condition to determine whether to execute the actions. You can use simple conditions (always true), or use scripting for more sophisticated scenarios. ◦ Actions ▪ Define one or more actions, such as sending email, pushing data to 3rd party systems via webhook, or indexing the results of your query. • Additionally, we can throttle and/or set a timeout on a watch
  19. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 23 Watcher Example: Google RTT - Schedule, Throttle and Query ... "trigger": { "schedule": { "interval": "10m" } } # run this watch every 10 minutes ... "throttle_period": "30m", # if triggered, don’t trigger again for 30 minutes ... "indices": [ "logstash-*" ], # search on these indices ... "bool": { "must": [ { "range": { "@timestamp": { "gte": "now-10m", "lte": "now" } } }, { "term": { "tags": "google" } } ], "must_not": [ { "term": { "target_host.raw": "smtp.gmail.com" } } ] } ... # bucket on date, then hosts and calculate the average rtt for each host bucket "aggs": { "minutes": { "date_histogram": { "field": "@timestamp", "interval": "minute" }, "aggs": { "targets": { "terms": { "field": "target_host.raw", "size": 10 }, "aggs": { "avg_rtt": { "avg": { "field": "rtt" } ...
  20. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 24 ... # if the latest average RTT for this host is over 15ms... "condition": { "script": "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def target = latest.targets.buckets [0]; return target && target.avg_rtt && target.avg_rtt.value >= 15;" }, # send an email listing the hosts where the RTT > 15ms to me "actions": { "send_email": { "transform": { "script": "def latest = ctx.payload.aggregations.minutes.buckets[-1]; return latest.targets.buckets.findAll { return it.avg_rtt && it.avg_rtt.value >= 15 };" }, "email": { "profile": "standard", "to": [ "[email protected]" ], "subject": "Watcher - High Google RTT", "body": { "text": "Google hosts with high RTT (above 15ms):\n\n{{#ctx.payload. _value}}\"{{key}}\" - RTT: {{avg_rtt.value}}ms\n{{/ctx.payload._value}}" } } } } Watcher Example: Google RTT - Condition and Action
  21. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 25 Watcher Example: Google RTT - Results
  22. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 26 So the end result? A visual archive of the results as well as proactive monitoring. But we can do better.
  23. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 27 Pingbeat A lightweight network monitoring probe (based on)
  24. • Complement the packetbeat high-level application protocol analysis with more

    low-level network protocol metrics. • Written in Go. • YAML based config. • ICMP, simple TCP(?), DNS timing probes. • Based on libbeats (same library used by packetbeat). ◦ Supports any outputs that libbeats supports (currently Elasticsearch, Redis and file outputs). • Ideally install in many places in your network to get a world-view of latency across the network.
  25. www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written

    permission is strictly prohibited 29 Watch this space: github.com/joshuar/pingbeat