Correlating Metrics and Logs

1 Elastic March 8, 2017 @tbragin Correlating Metrics and Logs
Tanya Bragin, Dir. Product Management

2 Logs Metrics

3 Logs Metrics

4 Definitions

5 Oxford Dictionary { } logs: records of incidents or
observations

6 Oxford Dictionary { } metrics: a set of figures
or statistics that measure results

7 Logs vs Metrics 7 64.242.88.10 - - [07/Mar/2017:16:10:02 -0800]
"GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. 07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 Every x minutes, measure the CPU load and print it out. Logs are records of discrete events, if an when they happen Metrics are periodic measurements of some KPIs

8 Logs and Metrics are both “time series” 8 07/Mar/2017
16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50

9 Unified analysis for all time series events 9 You
can aggregate log information into “time series” Metric: CPU (avg, per interval) Logs: # events (count, per interval)

10 Unified analysis for all time series events 10 You
can aggregate log information into “time series” Metric: CPU (avg, per interval) Logs: # events (count, per interval) Metric: CPU (avg, per interval) Logs: response time (avg, per interval)

11 DEMO TIME

12 • Single pane of glass for Monitor -> Troubleshoot
-> Root-Cause-Analysis • Machine learning and correlation on both types of data • Unified analytics and dashboards Correlate logs and metric data in one UI 12 what you gain

13 • Manage a single ingest pipeline • Manage a
single datastore Operational efficients are significant 13 what you gain

14 Storage and Analysis

15 Elasticsearch is a great datastore for metrics 15 https://www.elastic.co/blog/searching-numb3rs-in-5.0
• BKD Trees • 71% faster at index time • 66% less disk usage • 85% less memory usage • New data types • Half float • Scaled float

16 Metricbeat 16 • One Beat collects from many services
• Periodic poling and predefined data structure • Ships with several modules or build your own ‒ System (replaces Topbeat) ‒ Apache ‒ MySQL ‒ PostgreSQL ‒ Nginx ‒ Redis ‒ Zookeeper ‒ MongoDB

17 Filebeat modules 17 • Tails a file • Parses
common formats using Ingest Node • Ships with several modules or build your own ‒ System ‒ Apache ‒ MySQL ‒ Nginx

18 Time Series Visual Builder 18 •Works on top of
pipeline aggregations •Visual way of combining aggregations into charts

19 Timelion 19 Flexible and extensible query language for ad-hoc
time-series analytics

20 20 Typical Logging+Metrics Deployment Beats Logstash Elasticsearch Kibana X-Pack
X-Pack Nodes (X) Instances (X) Master Nodes (3) Ingest Nodes (X) Data Nodes – Hot (X) Data Notes – Warm (X) Filebeat: Log Files Metricbeat: Metrics Packetbeat: Wire Data your{beat} Data Collection ETL Storage Visualization

21 Logs+Metrics Use Cases

22 • Use Metricbeat to collect CPU+Memory metrics • Use
Filebeat to collect operating system logs • Use Packetbeat to sniff network traffic • Use Kibana to automatically visualize and correlate this data IT Infrastructure Monitoring 22 Collect system health and performance data Example: Walgreens

23 • Collect metrics and logs at billions of events
per day • Persist data in 6 globally-distributed data centers • Thousands of developers using centralized Kibana+Tribe instance Application Monitoring 23 Collect custom application telemetry and logs Example: Blizzard Entertainment

24 • Monitor power, cooling, temperature, weather data • 6-12
months lookback • 150B documents online IOT Monitoring 24 Monitor a large set of low-power “edge devices” Example: National Energy Research Scientific Computing (NERSC) Center

25 • Collect logs and metrics using edge devices •
Deploying in over 100 locations in 70 countries • Data used for threat analytics IOT Monitoring 25 Monitor a large set of low-power “edge devices” Example: Nature Conservancy

26 • You may already be doing it! • Don’t
boil the ocean • Add only the metrics you need to your existing logging system (and vice versa) • Leverage off-the-shelf functionality to get started quickly (e.g. Metricbeat, Filebeat) How to get started 26

27 Questions? Visit us at the AMA

28 www.elastic.c o

Correlating Metrics and Logs

Correlating Metrics and Logs

Elastic Co

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript

1 Elastic March 8, 2017 @tbragin Correlating Metrics and Logs

2 Logs Metrics

3 Logs Metrics

4 Definitions

5 Oxford Dictionary { } logs: records of incidents or

6 Oxford Dictionary { } metrics: a set of figures

7 Logs vs Metrics 7 64.242.88.10 - - [07/Mar/2017:16:10:02 -0800]

8 Logs and Metrics are both “time series” 8 07/Mar/2017

9 Unified analysis for all time series events 9 You

10 Unified analysis for all time series events 10 You

11 DEMO TIME

12 • Single pane of glass for Monitor -> Troubleshoot

13 • Manage a single ingest pipeline • Manage a

14 Storage and Analysis

15 Elasticsearch is a great datastore for metrics 15 https://www.elastic.co/blog/searching-numb3rs-in-5.0

16 Metricbeat 16 • One Beat collects from many services

17 Filebeat modules 17 • Tails a file • Parses

18 Time Series Visual Builder 18 •Works on top of

19 Timelion 19 Flexible and extensible query language for ad-hoc

20 20 Typical Logging+Metrics Deployment Beats Logstash Elasticsearch Kibana X-Pack

21 Logs+Metrics Use Cases

22 • Use Metricbeat to collect CPU+Memory metrics • Use

23 • Collect metrics and logs at billions of events

24 • Monitor power, cooling, temperature, weather data • 6-12

25 • Collect logs and metrics using edge devices •

26 • You may already be doing it! • Don’t

27 Questions? Visit us at the AMA

28 www.elastic.c o