Elastic
March 1, 2018
Logs, Metrics, and APM:
The Holy Trinity of Operations
Tanya Bragin
Senior Director, Product
Slide 2
Slide 2 text
You know us for searching logs...
Slide 3
Slide 3 text
You heard we’re good at metrics...
Slide 4
Slide 4 text
You just learned we added APM...
Slide 5
Slide 5 text
This talk explains how a “search”
technology expanded into storing
time series, numbers, etc..
Slide 6
Slide 6 text
6
Start with some definitions
Slide 7
Slide 7 text
logs:
records of events
Slide 8
Slide 8 text
metrics:
periodic numerical measurements
Slide 9
Slide 9 text
Logs vs Metrics
9
64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352
64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
For each event, print out what happened.
07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA
07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB
07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC
Every x minutes, measure the CPU load and print it out.
Metrics are periodic measurements of some KPIs
Slide 10
Slide 10 text
Structured logs and metrics analyzed together
10
Metric: CPU (avg, per interval)
Logs: response time (avg, per interval)
Slide 11
Slide 11 text
APM:
application performance monitoring
Slide 12
Slide 12 text
APM data looks a lot like logs and metrics
12
• APM agents look at:
• Transaction durations
• Application errors
• Send across text-heavy, rich metadata:
• Code path executions (“spans”)
• Code statements associated with the error
• Could also get metrics, e.g. in-app memory
usage
Slide 13
Slide 13 text
13
To store (and retrieve) data at scale...
Slide 14
Slide 14 text
• Elasticsearch primarily used for application search
• Lucene data structure: Inverted index
Elasticsearch beginnings
14
Circa 2010
Slide 15
Slide 15 text
• 2011: Logstash is open sourced, key part is “grok” for structuring logs
• 2012: Kibana is open-sourced; ELK is widely used to search structured logs and create operational dashboards
From Elasticsearch to ELK
15
~ 2010 to 2014
Slide 16
Slide 16 text
• 2011: Logstash is open sourced, key part is “grok” for structuring logs
• 2012: Kibana is open-sourced; ELK is widely used to search structured logs and create operational dashboards
• 2014: Kibana adds dashboarding; ELK stack gains prominence for log analytics
From Elasticsearch to ELK
16
~ 2010 to 2014
Slide 17
Slide 17 text
• 2010: Elasticsearch adds support for “fielddata” (column-oriented view of the data in memory)
Elasticsearch evolving to support analytics
17
~ 2010 to 2014
https://www.elastic.co/blog/elasticsearch-as-a-column-store
Slide 18
Slide 18 text
• 2010: Elasticsearch adds support for “fielddata” (column-oriented view of the data in memory)
• 2012: Lucene introduces off-heap columnar store for numbers (“doc values”)
• 2014: Elasticsearch 1.0 adds support for “doc values” (column store)
Elasticsearch evolving to support analytics
18
~ 2010 to 2014
https://www.elastic.co/blog/elasticsearch-as-a-column-store
Slide 19
Slide 19 text
• 2017: Elasticsearch 6.0 improves Lucene sparse values storage efficiency (41.5% in Metricbeat index size)
Elasticsearch storage efficiencies
19
2014 to Present
https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0
Slide 20
Slide 20 text
• 2015: Elasticsearch 2.0 more aggressive text compression (with DEFLATE)
Elasticsearch storage efficiencies
20
2014 to Present
https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
Slide 21
Slide 21 text
• 2016: Elasticsearch 5.0 adds more data structures for efficient storing and querying numbers (BKD Trees)
Elasticsearch storage efficiencies
21
2014 to Present
https://www.elastic.co/blog/lucene-points-6.0
1-Dimension 2-Dimensions
Slide 22
Slide 22 text
• 2016: Elasticsearch 5.0 adds more data structures for efficient storing and querying numbers (BKD Trees)
Elasticsearch query efficiencies
22
2014 to Present
1-Dimension 2-Dimensions
https://www.elastic.co/blog/lucene-points-6.0
Slide 23
Slide 23 text
• Speed up common queries and aggregations
• 2014: Per shard result cache
• 2016: Advanced query rewriting
Elasticsearch query efficiencies
23
2014 to Present
https://www.elastic.co/blog/instant-aggregations-rewriting-queries-for-fun-and-profit
Elasticsearch for search and numerical analytics
25
Inverted Index for full-text search Columnar store for structured data
BKD Trees for numerical operations Caches
shard-level request/result caches, filter cache, etc.
Slide 26
Slide 26 text
26
Elasticsearch for metrics = natural evolution
Slide 27
Slide 27 text
CENTRALIZED COLLECTION
Logstash
Elasticsearch
Transform
Store
ingest
node
data node
27
network
devices
DISTRIBUTED
COLLECTION
Beats
servers, containers
Elastic evolving ingest story
Slide 28
Slide 28 text
Immediate insights with modules
• Turnkey experience for specific data types
• Data to dashboard in just one step
• Automated parsing and enrichment
• Default dashboards, alerts, ML jobs
Logging Metrics Security
Available with
28
Slide 29
Slide 29 text
Logging modules
29
System
• Linux / MacOS
• Windows Events
Containers
• Docker
• Kubernetes
Infrastructure Applications
Databases
• MySQL
• PostgreSQL
Queues
• Kafka
• Redis
Web servers
• Apache
• Nginx
Audit data
• Filesystem
• System calls
WINLOGBEAT
FILEBEAT
AUDITBEA
T
32
Visualizing time series data
• Timelion
• Time Series Visual Builder
• Annotations
Slide 33
Slide 33 text
33
Visualizing time series data
• Timelion
• Time Series Visual Builder
• Annotations
Slide 34
Slide 34 text
34
Visualizing time series data
• Timelion
• Time Series Visual Builder
• Annotations
Slide 35
Slide 35 text
35
Elasticsearch for APM = natural evolution
Slide 36
Slide 36 text
• First open-source alternative to traditional APM tools
• Focused on underserved areas by traditional vendors
• Active roadmap to expand programming languages
Elastic APM
36
APM adds end-user experience and application-level monitoring to the stack
Slide 37
Slide 37 text
Elastic APM
37
How it works
Kibana
Beats
Logstash Elasticsearch
APM Server
APM Agents
Logs
Metrics
Packets
...
Datastore JMX
Slide 38
Slide 38 text
• Opbeat migrated from combination of Cassandra and Redis to Elasticsearch
• Much of the data that was pre-aggregated before is now stored as raw document in Elasticsearch
• Ad-hoc querying flexibility for the user
• New feature development agility for engineering
Elasticsearch as APM datastore
38
The Journey
Slide 39
Slide 39 text
39
Benefits of Logs, Metrics, APM in one stack
Slide 40
Slide 40 text
40
Unified Dashboards
Same UI for KPI summaries and root cause analysis
Slide 41
Slide 41 text
41
Unified Machine Learning
Correlate multiple data sources for more intelligent anomaly detection
Slide 42
Slide 42 text
42
Unified Alerting
Trigger off any operational data to provide unified SLA monitoring
Slide 43
Slide 43 text
43
Operational gains
Single technology for operational data saves on administrative costs
Slide 44
Slide 44 text
DEMO
Slide 45
Slide 45 text
45
Roadmap for Operational Analytics
Slide 46
Slide 46 text
• New Beats and Logstash inputs and modules
• Improved dashboards and ML jobs / alerts for existing modules
• Agentless shippers
• Distributed tracing
New operational data sources
46
It all starts with the data
Slide 47
Slide 47 text
• Correlate data from different sources
• Ability to re-use analysis content
• Ability to re-use Elastic-provided content
Elastic Common Schema
47
Benefits
• Preliminary review
• Working closely with the community
• Will provide more information via usual channels
Status
Slide 48
Slide 48 text
48
Rollup support
• Caveat: Lose ability to query
individual events on rolled-up data
• Recommended for long retention
use cases, such as capacity
planning
• Can accomplish this today with
Watcher-enabled rollups
• Built-in rollup support in active
development
Storage efficiency
https://www.elastic.co/blog/disk-based-field-data-a-k-a-doc-values
https://www.elastic.co/blog/elasticsearch-storage-the-true-story
https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
https://www.elastic.co/blog/lucene-points-6.0
https://www.elastic.co/blog/apache-lucene-numeric-filters
https://www.elastic.co/blog/searching-numb3rs-in-5.0
https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene
https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0
Query efficiency
https://www.elastic.co/blog/instant-aggregations-rewriting-queries-for-fun-and-profit
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
Customer stories
https://www.elastic.co/blog/elasticsearch-as-a-column-store
Where to learn more...
52
References for the curious
Slide 53
Slide 53 text
• Instrument newer projects built on new frameworks and technologies
• For legacy projects, start with unifying most important KPIs and events
• During re-architecture efforts, consider consolidating datastores / tools
How do I get started?
53
Practical initial deployment and migration strategies