Slide 1

Slide 1 text

Elastic March 1, 2018 Logs, Metrics, and APM: The Holy Trinity of Operations Tanya Bragin Senior Director, Product

Slide 2

Slide 2 text

You know us for searching logs...

Slide 3

Slide 3 text

You heard we’re good at metrics...

Slide 4

Slide 4 text

You just learned we added APM...

Slide 5

Slide 5 text

This talk explains how a “search” technology expanded into storing time series, numbers, etc..

Slide 6

Slide 6 text

6 Start with some definitions

Slide 7

Slide 7 text

logs: records of events

Slide 8

Slide 8 text

metrics: periodic numerical measurements

Slide 9

Slide 9 text

Logs vs Metrics 9 64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. 07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA 07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB 07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC Every x minutes, measure the CPU load and print it out. Metrics are periodic measurements of some KPIs

Slide 10

Slide 10 text

Structured logs and metrics analyzed together 10 Metric: CPU (avg, per interval) Logs: response time (avg, per interval)

Slide 11

Slide 11 text

APM: application performance monitoring

Slide 12

Slide 12 text

APM data looks a lot like logs and metrics 12 • APM agents look at: • Transaction durations • Application errors • Send across text-heavy, rich metadata: • Code path executions (“spans”) • Code statements associated with the error • Could also get metrics, e.g. in-app memory usage

Slide 13

Slide 13 text

13 To store (and retrieve) data at scale...

Slide 14

Slide 14 text

• Elasticsearch primarily used for application search • Lucene data structure: Inverted index Elasticsearch beginnings 14 Circa 2010

Slide 15

Slide 15 text

• 2011: Logstash is open sourced, key part is “grok” for structuring logs • 2012: Kibana is open-sourced; ELK is widely used to search structured logs and create operational dashboards From Elasticsearch to ELK 15 ~ 2010 to 2014

Slide 16

Slide 16 text

• 2011: Logstash is open sourced, key part is “grok” for structuring logs • 2012: Kibana is open-sourced; ELK is widely used to search structured logs and create operational dashboards • 2014: Kibana adds dashboarding; ELK stack gains prominence for log analytics From Elasticsearch to ELK 16 ~ 2010 to 2014

Slide 17

Slide 17 text

• 2010: Elasticsearch adds support for “fielddata” (column-oriented view of the data in memory) Elasticsearch evolving to support analytics 17 ~ 2010 to 2014 https://www.elastic.co/blog/elasticsearch-as-a-column-store

Slide 18

Slide 18 text

• 2010: Elasticsearch adds support for “fielddata” (column-oriented view of the data in memory) • 2012: Lucene introduces off-heap columnar store for numbers (“doc values”) • 2014: Elasticsearch 1.0 adds support for “doc values” (column store) Elasticsearch evolving to support analytics 18 ~ 2010 to 2014 https://www.elastic.co/blog/elasticsearch-as-a-column-store

Slide 19

Slide 19 text

• 2017: Elasticsearch 6.0 improves Lucene sparse values storage efficiency (41.5% in Metricbeat index size) Elasticsearch storage efficiencies 19 2014 to Present https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0

Slide 20

Slide 20 text

• 2015: Elasticsearch 2.0 more aggressive text compression (with DEFLATE) Elasticsearch storage efficiencies 20 2014 to Present https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

Slide 21

Slide 21 text

• 2016: Elasticsearch 5.0 adds more data structures for efficient storing and querying numbers (BKD Trees) Elasticsearch storage efficiencies 21 2014 to Present https://www.elastic.co/blog/lucene-points-6.0 1-Dimension 2-Dimensions

Slide 22

Slide 22 text

• 2016: Elasticsearch 5.0 adds more data structures for efficient storing and querying numbers (BKD Trees) Elasticsearch query efficiencies 22 2014 to Present 1-Dimension 2-Dimensions https://www.elastic.co/blog/lucene-points-6.0

Slide 23

Slide 23 text

• Speed up common queries and aggregations • 2014: Per shard result cache • 2016: Advanced query rewriting Elasticsearch query efficiencies 23 2014 to Present https://www.elastic.co/blog/instant-aggregations-rewriting-queries-for-fun-and-profit

Slide 24

Slide 24 text

• Reduce memory usage of complex filters • 201?: Filter cache • 2015: Roaring bitmaps Elasticsearch query efficiencies 24 2014 to Present https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

Slide 25

Slide 25 text

Elasticsearch for search and numerical analytics 25 Inverted Index for full-text search Columnar store for structured data BKD Trees for numerical operations Caches shard-level request/result caches, filter cache, etc.

Slide 26

Slide 26 text

26 Elasticsearch for metrics = natural evolution

Slide 27

Slide 27 text

CENTRALIZED COLLECTION Logstash Elasticsearch Transform Store ingest node data node 27 network devices DISTRIBUTED COLLECTION Beats servers, containers Elastic evolving ingest story

Slide 28

Slide 28 text

Immediate insights with modules • Turnkey experience for specific data types • Data to dashboard in just one step • Automated parsing and enrichment • Default dashboards, alerts, ML jobs Logging Metrics Security Available with 28

Slide 29

Slide 29 text

Logging modules 29 System • Linux / MacOS • Windows Events Containers • Docker • Kubernetes Infrastructure Applications Databases • MySQL • PostgreSQL Queues • Kafka • Redis Web servers • Apache • Nginx Audit data • Filesystem • System calls WINLOGBEAT FILEBEAT AUDITBEA T

Slide 30

Slide 30 text

Metrics modules 30 System • Linux • MacOS • Windows • Perfmon Infrastructure Cloud • AWS • GCP • Azure • DigitalOcean Containers • Docker • Kubernetes Virtualization • vSphere PACKETBEAT METRICBEAT Network • Netflow • Packets • TLS Envelope Storage • Ceph LOGSTASH

Slide 31

Slide 31 text

Metrics modules 31 Applications Datastores • MySQL • PostgreSQL • MongoDB • Couchbase • Aerospike • Graphite Web servers • Apache • Nginx Other • HAProxy • Zookeeper Queues • Kafka • Redis • RabbitMQ Caches • Memcached Uptime • Heartbeat Custom apps • JMX/Jolokia • PHP-FPM • Golang PACKETBEAT METRICBEAT LOGSTASH HEARTBEAT

Slide 32

Slide 32 text

32 Visualizing time series data • Timelion • Time Series Visual Builder • Annotations

Slide 33

Slide 33 text

33 Visualizing time series data • Timelion • Time Series Visual Builder • Annotations

Slide 34

Slide 34 text

34 Visualizing time series data • Timelion • Time Series Visual Builder • Annotations

Slide 35

Slide 35 text

35 Elasticsearch for APM = natural evolution

Slide 36

Slide 36 text

• First open-source alternative to traditional APM tools • Focused on underserved areas by traditional vendors • Active roadmap to expand programming languages Elastic APM 36 APM adds end-user experience and application-level monitoring to the stack

Slide 37

Slide 37 text

Elastic APM 37 How it works Kibana Beats Logstash Elasticsearch APM Server APM Agents Logs Metrics Packets ... Datastore JMX

Slide 38

Slide 38 text

• Opbeat migrated from combination of Cassandra and Redis to Elasticsearch • Much of the data that was pre-aggregated before is now stored as raw document in Elasticsearch • Ad-hoc querying flexibility for the user • New feature development agility for engineering Elasticsearch as APM datastore 38 The Journey

Slide 39

Slide 39 text

39 Benefits of Logs, Metrics, APM in one stack

Slide 40

Slide 40 text

40 Unified Dashboards Same UI for KPI summaries and root cause analysis

Slide 41

Slide 41 text

41 Unified Machine Learning Correlate multiple data sources for more intelligent anomaly detection

Slide 42

Slide 42 text

42 Unified Alerting Trigger off any operational data to provide unified SLA monitoring

Slide 43

Slide 43 text

43 Operational gains Single technology for operational data saves on administrative costs

Slide 44

Slide 44 text

DEMO

Slide 45

Slide 45 text

45 Roadmap for Operational Analytics

Slide 46

Slide 46 text

• New Beats and Logstash inputs and modules • Improved dashboards and ML jobs / alerts for existing modules • Agentless shippers • Distributed tracing New operational data sources 46 It all starts with the data

Slide 47

Slide 47 text

• Correlate data from different sources • Ability to re-use analysis content • Ability to re-use Elastic-provided content Elastic Common Schema 47 Benefits • Preliminary review • Working closely with the community • Will provide more information via usual channels Status

Slide 48

Slide 48 text

48 Rollup support • Caveat: Lose ability to query individual events on rolled-up data • Recommended for long retention use cases, such as capacity planning • Can accomplish this today with Watcher-enabled rollups • Built-in rollup support in active development

Slide 49

Slide 49 text

• Infra monitoring UI • Log tailing UI Use-case focused UIs 49 Benefits

Slide 50

Slide 50 text

• Infra monitoring UI • Log tailing UI Use-case focused UIs 50 Benefits

Slide 51

Slide 51 text

51 Now what?

Slide 52

Slide 52 text

Storage efficiency https://www.elastic.co/blog/disk-based-field-data-a-k-a-doc-values https://www.elastic.co/blog/elasticsearch-storage-the-true-story https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch https://www.elastic.co/blog/lucene-points-6.0 https://www.elastic.co/blog/apache-lucene-numeric-filters https://www.elastic.co/blog/searching-numb3rs-in-5.0 https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0 Query efficiency https://www.elastic.co/blog/instant-aggregations-rewriting-queries-for-fun-and-profit https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps Customer stories https://www.elastic.co/blog/elasticsearch-as-a-column-store Where to learn more... 52 References for the curious

Slide 53

Slide 53 text

• Instrument newer projects built on new frameworks and technologies • For legacy projects, start with unifying most important KPIs and events • During re-architecture efforts, consider consolidating datastores / tools How do I get started? 53 Practical initial deployment and migration strategies

Slide 54

Slide 54 text

54 More Questions? Visit us at the AMA

Slide 55

Slide 55 text

www.elastic.c o