Elastic Stack 2.x News

‹#› Pablo Musa March 2016, @pablitomusa Elastic Stack 2.X News

Elastic Stack 2 Elastic Cloud Security Monitoring Alerting Graph X-Pack
Kibana User Interface Elasticsearch Logstash Beats Store, Index, and Analyze Ingest

Query DSL Update and Optimization • Elasticsearch 2.x intelligently executes
queries • no need to write queries "the best way" • Examples: • For "conjunction" queries (2 or more"match" queries in a must section) • Sub-queries are sorted by term frequency • Executed lowest to highest by term frequency • For complex queries ("match_phrase" for instance) a 2-Phase execution strategy is used • Approximation Phase • Verification Phase 4

Query DSL Update and Optimization • The "Query Cache" will
cache filter parts if they appear enough times ‒ Complex queries - 2 executions in the last 256 queries ‒ Typical queries - 5 executions in the last 256 queries ‒ Simple queries - 20 executions in the last 256 queries • No need to manage this with _cache or _cache_key any longer - deprecated features from 1.x • Only big segments get caches • Segments that contain 3% of index documents or 10,000 documents 5

Doc Values and Field Data • Inverted Index ‒ For
a "value", which "docs" contain it? • What if we need the opposite: ‒ For a "doc", what is a particular field's "value"? • Why do we need this? ‒ Sorting ‒ Aggregations ‒ Some Scripting • Two approaches for storing and accessing this structure... 6

Doc Values • Build columnar style data structure on disk
• We call these "doc values" (Lucene construct) • Created at indexing time, stored as part of the segment • Read like other pieces of the Lucene index ‒ Don't take up heap space ‒ Uses file system cache • Default for not_analyzed string and numeric fields in 2.0+ 7

Field Data • Data structure built on the fly at
query time • Held PER SEGMENT in the JVM memory • "15-20% faster", but comes at the cost of large heap usage (*GC) • To intentionally enable field data on 2.0+ ("not advised") 8 "properties" : { "tag": { "type": "string", "index" : "not_analyzed", "doc_values": false } }

Aggregations 9

Aggregations 10

Significant Terms (find the “uncommonly common”) • Terms Aggregation is
about popularity. • Significant Terms Aggregation is about significance. • Create a foreground dataset • See which terms are “significant” to it VS the background dataset 11

Sampler Aggregation • Limit the amount of documents a sub
aggregation will operate on • Reduce noise • Get better and faster results 12

Pipeline Aggregations • After you've aggregated data, how can you
aggregate the results? • Elasticsearch 2.0 introduced "Pipeline Aggregations" • Many type of aggregations such as moving averages, derivatives, bucket selectors and more! 13

Pipeline Aggregations • Simple to use • Specify a type
of Pipeline Aggregation • Specify a "bucket_path" • Optionally, use bucket_selectors to filter out buckets you don't want to pipeline aggregate 14 GET stack/question/_search { "size": 0, "aggs": { "daily_comments": { "date_histogram": { "field": "creation_date", "interval": "hour" }, "aggs" : { "comment_counts" : { "sum" : { "field" : "comment_count" } }, "comments_moving_avg" : { "moving_avg": { "buckets_path": "comment_counts", "model": "simple" } } } } }

Pipeline Aggregations 15 • Chart with simple model, 30m intervals
and a window of 50: Blue Line: Count Red Line: Moving Average

Pipeline Aggregations 16 • Chart with linear model, 30m intervals
and a window of 50: Blue Line: Count Red Line: Moving Average

Pipeline Aggregations 17 • Chart with holt-winters model, 30m intervals
and a window of 100, period of 48 and prediction of 200: Blue Line: Count Red Line: Moving Average

Query Profiler • Attempts to time execution of query components
• Best-effort profiling • Expensive! Verbose! 18 “SQL Explain for ES” https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

19 Query Profiler

Kibana • Dark Theme • Legend Colors • Shield Integration
• Plugins • marvel (agent/ui) • sense • timelion • ... 21

Found -> Cloud • Easy updates • Back up every
30 minutes • AWS integration 23

Beats 25 Topbeat Filebeat Packetbeat {Community}beat libbeat Beats Platform Elasticsearch
Kibana Logstash Optional Open source platform for building lightweight data shippers

libbeat • Foundation for all Beats • Go library •
Just worry about how to collect (parse) the data • Do not worry ‒ where to ship the data ‒ how to connect • Create a new beat guide ‒ https://www.elastic.co/guide/en/beats/libbeat/current/new-beat.html 26

Logstash • Deprecating support for node protocol (only http) •
Optimizations to UserAgent and GeoIP Lookups • Installing Plugins Offline • Shutdown Improvements • Twitter Input Enhancements • Smarter Defaults, and Better Output management • Next Generation (NG) pipeline 28

X-Pack • A new product that extends the Elastic Stack
with features: • Security (Shield) - Protect your data across the Elastic Stack. • Alerting (Watcher) - Get notifications about changes in your data. • Monitoring (Marvel) - Keep a pulse on the health of your stack. • Graph - Query and visualize meaningful relationships in your data. • Reporting - Generate, schedule, and email PDF reports. 31

Elastic Stack 2.x News

Elastic Stack 2.x News

Pablo Musa

More Decks by Pablo Musa

Other Decks in Technology

Featured

Transcript

‹#› Pablo Musa March 2016, @pablitomusa Elastic Stack 2.X News

Elastic Stack 2 Elastic Cloud Security Monitoring Alerting Graph X-Pack

3

Query DSL Update and Optimization • Elasticsearch 2.x intelligently executes

Query DSL Update and Optimization • The "Query Cache" will

Doc Values and Field Data • Inverted Index ‒ For

Doc Values • Build columnar style data structure on disk

Field Data • Data structure built on the fly at

Aggregations 9

Aggregations 10

Significant Terms (find the “uncommonly common”) • Terms Aggregation is

Sampler Aggregation • Limit the amount of documents a sub

Pipeline Aggregations • After you've aggregated data, how can you

Pipeline Aggregations • Simple to use • Specify a type

Pipeline Aggregations 15 • Chart with simple model, 30m intervals

Pipeline Aggregations 16 • Chart with linear model, 30m intervals

Pipeline Aggregations 17 • Chart with holt-winters model, 30m intervals

Query Profiler • Attempts to time execution of query components

19 Query Profiler

20

Kibana • Dark Theme • Legend Colors • Shield Integration

22

Found -> Cloud • Easy updates • Back up every

24

Beats 25 Topbeat Filebeat Packetbeat {Community}beat libbeat Beats Platform Elasticsearch

libbeat • Foundation for all Beats • Go library •

27

Logstash • Deprecating support for node protocol (only http) •

29

30

X-Pack • A new product that extends the Elastic Stack