Elastic Stack News (2.x + 5.0)

‹#› Pablo Musa May 2016, @pablitomusa Elastic Stack News (2.x
+ 5.0)

ELK Stack and Others 2 1.1 2.2 4.4 2.2 Watcher
- Shield - Marvel Found

Elastic Stack 5.0 3 Elastic Cloud Security Monitoring Alerting Graph
X-Pack Kibana User Interface Elasticsearch Logstash Beats Store, Index, and Analyze Ingest

Query DSL Update and Optimization (2.0) • Elasticsearch 2.x intelligently
executes queries • no need to write queries "the best way" • Examples: • For "conjunction" queries (2 or more"match" queries in a must section) • Sub-queries are sorted by term frequency • Executed lowest to highest by term frequency • For complex queries ("match_phrase" for instance) a 2-Phase execution strategy is used • Approximation Phase • Verification Phase 5

Query DSL Update and Optimization • The "Query Cache" will
cache filter parts if they appear enough times ‒ Complex queries - 2 executions in the last 256 queries ‒ Typical queries - 5 executions in the last 256 queries ‒ Simple queries - 20 executions in the last 256 queries • No need to manage this with _cache or _cache_key any longer - deprecated features from 1.x • Only big segments get caches • Segments that contain 3% of index documents or 10,000 documents 6

Doc Values and Field Data (2.0) • Inverted Index ‒
For a "value", which "docs" contain it? • What if we need the opposite: ‒ For a "doc", what is a particular field's "value"? • Why do we need this? ‒ Sorting ‒ Aggregations ‒ Some Scripting • Two approaches for storing and accessing this structure... 7

Doc Values • Build columnar style data structure on disk
• We call these "doc values" (Lucene construct) • Created at indexing time, stored as part of the segment • Read like other pieces of the Lucene index ‒ Don't take up heap space ‒ Uses file system cache • Default for not_analyzed string and numeric fields in 2.0+ 8

Field Data • Data structure built on the fly at
query time • Held PER SEGMENT in the JVM memory • "15-20% faster", but comes at the cost of large heap usage (*GC) • To intentionally enable field data on 2.0+ ("not advised") 9 "properties" : { "tag": { "type": "string", "index" : "not_analyzed", "doc_values": false } }

Aggregations 10

Aggregations 11

Significant Terms (find the “uncommonly common”) • Terms Aggregation is
about popularity. • Significant Terms Aggregation is about significance. • Create a foreground dataset • See which terms are “significant” to it VS the background dataset 12

Sampler Aggregation (2.0) • Limit the amount of documents a
sub aggregation will operate on • Reduce noise • Get better and faster results 13

Pipeline Aggregations (2.0) • After you've aggregated data, how can
you aggregate the results? • Elasticsearch 2.0 introduced "Pipeline Aggregations" • Many type of aggregations such as moving averages, derivatives, bucket selectors and more! 14

Pipeline Aggregations • Simple to use • Specify a type
of Pipeline Aggregation • Specify a "bucket_path" • Optionally, use bucket_selectors to filter out buckets you don't want to pipeline aggregate 15 GET stack/question/_search { "size": 0, "aggs": { "daily_comments": { "date_histogram": { "field": "creation_date", "interval": "hour" }, "aggs" : { "comment_counts" : { "sum" : { "field" : "comment_count" } }, "comments_moving_avg" : { "moving_avg": { "buckets_path": "comment_counts", "model": "simple" } } } } }

Pipeline Aggregations 16 • Chart with simple model, 30m intervals
and a window of 50: Blue Line: Count Red Line: Moving Average

Pipeline Aggregations 17 • Chart with linear model, 30m intervals
and a window of 50: Blue Line: Count Red Line: Moving Average

Pipeline Aggregations 18 • Chart with holt-winters model, 30m intervals
and a window of 100, period of 48 and prediction of 200: Blue Line: Count Red Line: Moving Average

GET /my-index/_search { "profile": true, "query": { "match_all": {} }
} Query Profiler (2.2) • Attempts to time execution of query components • Best-effort profiling • Expensive! Verbose! 19 “SQL Explain for ES” https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

20 Query Profiler

Update by Query (2.3) • Gets a 'snapshot' of the
index • Indexes what it finds • version++ • Version conflict if there are changes between 'snapshot' and update • Failures cause abortion • no roll back • "conflicts": "proceed" 22 POST /twitter/_update_by_query { "script": { "inline": "ctx._source.likes++" }, "query": { "term": { "user": "kimchy" } } }

Update by Query (2.3) • Batches of 100 docs •
?scroll_size=200 • First failure aborts, but all failures that are returned by the failing bulk request are returned 23 { "took" : 639, "updated": 1235, "batches": 13, "version_conflicts": 2, "failures" : [ ] } https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

Reindex API (2.3) • Gets a 'snapshot' of the index
• Indexes to a new index • Failures cause abortion • no roll back • "conflicts": "proceed" • Multiple indices and types 24 POST _reindex { "source": { "index": "old_index", "query": { "match": { "user": "twitter" } } }, "dest": { "index": "new_index" } }

Reindex API (2.3) • Conflicts are not likely, but •
"version_type": "internal" • "version_type": "external" • "op_type": "create" • "size": 100 • "sort": { "date": "desc" } • Very flexible and powerful (scripts, refresh, wait_for_completion) 25 { "took" : 639, "updated": 112, "batches": 130, "version_conflicts": 0, "failures" : [ ], "created": 12344 } https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Task Management API (2.3) • Monitoring of cancellation of running
tasks 26 https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html GET /_tasks GET /_tasks?nodes=nodeId1,nodeId2 GET /_tasks?nodes=nodeId1,nodeId2&actions=cluster:* GET /_tasks/taskId1 GET /_tasks?parent_task_id=parentTaskId1 GET /_tasks/taskId1?wait_for_completion=true&timeout=10s POST /_tasks/taskId1/_cancel

Ingest Node (5.0) • Adding the power of Logstash filters
inside an Elasticsearch node • Pre-process documents before the actual indexing takes place • Enabled by default • (node.ingest: false) 27 https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html PUT _ingest/pipeline/pipeline-name GET _ingest/pipeline/pipeline-name GET ingest/pipeline/* DELETE _ingest/pipeline/pipeline-name

Ingest Node (5.0) 28 { "description": "mysql pipeline", "processors": [
{ "grok" : { "field" : "message", "pattern" : "..." } }, { "remove" : { "field" : "message" } } ] } • You define pipelines as series of processors • For example: • extract mysql fields from `message` field and then remove it from the document • Simplifies the ingestion pipeline

Painless Scripting (5.0) • Fast • Secure • Single function
only • Groovy-like syntax • Dynamic & static typing 29 # {"name":"JC", "goals":[9,27], "assists":[0,0]} GET /hockey-stats/_search { "query": { "function_score": { "script_score": { "script": { "lang": "painless", "inline": "int total = 0; for (int i = 0; i < input.doc.goals.size(); ++i) { total += input.doc.goals[i]; } return total;" } } } } } https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-painless.html

Java HTTP Client (5.0) • Decouple server/client • Minimize dependencies
• Similar to other clients 30 https://github.com/elastic/elasticsearch/issues/7743

Lucene 6 multi-dimensional points (5.0) • improves indexing of numeric
field • faster indexing • less memory at search time 31 0% 25% 50% 75% 100% Index Size Index Time Search Time Search Time Heap Usage 15% 76% 49% 49% 100% 100% 100% 100% NumericField PointField (Michael McCandless, https://www.elastic.co/blog/lucene-points-6.0)

String Mappings (5.0) • The string field datatype has been
replaced by • the text field for full text analyzed content • the keyword field for not-analyzed string values 32 https://www.elastic.co/guide/en/elasticsearch/reference/master/breaking_50_mapping_changes.html "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } "my_number": { "type": "long", "fields": { "raw": { "type": "keyword" } } }

Kibana • 4.X • Status Page • Shield integration •
Flexibility (filters, legend colors, dark theme) • UI framework • 5.0 • New design • First-class applications • Packs and a new plugin installer 34

35 4.X

‹#›

Filters 38 Edit with the full power of the Elasticsearch
DSL Pin it then take it with you. Alias for commonly used filters

Custom Colors 39

Custom Colors 40

Persistent axis labeling 41

Persistent axis labeling 42

return to The Dark Side

UI Framework • Apps • Webserver 44

Creating Kibana Apps 45 # install npm install -g yo
# install yeoman npm install -g generator-kibana-plugin # configure mkdir my-new-plugin cd my-new-plugin yo kibana-plugin # Generate an app skeleton npm start # Start the plugin development environment # create cd ../kibana npm start # start the kibana dev environment (needs Elasticsearch)   # go to http://localhost:5601

46 5.0

New Design 47

First-class applications 48

Packs, and a new plugin installer 49 # Want to
install a third party pack? Just give it a url: bin/kibana-plugin install https://example.com/mypack.zip # Or how about one of our own bin/kibana-plugin install timelion # Want security, monitoring, reporting, and graph? bin/kibana-plugin install x-pack

Found -> Cloud • Easy updates - 2 clicks from
all the new features • Kibana • Security • Monitoring • Flexibility (configs, plugins, ...) • Back up every 30 minutes • Easy AWS integration 61

Cloud Enterprise (5.0) Cloud deployment manager in your own infrastructure!
62

Beats 64 Topbeat Filebeat Packetbeat {Community}beat libbeat Beats Platform Elasticsearch
Kibana Logstash Optional Open source platform for building lightweight data shippers

libbeat • Foundation for all Beats • Go library •
Just worry about how to collect (parse) the data • Do not worry ‒ where to ship the data ‒ how to connect • Create a new beat guide ‒ https://www.elastic.co/guide/en/beats/libbeat/current/new-beat.html 65

Other Inputs and Outputs • Outputs • Kafka (built-in) •
Redis (built-in) • Inputs • Windows event logs (built-in) • Nginx, Apache • Redis • MySQL • https://www.elastic.co/guide/en/beats/libbeat/master/community-beats.html 66

Logstash • Deprecating support for node protocol (only http) (2.0)
• Installing Plugins Offline (2.2) • Config Reload (2.3) • Next Generation (NG) pipeline (5.0) • Metrics (5.0) • Configuration Management (5.0) • Persistency (5.0) 68

Config Reloading Previously: Any config change made to file required
a process restart Feedback loop for development/ testing slow Processing pipeline must be long living 69 File watched for changes or SIGHUP triggers reload Current Pipeline stopped Config Validated New Pipeline started - no process restart Why? How?

The Next Generation Pipeline (5.0) 70 I F O I
F O Old New

Metrics (5.0) • Current web api resources (default port 9600):
• http://localhost:9600/_node/hot_threads • http://localhost:9600/_node/stats/ • http://localhost:9600/_node/stats/events • http://localhost:9600/_stats/jvm • http://localhost:9600/_plugins/ • ….. 72

73 Metrics (5.0)

74 role: frontend-logs Create a role, upload config

75 Event Persistency, soon!!

X-Pack • A new product that extends the Elastic Stack
with features: • Security (Shield) - Protect your data across the Elastic Stack. • Alerting (Watcher) - Get notifications about changes in your data. • Monitoring (Marvel) - Keep a pulse on the health of your stack. • Graph - Query and visualize meaningful relationships in your data. • Reporting - Generate, schedule, and email PDF reports. 77

‹#› Pioneer Program https://www.elastic.co/blog/ elastic-pioneer-program

Elastic Stack News (2.x + 5.0)

Elastic Stack News (2.x + 5.0)

More Decks by Pablo Musa

Other Decks in Programming

Featured

Transcript