Elastic Stack News (2.x + 5.0)

by Pablo Musa

Slide 1

Slide 1 text

‹#› Pablo Musa May 2016, @pablitomusa Elastic Stack News (2.x + 5.0)

Slide 2

Slide 2 text

ELK Stack and Others 2 1.1 2.2 4.4 2.2 Watcher - Shield - Marvel Found

Slide 3

Slide 3 text

Elastic Stack 5.0 3 Elastic Cloud Security Monitoring Alerting Graph X-Pack Kibana User Interface Elasticsearch Logstash Beats Store, Index, and Analyze Ingest

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Query DSL Update and Optimization (2.0) • Elasticsearch 2.x intelligently executes queries • no need to write queries "the best way" • Examples: • For "conjunction" queries (2 or more"match" queries in a must section) • Sub-queries are sorted by term frequency • Executed lowest to highest by term frequency • For complex queries ("match_phrase" for instance) a 2-Phase execution strategy is used • Approximation Phase • Verification Phase 5

Slide 6

Slide 6 text

Query DSL Update and Optimization • The "Query Cache" will cache filter parts if they appear enough times ‒ Complex queries - 2 executions in the last 256 queries ‒ Typical queries - 5 executions in the last 256 queries ‒ Simple queries - 20 executions in the last 256 queries • No need to manage this with _cache or _cache_key any longer - deprecated features from 1.x • Only big segments get caches • Segments that contain 3% of index documents or 10,000 documents 6

Slide 7

Slide 7 text

Doc Values and Field Data (2.0) • Inverted Index ‒ For a "value", which "docs" contain it? • What if we need the opposite: ‒ For a "doc", what is a particular field's "value"? • Why do we need this? ‒ Sorting ‒ Aggregations ‒ Some Scripting • Two approaches for storing and accessing this structure... 7

Slide 8

Slide 8 text

Doc Values • Build columnar style data structure on disk • We call these "doc values" (Lucene construct) • Created at indexing time, stored as part of the segment • Read like other pieces of the Lucene index ‒ Don't take up heap space ‒ Uses file system cache • Default for not_analyzed string and numeric fields in 2.0+ 8

Slide 9

Slide 9 text

Field Data • Data structure built on the fly at query time • Held PER SEGMENT in the JVM memory • "15-20% faster", but comes at the cost of large heap usage (*GC) • To intentionally enable field data on 2.0+ ("not advised") 9 "properties" : { "tag": { "type": "string", "index" : "not_analyzed", "doc_values": false } }

Slide 10

Slide 10 text

Aggregations 10

Slide 11

Slide 11 text

Aggregations 11

Slide 12

Slide 12 text

Significant Terms (find the “uncommonly common”) • Terms Aggregation is about popularity. • Significant Terms Aggregation is about significance. • Create a foreground dataset • See which terms are “significant” to it VS the background dataset 12

Slide 13

Slide 13 text

Sampler Aggregation (2.0) • Limit the amount of documents a sub aggregation will operate on • Reduce noise • Get better and faster results 13

Slide 14

Slide 14 text

Pipeline Aggregations (2.0) • After you've aggregated data, how can you aggregate the results? • Elasticsearch 2.0 introduced "Pipeline Aggregations" • Many type of aggregations such as moving averages, derivatives, bucket selectors and more! 14

Slide 15

Slide 15 text

Pipeline Aggregations • Simple to use • Specify a type of Pipeline Aggregation • Specify a "bucket_path" • Optionally, use bucket_selectors to filter out buckets you don't want to pipeline aggregate 15 GET stack/question/_search { "size": 0, "aggs": { "daily_comments": { "date_histogram": { "field": "creation_date", "interval": "hour" }, "aggs" : { "comment_counts" : { "sum" : { "field" : "comment_count" } }, "comments_moving_avg" : { "moving_avg": { "buckets_path": "comment_counts", "model": "simple" } } } } }

Slide 16

Slide 16 text

Pipeline Aggregations 16 • Chart with simple model, 30m intervals and a window of 50: Blue Line: Count Red Line: Moving Average

Slide 17

Slide 17 text

Pipeline Aggregations 17 • Chart with linear model, 30m intervals and a window of 50: Blue Line: Count Red Line: Moving Average

Slide 18

Slide 18 text

Pipeline Aggregations 18 • Chart with holt-winters model, 30m intervals and a window of 100, period of 48 and prediction of 200: Blue Line: Count Red Line: Moving Average

Slide 19

Slide 19 text

GET /my-index/_search { "profile": true, "query": { "match_all": {} } } Query Profiler (2.2) • Attempts to time execution of query components • Best-effort profiling • Expensive! Verbose! 19 “SQL Explain for ES” https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

Slide 20

Slide 20 text

20 Query Profiler

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Update by Query (2.3) • Gets a 'snapshot' of the index • Indexes what it finds • version++ • Version conflict if there are changes between 'snapshot' and update • Failures cause abortion • no roll back • "conflicts": "proceed" 22 POST /twitter/_update_by_query { "script": { "inline": "ctx._source.likes++" }, "query": { "term": { "user": "kimchy" } } }

Slide 23

Slide 23 text

Update by Query (2.3) • Batches of 100 docs • ?scroll_size=200 • First failure aborts, but all failures that are returned by the failing bulk request are returned 23 { "took" : 639, "updated": 1235, "batches": 13, "version_conflicts": 2, "failures" : [ ] } https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

Slide 24

Slide 24 text

Reindex API (2.3) • Gets a 'snapshot' of the index • Indexes to a new index • Failures cause abortion • no roll back • "conflicts": "proceed" • Multiple indices and types 24 POST _reindex { "source": { "index": "old_index", "query": { "match": { "user": "twitter" } } }, "dest": { "index": "new_index" } }

Slide 25

Slide 25 text

Reindex API (2.3) • Conflicts are not likely, but • "version_type": "internal" • "version_type": "external" • "op_type": "create" • "size": 100 • "sort": { "date": "desc" } • Very flexible and powerful (scripts, refresh, wait_for_completion) 25 { "took" : 639, "updated": 112, "batches": 130, "version_conflicts": 0, "failures" : [ ], "created": 12344 } https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Slide 26

Slide 26 text

Task Management API (2.3) • Monitoring of cancellation of running tasks 26 https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html GET /_tasks GET /_tasks?nodes=nodeId1,nodeId2 GET /_tasks?nodes=nodeId1,nodeId2&actions=cluster:* GET /_tasks/taskId1 GET /_tasks?parent_task_id=parentTaskId1 GET /_tasks/taskId1?wait_for_completion=true&timeout=10s POST /_tasks/taskId1/_cancel

Slide 27

Slide 27 text

Ingest Node (5.0) • Adding the power of Logstash filters inside an Elasticsearch node • Pre-process documents before the actual indexing takes place • Enabled by default • (node.ingest: false) 27 https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html PUT _ingest/pipeline/pipeline-name GET _ingest/pipeline/pipeline-name GET ingest/pipeline/* DELETE _ingest/pipeline/pipeline-name

Slide 28

Slide 28 text

Ingest Node (5.0) 28 { "description": "mysql pipeline", "processors": [ { "grok" : { "field" : "message", "pattern" : "..." } }, { "remove" : { "field" : "message" } } ] } • You define pipelines as series of processors • For example: • extract mysql fields from `message` field and then remove it from the document • Simplifies the ingestion pipeline

Slide 29

Slide 29 text

Painless Scripting (5.0) • Fast • Secure • Single function only • Groovy-like syntax • Dynamic & static typing 29 # {"name":"JC", "goals":[9,27], "assists":[0,0]} GET /hockey-stats/_search { "query": { "function_score": { "script_score": { "script": { "lang": "painless", "inline": "int total = 0; for (int i = 0; i < input.doc.goals.size(); ++i) { total += input.doc.goals[i]; } return total;" } } } } } https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-painless.html

Slide 30

Slide 30 text

Java HTTP Client (5.0) • Decouple server/client • Minimize dependencies • Similar to other clients 30 https://github.com/elastic/elasticsearch/issues/7743

Slide 31

Slide 31 text

Lucene 6 multi-dimensional points (5.0) • improves indexing of numeric field • faster indexing • less memory at search time 31 0% 25% 50% 75% 100% Index Size Index Time Search Time Search Time Heap Usage 15% 76% 49% 49% 100% 100% 100% 100% NumericField PointField (Michael McCandless, https://www.elastic.co/blog/lucene-points-6.0)

Slide 32

Slide 32 text

String Mappings (5.0) • The string field datatype has been replaced by • the text field for full text analyzed content • the keyword field for not-analyzed string values 32 https://www.elastic.co/guide/en/elasticsearch/reference/master/breaking_50_mapping_changes.html "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } "my_number": { "type": "long", "fields": { "raw": { "type": "keyword" } } }

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Kibana • 4.X • Status Page • Shield integration • Flexibility (filters, legend colors, dark theme) • UI framework • 5.0 • New design • First-class applications • Packs and a new plugin installer 34

Slide 35

Slide 35 text

35 4.X

Slide 36

Slide 36 text

‹#›

Slide 37

Slide 37 text

‹#›

Slide 38

Slide 38 text

Filters 38 Edit with the full power of the Elasticsearch DSL Pin it then take it with you. Alias for commonly used filters

Slide 39

Slide 39 text

Custom Colors 39

Slide 40

Slide 40 text

Custom Colors 40

Slide 41

Slide 41 text

Persistent axis labeling 41

Slide 42

Slide 42 text

Persistent axis labeling 42

Slide 43

Slide 43 text

return to The Dark Side

Slide 44

Slide 44 text

UI Framework • Apps • Webserver 44

Slide 45

Slide 45 text

Creating Kibana Apps 45 # install npm install -g yo # install yeoman npm install -g generator-kibana-plugin # configure mkdir my-new-plugin cd my-new-plugin yo kibana-plugin # Generate an app skeleton npm start # Start the plugin development environment # create cd ../kibana npm start # start the kibana dev environment (needs Elasticsearch)   # go to http://localhost:5601

Slide 46

Slide 46 text

46 5.0

Slide 47

Slide 47 text

New Design 47

Slide 48

Slide 48 text

First-class applications 48

Slide 49

Slide 49 text

Packs, and a new plugin installer 49 # Want to install a third party pack? Just give it a url: bin/kibana-plugin install https://example.com/mypack.zip # Or how about one of our own bin/kibana-plugin install timelion # Want security, monitoring, reporting, and graph? bin/kibana-plugin install x-pack

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Slide 52

Slide 52 text

Slide 53

Slide 53 text

Slide 54

Slide 54 text

Slide 55

Slide 55 text

Slide 56

Slide 56 text

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Slide 60

Slide 60 text

Slide 61

Slide 61 text

Found -> Cloud • Easy updates - 2 clicks from all the new features • Kibana • Security • Monitoring • Flexibility (configs, plugins, ...) • Back up every 30 minutes • Easy AWS integration 61

Slide 62

Slide 62 text

Cloud Enterprise (5.0) Cloud deployment manager in your own infrastructure! 62

Slide 63

Slide 63 text

Slide 64

Slide 64 text

Beats 64 Topbeat Filebeat Packetbeat {Community}beat libbeat Beats Platform Elasticsearch Kibana Logstash Optional Open source platform for building lightweight data shippers

Slide 65

Slide 65 text

libbeat • Foundation for all Beats • Go library • Just worry about how to collect (parse) the data • Do not worry ‒ where to ship the data ‒ how to connect • Create a new beat guide ‒ https://www.elastic.co/guide/en/beats/libbeat/current/new-beat.html 65

Slide 66

Slide 66 text

Other Inputs and Outputs • Outputs • Kafka (built-in) • Redis (built-in) • Inputs • Windows event logs (built-in) • Nginx, Apache • Redis • MySQL • https://www.elastic.co/guide/en/beats/libbeat/master/community-beats.html 66

Slide 67

Slide 67 text

Slide 68

Slide 68 text

Logstash • Deprecating support for node protocol (only http) (2.0) • Installing Plugins Offline (2.2) • Config Reload (2.3) • Next Generation (NG) pipeline (5.0) • Metrics (5.0) • Configuration Management (5.0) • Persistency (5.0) 68

Slide 69

Slide 69 text

Config Reloading Previously: Any config change made to file required a process restart Feedback loop for development/ testing slow Processing pipeline must be long living 69 File watched for changes or SIGHUP triggers reload Current Pipeline stopped Config Validated New Pipeline started - no process restart Why? How?

Slide 70

Slide 70 text

The Next Generation Pipeline (5.0) 70 I F O I F O Old New

Slide 71

Slide 71 text

Slide 72

Slide 72 text

Metrics (5.0) • Current web api resources (default port 9600): • http://localhost:9600/_node/hot_threads • http://localhost:9600/_node/stats/ • http://localhost:9600/_node/stats/events • http://localhost:9600/_stats/jvm • http://localhost:9600/_plugins/ • ….. 72

Slide 73

Slide 73 text

73 Metrics (5.0)

Slide 74

Slide 74 text

74 role: frontend-logs Create a role, upload config

Slide 75

Slide 75 text

75 Event Persistency, soon!!

Slide 76

Slide 76 text

Slide 77

Slide 77 text

X-Pack • A new product that extends the Elastic Stack with features: • Security (Shield) - Protect your data across the Elastic Stack. • Alerting (Watcher) - Get notifications about changes in your data. • Monitoring (Marvel) - Keep a pulse on the health of your stack. • Graph - Query and visualize meaningful relationships in your data. • Reporting - Generate, schedule, and email PDF reports. 77

Slide 78

Slide 78 text

‹#› Pioneer Program https://www.elastic.co/blog/ elastic-pioneer-program