Slide 1

Slide 1 text

‹#› Ingest Node: (re)indexing and enriching documents within Elasticsearch is powered by @lucacavanna

Slide 2

Slide 2 text

Agenda 2 Why ingest node? How does it work? Where can it be used? Demo! 1 2 3 4

Slide 3

Slide 3 text

‹#› Why ingest node?

Slide 4

Slide 4 text

‹#› I just want to tail a file.

Slide 5

Slide 5 text

Logstash: collect, enrich & transport 5 grok date mutate input output Filters The file Elasticsearch

Slide 6

Slide 6 text

Logstash common setup 6 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 message

Slide 7

Slide 7 text

Ingest node setup 7 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24

Slide 8

Slide 8 text

Filebeat: collect and ship 8 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /not_found/ HTTP/1.1\" 404 7218" } { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /favicon.ico HTTP/1.1\" 200 3638" }

Slide 9

Slide 9 text

Elasticsearch: enrich and index 9 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } { "request" : "/", "auth" : "-", "ident" : "-", "verb" : "GET", "@timestamp" : "2016-04-19T10:00:04.000Z", "response" : "200", "bytes" : "24", "clientip" : "127.0.0.1", "httpversion" : "1.1", "rawrequest" : null, "timestamp" : "19/Apr/2016:12:00:04 +0200" }

Slide 10

Slide 10 text

‹#› How does ingest node work?

Slide 11

Slide 11 text

Ingest pipeline 11 Pipeline: a set of processors grok date remove document enriched document

Slide 12

Slide 12 text

Define a pipeline PUT /_ingest/pipeline/apache-log { "processors" : [ { "grok" : { "field": "message", "pattern": "%{COMMONAPACHELOG}" } }, { "date" : { "match_field" : "timestamp", "match_formats" : ["dd/MMM/YYYY:HH:mm:ss Z"] } }, { "remove" : { "field" : "message" } } ] } 12

Slide 13

Slide 13 text

Index a document Provide the id of the pipeline to execute PUT /logs/apache/1?pipeline=apache-log { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } 13

Slide 14

Slide 14 text

GET /logs/apache/1 { "request" : "/", "auth" : "-", "ident" : "-", "verb" : "GET", "@timestamp" : "2016-04-19T10:00:04.000Z", "response" : "200", "bytes" : "24", "clientip" : "127.0.0.1", "httpversion" : "1.1", "rawrequest" : null, "timestamp" : "19/Apr/2016:12:00:04 +0200" } What has actually been indexed 14

Slide 15

Slide 15 text

PUT /_ingest/pipeline/apache-log { … } GET /_ingest/pipeline/apache-log GET /_ingest/pipeline/* DELETE /_ingest/pipeline/apache-log Pipeline management Create, Read, Update & Delete 15

Slide 16

Slide 16 text

16 grok remove attachment convert uppercase foreach trim append gsub set split fail geoip join lowercase rename date

Slide 17

Slide 17 text

Extracts structured fields out of a single text field 17 Grok processor { "grok": { "field": "message", "pattern": "%{DATE:date}" } }

Slide 18

Slide 18 text

set, remove, rename, convert, gsub, split, join, lowercase, uppercase, trim, append 18 Mutate processors { "remove": { "field": "message" } }

Slide 19

Slide 19 text

Parses a date from a string 19 Date processor { "date": { "field": "timestamp", "match_formats": ["YYYY"] } }

Slide 20

Slide 20 text

Adds information about the geographical location of IP addresses 20 Geoip processor { "geoip": { "field": "ip" } }

Slide 21

Slide 21 text

Do something for every element of an array 21 Foreach processor { "foreach": { "field" : "values", "processors" : [ { "uppercase" : { "field" : "_value" } } ] } }

Slide 22

Slide 22 text

Raises an exception with a configurable message 22 Fail processor { "fail": { "message": "custom error" } }

Slide 23

Slide 23 text

Introducing new processors is as easy as writing a plugin 23 Plugins { "your_plugin": { … } }

Slide 24

Slide 24 text

‹#› Error handling

Slide 25

Slide 25 text

25 grok date remove { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 26

Slide 26 text

26 grok date remove { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" } 400 Bad Request unable to parse date [19/Apr/2016:12:00:00 +040]

Slide 27

Slide 27 text

27 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" } grok date remove set on failure processors at the pipeline level

Slide 28

Slide 28 text

28 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" } remove 200 OK grok date set on failure processors at the pipeline level

Slide 29

Slide 29 text

29 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" } grok date remove set on failure processors at the processor level remove

Slide 30

Slide 30 text

30 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" } grok date remove set remove 200 OK on failure processors at the processor level

Slide 31

Slide 31 text

‹#› Ingest node internals

Slide 32

Slide 32 text

cluster Default scenario 32 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Cluster State logs index: 3 primary shards, 1 replica each All nodes are equal: - node.data: true - node.master: true - node.ingest: true

Slide 33

Slide 33 text

cluster Default scenario 33 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Pre-processing on the coordinating node All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 34

Slide 34 text

cluster Default scenario 34 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Indexing on the primary shard All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 35

Slide 35 text

cluster Default scenario 35 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Indexing on the replica shard All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 36

Slide 36 text

cluster Ingest dedicated nodes 36 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 37

Slide 37 text

cluster Ingest dedicated nodes 37 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Forward request to an ingest node

Slide 38

Slide 38 text

cluster Ingest dedicated nodes 38 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Pre-processing on the ingest node

Slide 39

Slide 39 text

cluster Ingest dedicated nodes 39 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Indexing on the primary shard

Slide 40

Slide 40 text

cluster Ingest dedicated nodes 40 Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Indexing on the replica shard

Slide 41

Slide 41 text

‹#› Where can ingest pipelines be used?

Slide 42

Slide 42 text

42 Index api PUT /logs/apache/1?pipeline=apache-log { "message" : "…" }

Slide 43

Slide 43 text

43 Bulk api PUT /logs/_bulk { "index": { "_type": "apache", "_id": "1", "pipeline": "apache-log" } }\n { "message" : "…" }\n { "index": {"_type": "mysql", "_id": "1", "pipeline": "mysql-log" } }\n { "message" : "…" }\n

Slide 44

Slide 44 text

Scan/scroll & bulk indexing made easy 44 Reindex api POST /_reindex { "source": { "index": "logs", "type": "apache" }, "dest": { "index": "apache-logs", "pipeline" : "apache-log" } }

Slide 45

Slide 45 text

‹#› Demo time!

Slide 46

Slide 46 text

‹#› https://www.elastic.co/downloads/elasticsearch Go get Elasticsearch 5.0.0-alpha1!

Slide 47

Slide 47 text

‹#› Thank you