Slide 1

Slide 1 text

1 Ingest Node (re)indexing and enriching documents within Elasticsearch David Pilato Developer | Evangelist, @dadoonet

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

sli.do/elastic

Slide 4

Slide 4 text

@dadoonet sli.do/elastic 5 Elastic Stack 100% open source No enterprise edition

Slide 5

Slide 5 text

@dadoonet sli.do/elastic 6 X-Pack Single install Extensions for the Elastic Stack Subscription pricing Security Alerting Monitoring Reporting Graph Machine Learning

Slide 6

Slide 6 text

@dadoonet sli.do/elastic 7 Elastic Cloud Hosted Elasticsearch & Kibana Includes X-Pack features Starts at $45/mo Available in AWS and Google Cloud Platform

Slide 7

Slide 7 text

@dadoonet sli.do/elastic 8 Elastic Cloud Enterprise Provision and manage multiple Elastic Stack environments; Expose logging as a service to your entire organization

Slide 8

Slide 8 text

Why ingest node?

Slide 9

Slide 9 text

10 I just want to tail a log file... Ops Engineer

Slide 10

Slide 10 text

@dadoonet sli.do/elastic 11 Logstash: collect, enrich & transport grok date mutate input output Filters The file Elasticsearch

Slide 11

Slide 11 text

@dadoonet sli.do/elastic 12 Logstash common setup 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 message

Slide 12

Slide 12 text

@dadoonet sli.do/elastic 13 Or … 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 message

Slide 13

Slide 13 text

@dadoonet sli.do/elastic 14 Ingest node setup 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24

Slide 14

Slide 14 text

@dadoonet sli.do/elastic 15 Filebeat: collect and ship 127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24 127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218 127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /not_found/ HTTP/1.1\" 404 7218" } { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /favicon.ico HTTP/1.1\" 200 3638" }

Slide 15

Slide 15 text

@dadoonet sli.do/elastic 16 Elasticsearch: enrich and index { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } { "request" : "/", "auth" : "-", "ident" : "-", "verb" : "GET", "@timestamp" : "2016-04-19T10:00:04.000Z", "response" : "200", "bytes" : "24", "clientip" : "127.0.0.1", "httpversion" : "1.1", "rawrequest" : null, "timestamp" : "19/Apr/2016:12:00:04 +0200" }

Slide 16

Slide 16 text

How does ingest node work?

Slide 17

Slide 17 text

@dadoonet sli.do/elastic 18 Ingest pipeline Pipeline: a set of processors grok date remove document enriched document

Slide 18

Slide 18 text

grok remove attachment convert uppercase foreach trim append gsub set split fail geoip join lowercase rename date

Slide 19

Slide 19 text

Extracts structured fields out of a single text field 20 Grok processor { "grok": { "field": "message", "patterns": ["%{DATE:date}"] } }

Slide 20

Slide 20 text

set, remove, rename, convert, gsub, split, join, lowercase, uppercase, trim, append 21 Mutate processors { "remove": { "field": "message" } }

Slide 21

Slide 21 text

Parses a date from a string 22 Date processor { "date": { "field": "timestamp", "formats": ["YYYY"] } }

Slide 22

Slide 22 text

Adds information about the geographical location of IP addresses 23 Geoip processor { "geoip": { "field": "ip" } }

Slide 23

Slide 23 text

You know, for documents 24 Attachment processor { "attachment": { "field" : "file" } } // Send a binary content {
 "file": "BASE64"
 }

Slide 24

Slide 24 text

Introducing new processors is as easy as writing a plugin 25 Plugins { "your_plugin": { ... } }

Slide 25

Slide 25 text

@dadoonet sli.do/elastic Pipeline management PUT /_ingest/pipeline/apache-log { "processors" : [ { "grok" : { "field": "message", "patterns": ["%{COMMONAPACHELOG}"] } }, { "date" : { "field" : "timestamp", "formats" : ["dd/MMM/YYYY:HH:mm:ss Z"] } }, { "remove" : { "field" : "message" } } ] } 26

Slide 26

Slide 26 text

Where can ingest pipelines be used?

Slide 27

Slide 27 text

@dadoonet sli.do/elastic Index API PUT /apache/doc/1 { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } 28

Slide 28

Slide 28 text

@dadoonet sli.do/elastic Index API PUT /apache/doc/1?pipeline=apache-log { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24" } 29

Slide 29

Slide 29 text

@dadoonet sli.do/elastic Bulk API PUT /apache/doc/_bulk {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /foo/ ..."}\n {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /f.png ..."}\n 30

Slide 30

Slide 30 text

@dadoonet sli.do/elastic Bulk API PUT /apache/doc/_bulk?pipeline=apache-log {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /foo/ ..."}\n {"index":{}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /f.png ..."}\n 31

Slide 31

Slide 31 text

@dadoonet sli.do/elastic Bulk API PUT /_bulk {"index":{"_index":"apache","_type":"doc"}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n {"index":{"_index":"mysql","_type":"doc"}}\n {"message":"..."}\n 32

Slide 32

Slide 32 text

@dadoonet sli.do/elastic Bulk API PUT /_bulk {"index":{"_index":"apache","_type":"doc","pipeline":"apache-log"}}\n {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n {"index":{"_index":"mysql","_type":"doc","pipeline":"mysql-log"}}\n {"message":"..."}\n 33

Slide 33

Slide 33 text

@dadoonet sli.do/elastic Reindex API POST /_reindex { "source": { "index": "logs", "type": "apache" }, "dest": { "index": "apache-logs", "type": "doc" } } 34

Slide 34

Slide 34 text

@dadoonet sli.do/elastic Reindex API POST /_reindex { "source": { "index": "logs", "type": "apache" }, "dest": { "index": "apache-logs", "type": "doc", "pipeline" : "apache-log" } } 35

Slide 35

Slide 35 text

Error handling

Slide 36

Slide 36 text

@dadoonet sli.do/elastic 37 grok date remove { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 37

Slide 37 text

@dadoonet sli.do/elastic 38 grok date remove 400 Bad Request unable to parse date [19/Apr/2016:12:00:00 +040] { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 38

Slide 38 text

@dadoonet sli.do/elastic 39 grok date remove set on failure processors at the pipeline level { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 39

Slide 39 text

@dadoonet sli.do/elastic 40 remove 200 OK grok date set on failure processors at the pipeline level { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 40

Slide 40 text

@dadoonet sli.do/elastic 41 grok date remove set on failure processors at the processor level remove { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 41

Slide 41 text

@dadoonet sli.do/elastic 42 grok date remove set remove 200 OK on failure processors at the processor level { "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24" }

Slide 42

Slide 42 text

Ingest node internals

Slide 43

Slide 43 text

@dadoonet sli.do/elastic cluster 44 Default scenario Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Cluster State logs index: 3 primary shards, 1 replica each All nodes are equal: - node.data: true - node.master: true - node.ingest: true

Slide 44

Slide 44 text

@dadoonet sli.do/elastic cluster 45 Default scenario Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Pre-processing on the coordinating node All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 45

Slide 45 text

@dadoonet sli.do/elastic cluster 46 Default scenario Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Indexing on the primary shard All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 46

Slide 46 text

@dadoonet sli.do/elastic cluster 47 Default scenario Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS Indexing on the replica shard All nodes are equal: - node.data: true - node.master: true - node.ingest: true index request for shard 3

Slide 47

Slide 47 text

@dadoonet sli.do/elastic cluster 48 Ingest dedicated nodes Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 48

Slide 48 text

@dadoonet sli.do/elastic cluster 49 Ingest dedicated nodes Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Forward request to an ingest node node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 49

Slide 49 text

@dadoonet sli.do/elastic cluster 50 Ingest dedicated nodes Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Pre-processing on the ingest node node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 50

Slide 50 text

@dadoonet sli.do/elastic cluster 51 Ingest dedicated nodes Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Indexing on the primary shard node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 51

Slide 51 text

@dadoonet sli.do/elastic cluster 52 Ingest dedicated nodes Client node1 logs 2P logs 3R CS node2 logs 3P logs 1R CS node3 logs 1P logs 2R CS node4 CS node5 CS index request for shard 3 Indexing on the replica shard node.data: false node.master: false node.ingest: true node.data: true node.master: true node.ingest: false

Slide 52

Slide 52 text

Demo time! 52.35.38.35 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24

Slide 53

Slide 53 text

bano-ingest plugin From postal address to geo_point From geo_point to postal address

Slide 54

Slide 54 text

@dadoonet sli.do/elastic 55 What is BANO? • French Open Data base for postal addresses • http://openstreetmap.fr/bano • http://bano.openstreetmap.fr/data/ per region all addresses

Slide 55

Slide 55 text

@dadoonet sli.do/elastic BANO Format 976030950H-26,26,RUE DISMA,97660,Bandrélé,CAD,-12.891701,45.202652
 976030950H-28,28,RUE DISMA,97660,Bandrélé,CAD,-12.891900,45.202700
 976030950H-30,30,RUE DISMA,97660,Bandrélé,CAD,-12.891781,45.202535
 976030950H-32,32,RUE DISMA,97660,Bandrélé,CAD,-12.892005,45.202564
 976030950H-3,3,RUE DISMA,97660,Bandrélé,CAD,-12.892444,45.202135
 976030950H-34,34,RUE DISMA,97660,Bandrélé,CAD,-12.892068,45.202450
 976030950H-4,4,RUE DISMA,97660,Bandrélé,CAD,-12.892446,45.202367
 976030950H-5,5,RUE DISMA,97660,Bandrélé,CAD,-12.892461,45.202248
 976030950H-6,6,RUE DISMA,97660,Bandrélé,CAD,-12.892383,45.202456
 976030950H-8,8,RUE DISMA,97660,Bandrélé,CAD,-12.892300,45.202555
 976030950H-9,9,RUE DISMA,97660,Bandrélé,CAD,-12.892355,45.202387 976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696 \_ ID | \_ Street Name | \ \_ Source \_ Geo point | | \ |_ Street Number |_ Zipcode \_ City Name 56

Slide 56

Slide 56 text

Import bano dataset

Slide 57

Slide 57 text

@dadoonet sli.do/elastic Load CSV with Logstash (Extract: input) input { stdin { } } 58 976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696 \_ ID | \_ Street Name | \ \_ Source \_ Geo point | | \ |_ Street Number |_ Zipcode \_ City Name { "message":"976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696", "@timestamp":"2017-12-05T16:00:00.000PST", "@version":1, "host":"MacBook-Pro-David.local" }

Slide 58

Slide 58 text

@dadoonet sli.do/elastic Load CSV with Logstash (Transform: filter) filter { csv { separator => "," columns => [ "id","number","street_name","zipcode","city","source","latitude","longitude" ] remove_field => [ "message", "@version", "@timestamp", "host" ] } } 59 { "message":"976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696", "@timestamp":"2017-12-05T16:00:00.000PST", "@version":1, "host":"MacBook-Pro-David.local" } { "source":"CAD", "id":"976030951J-103", "number":"103", "street_name":"RTE NATIONALE 3", "zipcode":"97660", "city":"Bandrélé", "latitude":"-12.893639", "longitude":"45.201696" }

Slide 59

Slide 59 text

@dadoonet sli.do/elastic Load CSV with Logstash (Transform: filter) filter { mutate { convert => { "longitude" => "float" } convert => { "latitude" => "float" } rename => { "longitude" => "[location][lon]" "latitude" => "[location][lat]" "number" => "[address][number]" "street_name" => "[address][street_name]" "zipcode" => "[address][zipcode]" "city" => "[address][city]" } replace => { "region" => "${REGION}" } } } 60 { "source":"CAD","id":"976030951J-103", "number":"103", "street_name":"RTE NATIONALE 3", "zipcode":"97660","city":"Bandrélé", "latitude":"-12.893639", "longitude":"45.201696" } { "source":"CAD","id":"976030951J-103", "region":"976", "address":{ "number":"103", "street_name":"RTE NATIONALE 3", "zipcode":"97660", "city":"Bandrélé" }, "location":{ "lat":-12.893639,"lon":45.201696 } }

Slide 60

Slide 60 text

@dadoonet sli.do/elastic Load CSV with Logstash (Load: output) output { elasticsearch { "template_name" => "bano" "template_overwrite" => true "template" => "${SOURCE_DIR}/src/main/logstash/bano.json" "index" => ".bano-${REGION}" "document_id" => "%{[id]}" } } 61 { "source":"CAD","id":"976030951J-103", "region":"976", "address":{ "number":"103","street_name":"RTE NATIONALE 3", "zipcode":"97660","city":"Bandrélé" }, "location":{ "lat":-12.893639,"lon":45.201696 } }

Slide 61

Slide 61 text

@dadoonet sli.do/elastic Index template (index settings) { "template": ".bano-*", "settings": { "index.number_of_shards": 1, "index.number_of_replicas": 0, "index.analysis: { "analyzer": { "bano_analyzer": { "type": "custom", "tokenizer": "standard", "filter" : [ "lowercase", "asciifolding" ] }, "bano_street_analyzer": { "type": "custom", "tokenizer": "standard", "filter" : [ "lowercase", "asciifolding", "bano_synonym" ] } }, "filter": { "bano_synonym": { "type": "synonym", "synonyms" : [ "bd => boulevard", "av => avenue", "r => rue", "rte => route" ] } } } }, // ... 62

Slide 62

Slide 62 text

@dadoonet sli.do/elastic Index template (mapping) { "template": ".bano-*", "settings": { ... }, "mappings": { "doc": { "properties" : { "address": { "properties" : { "city": { "type": "text", "analyzer": "bano_analyzer", "fields": { "keyword": { "type": "keyword" } } }, "number": { "type": "keyword" }, "street_name": { "type": "text", "analyzer": "bano_street_analyzer" }, "zipcode": { "type": "keyword" } } }, "region": { "type": "keyword" }, "location": { "type": "geo_point" }, "id": { "type": "keyword" }, "source": { "type": "keyword" } }}}, // ... 63

Slide 63

Slide 63 text

@dadoonet sli.do/elastic Index template (aliases) { "template": ".bano-*", "settings": { ... }, "mappings": { ... }, "aliases" : { ".bano" : {} } } 64 d6 d3 d2 d5 d1 d4 .bano-17 .bano d6 d3 d2 d5 d1 d4 .bano-95 d6 d3 d2 d5 d1 d4 .bano-75

Slide 64

Slide 64 text

@dadoonet sli.do/elastic Launch Logstash export SOURCE_DIR=~/Documents/ingest-bano/ DATASOURCE_DIR=~/Documents/ingest/bano-data LOGSTASH=~/Documents/ingest/stack-6.0.0/logstash-6.0.0 import_region () { export REGION=$1 FILE=$DATASOURCE_DIR/bano-$REGION.csv curl -XDELETE localhost:9200/.bano-$REGION?pretty cat $FILE | $LOGSTASH/bin/logstash -f $SOURCE_DIR/src/main/logstash/import.conf } DEPTS=95 for i in {01..19} $(seq 21 $DEPTS) {971..974} {976..976} ; do DEPT=$(printf %02d $i) import_region $DEPT done 65

Slide 65

Slide 65 text

Writing an ingest plugin

Slide 66

Slide 66 text

Use bano processor

Slide 67

Slide 67 text

68 Ingest Node (re)indexing and enriching documents within Elasticsearch David Pilato Developer | Evangelist, @dadoonet Watch this space: https://github.com/dadoonet And follow me on Twitter!