Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Confoo Montreal: Ingest node: enriching documents within Elasticsearch

Confoo Montreal: Ingest node: enriching documents within Elasticsearch

Wanna transform your documents on the fly before indexing them into elasticsearch? Node ingest is built for you.

The talk will also cover the reindex api, which can be used in combination with ingest pipelines to modify data while reindexing.

Last but not least, I'll tell you how to write your own Ingest processor in Java as a plugin! Our own processor will convert postal addresses from/to geo points.

Elastic Co

March 07, 2018
Tweet

More Decks by Elastic Co

Other Decks in Programming

Transcript

  1. 1
    Ingest Node
    (re)indexing and enriching documents within Elasticsearch
    David Pilato
    Developer | Evangelist, @dadoonet

    View Slide

  2. View Slide

  3. sli.do/elastic

    View Slide

  4. @dadoonet sli.do/elastic
    5
    Elastic Stack
    100% open source
    No enterprise edition

    View Slide

  5. @dadoonet sli.do/elastic
    6
    X-Pack
    Single install
    Extensions for the Elastic Stack
    Subscription pricing
    Security
    Alerting
    Monitoring
    Reporting
    Graph
    Machine Learning

    View Slide

  6. @dadoonet sli.do/elastic
    7
    Elastic Cloud
    Hosted Elasticsearch & Kibana
    Includes X-Pack features
    Starts at $45/mo
    Available in AWS and Google Cloud Platform

    View Slide

  7. @dadoonet sli.do/elastic
    8
    Elastic Cloud
    Enterprise
    Provision and manage multiple Elastic
    Stack environments; Expose logging as a
    service to your entire organization

    View Slide

  8. Why ingest node?

    View Slide

  9. 10
    I just want to tail a log file...
    Ops Engineer

    View Slide

  10. @dadoonet sli.do/elastic
    11
    Logstash: collect, enrich & transport
    grok date mutate
    input output
    Filters
    The file Elasticsearch

    View Slide

  11. @dadoonet sli.do/elastic
    12
    Logstash common setup
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    message

    View Slide

  12. @dadoonet sli.do/elastic
    13
    Or …
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    message

    View Slide

  13. @dadoonet sli.do/elastic
    14
    Ingest node setup
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:01 +0200] "GET /cgi-bin/try/ HTTP/1.1" 200 3395
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:18 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:00 +0200] "GET /robots.txt HTTP/1.1" 200 68
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    127.0.0.1 - - [19/Apr/2016:12:00:15 +0200] "GET / HTTP/1.1" 200 24

    View Slide

  14. @dadoonet sli.do/elastic
    15
    Filebeat: collect and ship
    127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24
    127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] "GET /not_found/ HTTP/1.1" 404 7218
    127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] "GET /favicon.ico HTTP/1.1" 200 3638
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"
    }
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /not_found/ HTTP/1.1\" 404 7218"
    }
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /favicon.ico HTTP/1.1\" 200 3638"
    }

    View Slide

  15. @dadoonet sli.do/elastic
    16
    Elasticsearch: enrich and index
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"
    }
    {
    "request" : "/",
    "auth" : "-",
    "ident" : "-",
    "verb" : "GET",
    "@timestamp" : "2016-04-19T10:00:04.000Z",
    "response" : "200",
    "bytes" : "24",
    "clientip" : "127.0.0.1",
    "httpversion" : "1.1",
    "rawrequest" : null,
    "timestamp" : "19/Apr/2016:12:00:04 +0200"
    }

    View Slide

  16. How does ingest node
    work?

    View Slide

  17. @dadoonet sli.do/elastic
    18
    Ingest pipeline
    Pipeline: a set of processors
    grok date remove
    document enriched
    document

    View Slide

  18. grok
    remove
    attachment
    convert
    uppercase
    foreach
    trim
    append
    gsub
    set
    split
    fail
    geoip
    join
    lowercase
    rename
    date

    View Slide

  19. Extracts structured fields out of a
    single text field
    20
    Grok processor
    {
    "grok": {
    "field": "message",
    "patterns": ["%{DATE:date}"]
    }
    }

    View Slide

  20. set, remove, rename, convert,
    gsub, split, join, lowercase,
    uppercase, trim, append
    21
    Mutate processors
    {
    "remove": {
    "field": "message"
    }
    }

    View Slide

  21. Parses a date from a string
    22
    Date processor
    {
    "date": {
    "field": "timestamp",
    "formats": ["YYYY"]
    }
    }

    View Slide

  22. Adds information about the
    geographical location of IP
    addresses
    23
    Geoip processor
    {
    "geoip": {
    "field": "ip"
    }
    }

    View Slide

  23. You know, for documents
    24
    Attachment
    processor
    {
    "attachment": {
    "field" : "file"
    }
    }
    // Send a binary content
    {

    "file": "BASE64"

    }

    View Slide

  24. Introducing new processors is as
    easy as writing a plugin
    25
    Plugins
    {
    "your_plugin": {
    ...
    }
    }

    View Slide

  25. @dadoonet sli.do/elastic
    Pipeline management
    PUT /_ingest/pipeline/apache-log
    {
    "processors" : [
    {
    "grok" : {
    "field": "message",
    "patterns": ["%{COMMONAPACHELOG}"]
    }
    },
    {
    "date" : {
    "field" : "timestamp",
    "formats" : ["dd/MMM/YYYY:HH:mm:ss Z"]
    }
    },
    {
    "remove" : {
    "field" : "message"
    }
    }
    ]
    }
    26

    View Slide

  26. Where can ingest
    pipelines be used?

    View Slide

  27. @dadoonet sli.do/elastic
    Index API
    PUT /apache/doc/1
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"
    }
    28

    View Slide

  28. @dadoonet sli.do/elastic
    Index API
    PUT /apache/doc/1?pipeline=apache-log
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / HTTP/1.1\" 200 24"
    }
    29

    View Slide

  29. @dadoonet sli.do/elastic
    Bulk API
    PUT /apache/doc/_bulk
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /foo/ ..."}\n
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /f.png ..."}\n
    30

    View Slide

  30. @dadoonet sli.do/elastic
    Bulk API
    PUT /apache/doc/_bulk?pipeline=apache-log
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:07 +0200] \"GET /foo/ ..."}\n
    {"index":{}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:09 +2000] \"GET /f.png ..."}\n
    31

    View Slide

  31. @dadoonet sli.do/elastic
    Bulk API
    PUT /_bulk
    {"index":{"_index":"apache","_type":"doc"}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n
    {"index":{"_index":"mysql","_type":"doc"}}\n
    {"message":"..."}\n
    32

    View Slide

  32. @dadoonet sli.do/elastic
    Bulk API
    PUT /_bulk
    {"index":{"_index":"apache","_type":"doc","pipeline":"apache-log"}}\n
    {"message":"127.0.0.1 - - [19/Apr/2016:12:00:04 +0200] \"GET / ..."}\n
    {"index":{"_index":"mysql","_type":"doc","pipeline":"mysql-log"}}\n
    {"message":"..."}\n
    33

    View Slide

  33. @dadoonet sli.do/elastic
    Reindex API
    POST /_reindex
    {
    "source": {
    "index": "logs",
    "type": "apache"
    },
    "dest": {
    "index": "apache-logs",
    "type": "doc"
    }
    }
    34

    View Slide

  34. @dadoonet sli.do/elastic
    Reindex API
    POST /_reindex
    {
    "source": {
    "index": "logs",
    "type": "apache"
    },
    "dest": {
    "index": "apache-logs",
    "type": "doc",
    "pipeline" : "apache-log"
    }
    }
    35

    View Slide

  35. Error handling

    View Slide

  36. @dadoonet sli.do/elastic
    37
    grok date remove
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  37. @dadoonet sli.do/elastic
    38
    grok date remove
    400 Bad Request
    unable to parse date [19/Apr/2016:12:00:00 +040]
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  38. @dadoonet sli.do/elastic
    39
    grok date remove
    set
    on failure processors at
    the pipeline level
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  39. @dadoonet sli.do/elastic
    40
    remove
    200 OK
    grok date
    set
    on failure processors at
    the pipeline level
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  40. @dadoonet sli.do/elastic
    41
    grok date remove
    set
    on failure processors at
    the processor level remove
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  41. @dadoonet sli.do/elastic
    42
    grok date remove
    set
    remove
    200 OK
    on failure processors at
    the processor level
    {
    "message" : "127.0.0.1 - - [19/Apr/2016:12:00:00 +040] \"GET / HTTP/1.1\" 200 24"
    }

    View Slide

  42. Ingest node internals

    View Slide

  43. @dadoonet sli.do/elastic
    cluster
    44
    Default scenario
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    Cluster State
    logs index: 3 primary
    shards, 1 replica each
    All nodes are equal:
    - node.data: true
    - node.master: true
    - node.ingest: true

    View Slide

  44. @dadoonet sli.do/elastic
    cluster
    45
    Default scenario
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    Pre-processing on the
    coordinating node
    All nodes are equal:
    - node.data: true
    - node.master: true
    - node.ingest: true
    index request
    for shard 3

    View Slide

  45. @dadoonet sli.do/elastic
    cluster
    46
    Default scenario
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    Indexing on the primary
    shard
    All nodes are equal:
    - node.data: true
    - node.master: true
    - node.ingest: true
    index request
    for shard 3

    View Slide

  46. @dadoonet sli.do/elastic
    cluster
    47
    Default scenario
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    Indexing on the
    replica shard
    All nodes are equal:
    - node.data: true
    - node.master: true
    - node.ingest: true
    index request
    for shard 3

    View Slide

  47. @dadoonet sli.do/elastic
    cluster
    48
    Ingest dedicated nodes
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    node4
    CS
    node5
    CS
    node.data: false
    node.master: false
    node.ingest: true
    node.data: true
    node.master: true
    node.ingest: false

    View Slide

  48. @dadoonet sli.do/elastic
    cluster
    49
    Ingest dedicated nodes
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    node4
    CS
    node5
    CS
    index request
    for shard 3
    Forward request to an
    ingest node
    node.data: false
    node.master: false
    node.ingest: true
    node.data: true
    node.master: true
    node.ingest: false

    View Slide

  49. @dadoonet sli.do/elastic
    cluster
    50
    Ingest dedicated nodes
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    node4
    CS
    node5
    CS
    index request
    for shard 3
    Pre-processing on
    the ingest node
    node.data: false
    node.master: false
    node.ingest: true
    node.data: true
    node.master: true
    node.ingest: false

    View Slide

  50. @dadoonet sli.do/elastic
    cluster
    51
    Ingest dedicated nodes
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    node4
    CS
    node5
    CS
    index request
    for shard 3
    Indexing on the primary
    shard
    node.data: false
    node.master: false
    node.ingest: true
    node.data: true
    node.master: true
    node.ingest: false

    View Slide

  51. @dadoonet sli.do/elastic
    cluster
    52
    Ingest dedicated nodes
    Client
    node1
    logs
    2P
    logs
    3R
    CS
    node2
    logs
    3P
    logs
    1R
    CS
    node3
    logs
    1P
    logs
    2R
    CS
    node4
    CS
    node5
    CS
    index request
    for shard 3
    Indexing on the
    replica shard
    node.data: false
    node.master: false
    node.ingest: true
    node.data: true
    node.master: true
    node.ingest: false

    View Slide

  52. Demo time!
    52.35.38.35 - - [19/Apr/2016:12:00:04 +0200] "GET / HTTP/1.1" 200 24

    View Slide

  53. bano-ingest plugin
    From postal address to geo_point
    From geo_point to postal address

    View Slide

  54. @dadoonet sli.do/elastic
    55
    What is BANO?
    • French Open Data base for postal addresses
    • http://openstreetmap.fr/bano
    • http://bano.openstreetmap.fr/data/
    per region
    all addresses

    View Slide

  55. @dadoonet sli.do/elastic
    BANO Format
    976030950H-26,26,RUE DISMA,97660,Bandrélé,CAD,-12.891701,45.202652

    976030950H-28,28,RUE DISMA,97660,Bandrélé,CAD,-12.891900,45.202700

    976030950H-30,30,RUE DISMA,97660,Bandrélé,CAD,-12.891781,45.202535

    976030950H-32,32,RUE DISMA,97660,Bandrélé,CAD,-12.892005,45.202564

    976030950H-3,3,RUE DISMA,97660,Bandrélé,CAD,-12.892444,45.202135

    976030950H-34,34,RUE DISMA,97660,Bandrélé,CAD,-12.892068,45.202450

    976030950H-4,4,RUE DISMA,97660,Bandrélé,CAD,-12.892446,45.202367

    976030950H-5,5,RUE DISMA,97660,Bandrélé,CAD,-12.892461,45.202248

    976030950H-6,6,RUE DISMA,97660,Bandrélé,CAD,-12.892383,45.202456

    976030950H-8,8,RUE DISMA,97660,Bandrélé,CAD,-12.892300,45.202555

    976030950H-9,9,RUE DISMA,97660,Bandrélé,CAD,-12.892355,45.202387
    976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696
    \_ ID | \_ Street Name | \ \_ Source \_ Geo point
    | | \
    |_ Street Number |_ Zipcode \_ City Name
    56

    View Slide

  56. Import bano dataset

    View Slide

  57. @dadoonet sli.do/elastic
    Load CSV with Logstash (Extract: input)
    input {
    stdin { }
    }
    58
    976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696
    \_ ID | \_ Street Name | \ \_ Source \_ Geo point
    | | \
    |_ Street Number |_ Zipcode \_ City Name
    {
    "message":"976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696",
    "@timestamp":"2017-12-05T16:00:00.000PST",
    "@version":1,
    "host":"MacBook-Pro-David.local"
    }

    View Slide

  58. @dadoonet sli.do/elastic
    Load CSV with Logstash (Transform: filter)
    filter {
    csv {
    separator => ","
    columns => [
    "id","number","street_name","zipcode","city","source","latitude","longitude"
    ]
    remove_field => [ "message", "@version", "@timestamp", "host" ]
    }
    }
    59
    {
    "message":"976030951J-103,103,RTE NATIONALE 3,97660,Bandrélé,CAD,-12.893639,45.201696",
    "@timestamp":"2017-12-05T16:00:00.000PST",
    "@version":1, "host":"MacBook-Pro-David.local"
    }
    {
    "source":"CAD", "id":"976030951J-103",
    "number":"103", "street_name":"RTE NATIONALE 3",
    "zipcode":"97660", "city":"Bandrélé",
    "latitude":"-12.893639", "longitude":"45.201696"
    }

    View Slide

  59. @dadoonet sli.do/elastic
    Load CSV with Logstash (Transform: filter)
    filter {
    mutate {
    convert => { "longitude" => "float" }
    convert => { "latitude" => "float" }
    rename => {
    "longitude" => "[location][lon]"
    "latitude" => "[location][lat]"
    "number" => "[address][number]"
    "street_name" => "[address][street_name]"
    "zipcode" => "[address][zipcode]"
    "city" => "[address][city]"
    }
    replace => {
    "region" => "${REGION}"
    }
    }
    }
    60
    {
    "source":"CAD","id":"976030951J-103",
    "number":"103",
    "street_name":"RTE NATIONALE 3",
    "zipcode":"97660","city":"Bandrélé",
    "latitude":"-12.893639",
    "longitude":"45.201696"
    }
    {
    "source":"CAD","id":"976030951J-103",
    "region":"976",
    "address":{
    "number":"103",
    "street_name":"RTE NATIONALE 3",
    "zipcode":"97660",
    "city":"Bandrélé"
    },
    "location":{
    "lat":-12.893639,"lon":45.201696
    }
    }

    View Slide

  60. @dadoonet sli.do/elastic
    Load CSV with Logstash (Load: output)
    output {
    elasticsearch {
    "template_name" => "bano"
    "template_overwrite" => true
    "template" => "${SOURCE_DIR}/src/main/logstash/bano.json"
    "index" => ".bano-${REGION}"
    "document_id" => "%{[id]}"
    }
    }
    61
    {
    "source":"CAD","id":"976030951J-103",
    "region":"976",
    "address":{
    "number":"103","street_name":"RTE NATIONALE 3",
    "zipcode":"97660","city":"Bandrélé"
    },
    "location":{
    "lat":-12.893639,"lon":45.201696
    }
    }

    View Slide

  61. @dadoonet sli.do/elastic
    Index template (index settings)
    {
    "template": ".bano-*", "settings": {
    "index.number_of_shards": 1, "index.number_of_replicas": 0,
    "index.analysis: {
    "analyzer": {
    "bano_analyzer": {
    "type": "custom", "tokenizer": "standard",
    "filter" : [ "lowercase", "asciifolding" ]
    },
    "bano_street_analyzer": {
    "type": "custom", "tokenizer": "standard",
    "filter" : [ "lowercase", "asciifolding", "bano_synonym" ]
    }
    },
    "filter": {
    "bano_synonym": {
    "type": "synonym",
    "synonyms" : [ "bd => boulevard", "av => avenue", "r => rue", "rte => route" ]
    }
    }
    }
    }, // ...
    62

    View Slide

  62. @dadoonet sli.do/elastic
    Index template (mapping)
    {
    "template": ".bano-*", "settings": { ... },
    "mappings": {
    "doc": {
    "properties" : {
    "address": {
    "properties" : {
    "city": {
    "type": "text", "analyzer": "bano_analyzer",
    "fields": { "keyword": { "type": "keyword" } }
    },
    "number": { "type": "keyword" },
    "street_name": { "type": "text", "analyzer": "bano_street_analyzer" },
    "zipcode": { "type": "keyword" }
    }
    },
    "region": { "type": "keyword" },
    "location": { "type": "geo_point" },
    "id": { "type": "keyword" },
    "source": { "type": "keyword" }
    }}}, // ...
    63

    View Slide

  63. @dadoonet sli.do/elastic
    Index template (aliases)
    {
    "template": ".bano-*",
    "settings": { ... },
    "mappings": { ... },
    "aliases" : {
    ".bano" : {}
    }
    }
    64
    d6
    d3
    d2
    d5
    d1
    d4
    .bano-17
    .bano
    d6
    d3
    d2
    d5
    d1
    d4
    .bano-95
    d6
    d3
    d2
    d5
    d1
    d4
    .bano-75

    View Slide

  64. @dadoonet sli.do/elastic
    Launch Logstash
    export SOURCE_DIR=~/Documents/ingest-bano/
    DATASOURCE_DIR=~/Documents/ingest/bano-data
    LOGSTASH=~/Documents/ingest/stack-6.0.0/logstash-6.0.0
    import_region () {
    export REGION=$1
    FILE=$DATASOURCE_DIR/bano-$REGION.csv
    curl -XDELETE localhost:9200/.bano-$REGION?pretty
    cat $FILE | $LOGSTASH/bin/logstash -f $SOURCE_DIR/src/main/logstash/import.conf
    }
    DEPTS=95
    for i in {01..19} $(seq 21 $DEPTS) {971..974} {976..976} ; do
    DEPT=$(printf %02d $i)
    import_region $DEPT
    done
    65

    View Slide

  65. Writing an ingest
    plugin

    View Slide

  66. Use bano processor

    View Slide

  67. 68
    Ingest Node
    (re)indexing and enriching documents within Elasticsearch
    David Pilato
    Developer | Evangelist, @dadoonet
    Watch this space: https://github.com/dadoonet
    And follow me on Twitter!

    View Slide