Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch In Action - Techorama 2016

ElasticSearch In Action - Techorama 2016

Slides for my ElasticSearch talk at Techorama 2016 in Mechelen (Belgium). http://www.techorama.be

Thijs Feryn

May 03, 2016
Tweet

More Decks by Thijs Feryn

Other Decks in Technology

Transcript

  1. Elasticsearch
    in action
    By Thijs Feryn

    View Slide

  2. Explain in 1 slide

    View Slide

  3. •Full-text search engine
    •NoSQL database
    •Analytics engine
    •Written in Java
    •Lucene based ( ~Solr)
    •Inverted indices
    •Easy to scale (~Elastic)
    •RESTFul interface (HTTP/JSON)
    •Schemaless
    •Real-time
    •ELK stack

    View Slide

  4. Still with me?

    View Slide

  5. Hi, I’m Thijs

    View Slide

  6. I’m
    @ThijsFeryn
    on Twitter

    View Slide

  7. I’m an
    Evangelist
    At

    View Slide

  8. I’m a
    at
    board member

    View Slide

  9. 150
    This is my 150th
    presentation

    View Slide

  10. https://www.elastic.co/
    downloads/elasticsearch

    View Slide

  11. {
    "name" : "node-1",
    "cluster_name" : "elasticsearch",
    "version" : {
    "number" : "2.2.0",
    "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
    "build_timestamp" : "2016-01-27T13:32:39Z",
    "build_snapshot" : false,
    "lucene_version" : "5.4.1"
    },
    "tagline" : "You Know, for Search"
    }
    http://localhost:
    9200

    View Slide

  12. RDBMS Elasticsearch
    Database
    Table
    Row
    Index
    Type
    Document

    View Slide

  13. POST /blog
    {"acknowledged":true}
    Confirmation

    View Slide

  14. POST/blog/post/6160
    {
    "language": "en-US",
    "title": "WordPress 4.4 is available! And these are
    the new features…",
    "date": "Tue, 15 Dec 2015 13:28:23 +0000",
    "author": "Romy",
    "category": [
    "News",
    "PHP",
    "Sector news",
    "Webdesign & development",
    "CMS",
    "content management system",
    "wordpress",
    "WordPress 4.4"
    ],
    "guid": "6160"
    }

    View Slide

  15. {
    "_index": "blog",
    "_type": "post",
    "_id": "6160",
    "_version": 1,
    "created": true
    }
    Confirmation

    View Slide

  16. GET /blog/post/6160
    {
    "_index": "blog",
    "_type": "post",
    "_id": "6160",
    "_version": 1,
    "found": true,
    "_source": {
    "language": "en-US",
    "title": "WordPress 4.4 is available! And these are the new
    features…",
    "date": "Tue, 15 Dec 2015 13:28:23 +0000",
    "author": "Romy",
    "category": [
    "News",
    "PHP",
    "Sector news",
    "Webdesign & development",
    "CMS",
    "content management system",
    "wordpress",
    "WordPress 4.4"
    ],
    "guid": "6160"
    }
    }
    Retrieve
    document by
    id
    Document &
    meta data

    View Slide

  17. GET /blog/_mapping
    {
    "blog": {
    "mappings": {
    "post": {
    "properties": {
    "author": {
    "type": "string"
    },
    "category": {
    "type": "string"
    },
    "date": {
    "type": "string"
    },
    "guid": {
    "type": "string"
    },
    "language": {
    "type": "string"
    },
    "title": {
    "type": "string"
    }
    }
    }
    }
    }
    }
    Schemaless?
    Not really …
    “Guesses”
    mapping on
    insert

    View Slide

  18. Explicit mapping

    View Slide

  19. POST /blog
    {
    "mappings" : {
    "post" : {
    "properties": {
    "title" : {
    "type" : "string"
    },
    "date" : {
    "type" : "date",
    "format": "E, dd MMM YYYY HH:mm:ss Z"
    },
    "author": {
    "type": "string"
    },
    "category": {
    "type": "string"
    },
    "guid": {
    "type": "integer"
    }
    }
    }
    }
    }
    Explicit
    mapping at
    index creation
    time

    View Slide

  20. POST /blog
    {
    "mappings": {
    "post": {
    "properties": {
    "author": {
    "type": "string",
    "index": "not_analyzed"
    },
    "category": {
    "type": "string",
    "index": "not_analyzed"
    },
    "date": {
    "type": "date",
    "format": "E, dd MMM YYYY HH:mm:ss Z"
    },
    "guid": {
    "type": "integer"
    },
    "language": {
    "type": "string",
    "index": "not_analyzed"
    },
    "title": {
    "type": "string",
    "fields": {
    "en": {
    "type": "string",
    "analyzer": "english"
    },
    "nl": {
    "type": "string",
    "analyzer": "dutch"
    },
    "raw": {
    "type": "string",
    "index": "not_analyzed"
    }
    }
    }
    }
    }
    }
    }
    Alternative
    mapping

    View Slide

  21. "type": "integer"
    },
    "language": {
    "type": "string",
    "index": "not_analyzed"
    },
    "title": {
    "type": "string",
    "fields": {
    "en": {
    "type": "string",
    "analyzer": "english"
    },
    "nl": {
    "type": "string",
    "analyzer": "dutch"
    },
    "raw": {
    "type": "string",
    "index": "not_analyzed"
    }
    }
    }
    }
    }
    }
    }
    What’s with
    the analyzers?

    View Slide

  22. Analyzed
    vs
    non-analyzed

    View Slide

  23. Full-text
    vs
    exact value

    View Slide

  24. By default strings
    are analyzed
    … unless you mention it in the
    mapping

    View Slide

  25. Analyzer
    •Character filters
    •Tokenizers
    •Token filters
    Replaces
    characters
    for analyzed
    text
    Break text
    down into
    terms
    Add/modify/
    delete tokens

    View Slide

  26. Built-in analyzers
    •Standard
    •Simple
    •Whitespace
    •Stop
    •Keyword
    •Pattern
    •Language
    •Snowball
    •Custom
    Standard
    tokenizer
    Lowercase
    token filter
    English
    stop word
    token filter

    View Slide

  27. Hey man, how are you doing?
    hey man how are you doing
    Standard
    Hey man, how are you doing?
    Whitespace
    hei man how you do
    English

    View Slide

  28. POST /blog/post/_search
    {
    "fields": ["title"],
    "query": {
    "match": {
    "title": "working"
    }
    }
    }

    View Slide

  29. "total": 1,
    "max_score": 1.7562683,
    "hits": [
    {
    "_index": "blog",
    "_type": "post",
    "_id": "2742",
    "_score": 1.7562683,
    "fields": {
    "title": [
    "Hosted SharePoint 2010: working
    efficiently as a team"
    ]
    }
    }
    ]
    }
    }

    View Slide

  30. POST /blog/post/_search
    {
    "fields": ["title"],
    "query": {
    "match": {
    "title.en": "working"
    }
    }
    }

    View Slide

  31. "failed": 0
    },
    "hits": {
    "total": 6,
    "max_score": 2.4509864,
    "hits": [
    {
    "_index": "blog",
    "_type": "post",
    "_id": "828",
    "_score": 2.4509864,
    "fields": {
    "title": [
    "Still a lot of work in store"
    ]
    }
    },
    {
    "_index": "blog",
    "_type": "post",
    "_id": "3873",
    "_score": 2.144613,
    "fields": {
    "title": [
    "SSL: what is it and how does it work?"
    ]
    }
    },
    {
    "_index": "blog",

    View Slide

  32. Search

    View Slide

  33. GET /blog/post/_search?pretty
    {
    "took": 2,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
    },
    "hits": {
    "total": 963,
    "max_score": 1,
    "hits": [
    {
    "_index": "blog",
    "_type": "post",
    "_id": "6067",
    "_score": 1,
    "_source": {
    "language": "en-US",
    "title": "My Combell Power Tips: Registrant Templates and
    new domain name overview",
    "date": "Tue, 24 Nov 2015 15:58:48 +0000",
    "author": "Romy",
    "category": [
    "Combell news",
    "Domain names",
    "News",
    "Tools",
    "control panel",
    "domain name",
    "my combell",
    "register",
    "templates"
    ],
    "guid": "6067"

    View Slide

  34. GET /blog/post/_search?pretty
    POST /blog/post/_search?pretty
    {
    "query": {
    "match_all": {}
    }
    }
    Search
    “lite” vs full
    query DSL

    View Slide

  35. GET /blog/post/_search?pretty&q=title:Thijs
    POST /products/product/_search?pretty
    {
    "query": {
    "match": {
    "title": "Thijs"
    }
    }
    }
    Search
    “lite” vs full
    query DSL

    View Slide

  36. POST /blog/post/_count
    {
    "query": {
    "match": {
    "title": "PROXY protocol support in Varnish"
    }
    }
    }
    162 posts
    1 post
    POST /blog/post/_count
    {
    "query": {
    "filtered": {
    "filter": {
    "term": {
    "title.raw": "PROXY protocol support in Varnish"
    }
    }
    }
    }
    }

    View Slide

  37. Filter
    vs
    Query

    View Slide

  38. Filter
    •Does it match? Yes or no
    •When relevance doesn’t matter
    •Faster & cacheable
    •For non-analyzed data
    Query
    •How well does it match?
    •For full-text search
    •On analyzed/tokenized data

    View Slide

  39. Match Query
    Multi Match Query
    Bool Query
    Boosting Query
    Common Terms Query
    Constant Score Query
    Dis Max Query
    Filtered Query
    Fuzzy Like This Query
    Fuzzy Like This Field Query
    Function Score Query
    Fuzzy Query
    GeoShape Query
    Has Child Query
    Has Parent Query
    Ids Query
    Indices Query
    Match All Query
    More Like This Query
    Nested Query
    Prefix Query
    Query String Query
    Simple Query String Query
    Range Query
    Regexp Query
    Span First Query
    Span Multi Term Query
    Span Near Query
    Span Not Query
    Span Or Query
    Span Term Query
    Term Query
    Terms Query
    Top Children Query
    Wildcard Query
    Minimum Should Match
    Multi Term Query Rewrite
    Template Query

    View Slide

  40. And Filter
    Bool Filter
    Exists Filter
    Geo Bounding Box Filter
    Geo Distance Filter
    Geo Distance Range Filter
    Geo Polygon Filter
    GeoShape Filter
    Geohash Cell Filter
    Has Child Filter
    Has Parent Filter
    Ids Filter
    Indices Filter
    Limit Filter
    Match All Filter
    Missing Filter
    Nested Filter
    Not Filter
    Or Filter
    Prefix Filter
    Query Filter
    Range Filter
    Regexp Filter
    Script Filter
    Term Filter
    Terms Filter
    Type Filter

    View Slide

  41. Filter
    examples

    View Slide

  42. POST /blog/post/_search?pretty
    {
    "query": {
    "filtered": {
    "filter": {
    "ids": {
    "values": [231,234,258]
    }
    }
    }
    }
    }

    View Slide

  43. POST /blog/_search
    {
    "query": {
    "filtered": {
    "filter": {
    "bool": {
    "must" : [
    {
    "term" : {
    "language" : "en-US"
    }
    },
    {
    "range" : {
    "date" : {
    "gte" : "2016-01-01",
    "format" : "yyyy-MM-dd"
    }
    }
    }
    ],
    "must_not" : [
    {
    "term" : {
    "category" : "joomla"
    }
    }
    ],
    "should" : [
    {
    "term" : {
    "category" : "Hosting"
    }
    },
    {
    "term" : {
    "category" : "evangelist"
    }
    }
    ]
    }
    }
    }
    }
    }

    View Slide

  44. POST /blog/_search?pretty
    {
    "query": {
    "filtered": {
    "filter": {
    "prefix": {
    "title.raw": "Combell"
    }
    }
    }
    }
    }

    View Slide

  45. POST /cities/city/_search
    {
    "size": 200,
    "sort": [
    {
    "city": {
    "order": "asc"
    }
    }
    ],
    "query": {
    "filtered": {
    "filter": {
    "geo_distance_range": {
    "lt": "5km",
    "location": {
    "lat": 51.033333,
    "lon": 2.866667
    }
    }
    }
    }
    }
    }
    Requires “geo
    point” typed
    field

    View Slide

  46. POST /cities/city/_search
    {
    "size": 200,
    "query": {
    "filtered": {
    "query": {
    "match_all": {}
    },
    "filter": {
    "geo_bounding_box": {
    "location": {
    "bottom_left": {
    "lat": 51.1,
    "lon": 2.6
    },
    "top_right": {
    "lat": 51.2,
    "lon": 2.7
    }
    }
    }
    }
    }
    }
    }
    Requires “geo
    point” typed
    field
    Draw a “box”

    View Slide

  47. Relevance

    View Slide

  48. POST /blog/_search
    {
    "fields": ["title"],
    "query": {
    "bool": {
    "must": [
    {
    "match": {
    "title": "varnish thijs"
    }
    },
    {
    "filtered": {
    "filter": {
    "term": {
    "language": "en-US"
    }
    }
    }
    }
    ]
    }
    }
    }

    View Slide

  49. {
    "took": 3,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
    },
    "hits": {
    "total": 8,
    "max_score": 1.984594,
    "hits": [
    {
    "_index": "blog",
    "_type": "post",
    "_id": "4275",
    "_score": 1.984594,
    "fields": {
    "title": [
    "Thijs Feryn gave a demo of Varnish Cache on WordPress during a
    Future Insights webinar"
    ]
    }
    },
    {
    "_index": "blog",
    "_type": "post",
    "_id": "6238",
    "_score": 0.8335616,
    "fields": {
    "title": [
    "PROXY protocol support in Varnish"
    ]
    }
    },
    {
    "_index": "blog",
    Hits both
    terms. More
    relevant

    View Slide

  50. POST /blog/_search?_source=false
    {
    "query": {
    "filtered": {
    "filter": {
    "term": {
    "category": "PHPBenelux"
    }
    }
    }
    }
    }
    Using a filter
    instead of a
    query
    We don’t
    care about
    the source

    View Slide

  51. {
    "took": 3,
    "timed_out": false,
    "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
    },
    "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
    {
    "_index": "blog",
    "_type": "post",
    "_id": "6254",
    "_score": 1
    },
    {
    "_index": "blog",
    "_type": "post",
    "_id": "11749",
    "_score": 1
    }
    ]
    }
    }
    No relevance
    on filters
    Score is
    always 1

    View Slide

  52. POST /blog/_search
    {
    "fields": ["title", "category"],
    "query": {
    "bool": {
    "must": [
    {
    "match": {
    "title": "thijs feryn"
    }
    }
    ],
    "should": [
    {
    "match": {
    "category": "Varnish"
    }
    }
    ]
    }
    }
    }
    Only search
    for “thijs feryn”
    Increase
    relevance if
    category contains
    “Varnish”

    View Slide

  53. POST /blog/_search
    {
    "fields": ["title", "category"],
    "query": {
    "bool": {
    "must_not": [
    {
    "filtered": {
    "filter": {
    "term": {
    "author": "Romy"
    }
    }
    }
    }
    ],
    "should": [
    {
    "match": {
    "category": "Magento"
    }
    }
    ]
    }
    }
    }
    Increase
    relevance
    Combining
    filters &
    queries

    View Slide

  54. POST /blog/_search
    {
    "query": {
    "bool": {
    "should": [
    {
    "match": {
    "title": {
    "query": "Magento",
    "boost" : 3
    }
    }
    },
    {
    "match": {
    "title": {
    "query": "Wordpress",
    "boost" : 2
    }
    }
    }
    ]
    }
    }
    }
    Increase
    relevance
    Query-
    time
    boosting

    View Slide

  55. Multi index
    multi type

    View Slide

  56. /_search
    /products/_search
    /products/product/_search
    /products,clients/_search
    /pro*/_search
    /pro*,cli*/_search
    /products/product,invoice/_search
    /products/pro*/_search
    /_all/product/_search
    /_all/product,invoice/_search
    /_all/pro*/_search

    View Slide

  57. Multi
    “all the
    things”

    View Slide

  58. Aggregations

    View Slide

  59. Group by on steroids

    View Slide

  60. SELECT author, COUNT(guid)
    FROM blog.post
    GROUP BY author
    Aggregations
    in SQL
    Metric
    Bucket

    View Slide

  61. SELECT author, COUNT(guid)
    FROM blog.post
    GROUP BY author
    POST /blog/post/_search?
    pretty&search_type=count
    {
    "aggs": {
    "popular_bloggers": {
    "terms": {
    "field": "author"
    }
    }
    }
    }
    Only
    aggs, no
    docs

    View Slide

  62. "aggregations": {
    "popular_bloggers": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "Romy",
    "doc_count": 415
    },
    {
    "key": "Combell",
    "doc_count": 184
    },
    {
    "key": "Tom",
    "doc_count": 184
    },
    {
    "key": "Jimmy Cappaert",
    "doc_count": 157
    },
    {
    "key": "Christophe",
    "doc_count": 23
    }
    ]
    }
    }
    Aggregation
    output

    View Slide

  63. POST /blog/_search
    {
    "query": {
    "match": {
    "title": "varnish"
    }
    },
    "aggs": {
    "popular_bloggers": {
    "terms": {
    "field": "author",
    "size": 10
    },
    "aggs": {
    "used_languages": {
    "terms": {
    "field": "language",
    "size": 10
    }
    }
    }
    }
    }
    }
    Nested
    multi-group by
    alongside
    query

    View Slide

  64. "aggregations": {
    "popular_bloggers": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "Romy",
    "doc_count": 4,
    "used_languages": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "en-US",
    "doc_count": 3
    },
    {
    "key": "nl-NL",
    "doc_count": 1
    }
    ]
    }
    },
    {
    "key": "Combell",
    "doc_count": 3,
    "used_languages": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "nl-NL",
    "doc_count": 3
    }
    ]
    }
    },
    Aggregation
    output

    View Slide

  65. Min Aggregation
    Max Aggregation
    Sum Aggregation
    Avg Aggregation
    Stats Aggregation
    Extended Stats Aggregation
    Value Count Aggregation
    Percentiles Aggregation
    Percentile Ranks Aggregation
    Cardinality Aggregation
    Geo Bounds Aggregation
    Top hits Aggregation
    Scripted Metric Aggregation
    Global Aggregation
    Filter Aggregation
    Filters Aggregation
    Missing Aggregation
    Nested Aggregation
    Reverse nested Aggregation
    Children Aggregation
    Terms Aggregation
    Significant Terms Aggregation
    Range Aggregation
    Date Range Aggregation
    IPv4 Range Aggregation
    Histogram Aggregation
    Date Histogram Aggregation
    Geo Distance Aggregation
    GeoHash grid Aggregation

    View Slide

  66. Managing
    Elasticsearch

    View Slide

  67. Plenty of ways
    … for which we don’t have enough time

    View Slide

  68. Clustering

    View Slide

  69. Single
    node
    2 node
    cluster
    3 node
    cluster

    View Slide

  70. Example config settings
    node.rack: my-location
    node.master: true
    node.data: true
    http.enabled: true
    cluster.name: my-cluster
    node.name: my-node
    index.number_of_shards: 5
    index.number_of_replicas: 1
    discovery.zen.minimum_master_nodes: 2

    View Slide

  71. GET /_cat

    View Slide

  72. GET /_cat
    =^.^=
    /_cat/allocation
    /_cat/shards
    /_cat/shards/{index}
    /_cat/master
    /_cat/nodes
    /_cat/indices
    /_cat/indices/{index}
    /_cat/segments
    /_cat/segments/{index}
    /_cat/count
    /_cat/count/{index}
    /_cat/recovery
    /_cat/recovery/{index}
    /_cat/health
    /_cat/pending_tasks
    /_cat/aliases
    /_cat/aliases/{alias}
    /_cat/thread_pool
    /_cat/plugins
    /_cat/fielddata
    /_cat/fielddata/{fields}
    Non-JSON
    output

    View Slide

  73. GET /_cat/shards?v
    index shard prirep state docs store ip node
    my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3
    my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2
    my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3
    my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2
    my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1
    my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3
    my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1
    my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2
    my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1
    my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3
    5 shards & a
    single replica
    by default

    View Slide

  74. GET /_cat/health?
    v&h=cluster,status,node.total,shards,pri,unassign,init
    cluster status node.total shards pri unassign init
    mycluster green 3 12 6 0 0
    Cluster health

    View Slide

  75. The ELK stack

    View Slide

  76. View Slide

  77. Logs
    Parse
    & ship
    Store
    Visualize

    View Slide

  78. Beats
    •File beat
    •Top beat
    •Packet beat
    •Winlog beat

    View Slide

  79. Logs
    Parse
    Store Visualize
    Ship

    View Slide

  80. View Slide

  81. Integrating
    Elasticsearch

    View Slide

  82. It’s REST,
    deal with it!

    View Slide

  83. Or just use an
    API
    PHP Java Perl
    Python
    Ruby
    .NET

    View Slide

  84. Try it
    yourself!
    http://github.com/
    thijsferyn/
    elasticsearch_tutorial

    View Slide

  85. View Slide

  86. https://blog.feryn.eu
    https://talks.feryn.eu
    https://youtube.com/thijsferyn
    https://soundcloud.com/thijsferyn
    https://twitter.com/thijsferyn
    http://itunes.feryn.eu

    View Slide

  87. View Slide