Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch In Action - Codemotion Rome 2016

ElasticSearch In Action - Codemotion Rome 2016

ElasticSearch In Action - Codemotion Rome 2016. See https://talks.feryn.eu for more information.

Ca901ddcea38854b9783781c91fc87c9?s=128

Thijs Feryn

March 18, 2016
Tweet

Transcript

  1. Elasticsearch in action By Thijs Feryn

  2. Explain in 1 slide

  3. •Full-text search engine •NoSQL database •Analytics engine •Written in Java

    •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack
  4. Still with me?

  5. Hi, I’m Thijs

  6. I’m @ThijsFeryn on Twitter

  7. I’m an Evangelist At

  8. I’m a at board member

  9. https://www.elastic.co/ downloads/elasticsearch

  10. { "name" : "node-1", "cluster_name" : "elasticsearch", "version" : {

    "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" } http://localhost: 9200
  11. RDBMS Elasticsearch Database Table Row Index Type Document

  12. POST /blog {"acknowledged":true} Confirmation

  13. POST/blog/post/6160 { "language": "en-US", "title": "WordPress 4.4 is available! And

    these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" }
  14. { "_index": "blog", "_type": "post", "_id": "6160", "_version": 1, "created":

    true } Confirmation
  15. GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version":

    1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Retrieve document by id Document & meta data
  16. GET /blog/_mapping { "blog": { "mappings": { "post": { "properties":

    { "author": { "type": "string" }, "category": { "type": "string" }, "date": { "type": "string" }, "guid": { "type": "string" }, "language": { "type": "string" }, "title": { "type": "string" } } } } } } Schemaless? Not really … “Guesses” mapping on insert
  17. Explicit mapping

  18. POST /blog { "mappings" : { "post" : { "properties":

    { "title" : { "type" : "string" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "string" }, "category": { "type": "string" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  19. POST /blog { "mappings": { "post": { "properties": { "author":

    { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } Alternative mapping
  20. "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" },

    "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } What’s with the analyzers?
  21. Analyzed vs non-analyzed

  22. Full-text vs exact value

  23. By default strings are analyzed … unless you mention it

    in the mapping
  24. Analyzer •Character filters •Tokenizers •Token filters Replaces characters for analyzed

    text Break text down into terms Add/modify/ delete tokens
  25. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball

    •Custom Standard tokenizer Lowercase token filter English stop word token filter
  26. Hey man, how are you doing? hey man how are

    you doing Standard Hey man, how are you doing? Whitespace hei man how you do English
  27. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title":

    "working" } } }
  28. "total": 1, "max_score": 1.7562683, "hits": [ { "_index": "blog", "_type":

    "post", "_id": "2742", "_score": 1.7562683, "fields": { "title": [ "Hosted SharePoint 2010: working efficiently as a team" ] } } ] } }
  29. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title.en":

    "working" } } }
  30. "failed": 0 }, "hits": { "total": 6, "max_score": 2.4509864, "hits":

    [ { "_index": "blog", "_type": "post", "_id": "828", "_score": 2.4509864, "fields": { "title": [ "Still a lot of work in store" ] } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 2.144613, "fields": { "title": [ "SSL: what is it and how does it work?" ] } }, { "_index": "blog",
  31. Search

  32. GET /blog/post/_search?pretty { "took": 2, "timed_out": false, "_shards": { "total":

    5, "successful": 5, "failed": 0 }, "hits": { "total": 963, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6067", "_score": 1, "_source": { "language": "en-US", "title": "My Combell Power Tips: Registrant Templates and new domain name overview", "date": "Tue, 24 Nov 2015 15:58:48 +0000", "author": "Romy", "category": [ "Combell news", "Domain names", "News", "Tools", "control panel", "domain name", "my combell", "register", "templates" ], "guid": "6067"
  33. GET /blog/post/_search?pretty POST /blog/post/_search?pretty { "query": { "match_all": {} }

    } Search “lite” vs full query DSL
  34. GET /blog/post/_search?pretty&q=title:Thijs POST /products/product/_search?pretty { "query": { "match": { "title":

    "Thijs" } } } Search “lite” vs full query DSL
  35. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol

    support in Varnish" } } } 162 posts 1 post POST /blog/post/_count { "query": { "filtered": { "filter": { "term": { "title.raw": "PROXY protocol support in Varnish" } } } } }
  36. Filter vs Query

  37. Filter •Does it match? Yes or no •When relevance doesn’t

    matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data
  38. Match Query Multi Match Query Bool Query Boosting Query Common

    Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
  39. And Filter Bool Filter Exists Filter Geo Bounding Box Filter

    Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
  40. Filter examples

  41. POST /blog/post/_search?pretty { "query": { "filtered": { "filter": { "ids":

    { "values": [231,234,258] } } } } }
  42. POST /blog/_search { "query": { "filtered": { "filter": { "bool":

    { "must" : [ { "term" : { "language" : "en-US" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }
  43. POST /blog/_search?pretty { "query": { "filtered": { "filter": { "prefix":

    { "title.raw": "Combell" } } } } }
  44. POST /cities/city/_search { "size": 200, "sort": [ { "city": {

    "order": "asc" } } ], "query": { "filtered": { "filter": { "geo_distance_range": { "lt": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field
  45. POST /cities/city/_search { "size": 200, "query": { "filtered": { "query":

    { "match_all": {} }, "filter": { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”
  46. Relevance

  47. POST /blog/_search { "fields": ["title"], "query": { "bool": { "must":

    [ { "match": { "title": "varnish thijs" } }, { "filtered": { "filter": { "term": { "language": "en-US" } } } } ] } } }
  48. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant
  49. POST /blog/_search?_source=false { "query": { "filtered": { "filter": { "term":

    { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source
  50. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6254", "_score": 1 }, { "_index": "blog", "_type": "post", "_id": "11749", "_score": 1 } ] } } No relevance on filters Score is always 1
  51. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": {

    "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”
  52. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": {

    "must_not": [ { "filtered": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries
  53. POST /blog/_search { "query": { "bool": { "should": [ {

    "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting
  54. Multi index multi type

  55. /_search /products/_search /products/product/_search /products,clients/_search /pro*/_search /pro*,cli*/_search /products/product,invoice/_search /products/pro*/_search /_all/product/_search /_all/product,invoice/_search

    /_all/pro*/_search
  56. Multi “all the things”

  57. Aggregations

  58. Group by on steroids

  59. SELECT author, COUNT(guid) FROM blog.post GROUP BY author Aggregations in

    SQL Metric Bucket
  60. SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search?

    pretty&search_type=count { "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs
  61. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 415 }, { "key": "Combell", "doc_count": 184 }, { "key": "Tom", "doc_count": 184 }, { "key": "Jimmy Cappaert", "doc_count": 157 }, { "key": "Christophe", "doc_count": 23 } ] } } Aggregation output
  62. POST /blog/_search { "query": { "match": { "title": "varnish" }

    }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  63. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 4, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en-US", "doc_count": 3 }, { "key": "nl-NL", "doc_count": 1 } ] } }, { "key": "Combell", "doc_count": 3, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl-NL", "doc_count": 3 } ] } }, Aggregation output
  64. Min Aggregation Max Aggregation Sum Aggregation Avg Aggregation Stats Aggregation

    Extended Stats Aggregation Value Count Aggregation Percentiles Aggregation Percentile Ranks Aggregation Cardinality Aggregation Geo Bounds Aggregation Top hits Aggregation Scripted Metric Aggregation Global Aggregation Filter Aggregation Filters Aggregation Missing Aggregation Nested Aggregation Reverse nested Aggregation Children Aggregation Terms Aggregation Significant Terms Aggregation Range Aggregation Date Range Aggregation IPv4 Range Aggregation Histogram Aggregation Date Histogram Aggregation Geo Distance Aggregation GeoHash grid Aggregation
  65. Managing Elasticsearch

  66. Plenty of ways … for which we don’t have enough

    time
  67. Clustering

  68. Single node 2 node cluster 3 node cluster

  69. Example config settings node.rack: my-location node.master: true node.data: true http.enabled:

    true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2
  70. GET /_cat

  71. GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index}

    /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} Non-JSON output
  72. GET /_cat/shards?v index shard prirep state docs store ip node

    my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default
  73. GET /_cat/health? v&h=cluster,status,node.total,shards,pri,unassign,init cluster status node.total shards pri unassign init

    mycluster green 3 12 6 0 0 Cluster health
  74. The ELK stack

  75. None
  76. Logs Parse & ship Store Visualize

  77. Beats •File beat •Top beat •Packet beat •Winlog beat

  78. Logs Parse Store Visualize Ship

  79. None
  80. Integrating Elasticsearch

  81. It’s REST, deal with it!

  82. Or just use an API PHP Java Perl Python Ruby

    .NET
  83. Try it yourself! http://github.com/ thijsferyn/ elasticsearch_tutorial

  84. None
  85. https://blog.feryn.eu https://talks.feryn.eu https://youtube.com/thijsferyn https://soundcloud.com/thijsferyn https://twitter.com/thijsferyn http://itunes.feryn.eu

  86. None