Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch In Action - Techorama 2016

ElasticSearch In Action - Techorama 2016

Slides for my ElasticSearch talk at Techorama 2016 in Mechelen (Belgium). http://www.techorama.be

Thijs Feryn

May 03, 2016
Tweet

More Decks by Thijs Feryn

Other Decks in Technology

Transcript

  1. •Full-text search engine •NoSQL database •Analytics engine •Written in Java

    •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack
  2. { "name" : "node-1", "cluster_name" : "elasticsearch", "version" : {

    "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" } http://localhost: 9200
  3. POST/blog/post/6160 { "language": "en-US", "title": "WordPress 4.4 is available! And

    these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" }
  4. GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version":

    1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Retrieve document by id Document & meta data
  5. GET /blog/_mapping { "blog": { "mappings": { "post": { "properties":

    { "author": { "type": "string" }, "category": { "type": "string" }, "date": { "type": "string" }, "guid": { "type": "string" }, "language": { "type": "string" }, "title": { "type": "string" } } } } } } Schemaless? Not really … “Guesses” mapping on insert
  6. POST /blog { "mappings" : { "post" : { "properties":

    { "title" : { "type" : "string" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "string" }, "category": { "type": "string" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  7. POST /blog { "mappings": { "post": { "properties": { "author":

    { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } Alternative mapping
  8. "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" },

    "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } What’s with the analyzers?
  9. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball

    •Custom Standard tokenizer Lowercase token filter English stop word token filter
  10. Hey man, how are you doing? hey man how are

    you doing Standard Hey man, how are you doing? Whitespace hei man how you do English
  11. "total": 1, "max_score": 1.7562683, "hits": [ { "_index": "blog", "_type":

    "post", "_id": "2742", "_score": 1.7562683, "fields": { "title": [ "Hosted SharePoint 2010: working efficiently as a team" ] } } ] } }
  12. "failed": 0 }, "hits": { "total": 6, "max_score": 2.4509864, "hits":

    [ { "_index": "blog", "_type": "post", "_id": "828", "_score": 2.4509864, "fields": { "title": [ "Still a lot of work in store" ] } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 2.144613, "fields": { "title": [ "SSL: what is it and how does it work?" ] } }, { "_index": "blog",
  13. GET /blog/post/_search?pretty { "took": 2, "timed_out": false, "_shards": { "total":

    5, "successful": 5, "failed": 0 }, "hits": { "total": 963, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6067", "_score": 1, "_source": { "language": "en-US", "title": "My Combell Power Tips: Registrant Templates and new domain name overview", "date": "Tue, 24 Nov 2015 15:58:48 +0000", "author": "Romy", "category": [ "Combell news", "Domain names", "News", "Tools", "control panel", "domain name", "my combell", "register", "templates" ], "guid": "6067"
  14. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol

    support in Varnish" } } } 162 posts 1 post POST /blog/post/_count { "query": { "filtered": { "filter": { "term": { "title.raw": "PROXY protocol support in Varnish" } } } } }
  15. Filter •Does it match? Yes or no •When relevance doesn’t

    matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data
  16. Match Query Multi Match Query Bool Query Boosting Query Common

    Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
  17. And Filter Bool Filter Exists Filter Geo Bounding Box Filter

    Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
  18. POST /blog/_search { "query": { "filtered": { "filter": { "bool":

    { "must" : [ { "term" : { "language" : "en-US" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }
  19. POST /cities/city/_search { "size": 200, "sort": [ { "city": {

    "order": "asc" } } ], "query": { "filtered": { "filter": { "geo_distance_range": { "lt": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field
  20. POST /cities/city/_search { "size": 200, "query": { "filtered": { "query":

    { "match_all": {} }, "filter": { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”
  21. POST /blog/_search { "fields": ["title"], "query": { "bool": { "must":

    [ { "match": { "title": "varnish thijs" } }, { "filtered": { "filter": { "term": { "language": "en-US" } } } } ] } } }
  22. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant
  23. POST /blog/_search?_source=false { "query": { "filtered": { "filter": { "term":

    { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source
  24. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6254", "_score": 1 }, { "_index": "blog", "_type": "post", "_id": "11749", "_score": 1 } ] } } No relevance on filters Score is always 1
  25. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": {

    "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”
  26. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": {

    "must_not": [ { "filtered": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries
  27. POST /blog/_search { "query": { "bool": { "should": [ {

    "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting
  28. SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search?

    pretty&search_type=count { "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs
  29. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 415 }, { "key": "Combell", "doc_count": 184 }, { "key": "Tom", "doc_count": 184 }, { "key": "Jimmy Cappaert", "doc_count": 157 }, { "key": "Christophe", "doc_count": 23 } ] } } Aggregation output
  30. POST /blog/_search { "query": { "match": { "title": "varnish" }

    }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  31. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 4, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en-US", "doc_count": 3 }, { "key": "nl-NL", "doc_count": 1 } ] } }, { "key": "Combell", "doc_count": 3, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl-NL", "doc_count": 3 } ] } }, Aggregation output
  32. Min Aggregation Max Aggregation Sum Aggregation Avg Aggregation Stats Aggregation

    Extended Stats Aggregation Value Count Aggregation Percentiles Aggregation Percentile Ranks Aggregation Cardinality Aggregation Geo Bounds Aggregation Top hits Aggregation Scripted Metric Aggregation Global Aggregation Filter Aggregation Filters Aggregation Missing Aggregation Nested Aggregation Reverse nested Aggregation Children Aggregation Terms Aggregation Significant Terms Aggregation Range Aggregation Date Range Aggregation IPv4 Range Aggregation Histogram Aggregation Date Histogram Aggregation Geo Distance Aggregation GeoHash grid Aggregation
  33. Example config settings node.rack: my-location node.master: true node.data: true http.enabled:

    true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2
  34. GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index}

    /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} Non-JSON output
  35. GET /_cat/shards?v index shard prirep state docs store ip node

    my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default