Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch In Action - Confoo 2017

ElasticSearch In Action - Confoo 2017

Slides for my ElasticSearch presentation at Confoo 2017 in Montreal.

More details: https://talks.feryn.eu/talks/162/elasticsearch-in-action-confoo-montreal

Thijs Feryn

March 10, 2017
Tweet

More Decks by Thijs Feryn

Other Decks in Technology

Transcript

  1. •Full-text search engine •NoSQL database •Analytics engine •Written in Java

    •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack
  2. { "cluster_name": "elasticsearch", "cluster_uuid": "KKD4RjtCTTWoRomBDXyCDA", "name": "y3qufp6", "tagline": "You Know,

    for Search", "version": { "build_date": "2017-02-09T22:05:32.386Z", "build_hash": "db0d481", "build_snapshot": false, "lucene_version": "6.4.1", "number": "5.2.1" } } http://localhost: 9200
  3. POST/blog/post/6576 { "language":"en", "title":"Combell revamps its hosting range: fewer packages,

    more features!", "date":"Thu, 08 Dec 2016 15:35:59 +0000", "author":"Combell", "category":[ "Combell news", "Hosting", "Combell", "hosting" ], "guid":"6576" }
  4. { "_index": "blog", "_type": "post", "_id": "6576", "_version": 1, "result":

    "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } Confirmation
  5. GET /blog/post/6576 { "_index": "blog", "_type": "post", "_id": "6576", "_version":

    1, "found": true, "_source": { "language": "en", "title": "Combell revamps its hosting range: fewer packages, more features!", "date": "Thu, 08 Dec 2016 15:35:59 +0000", "author": "Combell", "category": [ "Combell news", "Hosting", "Combell", "hosting" ], "guid": "6576" } } Retrieve document by id Document & meta data
  6. GET /blog/_mapping { "blogger": { "mappings": { "post": { "properties":

    { "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "category": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "date": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "guid": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 Schemaless? Not really … “Guesses” mapping on insert
  7. 0 hits, analyzed data POST /blog/_search { "_source": "date", "query":

    { "wildcard": { "date": { "value": "*11:50*" } } } } POST /blog/_search { "_source": "date", "query": { "wildcard": { "date.keyword": { "value": "*11:50*" } } } } 2 hits, non-analyzed data
  8. PUT /blog { "mappings" : { "post" : { "properties":

    { "title" : { "type" : "text" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "keyword" }, "category": { "type": "keyword" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  9. PUT /blog {
 "mappings": {
 "post": {
 "properties": {
 "title":

    {
 "type": "text",
 "fields": {
 "nl": {
 "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 },
 "date": {
 "type": "date",
 "format": "E, dd MMM YYYY HH:mm:ss Z"
 },
 "language": {
 "type": "keyword"
 },
 "author": {
 "type": "keyword"
 },
 "category": {
 "type": "keyword"
 },
 "guid": {
 "type": "integer"
 }
 }
 }
 }
 } Alternative mapping
  10. "properties": {
 "title": {
 "type": "text",
 "fields": {
 "nl": {


    "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 }, What’s with the analyzers?
  11. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball

    •Custom Standard tokenizer Lowercase token filter English stop word token filter
  12. POST /_analyze { "analyzer" : "standard", "text" : "Hey man,

    how are you doing?" } POST /_analyze { "analyzer" : "whitespace", "text" : "Hey man, how are you doing?" } POST /_analyze { "analyzer" : "english", "text" : "Hey man, how are you doing?" } Test the analyzer
  13. Hey man, how are you doing? hey man how are

    you doing Standard Hey man, how are you doing? Whitespace hei man how you do English
  14. "hits": { "total": 1, "max_score": 4.5011063, "hits": [ { "_index":

    "blog", "_type": "post", "_id": "2742", "_score": 4.5011063, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } } ] } }
  15. }, "hits": { "total": 4, "max_score": 4.9479203, "hits": [ {

    "_index": "blog", "_type": "post", "_id": "2742", "_score": 4.9479203, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } }, { "_index": "blog", "_type": "post", "_id": "5586", "_score": 4.896652, "_source": { "title": "WebAssembly: several world players work on a faster Internet" } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 4.8334804, "_source": { "title": "SSL: what is it and how does it work?" } },
  16. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol

    support in Varnish" } } } 164 posts 1 post POST /blog/post/_count { "query": { "bool": { "filter": { "term": { "title.keyword": "PROXY protocol support in Varnish" } } } } }
  17. Filter •Does it match? Yes or no •When relevance doesn’t

    matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data
  18. Match Query Match Phrase Query Match Phrase Prefix Query Multi

    Match Query Common Terms Query Query String Query Simple Query String Query Term Query Terms Query Range Query Exists Query Prefix Query Wildcard Query Regexp Query Fuzzy Query Type Query Ids Query Constant Score Query Bool Query Dis Max Query Function Score Query Boosting Query Indices Query Nested Query Has Child Query Has Parent Query Parent Id Query GeoShape Query Geo Bounding Box Query Geo Distance Query Geo Distance Range Query Geo Polygon Query More Like This Query Template Query Script Query Percolate Query Span Term Query Span Multi Term Query Span First Query Span Near Query Span Or Query Span Not Query Span Containing Query Span Within Query Span Field Masking Query
  19. POST /blog/_search { "query": { "bool": { "filter": { "bool":

    { "must" : [ { "term" : { "language" : "en" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }
  20. POST /cities/city/_search { "size": 200, "sort": [ { "city": {

    "order": "asc" } } ], "query": { "bool": { "filter": { "geo_distance": { "distance": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field
  21. POST /cities/city/_search { "size": 200, "query": { "bool": { "filter":

    { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”
  22. POST /blog/_search { "_source": ["title"], "query": { "bool": { "must":

    [ { "match": { "title": "varnish thijs" } }, { "bool": { "filter": { "term": { "language": "en" } } } } ] } } }
  23. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant
  24. POST /blog/_search?_source=false { "query": { "bool": { "filter": { "term":

    { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source
  25. { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [ { "_index": "blog", "_type": "post", "_id": "6700", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "13425", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "6254", "_score": 0 }, No relevance on filters Score is always 0
  26. POST /blog/_search { "_source": ["title", "category"], "query": { "bool": {

    "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”
  27. POST /blog/_search { "_source": ["title", "category"], "query": { "bool": {

    "must_not": [ { "bool": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries
  28. POST /blog/_search { "query": { "bool": { "should": [ {

    "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting
  29. SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search?pretty

    { "size": 0 "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs
  30. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 1, "buckets": [

    { "key": "Romy", "doc_count": 458 }, { "key": "Jimmy Cappaert", "doc_count": 160 }, { "key": "Tom", "doc_count": 144 }, { "key": "Combell", "doc_count": 143 }, { "key": "Christophe", "doc_count": 32 }, { "key": "Dorien Marinus", "doc_count": 19 }, Aggregation output
  31. POST /blog/_search { "query": { "match": { "title": "varnish" }

    }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  32. }, "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets":

    [ { "key": "Romy", "doc_count": 6, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en", "doc_count": 4 }, { "key": "nl", "doc_count": 2 } ] } }, { "key": "Combell", "doc_count": 5, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl", "doc_count": 4 }, { "key": "en", "doc_count": 1 } ] } }, Aggregation output
  33. Avg Aggregation Cardinality Aggregation Extended Stats Aggregation Geo Bounds Aggregation

    Geo Centroid Aggregation Max Aggregation Min Aggregation Percentiles Aggregation Percentile Ranks Aggregation Scripted Metric Aggregation Stats Aggregation Sum Aggregation Top hits Aggregation Value Count Aggregation Children Aggregation Date Histogram Aggregation Date Range Aggregation Diversified Sampler Aggregation Filter Aggregation Filters Aggregation Geo Distance Aggregation GeoHash grid Aggregation Global Aggregation Histogram Aggregation IP Range Aggregation Missing Aggregation Nested Aggregation Range Aggregation Reverse nested Aggregation Sampler Aggregation Significant Terms Aggregation Terms Aggregation Avg Bucket Aggregation Derivative Aggregation Max Bucket Aggregation Min Bucket Aggregation Sum Bucket Aggregation Stats Bucket Aggregation Extended Stats Bucket Aggregation Percentiles Bucket Aggregation Moving Average Aggregation Cumulative Sum Aggregation Bucket Script Aggregation Bucket Selector Aggregation Serial Differencing Aggregation Matrix Stats
  34. Example config settings node.rack: my-location node.master: true node.data: true http.enabled:

    true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2
  35. GET /_cat =^.^= /_cat/repositories /_cat/fielddata /_cat/fielddata/{fields} /_cat/recovery /_cat/recovery/{index} /_cat/allocation /_cat/pending_tasks

    /_cat/health /_cat/shards /_cat/shards/{index} /_cat/plugins /_cat/thread_pool /_cat/thread_pool/{thread_pools}/_cat/templates /_cat/master /_cat/tasks /_cat/segments /_cat/segments/{index} /_cat/indices /_cat/indices/{index} /_cat/aliases /_cat/aliases/{alias} /_cat/count /_cat/count/{index} /_cat/nodeattrs /_cat/snapshots/{repository} /_cat/nodes Non-JSON output
  36. GET /_cat/shards?v index shard prirep state docs store ip node

    my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default