Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch In Action - Intracto 2017

Ca901ddcea38854b9783781c91fc87c9?s=47 Thijs Feryn
December 20, 2017

ElasticSearch In Action - Intracto 2017

See https://feryn.eu/speaking/elasticsearch-in-action-intracto-2017/ for more info about this presentation.

Ca901ddcea38854b9783781c91fc87c9?s=128

Thijs Feryn

December 20, 2017
Tweet

Transcript

  1. Elasticsearch in action By Thijs Feryn

  2. Explain in 1 slide

  3. •Full-text search engine •NoSQL database •Analytics engine •Written in Java

    •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack
  4. Still with me?

  5. Hi, I’m Thijs

  6. I’m @ThijsFeryn on Twitter

  7. I’m an Evangelist At

  8. I’m an Evangelist At

  9. I’m a at board member

  10. None
  11. https://www.elastic.co/ downloads/elasticsearch

  12. { "cluster_name": "elasticsearch", "cluster_uuid": "KKD4RjtCTTWoRomBDXyCDA", "name": "y3qufp6", "tagline": "You Know,

    for Search", "version": { "build_date": "2017-02-09T22:05:32.386Z", "build_hash": "db0d481", "build_snapshot": false, "lucene_version": "6.4.1", "number": "5.2.1" } } http://localhost: 9200
  13. RDBMS Elasticsearch Database Table Row Index Type Document

  14. PUT /blog { "acknowledged": true, "shards_acknowledged": true }

  15. POST/blog/post/6576 { "language":"en", "title":"Combell revamps its hosting range: fewer packages,

    more features!", "date":"Thu, 08 Dec 2016 15:35:59 +0000", "author":"Combell", "category":[ "Combell news", "Hosting", "Combell", "hosting" ], "guid":"6576" }
  16. { "_index": "blog", "_type": "post", "_id": "6576", "_version": 1, "result":

    "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } Confirmation
  17. GET /blog/post/6576 { "_index": "blog", "_type": "post", "_id": "6576", "_version":

    1, "found": true, "_source": { "language": "en", "title": "Combell revamps its hosting range: fewer packages, more features!", "date": "Thu, 08 Dec 2016 15:35:59 +0000", "author": "Combell", "category": [ "Combell news", "Hosting", "Combell", "hosting" ], "guid": "6576" } } Retrieve document by id Document & meta data
  18. GET /blog/_mapping { "blogger": { "mappings": { "post": { "properties":

    { "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "category": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "date": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "guid": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, Schemaless? Not really … “Guesses” mapping on insert
  19. String types ✓Text analyzed ✓Keyword non-analyzed ✓String • Analyzed •

    Extra non-analyzed keyword field • Deprecated
  20. 0 hits, analyzed data POST /blog/_search { "_source": "date", "query":

    { "wildcard": { "date": { "value": "*11:50*" } } } } POST /blog/_search { "_source": "date", "query": { "wildcard": { "date.keyword": { "value": "*11:50*" } } } } 2 hits, non-analyzed data
  21. Explicit mapping

  22. PUT /blog { "mappings" : { "post" : { "properties":

    { "title" : { "type" : "text" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "keyword" }, "category": { "type": "keyword" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  23. PUT /blog {
 "mappings": {
 "post": {
 "properties": {
 "title":

    {
 "type": "text",
 "fields": {
 "nl": {
 "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 },
 "date": {
 "type": "date",
 "format": "E, dd MMM YYYY HH:mm:ss Z"
 },
 "language": {
 "type": "keyword"
 },
 "author": {
 "type": "keyword"
 },
 "category": {
 "type": "keyword"
 },
 "guid": {
 "type": "integer"
 }
 }
 }
 }
 } Alternative mapping
  24. "properties": {
 "title": {
 "type": "text",
 "fields": {
 "nl": {


    "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 }, What’s with the analyzers?
  25. Analyzed vs non-analyzed

  26. Full-text vs exact value

  27. By default strings are analyzed … unless you mention it

    in the mapping
  28. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball

    •Custom Standard tokenizer Lowercase token filter English stop word token filter
  29. POST /_analyze { "analyzer" : "standard", "text" : "Hey man,

    how are you doing?" } POST /_analyze { "analyzer" : "whitespace", "text" : "Hey man, how are you doing?" } POST /_analyze { "analyzer" : "english", "text" : "Hey man, how are you doing?" } Test the analyzer
  30. Hey man, how are you doing? hey man how are

    you doing Standard Hey man, how are you doing? Whitespace hei man how you do English
  31. POST /blog/post/_search { "_source": ["title"], "query": { "match": { "title":

    "working" } } }
  32. "hits": { "total": 1, "max_score": 4.5011063, "hits": [ { "_index":

    "blog", "_type": "post", "_id": "2742", "_score": 4.5011063, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } } ] } }
  33. POST /blog/post/_search { "_source": ["title"], "query": { "match": { "title.en":

    "working" } } }
  34. }, "hits": { "total": 4, "max_score": 4.9479203, "hits": [ {

    "_index": "blog", "_type": "post", "_id": "2742", "_score": 4.9479203, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } }, { "_index": "blog", "_type": "post", "_id": "5586", "_score": 4.896652, "_source": { "title": "WebAssembly: several world players work on a faster Internet" } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 4.8334804, "_source": { "title": "SSL: what is it and how does it work?" } },
  35. Search

  36. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol

    support in Varnish" } } } 164 posts 1 post POST /blog/post/_count { "query": { "bool": { "filter": { "term": { "title.keyword": "PROXY protocol support in Varnish" } } } } }
  37. Filter context vs Query context

  38. Filter •Does it match? Yes or no •When relevance doesn’t

    matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data
  39. Match Query Match Phrase Query Match Phrase Prefix Query Multi

    Match Query Common Terms Query Query String Query Simple Query String Query Term Query Terms Query Range Query Exists Query Prefix Query Wildcard Query Regexp Query Fuzzy Query Type Query Ids Query Constant Score Query Bool Query Dis Max Query Function Score Query Boosting Query Indices Query Nested Query Has Child Query Has Parent Query Parent Id Query GeoShape Query Geo Bounding Box Query Geo Distance Query Geo Distance Range Query Geo Polygon Query More Like This Query Template Query Script Query Percolate Query Span Term Query Span Multi Term Query Span First Query Span Near Query Span Or Query Span Not Query Span Containing Query Span Within Query Span Field Masking Query
  40. Filter examples

  41. POST /blog/post/_search?pretty { "query": { "bool": { "filter": { "ids":

    { "values": [231,234,258] } } } } }
  42. POST /blog/_search { "query": { "bool": { "filter": { "bool":

    { "must" : [ { "term" : { "language" : "en" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }
  43. POST /blog/_search?pretty { "query": { "bool": { "filter": { "prefix":

    { "title.keyword": "Combell" } } } } }
  44. POST /cities/city/_search { "size": 200, "sort": [ { "city": {

    "order": "asc" } } ], "query": { "bool": { "filter": { "geo_distance": { "distance": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field
  45. POST /cities/city/_search { "size": 200, "query": { "bool": { "filter":

    { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”
  46. Relevance

  47. POST /blog/_search { "_source": ["title"], "query": { "bool": { "must":

    [ { "match": { "title": "varnish thijs" } }, { "bool": { "filter": { "term": { "language": "en" } } } } ] } } }
  48. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights we ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { Hits both terms. More relevant
  49. POST /blog/_search?_source=false { "query": { "bool": { "filter": { "term":

    { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source
  50. { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [ { "_index": "blog", "_type": "post", "_id": "6700", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "13425", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "6254", "_score": 0 }, No relevance on filters Score is always 0
  51. POST /blog/_search { "_source": ["title", "category"], "query": { "bool": {

    "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”
  52. POST /blog/_search { "_source": ["title", "category"], "query": { "bool": {

    "must_not": [ { "bool": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries
  53. POST /blog/_search { "query": { "bool": { "should": [ {

    "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting
  54. Multi index multi type

  55. /_search /products/_search /products/product/_search /products,clients/_search /pro*/_search /pro*,cli*/_search /products/product,invoice/_search /products/pro*/_search /_all/product/_search /_all/product,invoice/_search

    /_all/pro*/_search
  56. Multi “all the things”

  57. Aggregations

  58. Group by on steroids

  59. SELECT author, COUNT(guid) FROM blog.post GROUP BY author Aggregations in

    SQL Metric Bucket
  60. SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search?pretty

    { "size": 0 "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs
  61. }, "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 1, "buckets":

    [ { "key": "Romy", "doc_count": 458 }, { "key": "Jimmy Cappaert", "doc_count": 160 }, { "key": "Tom", "doc_count": 144 }, { "key": "Combell", "doc_count": 143 }, { "key": "Christophe", "doc_count": 32 }, { "key": "Dorien Marinus", "doc_count": 19 }, { Aggregation output
  62. POST /blog/_search { "query": { "match": { "title": "varnish" }

    }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  63. "hits": [] }, "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count":

    0, "buckets": [ { "key": "Romy", "doc_count": 6, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en", "doc_count": 4 }, { "key": "nl", "doc_count": 2 } ] } }, { "key": "Combell", "doc_count": 5, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl", "doc_count": 4 }, { "key": "en", "doc_count": 1 } ] } }, { Aggregation output
  64. Avg Aggregation Cardinality Aggregation Extended Stats Aggregation Geo Bounds Aggregation

    Geo Centroid Aggregation Max Aggregation Min Aggregation Percentiles Aggregation Percentile Ranks Aggregation Scripted Metric Aggregation Stats Aggregation Sum Aggregation Top hits Aggregation Value Count Aggregation Children Aggregation Date Histogram Aggregation Date Range Aggregation Diversified Sampler Aggregation Filter Aggregation Filters Aggregation Geo Distance Aggregation GeoHash grid Aggregation Global Aggregation Histogram Aggregation IP Range Aggregation Missing Aggregation Nested Aggregation Range Aggregation Reverse nested Aggregation Sampler Aggregation Significant Terms Aggregation Terms Aggregation Avg Bucket Aggregation Derivative Aggregation Max Bucket Aggregation Min Bucket Aggregation Sum Bucket Aggregation Stats Bucket Aggregation Extended Stats Bucket Aggregation Percentiles Bucket Aggregation Moving Average Aggregation Cumulative Sum Aggregation Bucket Script Aggregation Bucket Selector Aggregation Serial Differencing Aggregation Matrix Stats
  65. Managing Elasticsearch

  66. Plenty of ways … for which we don’t have enough

    time
  67. Clustering

  68. Single node 2 node cluster 3 node cluster

  69. Example config settings node.rack: my-location node.master: true node.data: true http.enabled:

    true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2
  70. GET /_cat

  71. GET /_cat =^.^= /_cat/repositories /_cat/fielddata /_cat/fielddata/{fields} /_cat/recovery /_cat/recovery/{index} /_cat/allocation /_cat/pending_tasks

    /_cat/health /_cat/shards /_cat/shards/{index} /_cat/plugins /_cat/thread_pool /_cat/thread_pool/{thread_pools}/_cat/templates /_cat/master /_cat/tasks /_cat/segments /_cat/segments/{index} /_cat/indices /_cat/indices/{index} /_cat/aliases /_cat/aliases/{alias} /_cat/count /_cat/count/{index} /_cat/nodeattrs /_cat/snapshots/{repository} /_cat/nodes Non-JSON output
  72. GET /_cat/shards?v index shard prirep state docs store ip node

    my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default
  73. GET /_cat/health?v&h=cluster,status,node.total,shards,pri,unassign,init cluster status node.total shards pri unassign init mycluster

    green 3 12 6 0 0 Cluster health
  74. The ELK stack

  75. None
  76. Logs Parse & ship Store Visualize

  77. Beats •File beat •Top beat •Packet beat •Winlog beat

  78. Logs Parse Store Visualize Ship

  79. None
  80. Integrating Elasticsearch

  81. It’s REST, deal with it!

  82. Or just use an API PHP Java Perl Python Ruby

    .NET
  83. Try it yourself! http://github.com/ thijsferyn/ elasticsearch_tutorial

  84. None
  85. None