Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's new in Elasticsearch

What's new in Elasticsearch

Presentation given at the Milan Elasticsearch meetup on July 16th 2014.

Luca Cavanna

July 16, 2014
Tweet

More Decks by Luca Cavanna

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Elasticsearch? What’s new in @lucacavanna (1.0, 1.1, 1.2 & 1.x)
  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited JSON distributed real-time analytics RESTful Lucene open source schema-free document oriented search
  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Unstructured search
  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Structured search
  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Enrichment
  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Sorting
  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Pagination
  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Aggregation
  9. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Suggestions
  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
  11. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Setup $ wget https://download.elasticsearch.org/elasticsearch/ elasticsearch/elasticsearch-1.2.1.zip ! $ unzip elasticsearch-1.2.1.zip ! $ cd elasticsearch-1.2.1 ! $ bin/elasticsearch
  12. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Is it alive? $ curl localhost:9200 ! { "status" : 200, "name" : "Moondark", "version" : { "number" : “1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" }
  13. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Index $ curl -XPUT localhost:9200/twitter/status/1 -d ' { "text" : "Whats new in elasticsearch", "user" : { "name" : "Luca Cavanna", "screen_name" : "lucacavanna" }, "place" : { "country" : "Netherlands", "country_code" : "nl" }, "created_at" : "2014-07-16", "retweet_count" : 50 } '
  14. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Get $ curl -XGET localhost:9200/twitter/status/1 Delete $ curl -XDELETE localhost:9200/twitter/status/1
  15. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Search $ curl -XGET localhost:9200/_search?q=elasticsearch $ curl -XGET localhost:9200/_search -d ' { "query" : { "query_string" : { "query" : "elasticsearch AND features" } } } '
  16. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Search - query DSL $ curl -XGET localhost:9200/_search -d ' { "query" : { "filtered" : { "query" : { "bool" : { "must" : [ { "match" : { "text" : { "query" : "elasticsearch features", "operator" : "AND" }}} ], "should" : [ { "match" : {"text" : "pizza"} } ] } }, "filter" : { "range" : { "created_at" : {"from" : "2014-07-01"} } } } } } '
  17. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited snapshot & restore Photo by John http://www.flickr.com/people/60026579@N00
  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 0.90 • disable flush • find all primary shards location (optional) • copy files (rsync) • re-enable flush
  19. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 1.0 - repository $ curl -XPUT localhost:9200/_snapshot/local -d ' { "type" : "fs", "settings" : { "location" : "/data/es/backup" } } '
  20. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 1.0 - snapshot $ curl -XPUT localhost:9200/_snapshot/local/backup_1 -d ' { "indices" : "*,-twitter*" } '
  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited restore in 0.90 • close the index • find all existing shards • replace files with ones from backup • re-open the index
  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited restore in 1.0 $ curl -XPOST localhost:9200/2014-*/_close • close the index/indices $ curl -XPOST localhost:9200/_snapshot/local/backup_1/_restore -d ' { "indices" : "2014-*" } ' • restore existing snapshot
  23. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Facets in 0.90 • terms / terms stats • range • histogram / date histogram • statistical • geo distance • filter / query
  24. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per user $ curl -XGET localhost:9200/twitter/_search -d ' { "facets" : { "retweets_per_user" : { "terms_stats" : { "key_field" : "user.screen_name", "value_field" : "retweet_count" } } } } '
  25. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per user { "facets" : { "retweets_per_user" : { "_type" : "terms_stats", "missing" : 0, "terms" : [{ "term" : "lucacavanna", "count" : 1, "total_count" : 1, "min" : 50.0, "max" : 50.0, "total" : 50.0, "mean" : 50.0 }] } } }
  26. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited give me the retweets stats per month, per user… cool, then…
  27. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per month per user $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "user" : { "terms" : { "field" : "user.screen_name" }, "aggs" : { "retweets" : { "stats" : { "field" : "retweet_count" } } } } } } } } '
  28. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per month per user { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "Tue Jul 01 00:00:00 +0000 2014", "key" : 1396310400000, "doc_count" : 1, "user" : { "buckets" : [ { "key" : "lucacavanna", "doc_count" : 1, "retweets" : { "count" : 1, "min" : 50, "max" : 50, "avg" : 50, "sum" : 50 } } ] } } ] } }
  29. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited buckets • global • filter • missing • terms • range • date_range • ipv4_range • histogram • date_histogram • geo_distance • geohash_grid • nested
  30. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited metrics • value_count • stats • extended_stats • avg • min • max • sum
  31. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited register query $ curl -XPUT localhost:9200/twitter/.percolator/es-features -d ' { "query" : { "match" : { "text" : "elasticsearch AND features" } }, "alert_type" : "mention" } '
  32. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited percolate document $ curl -XGET localhost:9200/twitter/tweet/_percolate -d ' { "doc" : { "text": “Whats new in elasticsearch", "user" : { "name" : "Luca Cavanna", "screen_name" : "lucacavanna" }, "created_at" : "2014-07-16" } }' { … "total" : 1, "matches" : [{ "_index" : "twitter", "_id" : "es-features" }] }
  33. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited 0.90 VS 1.x • single shard • sequential execution • _percolator index • single index percolation • arbitrary number of shards • parallel execution • .percolator type (any index) • multi index percolation
  34. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited new percolation features in 1.0 • percolate existing documents • percolate count api • filter support (in addition to queries) • highlighting • scoring • multi percolate • support for aggregations
  35. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (0.90) { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } $ curl localhost:9200/cluster/_state/nodes,master_node?pretty
  36. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (0.90) $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } }
  37. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } Which node is the master? (0.90)
  38. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } Which node is the master? (0.90)
  39. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (1.0) $ curl localhost:9200/_cat/master yT4GUfIWTY6aJdQtWVEFpw Lucas-MacBook-Air.local 192.168.0.12 Moondark
  40. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited _cat*/api • /_cat/aliases • /_cat/allocation • /_cat/count • /_cat/health • /_cat/indices • /_cat/master • /_cat/nodes • /_cat/pending_tasks • /_cat/thread_pool • /_cat/shards • /_cat/plugins
  41. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • disk based field data (aka doc values) • field data circuit breaker • tribe node (aka federated search)
  42. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …count all the things cardinality aggregation
  43. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited How many distinct users tweeted, per month, per country?
  44. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Distinct users per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "distinct_users" : { "cardinality" : { "field" : "user.screen_name" } } } } } } } } '
  45. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 501244, "distinct_users" : { "value" : 471504 } }, { "key" : "Indonesia", "doc_count" : 452933, "distinct_users" : { "value" : 312002 } } ] } } ] } } } Distinct users per month, per country
  46. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited cardinality aggregation • HyperLogLog++ algorithm • Approximate counts based on the hashes of the field values • Configurable precision: how to trade memory for accuracy • Allows to provide hashes while indexing • Allows to compute hashes at index time • Scripting support
  47. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …know your data percentiles aggregation
  48. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited How is the number of retweets distributed, per month, per country?
  49. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Retweets stats per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "retweets" : { "stats" : { "field" : "retweet_count" } } } } } } } } '
  50. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 169442, "retweets" : { "min": 0, "max": 230681, "avg": 946.0165939898582, "sum": 47945067 } } ] } } ] } } } Retweets stats per month, per country
  51. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited …but how about the outliers? Interesting
  52. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Retweets per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "retweets" : { "percentiles" : { "field" : "retweet_count" } } } } } } } } '
  53. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 169442, "retweets" : { "1.0": 1, "5.0": 1, "25.0": 2, "50.0": 21.927867004790084, "75.0": 218.26625104274626, "95.0": 3199.6148040638604, "99.0": 15889.028205128077 } } ] } } ] } } } Retweets per month, per country
  54. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited percentiles aggregation • t-digest algorithm • Approximate percentiles • Configurable compression: trade memory for accuracy • Request specific percentiles only • Scripting support
  55. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …revealing the uncommonly common significant terms
  56. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What’s the right hashtag for… “tulip”?
  57. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Right hashtag for “tulip” $ curl -XGET localhost:9200/twitter/_search -d ' { "query" : { "match" : {"text" : "tulip"} }, "aggs" : { "interesting_tags" : { "significant_terms" : { "field" : "hashtags.text" } } } } '
  58. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "interesting_tags" : { "doc_count" : 40, "buckets" : [ { "key” : "spring", "doc_count" : 38, "score" : 3397.32, "bg_count" : 45 } ] } } } Right hashtag for “tulip”
  59. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited significant terms • Background set: the whole index • Foreground set: documents matching the query • Approximate counts • Configurable shard_size for better accuracy
  60. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • better cross field queries • search templates • aliases support in index templates • recovery api & _cat/recovery • _cat/segments
  61. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …filtered suggestions context suggester
  62. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Suggest songs of same genre $ curl -XPOST localhost:9200/songs/_suggest -d ' { "suggest" : { "text" : "a", "completion": { "field" : "suggest_field", "size" : 10, "context" : { "genre" : "rock" } } } '
  63. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • Java 1.7 required • reverse nested aggregation • background filtering for significant terms • disabled dynamic scripting by default
  64. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Give me the top tweets grouped by hashtag
  65. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Top tweets per hashtag $ curl -XPOST localhost:9200/_search -d ' { "aggs": { "hashtag": { "terms": { "field": "hashtag" }, "aggs" : { "top_tweets" : { "top_hits" : { "_source": { "include": ["text"] } } } } } } } '
  66. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited "aggregations" : { "hashtag" : { "buckets" : [ { "key" : "elasticsearch", "doc_count" : 1, "top_tweets" : { "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "twitter", "_type" : "status", "_id" : "1", "_score" : 1.0, "_source":{"text":"Whats new in elasticsearch"} } ] } } } ] } } Top tweets per hashtag
  67. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • search templates stored in an index • groovy as a scripting language • facets are getting deprecated
  68. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited thank you! Support: http://elasticsearch.com/support Training: http://training.elasticsearch.com ! We are hiring: http://elasticsearch.com/about/jobs/ @lucacavanna