Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New features in Elasticsearch 1.0/1.1

New features in Elasticsearch 1.0/1.1

Presentation given at the Amsterdam elasticsearch meetup on April 3rd 2014.

Luca Cavanna

April 03, 2014
Tweet

More Decks by Luca Cavanna

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited 0 Elasticsearch 1.1 New features in @lucacavanna
  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited JSON distributed real-time analytics RESTful Lucene open source schema-free document oriented search
  3. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Setup $ wget https://download.elasticsearch.org/elasticsearch/ elasticsearch/elasticsearch-1.1.0.zip ! $ unzip elasticsearch-1.1.0.zip ! $ cd elasticsearch-1.1.0 ! $ bin/elasticsearch
  4. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Is it alive? $ curl localhost:9200 ! { "status" : 200, "name" : "Moondark", "version" : { "number" : “1.1.0", "build_hash" : "2181e113dea80b4a9e31e58e9686658a2d46e363", "build_timestamp" : "2014-03-25T15:59:51Z", "build_snapshot" : false, "lucene_version" : "4.7" }, "tagline" : "You Know, for Search" }
  5. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Index $ curl -XPUT localhost:9200/twitter/status/1 -d ' { "text" : "New features in elasticsearch 1.1", "user" : { "name" : "Luca Cavanna", "screen_name" : "lucacavanna" }, "place" : { "country" : "Netherlands", "country_code" : "nl" }, "created_at" : "2014-04-03", "retweet_count" : 50 } '
  6. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Get $ curl -XGET localhost:9200/twitter/status/1 Delete $ curl -XDELETE localhost:9200/twitter/status/1
  7. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Search $ curl -XGET localhost:9200/_search?q=elasticsearch $ curl -XGET localhost:9200/_search -d ' { "query" : { "query_string" : { "query" : "elasticsearch AND features" } } } '
  8. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Search - query DSL $ curl -XGET localhost:9200/_search -d ' { "query" : { "filtered" : { "query" : { "bool" : { "must" : [ { "match" : { "text" : { "query" : "elasticsearch features", "operator" : "AND" }}} ], "should" : [ { "match" : {"text" : "pizza"} } ] } }, "filter" : { "range" : { "created_at" : {"from" : "2014-04-01"} } } } } } '
  9. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited snapshot & restore Photo by John http://www.flickr.com/people/60026579@N00
  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 0.90 • disable flush • find all primary shards location (optional) • copy files (rsync) • re-enable flush
  11. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 1.0 - repository $ curl -XPUT localhost:9200/_snapshot/local -d ' { "type" : "fs", "settings" : { "location" : "/data/es/backup" } } '
  12. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited backup in 1.0 - snapshot $ curl -XPUT localhost:9200/_snapshot/local/backup_1 -d ' { "indices" : "*,-twitter*" } '
  13. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited restore in 0.90 • close the index • find all existing shards • replace files with ones from backup • re-open the index
  14. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited restore in 1.0 $ curl -XPOST localhost:9200/2014-*/_close • close the index/indices $ curl -XPOST localhost:9200/_snapshot/local/backup_1/_restore -d ' { "indices" : "2014-*" } ' • restore existing snapshot
  15. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Facets in 0.90 • terms / terms stats • range • histogram / date histogram • statistical • geo distance • filter / query
  16. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per user $ curl -XGET localhost:9200/twitter/_search -d ' { "facets" : { "retweets_per_user" : { "terms_stats" : { "key_field" : "user.screen_name", "value_field" : "retweet_count" } } } } '
  17. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per user { "facets" : { "retweets_per_user" : { "_type" : "terms_stats", "missing" : 0, "terms" : [{ "term" : "lucacavanna", "count" : 1, "total_count" : 1, "min" : 50.0, "max" : 50.0, "total" : 50.0, "mean" : 50.0 }] } } }
  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited give me the retweets stats per month, per user… cool, then…
  19. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per month per user $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "user" : { "terms" : { "field" : "user.screen_name" }, "aggs" : { "retweets" : { "stats" : { "field" : "retweet_count" } } } } } } } } '
  20. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited retweets stats per month per user { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "Tue Apr 01 00:00:00 +0000 2014", "key" : 1396310400000, "doc_count" : 1, "user" : { "buckets" : [ { "key" : "lucacavanna", "doc_count" : 1, "retweets" : { "count" : 1, "min" : 50, "max" : 50, "avg" : 50, "sum" : 50 } } ] } } ] } }
  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited buckets • global • filter • missing • terms • range • date_range • ipv4_range • histogram • date_histogram • geo_distance • geohash_grid • nested
  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited metrics • value_count • stats • extended_stats • avg • min • max • sum
  23. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited register query $ curl -XPUT localhost:9200/twitter/.percolator/es-features -d ' { "query" : { "match" : { "text" : "elasticsearch AND features" } }, "alert_type" : "mention" } '
  24. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited percolate document $ curl -XGET localhost:9200/twitter/tweet/_percolate -d ' { "doc" : { "text": "New features in elasticsearch 1.0", "user" : { "name" : "Luca Cavanna", "screen_name" : "lucacavanna" }, "created_at" : "2014-03-18" } }' { … "total" : 1, "matches" : [{ "_index" : "twitter", "_id" : "es-features" }] }
  25. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited 0.90 VS 1.x • single shard • sequential execution • _percolator index • single index percolation • arbitrary number of shards • parallel execution • .percolator type (any index) • multi index percolation
  26. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited new percolation features in 1.0 • percolate existing documents • percolate count api • filter support (in addition to queries) • highlighting • scoring • multi percolate • support for aggregations
  27. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (0.90) { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } $ curl localhost:9200/cluster/_state/nodes,master_node?pretty
  28. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (0.90) $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } }
  29. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } Which node is the master? (0.90)
  30. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited $ curl localhost:9200/cluster/_state/nodes,master_node?pretty ! { "cluster_name" : "elasticsearch", "master_node" : "yT4GUfIWTY6aJdQtWVEFpw", "nodes” : { "R-5_0LiORAWmr_cYLXO69Q" : { "name" : "Woodgod", "transport_address" : "inet[/192.168.0.12:9302]", "attributes" : {} }, "yT4GUfIWTY6aJdQtWVEFpw" : { "name” : "Moondark", "transport_address" : "inet[/192.168.0.12:9300]", "attributes" : {} }, "pR0NmKeGTVGget2O1qSqCQ" : { "name" : "Adaptoid", "transport_address" : "inet[/192.168.0.12:9301]", "attributes" : {} } } } Which node is the master? (0.90)
  31. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Which node is the master? (1.0) $ curl localhost:9200/_cat/master yT4GUfIWTY6aJdQtWVEFpw Lucas-MacBook-Air.local 192.168.0.12 Moondark
  32. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited _cat*/api • /_cat/aliases • /_cat/allocation • /_cat/count • /_cat/health • /_cat/indices • /_cat/master • /_cat/nodes • /_cat/pending_tasks • /_cat/thread_pool • /_cat/shards • /_cat/plugins
  33. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • disk based field data (aka doc values) • field data circuit breaker • tribe node (aka federated search)
  34. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …count all the things cardinality aggregation
  35. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited How many distinct users tweeted, per month, per country?
  36. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Distinct users per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "distinct_users" : { "cardinality" : { "field" : "user.screen_name" } } } } } } } } '
  37. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 501244, "distinct_users" : { "value" : 471504 } }, { "key" : "Indonesia", "doc_count" : 452933, "distinct_users" : { "value" : 312002 } } ] } } ] } } } Distinct users per month, per country
  38. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited cardinality aggregation • HyperLogLog++ algorithm • Approximate counts based on the hashes of the field values • Configurable precision: how to trade memory for accuracy • Allows to provide hashes while indexing • Allows to compute hashes at index time • Scripting support
  39. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …know your data percentiles aggregation
  40. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited How is the number of retweets distributed, per month, per country?
  41. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Retweets stats per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "retweets" : { "stats" : { "field" : "retweet_count" } } } } } } } } '
  42. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 169442, "retweets" : { "min": 0, "max": 230681, "avg": 946.0165939898582, "sum": 47945067 } } ] } } ] } } } Retweets stats per month, per country
  43. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited …but how about the outliers? Interesting
  44. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Retweets per month, per country $ curl -XGET localhost:9200/twitter/_search -d ' { "aggs" : { "month" : { "date_histogram" : { "field" : "created_at", "interval" : "month" }, "aggs" : { "country" : { "terms" : { "field" : "place.country.keyword" }, "aggs" : { "retweets" : { "percentiles" : { "field" : "retweet_count" } } } } } } } } '
  45. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "month" : { "buckets" : [ { "key_as_string" : "2014-04-01T00:00:00.000Z", "key" : 1396310400000, "doc_count" : 1097354, "country" : { "buckets" : [ { "key" : "United States", "doc_count" : 169442, "retweets" : { "1.0": 1, "5.0": 1, "25.0": 2, "50.0": 21.927867004790084, "75.0": 218.26625104274626, "95.0": 3199.6148040638604, "99.0": 15889.028205128077 } } ] } } ] } } } Retweets per month, per country
  46. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited percentiles aggregation • t-digest algorithm • Approximate percentiles • Configurable compression: trade memory for accuracy • Request specific percentiles only • Scripting support
  47. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …revealing the uncommonly common significant terms
  48. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What’s the right hashtag for… “tulip”?
  49. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Right hashtag for “tulip” $ curl -XGET localhost:9200/twitter/_search -d ' { "query" : { "match" : {"text" : "tulip"} }, "aggs" : { "interesting_tags" : { "significant_terms" : { "field" : "hashtags.text" } } } } '
  50. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited { "aggregations" : { "interesting_tags" : { "doc_count" : 40, "buckets" : [ { "key” : "spring", "doc_count" : 38, "score" : 3397.32, "bg_count" : 45 } ] } } } Right hashtag for “tulip”
  51. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What tweets have the wrong hashtag or don’t have one?
  52. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Look for a specific hashtag $ curl -XGET localhost:9200/twitter/_search -d ' { "query" : { "match" : {"hashtags.text" : "elasticsearch"} }, "aggs" : { "interesting_tags" : { "significant_terms" : { "field" : "text" } } } } '
  53. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Like this but not this query $ curl -XGET localhost:9200/twitter/_search -d ' { "query" : { "bool" : { "must_not" : [ {"match" : {"hashtags.text" : "elasticsearch"}} ], "must" : [{ "terms" : { "text": ["lucene", “aggregations”, …] } }] } } } '
  54. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited significant terms • Background set: the whole index • Foreground set: documents matching the query • Approximate counts • Configurable shard_size for better accuracy
  55. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited …and more • better cross field queries • search templates • aliases support in index templates • recovery api & _cat/recovery • _cat/segments
  56. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited thank you! Support: http://elasticsearch.com/support Training: http://training.elasticsearch.com ! We are hiring: http://elasticsearch.com/about/jobs/ @lucacavanna