Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - What's Evolving in Elasticsearch

Elastic{ON} 2018 - What's Evolving in Elasticsearch

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Team and Tech Leads, Elasticsearch 28 February 2018 @clintongormley @s1m0nw

    What’s Evolving in Elasticsearch Clinton Gormley & Simon Willnauer
  2. • Only pay for what you use Sparse doc values

    Disk on a Diet Removed _all field • Replaced by "default_fields": ["*"]
  3. 0 25 50 75 100 5.x 6.0 OOTB _all disabled

    Sample Metricbeat Dataset Apples-to-apples index size improvements
  4. Data Rollups Flexible bucketing and filtering by time, histograms, and

    terms prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Date Histogram Histogram Terms (coming soon to X-Pack)
  5. Data Rollups prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Uses the new

    composite aggregation • Paginate through all buckets of a multi-level aggregation • Accurate term counts • Sorted by “natural” order (coming soon to X-Pack)
  6. Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by

    time, histograms, and terms (coming soon to X-Pack)
  7. Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by

    time, histograms, and terms (coming soon to X-Pack)
  8. Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by

    time, histograms, and terms (coming soon to X-Pack)
  9. Data Rollups @timestamp group by datacenter url.path Flexible bucketing and

    filtering by time, histograms, and terms (coming soon to X-Pack)
  10. Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by

    time, histograms, and terms (coming soon to X-Pack)
  11. Data Rollups Flexible bucketing and filtering by time, histograms, and

    terms @timestamp datacenter filter by
 url.path (coming soon to X-Pack)
  12. Data Rollups The more data you have, the more space

    you save, easily 90%+ Raw data (coming soon to X-Pack)
  13. Scaleable Cross Cluster Search Search across two major versions 5.latest

    7.x 6.latest Elasticsearch Kibana Elasticsearch Elasticsearch
  14. Improved Search Scalability Searches across many shards are more scalable:

    • Fast pre-check phase, exclude any shards that can’t match query. • Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster. • Batched reduction of results, reduces memory usage on the coordinating node. Multi-shard Search Request Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard N Subset of Shards containing results ...
  15. Index Sorting Player 1 Score: 600 5.x Query for top

    3 player scores Player 2 Score: 0 Player 3 Score: 200 Player 4 Score: 700 Player 5 Score: 300 Player 1907 Score: 800 ... Query for top 3 player scores ... Player 1907 Score: 800 Player 4 Score: 700 Player 1 Score: 600 Player 5 Score: 300 Player 3 Score: 200 Player 2 Score: 0 6.x • Sort at index time vs. query time • Optimize on-disk format for some use cases • Improve query performance at the cost of index performance Much speedier sorted queries
  16. SQL Client SELECT course, avg(age),count(*) FROM mytable WHERE match(uni,"oxford") GROUP

    BY course ORDER BY course, avg(age) HAVING avg(age) > 18 (coming soon to X-Pack)
  17. SQL Client CLI JDBC Kibana Canvas SQL over REST GET

    /_sql {} (coming soon to X-Pack)
  18. SQL Client CLI JDBC Kibana Canvas SQL over REST GET

    /_sql {} ODBC (coming soon to X-Pack)
  19. “title": { "type": "text", "index_prefix": { "min_chars": 2, "max_chars": 6

    } } Index prefixes for faster querying Faster Prefix Queries
  20. Index shingles for faster phrase queries Faster Phrase and Prefix

    Queries "match_phrase_prefix": { “title": "phrase and pref*" }
  21. Faster Top-N Queries GET /_search { "size": 2, "query": {

    "match": { "text": "quick brown fox" } } }
  22. Faster Top-N Queries GET /_search { "size": 2, "query": {

    "match": { "text": "quick brown fox" } } } Collect all docs because: • Aggregations • Total hits • To find top-N documents
  23. Faster Top-N Queries GET /_search { "size": 2, "query": {

    "match": { "text": "quick brown fox" } } } Collect all docs because: • Aggregations • Total hits • To find top-N documents
  24. Faster Top-N Queries When total hits and aggregations not required

    DOC quick brown fox total 1 2 3 1 6 2 2 3 0 5 3 0 3 1 4 4 2 0 1 3 GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } }
  25. • Scripted Search Similarity • Significant Text Aggregation • Ranking

    Evaluation API • Korean Analyzer • Nano-second timestamps More Search Features
  26. Attribute-Based Access Control { “attrs": ["team:finance", "country:usa"] … } {

    "attrs": ["team:finance", "country:usa", "clearance:secret"] … } All attributes must be present ( X-Pack feature)
  27. Audit Log Events Ignore Policies xpack.security.audit.logfile.events.ignore_filters: ignore_bulk_logging: users: ["beats"] indices:

    ["filebeat*", "metricbeat*"] Fine grained filtering of the security audit log ( X-Pack feature)
  28. Distributed watch execution • Watches are no longer executed on

    only the master node • They are executed on nodes which hold shards of the .watches index • Configure all or specific nodes dedicated to watch execution X-Pack feature (Gold)
  29. Cluster Protection Circuit Breakers and Soft Limits • Circuit breakers

    track memory usage, now including Lucene’s requirements. • Soft limits prevent users from running dangerous requests - help admins to protect their clusters from unwitting users. • Added limits on highlighting, terms query, 
 n-gram and shingle analysers, nested docs
  30. 1 2 3 4 5 6 7 Ops-Based Recovery (6.0)

    Primary Replica 1 2 3 4 5 6 7
  31. Index Lifecycle Management Hot Nodes 1 2 3 Cold Nodes

    Hot Phase - Index to my-logs-write, Search on my-logs-read Warm Nodes (coming soon to X-Pack)
  32. Index Lifecycle Management 1 2 3 Hot Phase - Rollover

    1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes
  33. Index Lifecycle Management 1 2 3 Warm Phase - Allocate

    1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes
  34. 2 3 Index Lifecycle Management 1 2 3 Warm Phase

    - Shrink 1 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes
  35. 1 Index Lifecycle Management 1 2 3 Warm Phase -

    Compress Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes
  36. 1 Index Lifecycle Management 1 2 3 Cold Phase -

    Allocate Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes
  37. Index Lifecycle Management 1 2 3 Delete Phase Hot Nodes

    Warm Nodes (coming soon to X-Pack) Cold Nodes 1
  38. Zen Discovery v2 minimum_master_nodes = 3 Master Eligible Node Master

    Eligible Node Master Eligible Node Master Eligible Node Automatic management of master nodes
  39. • Boaz Leskes, Jason Tedor, David Turner, Yannick Welsch •

    Wednesday, 3:30pm Elasticsearch Consensus: The Past, the Present, and the Future Get the Lay of the Lucene Land • Adrien Grand • Wednesday, 10:30am Other Talks You Should See The State of Geo in Elasticsearch • Nick Knize, Thomas Neirynck • Thursday, 9:30am The State of the Elasticsearch Java Client • Nik Everett • Thursday, 2:30pm Elasticsearch SQL • Costin Leau • Thursday, 3:30pm
  40. Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/

    Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. Please attribute Elastic with a link to elastic.co