Elastic{ON} 2018 - What's Evolving in Elasticsearch

Team and Tech Leads, Elasticsearch 28 February 2018 @clintongormley @s1m0nw
What’s Evolving in Elasticsearch Clinton Gormley & Simon Willnauer

6.0 November 2017 2,236 Pull Requests 333 Contributors

Indexing 1 2 Search 3 Security 4 Administration

6.0 1 2 6.x 3 7.0

Indexing

Faster Indexing 1.x 2.x 5.x 6.x 7.x

• Only pay for what you use Sparse doc values
Disk on a Diet Removed _all field • Replaced by "default_fields": ["*"]

0 25 50 75 100 5.x 6.0 OOTB _all disabled
Sample Metricbeat Dataset Apples-to-apples index size improvements

Index Shrinking 1 2 3 4

4 3 2 1 Index Shrinking

4 3 2 1 Index Splitting

4 3 1 2 Index Splitting

Data Rollups (coming soon to X-Pack)

Data Rollups Supported metrics min max count (coming soon to
X-Pack) avg

Data Rollups Supported metrics cardinality percentiles (coming soon to X-Pack)
min max count avg

Data Rollups Flexible bucketing and filtering by time, histograms, and
terms prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Date Histogram Histogram Terms (coming soon to X-Pack)

Data Rollups prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Uses the new
composite aggregation • Paginate through all buckets of a multi-level aggregation • Accurate term counts • Sorted by “natural” order (coming soon to X-Pack)

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by
time, histograms, and terms (coming soon to X-Pack)

Data Rollups @timestamp group by datacenter url.path Flexible bucketing and
filtering by time, histograms, and terms (coming soon to X-Pack)

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by
time, histograms, and terms (coming soon to X-Pack)

Data Rollups Flexible bucketing and filtering by time, histograms, and
terms @timestamp datacenter filter by  url.path (coming soon to X-Pack)

Data Rollups The more data you have, the more space
you save, easily 90%+ Raw data (coming soon to X-Pack)

Search

Scaleable Cross Cluster Search Elasticsearch Kibana Elasticsearch Elasticsearch

Scaleable Cross Cluster Search Search across two major versions 5.latest
7.x 6.latest Elasticsearch Kibana Elasticsearch Elasticsearch

Improved Search Scalability Searches across many shards are more scalable:
• Fast pre-check phase, exclude any shards that can’t match query. • Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster. • Batched reduction of results, reduces memory usage on the coordinating node. Multi-shard Search Request Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard N Subset of Shards containing results ...

2R 2P 1R Adaptive Replica Selection 1P

Adaptive Replica Selection 2R 2P 1R 1P

Adaptive Replica Selection Avoids nodes with higher latency 2R 2P
1R 1P

Index Sorting Player 1 Score: 600 5.x Query for top
3 player scores Player 2 Score: 0 Player 3 Score: 200 Player 4 Score: 700 Player 5 Score: 300 Player 1907 Score: 800 ... Query for top 3 player scores ... Player 1907 Score: 800 Player 4 Score: 700 Player 1 Score: 600 Player 5 Score: 300 Player 3 Score: 200 Player 2 Score: 0 6.x • Sort at index time vs. query time • Optimize on-disk format for some use cases • Improve query performance at the cost of index performance Much speedier sorted queries

SQL Client SELECT course, avg(age),count(*) FROM mytable WHERE match(uni,"oxford") GROUP
BY course ORDER BY course, avg(age) HAVING avg(age) > 18 (coming soon to X-Pack)

SQL Client CLI JDBC Kibana Canvas SQL over REST GET
/_sql {} (coming soon to X-Pack)

SQL Client CLI JDBC Kibana Canvas SQL over REST GET
/_sql {} ODBC (coming soon to X-Pack)

“title": { "type": "text", "index_prefix": { "min_chars": 2, "max_chars": 6
} } Index prefixes for faster querying Faster Prefix Queries

"query_string": { "query": "faster pref*", "fields": “title" } Index prefixes
for faster querying Faster Prefix Queries

Index shingles for faster phrase queries Faster Phrase and Prefix
Queries "match_phrase_prefix": { “title": "phrase and pref*" }

Faster Top-N Queries GET /_search { "size": 2, "query": {
"match": { "text": "quick brown fox" } } }

Faster Top-N Queries GET /_search { "size": 2, "query": {
"match": { "text": "quick brown fox" } } } Collect all docs because: • Aggregations • Total hits • To find top-N documents

Faster Top-N Queries When total hits and aggregations not required
DOC quick brown fox total 1 2 3 1 6 2 2 3 0 5 3 0 3 1 4 4 2 0 1 3 GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } }

• Scripted Search Similarity • Significant Text Aggregation • Ranking
Evaluation API • Korean Analyzer • Nano-second timestamps More Search Features

Security

Secure All the Things No default passwords Mandatory TLS between
nodes changeme ( X-Pack feature)

Attribute-Based Access Control { “attrs": ["team:finance", "country:usa"] … } {
"attrs": ["team:finance", "country:usa", "clearance:secret"] … } All attributes must be present ( X-Pack feature)

Audit Log Events Ignore Policies xpack.security.audit.logfile.events.ignore_filters: ignore_bulk_logging: users: ["beats"] indices:
["filebeat*", "metricbeat*"] Fine grained filtering of the security audit log ( X-Pack feature)

Single Sign On with SAML OAuth and Kerberos to follow
( X-Pack feature)

Administration

Rolling Major Version Upgrades 5.2 5.2 5.2 5.2 5.2 5.6
5.6 5.6 5.6 5.6

Rolling Major Version Upgrades X-Pack feature (Basic) Upgrade Assistant 5.6
5.6 5.6 5.6 5.6 6.x 6.x 6.x 6.x 6.x

6.x 6.x 6.x 6.x 6.x Rolling Major Version Upgrades Zero
Downtime

Distributed watch execution • Watches are no longer executed on
only the master node • They are executed on nodes which hold shards of the .watches index • Configure all or specific nodes dedicated to watch execution X-Pack feature (Gold)

Cluster Protection Circuit Breakers and Soft Limits • Circuit breakers
track memory usage, now including Lucene’s requirements. • Soft limits prevent users from running dangerous requests - help admins to protect their clusters from unwitting users. • Added limits on highlighting, terms query,   n-gram and shingle analysers, nested docs

Segment 1 File-Based Recovery (5.x) Primary Segment 2 Segment 3
Segment 1 Replica Segment 2 Segment 3

Segment 4 Segment 1 File-Based Recovery (5.x) Primary Segment 1
Replica Segment 2 Segment 3

Segment 4 Segment 1 File-Based Recovery (5.x) Primary Segment 1
Replica Segment 4

5 6 7 Ops-Based Recovery (6.0) Primary Replica 1 2
3 4 5 6 7 1 2 3 4

1 2 3 4 5 6 7 Ops-Based Recovery (6.0)
Primary Replica 1 2 3 4 5 6 7

Cross-Cluster Replication New York Tokyo London ny_sales ny_sales lnd_sales tk_sales
(coming soon to X-Pack)

Index Lifecycle Management (coming soon to X-Pack)

Index Lifecycle Management Hot Phase - Index to my-logs-write, Search
on my-logs-read (coming soon to X-Pack)

Index Lifecycle Management Hot Nodes 1 2 3 Cold Nodes
Hot Phase - Index to my-logs-write, Search on my-logs-read Warm Nodes (coming soon to X-Pack)

Index Lifecycle Management 1 2 3 Hot Phase - Rollover
1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Index Lifecycle Management 1 2 3 Warm Phase - Allocate
1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

2 3 Index Lifecycle Management 1 2 3 Warm Phase
- Shrink 1 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

1 Index Lifecycle Management 1 2 3 Warm Phase -
Compress Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

1 Index Lifecycle Management 1 2 3 Cold Phase -
Allocate Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Index Lifecycle Management 1 2 3 Delete Phase Hot Nodes
Warm Nodes (coming soon to X-Pack) Cold Nodes 1

Zen Discovery minimum_master_nodes = 1 Master Eligible Node

Zen Discovery minimum_master_nodes = 2 Master Eligible Node Master Eligible
Node

Node Master Eligible Node

Node Master Eligible Node Master Eligible Node

Zen Discovery v2 minimum_master_nodes = 3 Master Eligible Node Master
Eligible Node Master Eligible Node Master Eligible Node Automatic management of master nodes

• Boaz Leskes, Jason Tedor, David Turner, Yannick Welsch •
Wednesday, 3:30pm Elasticsearch Consensus: The Past, the Present, and the Future Get the Lay of the Lucene Land • Adrien Grand • Wednesday, 10:30am Other Talks You Should See The State of Geo in Elasticsearch • Nick Knize, Thomas Neirynck • Thursday, 9:30am The State of the Elasticsearch Java Client • Nik Everett • Thursday, 2:30pm Elasticsearch SQL • Costin Leau • Thursday, 3:30pm

Elastic{ON} 2018 - What's Evolving in Elasticse...

Elastic{ON} 2018 - What's Evolving in Elasticsearch

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript