Elastic{ON} 2018 - What's Evolving in Elasticsearch

Slide 1

Slide 1 text

Team and Tech Leads, Elasticsearch 28 February 2018 @clintongormley @s1m0nw What’s Evolving in Elasticsearch Clinton Gormley & Simon Willnauer

Slide 2

Slide 2 text

6.0 November 2017 2,236 Pull Requests 333 Contributors

Slide 3

Slide 3 text

Indexing 1 2 Search 3 Security 4 Administration

Slide 4

Slide 4 text

6.0 1 2 6.x 3 7.0

Slide 5

Slide 5 text

Indexing

Slide 6

Slide 6 text

Faster Indexing 1.x 2.x 5.x 6.x 7.x

Slide 7

Slide 7 text

• Only pay for what you use Sparse doc values Disk on a Diet Removed _all field • Replaced by "default_fields": ["*"]

Slide 8

Slide 8 text

0 25 50 75 100 5.x 6.0 OOTB _all disabled Sample Metricbeat Dataset Apples-to-apples index size improvements

Slide 9

Slide 9 text

Index Shrinking 1 2 3 4

Slide 10

Slide 10 text

Index Shrinking 1 2 3 4

Slide 11

Slide 11 text

4 3 2 1 Index Shrinking

Slide 12

Slide 12 text

4 3 2 1 Index Splitting

Slide 13

Slide 13 text

4 3 1 2 Index Splitting

Slide 14

Slide 14 text

4 3 1 2 Index Splitting

Slide 15

Slide 15 text

Data Rollups (coming soon to X-Pack)

Slide 16

Slide 16 text

Data Rollups Supported metrics min max count (coming soon to X-Pack) avg

Slide 17

Slide 17 text

Data Rollups Supported metrics cardinality percentiles (coming soon to X-Pack) min max count avg

Slide 18

Slide 18 text

Data Rollups Flexible bucketing and filtering by time, histograms, and terms prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Date Histogram Histogram Terms (coming soon to X-Pack)

Slide 19

Slide 19 text

Data Rollups prod-1.myco.com prod-2.myco.com prod-3.myco.com prod-4.myco.com prod-5.myco.com Uses the new composite aggregation • Paginate through all buckets of a multi-level aggregation • Accurate term counts • Sorted by “natural” order (coming soon to X-Pack)

Slide 20

Slide 20 text

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by time, histograms, and terms (coming soon to X-Pack)

Slide 21

Slide 21 text

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by time, histograms, and terms (coming soon to X-Pack)

Slide 22

Slide 22 text

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by time, histograms, and terms (coming soon to X-Pack)

Slide 23

Slide 23 text

Data Rollups @timestamp group by datacenter url.path Flexible bucketing and filtering by time, histograms, and terms (coming soon to X-Pack)

Slide 24

Slide 24 text

Data Rollups @timestamp datacenter url.path Flexible bucketing and filtering by time, histograms, and terms (coming soon to X-Pack)

Slide 25

Slide 25 text

Data Rollups Flexible bucketing and filtering by time, histograms, and terms @timestamp datacenter filter by  url.path (coming soon to X-Pack)

Slide 26

Slide 26 text

Data Rollups The more data you have, the more space you save, easily 90%+ Raw data (coming soon to X-Pack)

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Scaleable Cross Cluster Search Elasticsearch Kibana Elasticsearch Elasticsearch

Slide 29

Slide 29 text

Scaleable Cross Cluster Search Search across two major versions 5.latest 7.x 6.latest Elasticsearch Kibana Elasticsearch Elasticsearch

Slide 30

Slide 30 text

Improved Search Scalability Searches across many shards are more scalable: • Fast pre-check phase, exclude any shards that can’t match query. • Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster. • Batched reduction of results, reduces memory usage on the coordinating node. Multi-shard Search Request Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard N Subset of Shards containing results ...

Slide 31

Slide 31 text

2R 2P 1R Adaptive Replica Selection 1P

Slide 32

Slide 32 text

Adaptive Replica Selection 2R 2P 1R 1P

Slide 33

Slide 33 text

Adaptive Replica Selection Avoids nodes with higher latency 2R 2P 1R 1P

Slide 34

Slide 34 text

Index Sorting Player 1 Score: 600 5.x Query for top 3 player scores Player 2 Score: 0 Player 3 Score: 200 Player 4 Score: 700 Player 5 Score: 300 Player 1907 Score: 800 ... Query for top 3 player scores ... Player 1907 Score: 800 Player 4 Score: 700 Player 1 Score: 600 Player 5 Score: 300 Player 3 Score: 200 Player 2 Score: 0 6.x • Sort at index time vs. query time • Optimize on-disk format for some use cases • Improve query performance at the cost of index performance Much speedier sorted queries

Slide 35

Slide 35 text

SQL Client SELECT course, avg(age),count(*) FROM mytable WHERE match(uni,"oxford") GROUP BY course ORDER BY course, avg(age) HAVING avg(age) > 18 (coming soon to X-Pack)

Slide 36

Slide 36 text

SQL Client CLI JDBC Kibana Canvas SQL over REST GET /_sql {} (coming soon to X-Pack)

Slide 37

Slide 37 text

SQL Client CLI JDBC Kibana Canvas SQL over REST GET /_sql {} ODBC (coming soon to X-Pack)

Slide 38

Slide 38 text

“title": { "type": "text", "index_prefix": { "min_chars": 2, "max_chars": 6 } } Index prefixes for faster querying Faster Prefix Queries

Slide 39

Slide 39 text

"query_string": { "query": "faster pref*", "fields": “title" } Index prefixes for faster querying Faster Prefix Queries

Slide 40

Slide 40 text

Index shingles for faster phrase queries Faster Phrase and Prefix Queries "match_phrase_prefix": { “title": "phrase and pref*" }

Slide 41

Slide 41 text

Faster Top-N Queries GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } }

Slide 42

Slide 42 text

Faster Top-N Queries GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } } Collect all docs because: • Aggregations • Total hits • To find top-N documents

Slide 43

Slide 43 text

Faster Top-N Queries GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } } Collect all docs because: • Aggregations • Total hits • To find top-N documents

Slide 44

Slide 44 text

Faster Top-N Queries When total hits and aggregations not required DOC quick brown fox total 1 2 3 1 6 2 2 3 0 5 3 0 3 1 4 4 2 0 1 3 GET /_search { "size": 2, "query": { "match": { "text": "quick brown fox" } } }

Slide 45

Slide 45 text

• Scripted Search Similarity • Significant Text Aggregation • Ranking Evaluation API • Korean Analyzer • Nano-second timestamps More Search Features

Slide 46

Slide 46 text

Security

Slide 47

Slide 47 text

Secure All the Things No default passwords Mandatory TLS between nodes changeme ( X-Pack feature)

Slide 48

Slide 48 text

Attribute-Based Access Control { “attrs": ["team:finance", "country:usa"] … } { "attrs": ["team:finance", "country:usa", "clearance:secret"] … } All attributes must be present ( X-Pack feature)

Slide 49

Slide 49 text

Audit Log Events Ignore Policies xpack.security.audit.logfile.events.ignore_filters: ignore_bulk_logging: users: ["beats"] indices: ["filebeat*", "metricbeat*"] Fine grained filtering of the security audit log ( X-Pack feature)

Slide 50

Slide 50 text

Single Sign On with SAML OAuth and Kerberos to follow ( X-Pack feature)

Slide 51

Slide 51 text

Administration

Slide 52

Slide 52 text

Rolling Major Version Upgrades 5.2 5.2 5.2 5.2 5.2 5.6 5.6 5.6 5.6 5.6

Slide 53

Slide 53 text

Rolling Major Version Upgrades X-Pack feature (Basic) Upgrade Assistant 5.6 5.6 5.6 5.6 5.6 6.x 6.x 6.x 6.x 6.x

Slide 54

Slide 54 text

6.x 6.x 6.x 6.x 6.x Rolling Major Version Upgrades Zero Downtime

Slide 55

Slide 55 text

Distributed watch execution • Watches are no longer executed on only the master node • They are executed on nodes which hold shards of the .watches index • Configure all or specific nodes dedicated to watch execution X-Pack feature (Gold)

Slide 56

Slide 56 text

Cluster Protection Circuit Breakers and Soft Limits • Circuit breakers track memory usage, now including Lucene’s requirements. • Soft limits prevent users from running dangerous requests - help admins to protect their clusters from unwitting users. • Added limits on highlighting, terms query,   n-gram and shingle analysers, nested docs

Slide 57

Slide 57 text

Segment 1 File-Based Recovery (5.x) Primary Segment 2 Segment 3 Segment 1 Replica Segment 2 Segment 3

Slide 58

Slide 58 text

Segment 1 File-Based Recovery (5.x) Primary Segment 2 Segment 3 Segment 1 Replica Segment 2 Segment 3

Slide 59

Slide 59 text

Segment 4 Segment 1 File-Based Recovery (5.x) Primary Segment 1 Replica Segment 2 Segment 3

Slide 60

Slide 60 text

Segment 4 Segment 1 File-Based Recovery (5.x) Primary Segment 1 Replica Segment 4

Slide 61

Slide 61 text

Segment 4 Segment 1 File-Based Recovery (5.x) Primary Segment 1 Replica Segment 4

Slide 62

Slide 62 text

5 6 7 Ops-Based Recovery (6.0) Primary Replica 1 2 3 4 5 6 7 1 2 3 4

Slide 63

Slide 63 text

1 2 3 4 5 6 7 Ops-Based Recovery (6.0) Primary Replica 1 2 3 4 5 6 7

Slide 64

Slide 64 text

Cross-Cluster Replication New York Tokyo London ny_sales ny_sales lnd_sales tk_sales (coming soon to X-Pack)

Slide 65

Slide 65 text

Index Lifecycle Management (coming soon to X-Pack)

Slide 66

Slide 66 text

Index Lifecycle Management Hot Phase - Index to my-logs-write, Search on my-logs-read (coming soon to X-Pack)

Slide 67

Slide 67 text

Index Lifecycle Management Hot Nodes 1 2 3 Cold Nodes Hot Phase - Index to my-logs-write, Search on my-logs-read Warm Nodes (coming soon to X-Pack)

Slide 68

Slide 68 text

Index Lifecycle Management 1 2 3 Hot Phase - Rollover 1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Slide 69

Slide 69 text

Index Lifecycle Management 1 2 3 Warm Phase - Allocate 1 2 3 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Slide 70

Slide 70 text

2 3 Index Lifecycle Management 1 2 3 Warm Phase - Shrink 1 Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Slide 71

Slide 71 text

1 Index Lifecycle Management 1 2 3 Warm Phase - Compress Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Slide 72

Slide 72 text

1 Index Lifecycle Management 1 2 3 Cold Phase - Allocate Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes

Slide 73

Slide 73 text

Index Lifecycle Management 1 2 3 Delete Phase Hot Nodes Warm Nodes (coming soon to X-Pack) Cold Nodes 1

Slide 74

Slide 74 text

Zen Discovery minimum_master_nodes = 1 Master Eligible Node

Slide 75

Slide 75 text

Zen Discovery minimum_master_nodes = 2 Master Eligible Node Master Eligible Node

Slide 76

Slide 76 text

Zen Discovery minimum_master_nodes = 2 Master Eligible Node Master Eligible Node Master Eligible Node

Slide 77

Slide 77 text

Zen Discovery minimum_master_nodes = 3 Master Eligible Node Master Eligible Node Master Eligible Node Master Eligible Node

Slide 78

Slide 78 text

Zen Discovery v2 minimum_master_nodes = 3 Master Eligible Node Master Eligible Node Master Eligible Node Master Eligible Node Automatic management of master nodes

Slide 79

Slide 79 text

• Boaz Leskes, Jason Tedor, David Turner, Yannick Welsch • Wednesday, 3:30pm Elasticsearch Consensus: The Past, the Present, and the Future Get the Lay of the Lucene Land • Adrien Grand • Wednesday, 10:30am Other Talks You Should See The State of Geo in Elasticsearch • Nick Knize, Thomas Neirynck • Thursday, 9:30am The State of the Elasticsearch Java Client • Nik Everett • Thursday, 2:30pm Elasticsearch SQL • Costin Leau • Thursday, 3:30pm

Slide 80

Slide 80 text

Slide 81

Slide 81 text

www.elastic.co

Slide 82

Slide 82 text

Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. Please attribute Elastic with a link to elastic.co