Opening Keynote

Shay Banon, Founder & CEO Elastic London | June 22,
2017 Opening Keynote

life:universe user:soulmate _Search? outside the box city:restaurant car:model fridge:leftovers work:dreamjob

Logging

Metrics

Welcome Opbeat!

Opbeat Demo 12:30 - 1:20 PM 3:15 - 5:30 PM
Demo Station #4

Thank You

Clinton Gormley Team Lead, Elasticsearch Elasticsearch past, present, future

12 Elasticsearch 5.0 26 October 2016

13 Better at Numbers Safe Simple Things  Should Be Simple
Elasticsearch 5.0

14 Great for Metrics • Faster to index • Faster
to search • Smaller on disk • Less heap • IPv6

15 Keep Calm and Index On • Bootstrap checks •
Fully sandboxed scripting (Painless) • Strict settings • Soft limits • All-new circuit breakers

16 ‘Time-series’ not ‘time consuming’ • Ingest node • Rollover
API • Shrink API

17 Elasticsearch 5.x Feature rich

18 Elasticsearch 5.x Still ^ • Keyword normalization • Unified
highlighter • Field collapse • Multi-word synonyms+proximity • Cancellable searches • Parallel scroll & reindex

19 Elasticsearch 5.x Still ^ • Numeric & date range
fields • Automatic optimizations for range searches • Massive aggregations with partitioning • Faster geo-distance sorting • Faster geo-ip lookups and for logs and for numbers and for geo and ... ^

20 Where to next?

21 What are the pain points?

© Marie-Lan Nguyen Wikimedia Commons / CC-BY 2.5 22 What
are the pain points? • Ever increasing scale • Major version upgrades • Slow recovery • Sparse data and disk usage

24 Ever increasing scale • More clusters, not bigger clusters
• Easier to manage • Easier to upgrade • Reduce potential outages • Need to query across clusters

25 Tribe Node Yesterday’s solution

26 Cluster Sales Master Nodes Data Node Data Node Data
Node Cluster R&D Master Nodes Data Node Data Node Data Node How the Tribe Node Works

Node Tribe Node Cluster R&D Master Nodes Data Node Data Node Data Node tribe: t1: cluster.name: sales t2: cluster.name: r_and_d How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client How the Tribe Node Works

Cluster Sales Master Nodes Data Node Data Node Data Node
Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 30 Cluster State Cluster State How the Tribe Node Works

Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 31 Cluster State Cluster State How the Tribe Node Works

Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 32 Merged Cluster State How the Tribe Node Works

33 Kibana Cluster Sales Master Nodes Data Node Data Node
Data Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State How the Tribe Node Works

Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client 34 Problems With How the Tribe Node Works Merged Cluster State Kibana

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Static Configuration tribe: t1: cluster.name: sales t2: cluster.name: r_and_d Problems With How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Connections to All Nodes Problems With How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Frequent cluster state updates Problems With How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Kibana Merged Cluster State Index names must be unique Problems With How the Tribe Node Works

Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Tribe Node Kibana No master node No index creation Problems With How the Tribe Node Works

Node Tribe Node t1 Node Client Cluster R&D Master Nodes Data Node Data Node Data Node t2 Node Client Merged Cluster State Kibana Reduce results from many shards Problems With How the Tribe Node Works

41 Tribe is going away

42 Welcome to Cross-Cluster Search

43 Cross-Cluster Search • Minimum viable solution to supersede tribe
• Reduces the problem domain to query execution • Cluster information is reduced to a namespace

44 How Cross-Cluster search works Cluster Sales Master Nodes Data
Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node

Node Data Node Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Any node can perform cross-cluster search

Node Data Node Data Node Optional dedicated cross-cluster search cluster Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node PUT _cluster/settings { "transient": { "search.remote": { "sales.seeds": "10.0.0.1:9300", “r_and_d.seeds”: "10.1.0.1:9300" } } } Dynamic settings Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node No cluster state updates Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Can create indices Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Few lightweight connections Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana Index namespacing GET sales:*,r_and_d:logs*/_search { "query": { … } } Optional dedicated cross-cluster search cluster

Node Data Node Data Node Master/Data Node Cluster R&D Master Nodes Data Node Data Node Data Node Master/Data Node Kibana With many shards Batched Reduce Phase Optional dedicated cross-cluster search cluster

56 Major version upgrades • Upgrade Lucene • Add new
features • Streamline existing features • Breaking changes • Remove backwards compatibility cruft • Keep codebase maintainable © Famartin Wikimedia Commons / CC-BY 2.5

57 Major version upgrade pain • Too many changes at
once • Full cluster restart • Upgrade Java client at same time as Elasticsearch cluster • Data from major_version - 2 no longer readable

58 Too many changes at once • Most features backported
to 5.x • Deprecation logging • Migration assistance API (X-Pack)

59 Full Cluster Restart © Paul Cross / CC-BY 2.5

60 Rolling upgrades • Upgrade from 5.latest to 6.latest without
full cluster restart • 5.latest is the latest GA release of 5.x when 6.0.0 goes GA • All 6.x releases will allow upgrading from that 5.x release, unless there is a new 5.x release

61 Rolling upgrade caveats • If using security, must have
TLS enabled • Reserve the right to require full cluster restart in the future, but only if absolutely necessary • All nodes must be upgraded to 5.latest before upgrading • Indices created in 2.x still need to be reindexed before upgrading to 6.x

62 Java client • All other languages use REST interface
• Transport client tied to Elasticsearch major version • Second entry point into the cluster • Complicates distinguishing between clients and nodes

63 Java REST client • Released in 5.0 • JSON
strings only • Resilient, but not user friendly

64 Java high level REST client • Works across major
version upgrade • IDE friendly • Similar API to Transport Client - easy migration • Based on low-level REST client • Supports CRUD & Search • Currently targeted for release in 5.6 • Depends on elasticsearch-core

65 Data compatibility • Any index created in 5.x can
be upgraded to 6.x • Any index created in 2.x must be reindexed in 5.x or imported with reindex-from-remote • How do you reindex a petabyte of data?

66 Cross Major Version Search v5.2.0 Kibana Master Nodes Data
Node Data Node

67 Cross Major Version Search v5.2.0 Kibana v6.0.0 Master Nodes
Data Node Data Node Master Nodes Data Node Data Node

68 Cross Major Version Search v5.2.0 Master Nodes Data Node
Data Node v6.0.0 v5.latest Kibana Master Nodes Data Node Data Node

69 Cross Major Version Search v5.2.0 Master Nodes Data Node
Data Node v6.0.0 Kibana Master Nodes Data Node Cross Cluster Client v5.latest

How is data stored? In memory buffer Transaction log Lucene
segments

segments 1 1

segments 1 2 1 2

segments 1 2 3 1 2 3

segments 1 2 3 1 2 3 REFRESH

segments 1 2 3 1 2 3

segments 4 5 6 7 1 2 3 4 5 6 7 1 2 3

segments 4 5 6 7 1 2 3 4 5 6 7 1 2 3 REFRESH

segments 1 2 3 4 5 6 7 1 2 3 4 5 6 7

segments 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7

segments 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 FLUSH

segments 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

segments 1 2 3 4 5 6 7 8 9

segments 1 2 3 4 5 6 7 8 9 MERGE

segments 1 2 3 4 5 6 7 8 9 1 2 3 8 9

segments 4 5 6 7 1 2 3 8 9 1 2 3 8 9

Data replication Client Primary shard Replica shard

Data replication Client Primary shard Replica shard 1 2

Data replication Client Primary shard Replica shard 1

Data replication Lucene segments 4 5 6 7 1 2
3 8 9 Primary Lucene segments 1 2 4 7 9 3 5 6 8 Replica

Replica recovery Lucene segments 4 5 6 7 1 2
3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8

3 8 9 Primary 4 5 6 7 1 2 3 8 9 Lucene segments Replica 1 2 4 7 9 3 5 6 8

3 8 9 Primary 4 5 6 7 1 2 3 8 9 Lucene segments Replica

Data at rest Lucene segments 4 5 6 7 1
2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 SYNCED FLUSH

Data at rest Lucene segments 4 5 6 7 1
2 3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8

Active indexing Lucene segments 4 5 6 7 1 2
3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8

3 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11

7 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11

7 8 9 Primary Lucene segments Replica 1 2 4 7 9 3 5 6 8 10 11 1 2 3 4 5 6 7 8 9 10 11

7 8 9 Primary Lucene segments Replica 10 11 1 2 3 4 5 6 7 8 9 10 11

Sequence numbers Primary Replica

Sequence numbers Transaction log 1 2 3 Primary Transaction log
Replica 1 2 3

1 2 3 Sequence numbers Transaction log 1 2 3
4 5 Primary Transaction log Replica

Sequence numbers Transaction log 1 2 3 4 5 Primary
Transaction log Replica 1 2 3

Sequence numbers Transaction log 1 2 3 4 5 6
7 8 9 Primary Transaction log Replica 1 2 3 4 5 7 8

Trimming the transaction log Transaction log 1 2 3 4
5 6 7 8 9 Primary Transaction log Replica 1 2 3 4 5 7 8

Trimming the transaction log Transaction log Primary Transaction log Replica
5 6 7 8 9 5 7 8

117 Slow recovery • 6.0: ‒ Fast replica recovery ‒
Configurable transaction log retention period • Lays groundwork for: ‒ Replica syncing after primary failure ‒ Cross-data-centre recovery

119 Sparse data and disk usage • Doc Values: Columnar
store • Fast access to a field’s value for many documents • Used for aggregations, sorting, scripting, and some queries • Written to disk at index time • Cached in the file-system cache © Tony Weman / CC-BY 2.5

120 Doc values - Dense data Segment 2 Docs Field
1 Field 2 1 Four D Segment 1 Docs Field 1 Field 2 1 One A 2 Two B 3 Three C

121 Doc values - Dense data Merged Segment 3 Docs
Field 1 Field 2 1 One A 2 Two B 3 Three C 4 Four D Segment 1 Docs Field 1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 1 Field 2 1 Four D

122 Doc values - Sparse data Segment 1 Docs Field
1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Null Null 2 Null Bar Null 3 Null Null Baz

1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo Null Null 2 Null Bar Null 3 Null Null Baz Merged Segment 3 Docs Field 1 Field 2 Field 3 Field 4 Field 5 1 One A Null Null Null 2 Two B Null Null Null 3 Three C Null Null Null 4 Null Null Foo Null Null 5 Null Null Null Bar Null 6 Null Null Null Null Baz

1 Field 2 1 One A 2 Two B 3 Three C Segment 2 Docs Field 3 Field 4 Field 5 1 Foo 2 Bar 3 Baz Merged Segment 3 Docs Field 1 Field 2 Field 3 Field 4 Field 5 1 One A 2 Two B 3 Three C 4 Foo 5 Bar 6 Baz

125 Sparse doc value support • Coming in 6.0 •
Big disk savings for sparse values - pay for what you use • Big file cache savings -   more data can be cached • Dense queries still more efficient   than sparse © Tony Weman / CC-BY 2.5

Elasticsearch 6.0

Coming soon to a cluster near you Elasticsearch 6.0

Opening Keynote

Opening Keynote

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript