Elasticsearch

Elasticsearch Search and Analyze data

NoSQL: Text Search and Document

Distributed

Distributed Data Nodes

Distributed Data Nodes Shard

Distributed Master Nodes

Distributed Client Nodes

Master Node

Responsabilities • Shard allocation • Act on node joining and
leaving • Broadcast cluster state to every node

Shard allocation is based on: • balancing shards accross all
data nodes • Disk availability on data node • Fault tolerance • Physical configuration (VM, rack, availability zone) • Filters for node decomissioning

Discovery • Master election • Discovering Nodes

Zen Discovery 1. Uses Unicast 2. Try to contact hosts
received by configuration 3. Retrieve cluster state 4. Send a Join to the master

Cloud Supported Discovery • Azure • AWS • Google Cloud
Computing

Master Election • Any non client node can be configured
to be master eligible • Minimum master eligible nodes before election to prevent 2 masterss • By default only master eligible and data nodes participate

Failure detection • Master pings all nodes periodically • All
nodes ping master periodically • Only master can update cluster state

Data Node

Distributed P1 R2 R1 R3 P3 P3 R1 P1 P2
R3 R1 R3 R3 R2 R2 R2 R1 P2 Index 1 Index 2

Inside a shard: Segments

Inside a segment: Inverted index Term Doc 1 Doc 2
Doc 3 Doc 4 brown X X X fox X X X quick X X

Inside a segment: Inverted index Term Doc 1 Doc 2
Doc 3 Doc 4 Doc 5 brown X X X fox X X X X quick X X write READ

Lock would kill performance!

Immutable Segment • Partial inverted index in segment • Immutability
make it cacheable • Filters on top of segment become immutable and cacheable

Immutable Segment • Partial inverted index in segment • Immutability
make it cacheable • Filters on top of segment become immutable and cacheable how do I update my index??

Indexing document S1 S2 S3 DISK Memory S1 S2 S3
Commit point In memory buffer Translog

Refresh (every sec by default) S1 S2 S3 DISK Memory
S1 S2 S3 Commit point In memory buffer S4 Translog

Flush (every 5 secs) S1 S2 S3 DISK Memory S1
S2 S3 Commit point In memory buffer S4 Translog S4 fsync

Merge (background) S1 S2 S3 Commit point S4 S5 S1
S2 S3 DISK

Merge flushed to disk S1 Commit point S5 S1 DISK
S5

Search in segment S5 S1 Diff 1

Search in segment S5 S1 Diff 1 2

Indexing in the cluster

Indexing R2 P2 R1 R2 P1 1 R1

Indexing R2 P2 R1 R2 P1 1 2 R1

Indexing R2 P2 R1 R2 P1 1 2 3 R1
3

Write scale by adding shards

Searching in the cluster

Search R2 P2 R1 R2 P1 1 R1

Search: Query Phase R2 P2 R1 R2 P1 1 R1
2 2

Search: Query Phase R2 P2 R1 R2 P1 1 R1
2 2 3 3 Return IDs and Score

Search: Fetch Phase R2 P2 R1 R2 P1 1 R1
2 2 3 3 Multi get

Search scale by adding Replica

Thank you !

Elasticsearch

Elasticsearch

Other Decks in Technology

Featured

Transcript