Slide 1

Slide 1 text

Buscas poderosas em BILHÕES de documentos? Seu sistema pode prover isso de forma escalável e resiliente com o Elasticsearch Matheus de Faria Moraes

Slide 2

Slide 2 text

Agenda ▰ What is Elasticsearch; ▰ Use Cases; ▰ Basic Concepts; ▰ Document and Index; ▰ Cluster and Nodes; ▰ Primary Shards and Replica Shards; ▰ Near Real Time (NRT); ▰ Demo.

Slide 3

Slide 3 text

whoami I am Matheus Moraes Developer and Speaker @Sensedia Java, NoSQL and Microservices enthusiast

Slide 4

Slide 4 text

What is Elasticsearch?

Slide 5

Slide 5 text

What is Elasticsearch? ▰ Full-text search and analytics engine; ▰ Highly scalable; ▰ Open-source; ▰ Store, search, and analyze big volumes of data in near real time; ▰ REST APIs; ▰ Good documentation; ▰ Apache Lucene.

Slide 6

Slide 6 text

Use Cases

Slide 7

Slide 7 text

2 billion documents 8 million code repositories 4 million active users ~ 300 search / minute

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Basic Concepts

Slide 10

Slide 10 text

Document and Index curl -X PUT localhost:9200/cities/_doc/1 \ -H 'Content-Type: application/json' \ -d '{ "city": "Tanabi", "state": "SP", "country": "BR", "population": 25000 }'

Slide 11

Slide 11 text

CLUSTER NODE 1 ★ NODE 2 NODE 3 Cluster and Nodes

Slide 12

Slide 12 text

CLUSTER NODE 1 ★ NODE 2 NODE 3 P0 P1 Primary Shards

Slide 13

Slide 13 text

Primary Shard Benefits ▰ Elasticity ▰ Horizontal Scaling

Slide 14

Slide 14 text

CLUSTER NODE 1 ★ NODE 2 NODE 3 P0 R0 R1 R1 R0 P1 Replica Shards 2/2

Slide 15

Slide 15 text

CLUSTER NODE 1 ★ NODE 2 NODE 3 P0 R0 R1 R1 R0 P1 A A A B B B Cluster, Nodes and Shards 2/2

Slide 16

Slide 16 text

CLUSTER NODE 1 ★ NODE 2 NODE 3 P0 R0 R1 R1 R0 P1 A A A B B B Cluster, Nodes and Shards 2/2

Slide 17

Slide 17 text

Replica Shard Benefits ▰ H.A. ▰ Resilience ▰ Search Throughput

Slide 18

Slide 18 text

Topologies ▰ Default 7.0 1 / 1 ▰ Old Default 5 / 1 ▰ Search performance 1 / 10 ▰ Index performance 20 / 1

Slide 19

Slide 19 text

Index creation with shards curl -X PUT localhost:9200/cities \ -H 'Content-Type: application/json' \ -d '{ "settings": { "number_of_shards": 2, "number_of_replicas": 1 } }'

Slide 20

Slide 20 text

Searchable and Persistent Documents

Slide 21

Slide 21 text

curl -X PUT localhost:9200/cities/_doc/1 \ -H 'Content-Type: application/json' \ -d '{ "city": "Tanabi", "state": "SP", "country": "BR", "population": 25000 }' && \ curl -X GET localhost:9200/cities/_search?pretty&q=name:Tanabi N R T

Slide 22

Slide 22 text

Search by segment (Lucene) Searchable Commit Point

Slide 23

Slide 23 text

Lucene commits are expensive ▰ fsync ▰ Disk Searchable Commit Point

Slide 24

Slide 24 text

In-memory buffer and Translog

Slide 25

Slide 25 text

1. Documents are indexed In-memory buffer Searchable Commit Point { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } Translog

Slide 26

Slide 26 text

2. Refresh In-memory buffer Searchable Commit Point { } { } { } { } { } { } { } { } Translog

Slide 27

Slide 27 text

3. The translog keeps accumulating documents In-memory buffer Searchable Commit Point { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } Translog { } { } { } { } { } { } { } { } { }

Slide 28

Slide 28 text

4. Flush (Lucene commit) In-memory buffer Searchable Commit Point Translog

Slide 29

Slide 29 text

Big picture CLUSTER NODE 1 ★ P0 R1 NODE 3 R0 R1 NODE 2 P1 R0 Searchable Commit Point { }{ }{ } In-memory buffer Translog { }{ }{ }{ }{ }{ }

Slide 30

Slide 30 text

Refresh interval curl -X PUT localhost:9200/cities/_settings \ -H 'Content-Type: application/json' \ -d '{ "index" : { "refresh_interval" : "3s" } }'

Slide 31

Slide 31 text

?refresh (Index, Update, Delete, and Bulk) ▰ Empty or true ▰ wait_for ▰ false (default) POST cities/_refresh Refresh

Slide 32

Slide 32 text

Demo matheusfm/elasticsearch-demo

Slide 33

Slide 33 text

Thank you! matheusfm matheusfm mfariam