Slide 1

Slide 1 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited elasticsearch data/

Slide 2

Slide 2 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited About Me • Igor Motov • Developer at Elasticsearch Inc. • Github: imotov • Twitter: @imotov

Slide 3

Slide 3 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited About Elasticsearch Inc. • Founded in 2012 By the people behind the Elasticsearch and Apache Lucene http://www.elasticsearch.com Headquarters: Amsterdam and Los Altos, CA • We provide Training (public & onsite) Development support Production support subscription (SLA)

Slide 4

Slide 4 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited file descriptors ! ! ! “Make sure to increase the number of open files descriptors on the machine (or for the user running elasticsearch). Setting it to 32k or even 64k is recommended.” ! ! Source: setup and configuration guide

Slide 5

Slide 5 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited ! ! where are all these file descriptors go?

Slide 6

Slide 6 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited ! ! files, data structures and their usage

Slide 7

Slide 7 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited main concepts • node a running elasticsearch instance (typically JVM process) • cluster a group of nodes sharing the same set of indices • index a set of documents of possibly different types stored in one or more shards • shard a lucene index, allocated on one of the nodes

Slide 8

Slide 8 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Index shards shard 0 shard 1 shard 2 shard 3 shard 4 hash(_id)%5= 0 1 2 3 4 document

Slide 9

Slide 9 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Node 2 Index shards Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Node 1

Slide 10

Slide 10 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited master node • elected when nodes form a cluster • coordinates work of other nodes through cluster state • the only node that can update cluster state • publishes cluster state to other node

Slide 11

Slide 11 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited cluster state • nodes list of nodes in the cluster, their addresses, attributes and master • index metadata settings, mappings and aliases • shard routing table where the shards can be found • index templates • cluster settings persistent and transient

Slide 12

Slide 12 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited cluster state - persistent • nodes list of nodes in the cluster, their addresses, attributes and master • index metadata settings, mappings and aliases • shard routing table where the shards can be found • index templates • cluster settings persistent and transient

Slide 13

Slide 13 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited data • node level persistent cluster settings, templates • index level aliases, index settings, mappings • shard level shard metadata, lucene index, transaction log

Slide 14

Slide 14 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited data directory • “data” directory in elasticsearch home by default • path.data in config/elasticearch.yml • --path.data=… on command line • handled by deb and rpm packages !

Slide 15

Slide 15 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited multiple nodes per data dir • //nodes/NNN where NNN = 0, 1, 2, ... ! • node.max_local_storage_nodes! default 50

Slide 16

Slide 16 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited ! ! let’s take a look

Slide 17

Slide 17 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited summary / nodes/ / _state/ - cluster state node.lock - lock indices/ / _state/ - index metadata 0/ _state/ - shard metadata index/ - index data translog/ - transaction log data

Slide 18

Slide 18 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited transaction log shard lucene index transaction log lucene buffer

Slide 19

Slide 19 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited transaction log • transaction log stores every operation (create/update/delete) fsync-ed every 5 sec (configurable) replayed on node restart • lucene segments fsync-ed when transaction log is full (every 30 min, 200mb or 500 operations)

Slide 20

Slide 20 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited lucene index • inverted index • stored fields • doc values • …

Slide 21

Slide 21 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited inverted index • Document 1: { “text”: “Elasticsearch is an open source, distributed search engine.”, “date”: “2014-07-01” } • Document 2: { “text”: “Elasticsearch is a search server based on Lucene.”, “date”: “2014-07-02” }

Slide 22

Slide 22 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited analysis • “Elasticsearch is an open source, distributed search engine.” could be translated into tokens: – elasticsearch – open – source – distributed – search – engine • “Elasticsearch is a search server based on Lucene.” could be translated into tokens: – elasticsearch – search – server – based – lucene

Slide 23

Slide 23 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited inverted index - field text token document frequency postings (document ids) based 1 2 distributed 1 1 elasticsearch 2 1, 2 engine 1 1 lucene 1 2 open 1 1 search 2 1, 2 server 1 2 source 1 1

Slide 24

Slide 24 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited inverted index - field date token document frequency postings (document ids) 2014-07-01 1 1 2014-07-02 1 2

Slide 25

Slide 25 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited inverted index • tokens->documents • easy to build • difficult to update • segmented • segments are merged periodically

Slide 26

Slide 26 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited field data • “uninverted" inverted index • documents->tokens • can be built from inverted index on demand • can be stored with index as doc values • segmented • used by sorting, aggregations, scripts, etc

Slide 27

Slide 27 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited field data - text document tokens 1 distributed, elasticsearch, engine, open, search, source 2 based, elasticsearch, lucene, search, server

Slide 28

Slide 28 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited field data - date document tokens 1 2014-07-01 2 2014-07-02

Slide 29

Slide 29 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited stored fields • _source - JSON source of the entire document • _parent id • routing • ttl • _uid • any other field marked as “stored”

Slide 30

Slide 30 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited all together now • searching for terms “distributed” and “service” • sorting by the field “date”

Slide 31

Slide 31 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Node 1 Node 2 QUERY phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action Cluster State • using cluster state all relevant shards are identified • requesting node sends QUERY requests to this shards

Slide 32

Slide 32 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited QUERY phase - shard level Shard Engine Segment 1 Segment 2 Segment 3 Segment 4 Segment N ……. • each shard searches all segments in the shard one after another

Slide 33

Slide 33 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited QUERY phase - inverted index token document frequency postings (document ids) based 1 2 distributed 1 1 elasticsearch 2 1, 2 engine 1 1 lucene 1 2 open 1 1 search 2 1, 2 server 1 2 source 1 1

Slide 34

Slide 34 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited QUERY phase - field data document tokens 1 2014-07-01 2 2014-07-02

Slide 35

Slide 35 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited QUERY phase - shard level Shard Engine Segment 1 Segment 2 Segment 3 Segment 4 Segment N ……. seg1, 2, [2014-07-02] seg1, 1, [2014-07-01] ……. • all segments are searched and top 10 documents are collected for each shard • for each document internal Lucene id and sort key is stored

Slide 36

Slide 36 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Node 1 Node 2 QUERY phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action • top 10 ids and sort keys for each shard are sent to requesting node • requesting node resorts them and finds global top10

Slide 37

Slide 37 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Node 1 Node 2 FETCH phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action • global top 10 documents are requested • only shards that have these top 10 documents are contacted

Slide 38

Slide 38 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited FETCH phase - shard level Shard Engine Segment 1 Segment 2 Segment 3 Segment 4 Segment N ……. • _source (stored field) is retrieved from corresponding segments

Slide 39

Slide 39 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Node 1 Node 2 FETCH phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action • requesting node combines all documents and sends them to the client

Slide 40

Slide 40 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited ! ! … and this is it

Slide 41

Slide 41 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited ! ! questions?