Slide 1

Slide 1 text

two use cases of scaling data with Elasticsearch Staying Ahead of Users and Time Boaz Leskes @bleskes

Slide 2

Slide 2 text

indices,types and other animals Basics

Slide 3

Slide 3 text

A document {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }

Slide 4

Slide 4 text

A type {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } } = docs with similar data/structure {
 "created_at": "Thu Jan 23 18:27:23 +0000 2014", "id": 426420915698544640, "text": "Elasticsearch es una maravilla !!!!", "user": { "name": "Abel Coronado", "screen_name": "abxda", } }

Slide 5

Slide 5 text

An index = a collection of types {
 "created_at": "Thu Jan 23 "id": 426420915698544640, "text": "Elasticsearch Esc "user": { "name": "Abel Coronado "screen_name": "abxda" } } {
 "id": 19726002, "name": "Abel Coronado "screen_name": "abxda" "location":"Aguascalientes" "followers_count":871 "friends_count":1794 "listed_count":38 }

Slide 6

Slide 6 text

Sharding index

Slide 7

Slide 7 text

Sharding index shard 3 shard 4 shard 1 shard 2

Slide 8

Slide 8 text

Sharding index shard 3 shard 4 shard 1 shard 2 node node

Slide 9

Slide 9 text

Sharding index node node shard 3 shard 1 shard 4 shard 2

Slide 10

Slide 10 text

Sharding index node node shard 3 shard 1 shard 4 shard 2 node node copy 1 copy 4 copy 3 copy 2

Slide 11

Slide 11 text

Sharding node node copy 1 copy 3 node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2

Slide 12

Slide 12 text

Sharding node node copy 1 copy 3 node node node node node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2

Slide 13

Slide 13 text

Sharding node node copy 1 copy 3 node node node node node node shard 1 shard 2 shard 3 copy 4 copy 2 shard 4

Slide 14

Slide 14 text

Sharding - multiple indices node node shard 1 copy 2 node node node node node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1

Slide 15

Slide 15 text

Search node node shard 1 copy 2 node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node

Slide 16

Slide 16 text

Search node node shard 1 copy 2 node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node

Slide 17

Slide 17 text

Search node node shard 1 copy 2 node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node

Slide 18

Slide 18 text

Search node node shard 1 copy 2 node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node

Slide 19

Slide 19 text

Search node node shard 1 copy 2 node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node

Slide 20

Slide 20 text

Important fact for later indexing & searching is done on shards, not indices

Slide 21

Slide 21 text

time based data To the subject at hand

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

easy to get, easy to index # curl -XPUT localhost:9200/tweets/tweet/426674590560305150 -d '{
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk”, "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Shard 1 Shard 2 Shard 3

Slide 26

Slide 26 text

Shard 1 Shard 2 Shard 3

Slide 27

Slide 27 text

Shard 1 Shard 2 Shard 3

Slide 28

Slide 28 text

Shard 1 Shard 2 Shard 3

Slide 29

Slide 29 text

Shard 1 Shard 2 Shard 3

Slide 30

Slide 30 text

no problem, just use more shards shard 1 shard 7 shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10

Slide 31

Slide 31 text

no problem, just use more shards shard 1 shard 7 shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10 # curl localhost:9200/index/_search?q=something

Slide 32

Slide 32 text

Reminds of a tile at my aunt’s house Today is the tomorrow we were all afraid of yesterday….

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

March 1 2 3

Slide 35

Slide 35 text

March 1 2 3

Slide 36

Slide 36 text

March 1 2 3 April 1 2 3

Slide 37

Slide 37 text

March 1 2 3 April 1 2 3 May 1 2 3

Slide 38

Slide 38 text

Cluster scales with time ar. 1 april 1 pril 2 mar. 1 april 1 april 2 may 1 may 2 may 2 may 2 june 2

Slide 39

Slide 39 text

Scopes searches mar. 1 april 1 april 2 # curl localhost:9200/may/_search?q=something mar. 1 april 1 april 2 may 1 may 2 may 2 may 2

Slide 40

Slide 40 text

one little tweak… # curl -XPUT localhost:9200/tweets_201401/tweet/426674590560305150 -d '{
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Another fact index is the basic unit of configuration

Slide 43

Slide 43 text

index templates curl -XPUT localhost:9200/_template/twitter -d ' { "template" : “twitter_*", "settings" : { "number_of_shards" : 4, "number_of_replicas" : 1 } }'

Slide 44

Slide 44 text

older data # elasticsearch.yml ! node.disk: spinning_disks curl -XPUT localhost:9200/twitter_2012*/_settings -d '{ "index.routing.allocation.include.disk" : “spinning_disks”, “index.routing.allocation.exclude.disk" : "ssd" }'

Slide 45

Slide 45 text

older data curl -XPOST localhost:9200/twitter_201404/_optimize ! curl -XPOST localhost:9200/twitter_201304/_close ! curl -XDELETE localhost:9200/twitter_201204/ pro tip: https://github.com/elasticsearch/curator

Slide 46

Slide 46 text

aliases curl -XPUT localhost:9200/_aliases -d ‘{ "actions": { "add": { "index": "twitter_201311", "alias": "last_2_months" }, "remove": { "index": "twitter_201309", "alias": "last_2_months" } } }'

Slide 47

Slide 47 text

Implications • Use indices to manage data as it scales over time • Use aliases to efficiently point your searches at the relevant shards

Slide 48

Slide 48 text

One More Thing.. Time is just a (strictly) monotonic function Primary keys are just as good

Slide 49

Slide 49 text

scalable multi-tenancy Users data

Slide 50

Slide 50 text

Data

Slide 51

Slide 51 text

Data Data Data Data Data Data Data Data Data Data Data Data

Slide 52

Slide 52 text

Solution 1 - index per user user 1 shard 1 user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2

Slide 53

Slide 53 text

Solution 1 - index per user user 1 shard 1 user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node

Slide 54

Slide 54 text

Solution 1 - index per user user 1 shard 1 user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node Overloaded

Slide 55

Slide 55 text

Solution 2 - all users in one index shard 1 shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12

Slide 56

Slide 56 text

Solution 2 - all users in one index shard 1 shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 limited horizon

Slide 57

Slide 57 text

Solution 12 - both shard 1 shard 2 shard 1 shard 1 shard 2 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 6 shard 2 user 6 shard 3 user 6 shard 4 user 6 index 1 index 2 index 3

Slide 58

Slide 58 text

Why is one index per user convenient? # curl -XGET localhost:9200/user_1/_search -d '{
 "query": { "match": { "body": "all the things" } } }'

Slide 59

Slide 59 text

What do we want? • Have the simplicity of one user per index • Have the scalability of solution 12

Slide 60

Slide 60 text

Aliases to the rescue curl -XPUT localhost:9200/_aliases -d ‘{ "actions": { "add": { "index": "users_group_1", "alias": "user_1", "filter": { "term": { "user": "user_1" } } } }' # curl -XGET localhost:9200/user_1/_search -d '{
 … }'

Slide 61

Slide 61 text

thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources