Slide 1

Slide 1 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Elastic Fantastic ElasticSearch Usergroup Berlin [email protected] @s1m0nw

Slide 2

Slide 2 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited What is this guy talking about? • Shard Allocation • What is this and why do I need it? • Is it new? • What is new? • Improvements in the pipeline • “new stuff in Lucene 4” ...wait, in what? • things you care that will come soon in ES

Slide 3

Slide 3 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited In the beginning was the single node... node 1 1P 2R 2P C curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ 3P

Slide 4

Slide 4 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited And two indices.... node 1 1P 2P C curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ 1P 2P 3P

Slide 5

Slide 5 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited 2nd Node... now what? node 2 node 1 1P 2P 1P 2P 3P 2P 1P 1P

Slide 6

Slide 6 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited But you rather wanna have this, no? node 2 node 1 1P 2P 1P 2P 1P 1P 3P 3P

Slide 7

Slide 7 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Quick Demo curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' Behave like the previous ShardsAllocator

Slide 8

Slide 8 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited What changed? • EvenShardCountAllocator • balanced across shards and nodes • no notion of an index • tries to put same amount of shards on each node • BalancedShardsAllocator • based on a weight function • weights are calculated per node in an index context • users can influence the weight of an attribute

Slide 9

Slide 9 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited What can I adjust • how important for you is • balance over # of shards • balance over indices • balance over primaries • how aggressive rebalance acts • a threshold defining the minimum delta between 2 nodes to issue a rebalance operation. Default is 1.0f • ...more to come

Slide 10

Slide 10 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Example settings... curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.55, "cluster.routing.allocation.balance.shard" : 0.4, "cluster.routing.allocation.balance.primary" : 0.05, } }' Defaults: Acts like EvenShardCountsAllocator:

Slide 11

Slide 11 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Future Work? • Ways to expand the weight function • size of a shard • average number of request on a shard • number of docs in the shard • • Eventually we want the weight function to be customizable to be able to allow users to balance their cluster based on their needs.

Slide 12

Slide 12 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Improvements in the Pipeline • Lucene 4.0 / 4.1 • Codec Support (0.21) • Concurrent Flushing (0.21) • Spellchecking / Suggestions (0.21) • Similarity per Field • FieldData Refactoring • API (0.21) • Implementations (0.2?)

Slide 13

Slide 13 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Lucene 4.0 / 4.1 • Many features under the hood • Massive improvements in terms of memory consumption internally • Compression build in. • Fast FuzzyQuery • Faster Batch-Indexing • Bloom Filters build at index time • refresh might be much cheaper now • Default encoding on disk is based on blocks...

Slide 14

Slide 14 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited FieldData? • Used for Faceting, Sorting, Scoring • Until 0.20 not very flexible • Implementation details leaked the interface • 0.21 adds a new interface in order to improve memory and runtime performance • new FieldData will allow specialized implementations / data-structures per field • Defaults will be much more memory efficient (UTF-8 bytes vs. UTF-16 chars) • Future implementations can even read from MemoryMaps etc.

Slide 15

Slide 15 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited That’s it folks.... Ask your questions...!