ElasticSearch UserGroup Berlin meetup

ElasticSearch UserGroup Berlin meetup

Small overview over the upcoming features in Elasticsearch 0.20 and the new index-aware ShardsAllocator in particular.

01cf6773354da93aa886bfb7a7d26c9d?s=128

Simon Willnauer

January 29, 2013
Tweet

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Elastic Fantastic ElasticSearch Usergroup Berlin simon.willnauer@elasticsearch.com @s1m0nw
  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What is this guy talking about? • Shard Allocation • What is this and why do I need it? • Is it new? • What is new? • Improvements in the pipeline • “new stuff in Lucene 4” ...wait, in what? • things you care that will come soon in ES
  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited In the beginning was the single node... node 1 1P 2R 2P C curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ 3P
  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited And two indices.... node 1 1P 2P C curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ 1P 2P 3P
  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited 2nd Node... now what? node 2 node 1 1P 2P 1P 2P 3P 2P 1P 1P
  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited But you rather wanna have this, no? node 2 node 1 1P 2P 1P 2P 1P 1P 3P 3P
  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Quick Demo curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' Behave like the previous ShardsAllocator
  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What changed? • EvenShardCountAllocator • balanced across shards and nodes • no notion of an index • tries to put same amount of shards on each node • BalancedShardsAllocator • based on a weight function • weights are calculated per node in an index context • users can influence the weight of an attribute
  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What can I adjust • how important for you is • balance over # of shards • balance over indices • balance over primaries • how aggressive rebalance acts • a threshold defining the minimum delta between 2 nodes to issue a rebalance operation. Default is 1.0f • ...more to come
  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example settings... curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.55, "cluster.routing.allocation.balance.shard" : 0.4, "cluster.routing.allocation.balance.primary" : 0.05, } }' Defaults: Acts like EvenShardCountsAllocator:
  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Future Work? • Ways to expand the weight function • size of a shard • average number of request on a shard • number of docs in the shard • <your requirement goes here> • Eventually we want the weight function to be customizable to be able to allow users to balance their cluster based on their needs.
  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Improvements in the Pipeline • Lucene 4.0 / 4.1 • Codec Support (0.21) • Concurrent Flushing (0.21) • Spellchecking / Suggestions (0.21) • Similarity per Field • FieldData Refactoring • API (0.21) • Implementations (0.2?)
  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lucene 4.0 / 4.1 • Many features under the hood • Massive improvements in terms of memory consumption internally • Compression build in. • Fast FuzzyQuery • Faster Batch-Indexing • Bloom Filters build at index time • refresh might be much cheaper now • Default encoding on disk is based on blocks...
  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited FieldData? • Used for Faceting, Sorting, Scoring • Until 0.20 not very flexible • Implementation details leaked the interface • 0.21 adds a new interface in order to improve memory and runtime performance • new FieldData will allow specialized implementations / data-structures per field • Defaults will be much more memory efficient (UTF-8 bytes vs. UTF-16 chars) • Future implementations can even read from MemoryMaps etc.
  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited That’s it folks.... Ask your questions...!