Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch UserGroup Berlin meetup

ElasticSearch UserGroup Berlin meetup

Small overview over the upcoming features in Elasticsearch 0.20 and the new index-aware ShardsAllocator in particular.

Avatar for Simon Willnauer

Simon Willnauer

January 29, 2013
Tweet

More Decks by Simon Willnauer

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Elastic Fantastic ElasticSearch Usergroup Berlin simon.willnauer@elasticsearch.com @s1m0nw
  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What is this guy talking about? • Shard Allocation • What is this and why do I need it? • Is it new? • What is new? • Improvements in the pipeline • “new stuff in Lucene 4” ...wait, in what? • things you care that will come soon in ES
  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited In the beginning was the single node... node 1 1P 2R 2P C curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ 3P
  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited And two indices.... node 1 1P 2P C curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ 1P 2P 3P
  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited 2nd Node... now what? node 2 node 1 1P 2P 1P 2P 3P 2P 1P 1P
  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited But you rather wanna have this, no? node 2 node 1 1P 2P 1P 2P 1P 1P 3P 3P
  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Quick Demo curl -XPUT localhost:9200/index_1 -d ‘{ “settings” : { “number_of_shards” : 3, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/index_2 -d ‘{ “settings” : { “number_of_shards” : 2, “number_of_replicas” : 0 } }’ curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' Behave like the previous ShardsAllocator
  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What changed? • EvenShardCountAllocator • balanced across shards and nodes • no notion of an index • tries to put same amount of shards on each node • BalancedShardsAllocator • based on a weight function • weights are calculated per node in an index context • users can influence the weight of an attribute
  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited What can I adjust • how important for you is • balance over # of shards • balance over indices • balance over primaries • how aggressive rebalance acts • a threshold defining the minimum delta between 2 nodes to issue a rebalance operation. Default is 1.0f • ...more to come
  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example settings... curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.0, "cluster.routing.allocation.balance.shard" : 1.0, "cluster.routing.allocation.balance.primary" : 0.0, } }' curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.balance.index" : 0.55, "cluster.routing.allocation.balance.shard" : 0.4, "cluster.routing.allocation.balance.primary" : 0.05, } }' Defaults: Acts like EvenShardCountsAllocator:
  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Future Work? • Ways to expand the weight function • size of a shard • average number of request on a shard • number of docs in the shard • <your requirement goes here> • Eventually we want the weight function to be customizable to be able to allow users to balance their cluster based on their needs.
  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Improvements in the Pipeline • Lucene 4.0 / 4.1 • Codec Support (0.21) • Concurrent Flushing (0.21) • Spellchecking / Suggestions (0.21) • Similarity per Field • FieldData Refactoring • API (0.21) • Implementations (0.2?)
  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lucene 4.0 / 4.1 • Many features under the hood • Massive improvements in terms of memory consumption internally • Compression build in. • Fast FuzzyQuery • Faster Batch-Indexing • Bloom Filters build at index time • refresh might be much cheaper now • Default encoding on disk is based on blocks...
  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited FieldData? • Used for Faceting, Sorting, Scoring • Until 0.20 not very flexible • Implementation details leaked the interface • 0.21 adds a new interface in order to improve memory and runtime performance • new FieldData will allow specialized implementations / data-structures per field • Defaults will be much more memory efficient (UTF-8 bytes vs. UTF-16 chars) • Future implementations can even read from MemoryMaps etc.
  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited That’s it folks.... Ask your questions...!