Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch UserGroup Berlin meetup

ElasticSearch UserGroup Berlin meetup

Small overview over the upcoming features in Elasticsearch 0.20 and the new index-aware ShardsAllocator in particular.

Simon Willnauer

January 29, 2013
Tweet

More Decks by Simon Willnauer

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Elastic Fantastic
    ElasticSearch Usergroup Berlin
    [email protected]
    @s1m0nw

    View Slide

  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    What is this guy talking about?
    • Shard Allocation
    • What is this and why do I need it?
    • Is it new?
    • What is new?
    • Improvements in the pipeline
    • “new stuff in Lucene 4” ...wait, in what?
    • things you care that will come soon in ES

    View Slide

  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    In the beginning was the single node...
    node 1
    1P 2R
    2P
    C
    curl -XPUT localhost:9200/index_1 -d ‘{
    “settings” : {
    “number_of_shards” : 3,
    “number_of_replicas” : 0
    }
    }’
    3P

    View Slide

  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    And two indices....
    node 1
    1P 2P
    C
    curl -XPUT localhost:9200/index_2 -d ‘{
    “settings” : {
    “number_of_shards” : 2,
    “number_of_replicas” : 0
    }
    }’
    1P 2P
    3P

    View Slide

  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    2nd Node... now what?
    node 2
    node 1
    1P 2P 1P
    2P
    3P
    2P 1P
    1P

    View Slide

  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    But you rather wanna have this, no?
    node 2
    node 1
    1P 2P 1P
    2P 1P
    1P
    3P
    3P

    View Slide

  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Quick Demo
    curl -XPUT localhost:9200/index_1 -d ‘{
    “settings” : {
    “number_of_shards” : 3,
    “number_of_replicas” : 0
    }
    }’
    curl -XPUT localhost:9200/index_2 -d ‘{
    “settings” : {
    “number_of_shards” : 2,
    “number_of_replicas” : 0
    }
    }’
    curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
    "cluster.routing.allocation.balance.index" : 0.0,
    "cluster.routing.allocation.balance.shard" : 1.0,
    "cluster.routing.allocation.balance.primary" : 0.0,
    }
    }'
    Behave like the previous ShardsAllocator

    View Slide

  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    What changed?
    • EvenShardCountAllocator
    • balanced across shards and nodes
    • no notion of an index
    • tries to put same amount of shards on each node
    • BalancedShardsAllocator
    • based on a weight function
    • weights are calculated per node in an index context
    • users can influence the weight of an attribute

    View Slide

  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    What can I adjust
    • how important for you is
    • balance over # of shards
    • balance over indices
    • balance over primaries
    • how aggressive rebalance acts
    • a threshold defining the minimum delta between 2 nodes
    to issue a rebalance operation. Default is 1.0f
    • ...more to come

    View Slide

  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example settings...
    curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
    "cluster.routing.allocation.balance.index" : 0.0,
    "cluster.routing.allocation.balance.shard" : 1.0,
    "cluster.routing.allocation.balance.primary" : 0.0,
    }
    }'
    curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
    "cluster.routing.allocation.balance.index" : 0.55,
    "cluster.routing.allocation.balance.shard" : 0.4,
    "cluster.routing.allocation.balance.primary" : 0.05,
    }
    }'
    Defaults:
    Acts like EvenShardCountsAllocator:

    View Slide

  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Future Work?
    • Ways to expand the weight function
    • size of a shard
    • average number of request on a shard
    • number of docs in the shard

    • Eventually we want the weight function to be
    customizable to be able to allow users to balance
    their cluster based on their needs.

    View Slide

  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Improvements in the Pipeline
    • Lucene 4.0 / 4.1
    • Codec Support (0.21)
    • Concurrent Flushing (0.21)
    • Spellchecking / Suggestions (0.21)
    • Similarity per Field
    • FieldData Refactoring
    • API (0.21)
    • Implementations (0.2?)

    View Slide

  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Lucene 4.0 / 4.1
    • Many features under the hood
    • Massive improvements in terms of memory
    consumption internally
    • Compression build in.
    • Fast FuzzyQuery
    • Faster Batch-Indexing
    • Bloom Filters build at index time
    • refresh might be much cheaper now
    • Default encoding on disk is based on blocks...

    View Slide

  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    FieldData?
    • Used for Faceting, Sorting, Scoring
    • Until 0.20 not very flexible
    • Implementation details leaked the interface
    • 0.21 adds a new interface in order to improve
    memory and runtime performance
    • new FieldData will allow specialized implementations /
    data-structures per field
    • Defaults will be much more memory efficient (UTF-8
    bytes vs. UTF-16 chars)
    • Future implementations can even read from
    MemoryMaps etc.

    View Slide

  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    That’s it folks....

    Ask your questions...!

    View Slide