$30 off During Our Annual Pro Sale. View Details »

Not all Nodes are Created Equal - Scaling Elasticsearch

Boaz Leskes
November 24, 2014

Not all Nodes are Created Equal - Scaling Elasticsearch

Elasticsearch is famous for being easy to set up and for having good defaults. A single node can go a long way and a handful of nodes will deliver a surprising punch. However, there comes a point where generic defaults become less than ideal and cluster architecture starts to matter. In this session, we will talk about capacity planning and custom setups suitable for large Elasticsearch deployments.

The talk was given at the Elasticsearch meetup in Tel Aviv on 24 Nov 2014

Boaz Leskes

November 24, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. Scaling Elasticsearch
    Not all Nodes are Created Equal
    Boaz Leskes
    @bleskes

    View Slide

  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Elasticsearch
    • Real time Search and Analytics Engine
    • Schema-free, REST & JSON based document store
    • Distributed and horizontally scalable
    • Open Source: Apache License 2.0
    • Zero configuration
    • Written in Java, extensible

    View Slide

  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited

    View Slide

  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Big Numbers Are No Fun
    they cost money, time and hairTM

    View Slide

  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    so we want to know
    that….
    • that money needs to be spent
    • but also that we’re safe

    View Slide

  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    So how do we go from
    • I need to index 500GB (or 500MB) per day of
    application data
    • I need to serve 10.000 (or 3) requests per seconds

    View Slide

  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    To…
    • I need 20 nodes (or 2).
    • With SSDs (or maybe spinning disks are fine)
    • With 64GB (8GB) each.

    View Slide

  8. the essentials
    How Elasticsearch Works

    View Slide

  9. Each node:
    • stores data:
    • indexes, stores and searches 

    data

    • can become master:
    • performs cluster 

    administration

    • receives requests:
    • coordination and response
    merging
    3 nodes
    node node
    node

    View Slide

  10. data

    node
    data

    node
    data

    node
    data

    node
    data

    node
    data

    node
    data

    node
    data

    node
    master

    node
    master

    node
    master

    node
    role separation
    data

    node
    data

    node
    client

    node
    client

    node
    client

    node
    client

    node

    View Slide

  11. Lots of data == Sharding
    index
    shard 3 shard 4
    shard 1 shard 2
    node node
    shard 3
    shard 1
    shard 4
    shard 2
    node node
    copy 1
    copy 4
    copy 3
    copy 2

    View Slide

  12. More nodes, less sharing
    node node
    copy 1 copy 3
    node node
    node node node node
    shard 1
    shard 3 copy 4
    copy 2 shard 4
    shard 2
    shard 3 copy 4
    copy 2 shard 4

    View Slide

  13. indexing single doc
    shard 4
    shard 1
    # curl -XPUT localhost:9200/index1/type/id -d { f: 1 }
    any node
    shard 2 shard 3

    View Slide

  14. bulk indexing
    shard 4
    shard 1
    # curl -XPUT localhost:9200/index1/type/_bulk -d ……
    any node
    shard 2 shard 3
    more shards == scaling without sharing resources

    View Slide

  15. search
    shard 4
    shard 1
    any node
    shard 2 shard 3
    # curl localhost:9200/index/_search?q=something

    View Slide

  16. search - more replicas
    shard 4
    shard 1
    any node
    shard 2 shard 3
    # curl localhost:9200/index/_search?q=something
    shard 4
    shard 1 shard 2 shard 3
    # curl localhost:9200/index/_search?q=something
    more replicas == scaling without sharing resources

    View Slide

  17. search - single shard, single request
    shard
    nosql
    128
    New
    York
    lat=6.9
    lon=50
    F
    2 6 8 48 112 379
    6 9 10 48
    11 13 14 134 207
    6 9
    2 4 9 36 103 310
    search time ~ number docs hits ~ number of docs in shard

    View Slide

  18. in short
    • indexing:
    • # shards → higher throughput
    • searching:
    • # shards → more data (fixed latency)
    • # replicas → higher throughput
    • and:
    • shard capacity is the base metric

    View Slide

  19. Sizing a shard
    measurement

    View Slide

  20. shard size
    shard
    node
    single
    indexer
    single
    searcher
    doc
    mix
    query
    mix
    search takes 160ms
    data time
    shard size

    View Slide

  21. shard throughput (version 1)
    shard
    node
    scale
    indexers
    fixed multiple
    searchers
    query
    mix
    known size
    max docs/sec

    View Slide

  22. shard throughput (version 2)
    shard
    node
    scale
    searchers
    known size
    fixed multiple
    indexers
    doc
    mix
    max q/sec

    View Slide

  23. what did we learn?
    • Max Shard Capacity
    • dictated by latency
    • (Max) Shard/Node Throughput
    • indexing
    • searching
    • Resources needed to support a shard 

    under required load
    • CPU
    • Memory
    • IO

    View Slide

  24. what does it tell us?
    • How many shards we need
    • to fit the data
    • to support out indexing/searching requirement
    • How many shards we can put on and node
    • if we didn’t max out resources
    • How many nodes we need

    View Slide

  25. do more with less?
    • Tweak Queries / data structures
    • measure the effect
    • Invest in your real bottleneck
    • faster storage
    • more memory
    • more cores
    • Use your resources efficiently
    • dedicated nodes for hot indices
    • shared nodes for old data
    • Sometimes, you just need those nodes

    View Slide

  26. keep it simple
    • a few nodes can take you a surprising long way
    • defaults == predictable
    • Dedicated master nodes
    • as soon as you start to grow
    • Even simple experiments teach your a lot

    View Slide

  27. thank you!
    http://elasticsearch.com/support
    @elasticsearch , @bleskes
    http://elasticsearch.org/resources

    View Slide