Not all Nodes are Created Equal - Scaling Elasticsearch

Scaling Elasticsearch Not all Nodes are Created Equal Boaz Leskes
@bleskes

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission
is strictly prohibited Elasticsearch • Real time Search and Analytics Engine • Schema-free, REST & JSON based document store • Distributed and horizontally scalable • Open Source: Apache License 2.0 • Zero configuration • Written in Java, extensible

is strictly prohibited

is strictly prohibited Big Numbers Are No Fun they cost money, time and hairTM

is strictly prohibited so we want to know that…. • that money needs to be spent • but also that we’re safe

is strictly prohibited So how do we go from • I need to index 500GB (or 500MB) per day of application data • I need to serve 10.000 (or 3) requests per seconds

is strictly prohibited To… • I need 20 nodes (or 2). • With SSDs (or maybe spinning disks are fine) • With 64GB (8GB) each.

the essentials How Elasticsearch Works

Each node: • stores data: • indexes, stores and searches
  data  • can become master: • performs cluster   administration  • receives requests: • coordination and response merging 3 nodes node node node

data  node data  node data  node data  node data  node
data  node data  node data  node master  node master  node master  node role separation data  node data  node client  node client  node client  node client  node

Lots of data == Sharding index shard 3 shard 4
shard 1 shard 2 node node shard 3 shard 1 shard 4 shard 2 node node copy 1 copy 4 copy 3 copy 2

More nodes, less sharing node node copy 1 copy 3
node node node node node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2 shard 3 copy 4 copy 2 shard 4

indexing single doc shard 4 shard 1 # curl -XPUT
localhost:9200/index1/type/id -d { f: 1 } any node shard 2 shard 3

bulk indexing shard 4 shard 1 # curl -XPUT localhost:9200/index1/type/_bulk
-d …… any node shard 2 shard 3 more shards == scaling without sharing resources

search shard 4 shard 1 any node shard 2 shard
3 # curl localhost:9200/index/_search?q=something

search - more replicas shard 4 shard 1 any node
shard 2 shard 3 # curl localhost:9200/index/_search?q=something shard 4 shard 1 shard 2 shard 3 # curl localhost:9200/index/_search?q=something more replicas == scaling without sharing resources

search - single shard, single request shard nosql 128 New
York lat=6.9 lon=50 F 2 6 8 48 112 379 6 9 10 48 11 13 14 134 207 6 9 2 4 9 36 103 310 search time ~ number docs hits ~ number of docs in shard

in short • indexing: • # shards → higher throughput
• searching: • # shards → more data (ﬁxed latency) • # replicas → higher throughput • and: • shard capacity is the base metric

Sizing a shard measurement

shard size shard node single indexer single searcher doc mix
query mix search takes 160ms data time shard size

shard throughput (version 1) shard node scale indexers ﬁxed multiple
searchers query mix known size max docs/sec

shard throughput (version 2) shard node scale searchers known size
ﬁxed multiple indexers doc mix max q/sec

what did we learn? • Max Shard Capacity • dictated
by latency • (Max) Shard/Node Throughput • indexing • searching • Resources needed to support a shard   under required load • CPU • Memory • IO

what does it tell us? • How many shards we
need • to ﬁt the data • to support out indexing/searching requirement • How many shards we can put on and node • if we didn’t max out resources • How many nodes we need

do more with less? • Tweak Queries / data structures
• measure the effect • Invest in your real bottleneck • faster storage • more memory • more cores • Use your resources efﬁciently • dedicated nodes for hot indices • shared nodes for old data • Sometimes, you just need those nodes

keep it simple • a few nodes can take you
a surprising long way • defaults == predictable • Dedicated master nodes • as soon as you start to grow • Even simple experiments teach your a lot

thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources

Not all Nodes are Created Equal - Scaling Elast...

Not all Nodes are Created Equal - Scaling Elasticsearch

Boaz Leskes

More Decks by Boaz Leskes

Other Decks in Technology

Featured

Transcript

Scaling Elasticsearch Not all Nodes are Created Equal Boaz Leskes

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

the essentials How Elasticsearch Works

Each node: • stores data: • indexes, stores and searches

data  node data  node data  node data  node data  node

Lots of data == Sharding index shard 3 shard 4

More nodes, less sharing node node copy 1 copy 3

indexing single doc shard 4 shard 1 # curl -XPUT

bulk indexing shard 4 shard 1 # curl -XPUT localhost:9200/index1/type/_bulk

search shard 4 shard 1 any node shard 2 shard

search - more replicas shard 4 shard 1 any node

search - single shard, single request shard nosql 128 New

in short • indexing: • # shards → higher throughput

Sizing a shard measurement

shard size shard node single indexer single searcher doc mix

shard throughput (version 1) shard node scale indexers ﬁxed multiple

shard throughput (version 2) shard node scale searchers known size

what did we learn? • Max Shard Capacity • dictated

what does it tell us? • How many shards we

do more with less? • Tweak Queries / data structures

keep it simple • a few nodes can take you

thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources