Scaling Elasticsearch

Scaling Elasticsearch janko-m @jankomarohnic at AT&T M2X

Elasticsearch is a distributed, RESTful search engine

Elasticsearch is a distributed, RESTful search engine … { }

Elasticsearch is a distributed, RESTful search engine GET /users/_search {"query":{"term":{"first_name":"janko"}}}
=> { ... }

Elasticsearch is a distributed, RESTful search engine "car" A car
(or automobile) is a wheeled motor vehicle used for transportation. Most deﬁnitions of car say they run primarily on roads, seat one to eight people, have four tires, and mainly transport people rather than goods. Cars came into global use during the 20th century, and developed economies depend on them. The year 1886 is regarded as the birth year of the modern car, when German inventor Karl Benz built his Benz Patent-Motorwagen. Cars did not become widely available until the early 20th century. One of the ﬁrst cars that was accessible to the masses was the 1908 Model T, an American car manufactured by the Ford Motor Company. Cars were rapidly adopted in the US, where they replaced animal-drawn carriages and carts, but took much longer to be accepted in Western Europe and other parts of the world.

Cluster – group of nodes Node – server that holds
part of the data Index – similar to a "table" in a relational database Shard – partition of an index

M2X • Stores time-series data • Datapoints (temperature, humidity, speed
etc.) • Geolocations • Triggers • Widgets & Dashboards

M2X – datapoints Month New Datapoints July 2016 42,216,421 August
2016 38,456,296 September 2016 43,252,336 October 2016 89,572,942 November 2016 222,608,051 December 2016 338,588,651 January 2017 2,326,317,955 February 2017 7,192,489,182 ⠇ ⠇

#1 – Time-series indices datapoints

#1 – Time-series indices … datapoints-201501 datapoints-201502 datapoints-201503 datapoints-201504 datapoints-201505
… datapoints-201601 datapoints-201602 datapoints-201603 datapoints-201604 datapoints-201605 …

#1 – Time-series indices … datapoints-20170101 datapoints-20170102 datapoints-20170103 datapoints-20170104 datapoints-20170105
… datapoints-20170201 datapoints-20170202 datapoints-20170203 datapoints-20170204 datapoints-20170205 …

#2 – Reduce number of indices you need to query
… datapoints-20161229 datapoints-20161230 datapoints-20161231 datapoints-20170101 datapoints-20170102 datapoints-20170103 datapoints-20170104 datapoints-20170105 datapoints-20170106 datapoints-20170107 … … 29 December 2016 30 December 2016 31 December 2016 1 January 2017 2 January 2017 3 January 2017 4 January 2017 5 January 2017 6 January 2017 7 January 2017 … Device created

#3 – Query less indices at once Q: Fetch 100
latest datapoints GET /all-datapoints/_search GET /v1-datapoints-20170616/_search GET /v1-datapoints-20170615/_search GET /v1-datapoints-20170614/_search GET /v1-datapoints-20170613/_search GET /v1-datapoints-20170612/_search …

#4 – # shards ≈ # nodes

#4 – # shards ≈ # nodes 5 shards

#4 – # shards ≈ # nodes 15 shards

#5– Determine a good sharding strategy QUERY ⚙ ⚙ ⚙
⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ RESULT

#5– Determine a good sharding strategy QUERY ⚙ RESULT

#5– Determine a good sharding strategy Device Stream 1 Stream
2 Stream 3 Stream 4 Stream 5 … SHARDING Q: Fetch stream values => 1 shard Q: Fetch device values => 15 shards

#5– Determine a good sharding strategy Device Stream 1 Stream
2 Stream 3 Stream 4 Stream 5 … SHARDING Q: Fetch stream values => 1 shard Q: Fetch device values => 1 shard

#6 – Upgrade Elasticsearch 5.1, 5.0 2.4, 2.3, 2.2, 2.1,
2.0 1.7, 1.6, 1.5, 1.4, 1.3 0.90

#6 – Upgrade Elasticsearch v1.x Loads ﬁelds into memory and
creates a data structure for searching at query time • Slow searches • No available memory for caching • OutOfMemory exceptions for large indices

#6 – Upgrade Elasticsearch v2.x Creates a columnar data structure
on disk at write time • Fast searches • Small memory usage • Works for indices of any size

#7 – Scroll in large pages GET /datapoints/_search?size=1000&scroll=true GET /_search/_scroll
GET /_search/_scroll GET /_search/_scroll GET /_search/_scroll …

#7 – Scroll in large pages GET /datapoints/_search?size=10000&scroll=true GET /_search/_scroll
GET /_search/_scroll GET /_search/_scroll GET /_search/_scroll … 5x faster datapoint exports

#8 – Use cached counts when possible GET /all-datapoints/datapoint/_count

#8 – Use cached counts when possible GET /_cat/indices index
docs.count users 110843 … datapoints 43879824976 … devices 180301 streams 11793537 Datapoint count speedup from 18s to 0.7s

#9 – Use timeouts API request Elasticsearch request 30s

#9 – Use timeouts GET /datapoints/_search {"query":{…}} GET /datapoints/_search {"query":{…},"timeout":"30s"}

#9 – Use timeouts API request Elasticsearch request 30s

#10 – Use unscored queries { "query": { "bool": {
"must": { "term": { "first_name": "janko" } } } } }

#10 – Use unscored queries { "_index": "accounts", "_type": "account",
"_id": "AVWGIxr7A4FHE06BKOJi", "_score": 11.933598, "_source": { … } }

#10 – Use unscored queries { "query": { "bool": {
"filter": { "term": { "first_name": "Janko" } } } } }

The End

Scaling Elasticsearch

Scaling Elasticsearch

More Decks by Janko Marohnić

Other Decks in Programming

Featured

Transcript