elasticsearch @ ferret go

elasticsearch @ ferret go ES UG Berlin Meetup, 2012-11-27 Fabian
Neumann (@hellp) Daniel Trümper (@truemped)

"ferret go" -- THE PROJECT * media analysis * online,
print, social * -> rss/atom -> storm -> ES -> web app * linguistics (sentiment, entity recognition etc.) * Also:redis, Python, Pyramid, ...

"ferret go" -- THE LOCATION * Bernau b. Berlin, Brandenburg
* Zickenschulze (German food) * No (good) Asian food * Rollator races * Like Kreuzberg without the fancy

"ferret go" -- THE PROJECT * shrt dmo

THE BLACK WEEK * WHY suddenly? * more data (1
October = 4 Julies) * moving indexes; (bulk) re-indexing * more users * long-term queries now more long-term * config-/brain-less ES setup (which is nice!) only worked for us so long

ES SETUP * 2 indices * 6 data nodes (i7,
8 cores, 32G mem, 16G for ES) * each index: 12 shards * 3 replicas = 72 shards per index (too much, we know ...)

SENSIBLE SHARD-BALANCING? INDX 1 INDX 2 NODE 1 ▒ ▒
▒ ▒ ▒ ▓ NODE 2 ▒ ▒ ▓ ▓ ▓ ▓ NODE 3 ▒ ▓ ▓ ▓ ▓ ▓ NODE 4 ▒ ▒ ▒ ▒ ▒ ▓ ... shard sizes ^-- 12G 0.5G --^

SENSIBLE SHARD-BALANCING? INDX 1 INDX 2 NODE 1 ▒ ▒
▒ ▒ ▒ ▓ NODE 2 ▒ ▒ ▓ ▓ ▓ ▓ NODE 3 ▒ ▓ ▓ ▓ ▓ ▓ NODE 4 ▒ ▒ ▒ ▒ ▒ ▓ ... ^-- also more complex queries shard sizes ^-- 12G 0.5G --^

SENSIBLE LOAD-BALANCING? NODE 1 ▒ ▒ ▒ ▒ ▒ <-
NODE 2 ▒ ▒ <- NODE 3 ▒ <- NODE 4 ▒ ▒ ▒ ▒ ▒ <- > import pyes > # All nodes in a list, passed to urllib3 PoolManager, > # free load-balancing, yay! > conn = pyes.ES([node1, node2, node3, node4]) > res = conn.search(query_model.to_es_query()) > return res

SENSIBLE LOAD-BALANCING? NODE 1 ▒ ▒ ▒ ▒ ▒ <-
<- <- <- <- <- NODE 2 ▒ ▒ <- NODE 3 ▒ NODE 4 ▒ ▒ ▒ ▒ ▒ > import pyes > # All nodes in a list, passed to urllib3 PoolManager, > # free load-balancing, yay! NOT! 3 are just fallback. Oops. > conn = pyes.ES([node1, node2, node3, node4]) “The PoolManager will take care of reusing connections for you whenever you request the same host.”

SENSIBLE NODE CONFIGURATION? NODE 1 ▒ ▒ ▒ ▒ ▒
/(x.x)\ <-- JVM NODE 2 ▒ ▒ NODE 3 ▒ NODE 4 ▒ ▒ ▒ ▒ ▒ /(x.x)\ $ grep cache /etc/elasticsearch/elasticsearch.yml $ (hey, that looked like /dev/null ...) $ grep OutOfMemoryErr /var/log/elasticsearch/heck.log | wc -l 1337 $ # ... or rather n00b

SENSIBLE NODE CONFIGURATION? NODE 1 ▒ ▒ ▒ ▒ ▒
\(^.^)/ <-- JVM NODE 2 ▒ ▒ NODE 3 ▒ NODE 4 ▒ ▒ ▒ ▒ ▒ \(^.^)/ $ grep cache /etc/elasticsearch/elasticsearch.yml index.cache.field.type: soft $ grep OutOfMemoryErr /var/log/elasticsearch/heck.log | wc -l 0 $ # much better

CURRENT (IMPROVED!) SITUATION

MANUAL BALANCED SHARDS INDX 1 INDX 2 NODE 1 ▒
▒ ▓ ▓ ▓ NODE 2 ▒ ▒ ▓ ▓ ▓ NODE 3 ▒ ▒ ▓ ▓ ▓ NODE 4 ▒ ▒ ▓ ▓ ▓

NO-DATA NODES FOR LOAD-BALANCING INDX 1 INDX 2 NODE 1
▒ ▒ ▓ ▓ ▓ <- <- NODE 2 ▒ ▒ ▓ ▓ ▓ <- <- NODE 3 ▒ ▒ ▓ ▓ ▓ <- <- NODE 4 ▒ ▒ ▓ ▓ ▓ <- <- NODE 5 <- <- <- <- <- <- <- <- <- NODE 6 <- <- <- <- <- <- <-

6 data nodes + some nodata nodes -> <- new
docs/s

nodata node :) free LB; easy as HAProxy still too
many shards :/ <- also queries/s :(

NEXT STEPS -- TECH LEVEL * time slicing (flexibility in
shard/index layout) * request/shard routing (but no good routing criteria yet) * further config optimizations (flush/refresh intervals etc.) * smoother recovery phases

NEXT STEPS -- APP LEVEL * less query load (e.g.
re-implement clustering process) * query optimizing (never cover the whole index, good, right?)

* thank you * dankeschön * дякую * merci beaucoup
* obrigado :)

AFTERMATH -- USER GROUP INSIGHTS * some problems known to
ES core devs * some will be fixed * ferret is a faceting-heavy app, which uses lots of memory. we need to be more careful about that. * JVM choice matters * avoid many growing pains, read this: http://asquera.de/opensource/2012/11/25/elasticsearch-pre-flight-checklist/

elasticsearch @ ferret go

elasticsearch @ ferret go

Fabian Neumann

Other Decks in Technology

Featured

Transcript

elasticsearch @ ferret go ES UG Berlin Meetup, 2012-11-27 Fabian

"ferret go" -- THE PROJECT * media analysis * online,

"ferret go" -- THE LOCATION * Bernau b. Berlin, Brandenburg

"ferret go" -- THE PROJECT * shrt dmo

THE BLACK WEEK * WHY suddenly? * more data (1

ES SETUP * 2 indices * 6 data nodes (i7,

SENSIBLE SHARD-BALANCING? INDX 1 INDX 2 NODE 1 ▒ ▒

SENSIBLE SHARD-BALANCING? INDX 1 INDX 2 NODE 1 ▒ ▒

SENSIBLE LOAD-BALANCING? NODE 1 ▒ ▒ ▒ ▒ ▒ <-

SENSIBLE LOAD-BALANCING? NODE 1 ▒ ▒ ▒ ▒ ▒ <-

SENSIBLE NODE CONFIGURATION? NODE 1 ▒ ▒ ▒ ▒ ▒

SENSIBLE NODE CONFIGURATION? NODE 1 ▒ ▒ ▒ ▒ ▒

CURRENT (IMPROVED!) SITUATION

MANUAL BALANCED SHARDS INDX 1 INDX 2 NODE 1 ▒

NO-DATA NODES FOR LOAD-BALANCING INDX 1 INDX 2 NODE 1

6 data nodes + some nodata nodes -> <- new

nodata node :) free LB; easy as HAProxy still too

NEXT STEPS -- TECH LEVEL * time slicing (flexibility in

NEXT STEPS -- APP LEVEL * less query load (e.g.

* thank you * dankeschön * дякую * merci beaucoup

AFTERMATH -- USER GROUP INSIGHTS * some problems known to