Upgrade to Pro — share decks privately, control downloads, hide ads and more …

elasticsearch @ ferret go

elasticsearch @ ferret go

Short presentation on how we use (and *failed* to use properly) elasticsearch at ferret go, a media analysis startup. Given at the ES User Group Berlin, November 2012.

Fabian Neumann

November 27, 2012
Tweet

Other Decks in Technology

Transcript

  1. elasticsearch
    @
    ferret go
    ES UG Berlin Meetup, 2012-11-27
    Fabian Neumann (@hellp)
    Daniel Trümper (@truemped)

    View full-size slide

  2. "ferret go" -- THE PROJECT
    * media analysis
    * online, print, social
    * -> rss/atom -> storm -> ES -> web app
    * linguistics (sentiment, entity
    recognition etc.)
    * Also:redis, Python, Pyramid, ...

    View full-size slide

  3. "ferret go" -- THE LOCATION
    * Bernau b. Berlin, Brandenburg
    * Zickenschulze (German food)
    * No (good) Asian food
    * Rollator races
    * Like Kreuzberg without the fancy

    View full-size slide

  4. "ferret go" -- THE PROJECT
    * shrt dmo

    View full-size slide

  5. THE BLACK WEEK
    * WHY suddenly?
    * more data (1 October = 4 Julies)
    * moving indexes; (bulk) re-indexing
    * more users
    * long-term queries now more long-term
    * config-/brain-less ES setup (which is
    nice!) only worked for us so long

    View full-size slide

  6. ES SETUP
    * 2 indices
    * 6 data nodes
    (i7, 8 cores, 32G mem, 16G for ES)
    * each index: 12 shards * 3 replicas
    = 72 shards per index
    (too much, we know ...)

    View full-size slide

  7. SENSIBLE SHARD-BALANCING?
    INDX 1 INDX 2
    NODE 1 ▒ ▒ ▒ ▒ ▒ ▓
    NODE 2 ▒ ▒ ▓ ▓ ▓ ▓
    NODE 3 ▒ ▓ ▓ ▓ ▓ ▓
    NODE 4 ▒ ▒ ▒ ▒ ▒ ▓
    ...
    shard sizes ^-- 12G 0.5G --^

    View full-size slide

  8. SENSIBLE SHARD-BALANCING?
    INDX 1 INDX 2
    NODE 1 ▒ ▒ ▒ ▒ ▒ ▓
    NODE 2 ▒ ▒ ▓ ▓ ▓ ▓
    NODE 3 ▒ ▓ ▓ ▓ ▓ ▓
    NODE 4 ▒ ▒ ▒ ▒ ▒ ▓
    ... ^-- also more complex queries
    shard sizes ^-- 12G 0.5G --^

    View full-size slide

  9. SENSIBLE LOAD-BALANCING?
    NODE 1 ▒ ▒ ▒ ▒ ▒ <-
    NODE 2 ▒ ▒ <-
    NODE 3 ▒ <-
    NODE 4 ▒ ▒ ▒ ▒ ▒ <-
    > import pyes
    > # All nodes in a list, passed to urllib3 PoolManager,
    > # free load-balancing, yay!
    > conn = pyes.ES([node1, node2, node3, node4])
    > res = conn.search(query_model.to_es_query())
    > return res

    View full-size slide

  10. SENSIBLE LOAD-BALANCING?
    NODE 1 ▒ ▒ ▒ ▒ ▒ <- <- <- <- <- <-
    NODE 2 ▒ ▒ <-
    NODE 3 ▒
    NODE 4 ▒ ▒ ▒ ▒ ▒
    > import pyes
    > # All nodes in a list, passed to urllib3 PoolManager,
    > # free load-balancing, yay! NOT! 3 are just fallback. Oops.
    > conn = pyes.ES([node1, node2, node3, node4])
    “The PoolManager will take care of reusing connections
    for you whenever you request the same host.”

    View full-size slide

  11. SENSIBLE NODE CONFIGURATION?
    NODE 1 ▒ ▒ ▒ ▒ ▒ /(x.x)\ <-- JVM
    NODE 2 ▒ ▒
    NODE 3 ▒
    NODE 4 ▒ ▒ ▒ ▒ ▒ /(x.x)\
    $ grep cache /etc/elasticsearch/elasticsearch.yml
    $ (hey, that looked like /dev/null ...)
    $ grep OutOfMemoryErr /var/log/elasticsearch/heck.log | wc -l
    1337
    $ # ... or rather n00b

    View full-size slide

  12. SENSIBLE NODE CONFIGURATION?
    NODE 1 ▒ ▒ ▒ ▒ ▒ \(^.^)/ <-- JVM
    NODE 2 ▒ ▒
    NODE 3 ▒
    NODE 4 ▒ ▒ ▒ ▒ ▒ \(^.^)/
    $ grep cache /etc/elasticsearch/elasticsearch.yml
    index.cache.field.type: soft
    $ grep OutOfMemoryErr /var/log/elasticsearch/heck.log | wc -l
    0
    $ # much better

    View full-size slide

  13. CURRENT (IMPROVED!) SITUATION

    View full-size slide

  14. MANUAL BALANCED SHARDS
    INDX 1 INDX 2
    NODE 1 ▒ ▒ ▓ ▓ ▓
    NODE 2 ▒ ▒ ▓ ▓ ▓
    NODE 3 ▒ ▒ ▓ ▓ ▓
    NODE 4 ▒ ▒ ▓ ▓ ▓

    View full-size slide

  15. NO-DATA NODES FOR LOAD-BALANCING
    INDX 1 INDX 2
    NODE 1 ▒ ▒ ▓ ▓ ▓ <- <-
    NODE 2 ▒ ▒ ▓ ▓ ▓ <- <-
    NODE 3 ▒ ▒ ▓ ▓ ▓ <- <-
    NODE 4 ▒ ▒ ▓ ▓ ▓ <- <-
    NODE 5 <- <- <- <- <- <- <- <- <-
    NODE 6 <- <- <- <- <- <- <-

    View full-size slide

  16. 6 data nodes
    + some nodata
    nodes ->
    <- new docs/s

    View full-size slide

  17. nodata node :)
    free LB; easy
    as HAProxy
    still too many
    shards :/
    <- also
    queries/s :(

    View full-size slide

  18. NEXT STEPS -- TECH LEVEL
    * time slicing (flexibility in
    shard/index layout)
    * request/shard routing (but no good
    routing criteria yet)
    * further config optimizations
    (flush/refresh intervals etc.)
    * smoother recovery phases

    View full-size slide

  19. NEXT STEPS -- APP LEVEL
    * less query load (e.g. re-implement
    clustering process)
    * query optimizing (never cover the
    whole index, good, right?)

    View full-size slide

  20. * thank you
    * dankeschön
    * дякую
    * merci beaucoup
    * obrigado
    :)

    View full-size slide

  21. AFTERMATH -- USER GROUP INSIGHTS
    * some problems known to ES core devs
    * some will be fixed
    * ferret is a faceting-heavy app, which
    uses lots of memory. we need to be
    more careful about that.
    * JVM choice matters
    * avoid many growing pains, read this:
    http://asquera.de/opensource/2012/11/25/elasticsearch-pre-flight-checklist/

    View full-size slide