Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Do's and Don’ts of Elasticsearch Scalability and Performance

The Do's and Don’ts of Elasticsearch Scalability and Performance

Presentation held at the Berlin Buzzwords conference in Berlin on June 1, 2015.

00655e17a4f690cb462153f921f8eb77?s=128

Patrick Peschlow

June 01, 2015
Tweet

Transcript

  1. codecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch

    Scalability and Performance
  2. codecentric AG Think hard about your mapping

  3. codecentric AG Think hard about your mapping −Which fields to

    analyze? How to analyze them? ! −Need term frequencies, positions, offsets? Field norms? ! −Which fields to not analyze or not index/enable? ! −_all ! −_source vs stored fields
  4. codecentric AG Think hard about your mapping −Dynamic mapping/templates −Excessive

    number of fields? ! −Index-time vs. query-time solutions ! −Multi field, copy to, transform script ! −Relations: parent-child/nested
  5. codecentric AG Design for scale

  6. codecentric AG Design for scale Shard 1 Shard 2 Shard

    M ... Search
  7. codecentric AG Design for scale 2015-05-30 2015-05-31 2015-06-01 Search for

    „last 3 days“ 2014-11-25 ...
  8. codecentric AG Design for scale routing for user 1 Shard

    1 Shard 2 Shard M ... User 2 User 1 User 5 User 3 User 4 User 6 User 7 User 8 Search by user 1
  9. codecentric AG Design for scale −Can documents/access be partitioned in

    a natural way? ! −Need to find documents by ID (update/delete/get)? ! −Know the relevant features −Routing, aliases, multi-index search ! −Indices don’t come for free ! −Measure the impact of distributed search
  10. codecentric AG Don’t create more
 shards than you need

  11. codecentric AG Don’t create more shards than you need −More

    shards −Enable larger indices −Scale operations on individual documents ! −But shards don’t come for free ! −Measure how many shards you need −When unsure, overallocate a little
  12. codecentric AG Don’t treat all nodes as equal

  13. codecentric AG Don’t treat all nodes as equal −Cluster nodes

    −Master nodes, data nodes, client/aggregator nodes ! −Client applications −HTTP? −Transport protocol? −Join the cluster as a client node? −In Java: HTTP client vs TransportClient vs NodeClient
  14. codecentric AG Don’t run wasteful queries

  15. codecentric AG Don’t run wasteful queries −Only request as many

    hits as you need ! −Avoid deep pagination ! −Use scan+scroll to iterate without sorting ! −Only query indices/shards that may contain hits
  16. codecentric AG Engineer queries

  17. codecentric AG Engineer queries −Measure performance −Set up production-like cluster

    and data ! −Use filters ! −Check and tune filter caching ! −Reduce work for heavyweight filters −Order them, consider accelerators
  18. codecentric AG Care about field data

  19. codecentric AG Care about field data −Used for sorting, aggregation,

    parent-child, scripts, … ! −High memory consumption or OutOfMemoryError −Cache limit, circuit breakers avoid the worst ! −Evaluate field data requirements in advance ! −Use „doc values“ to store expensive field data on disk
  20. codecentric AG Be prepared for reindexing

  21. codecentric AG Be prepared for reindexing −Reasons for reindexing −Mapping

    changes −Index/shard reaches its capacity −Reduce number of indices/shards
  22. codecentric AG Be prepared for reindexing −Reindexing procedure depends on

    many factors −Data source? −Zero downtime? −Update API usage? −Possible deletes? −Designated component (queue) for indexing?
  23. codecentric AG Be prepared for reindexing −Use existing tooling !

    −Do it yourself? Use scan+scroll and bulk indexing ! −Follow best practices −Use aliases −Disable refresh −Decrease number of replicas
  24. codecentric AG Don’t use the defaults

  25. codecentric AG Don’t use the defaults −Cluster settings −cluster name,

    discovery, minimum_master_nodes −recovery ! −Number of shards and replicas ! −Refresh interval ! −Thread pool and cache configuration
  26. codecentric AG Monitor

  27. codecentric AG Monitor −Cluster health, split brains ! −Thread pools

    and caches ! −Garbage collection (the actual JVM output) ! −Slow log ! −Hot threads
  28. codecentric AG Follow the production recommendations

  29. codecentric AG Follow the production recommendations −A good start would

    be to read/research them at all ! −Just to mention a few −The more memory, the better −Isolate as much as possible −SSDs and local storage recommended
  30. codecentric AG Don’t test in production

  31. codecentric AG Don’t test in production −Use a test environment

    ! −Test the cluster −Single node restarts, rolling upgrades, node loss −Full cluster restarts ! −Test behavior under expected load −Queries −Indexing
  32. codecentric AG Read the guide

  33. codecentric AG Read the guide −Elasticsearch: The Definitive Guide −

    https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html ! −Elasticsearch Reference − https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
  34. codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG


    Merscheider Straße 1
 42699 Solingen
 
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79
 patrick.peschlow@codecentric.de
 
 www.codecentric.de