$30 off During Our Annual Pro Sale. View Details »

The Do's and Don’ts of Elasticsearch Scalability and Performance

The Do's and Don’ts of Elasticsearch Scalability and Performance

Presentation held at the Berlin Buzzwords conference in Berlin on June 1, 2015.

Patrick Peschlow

June 01, 2015
Tweet

More Decks by Patrick Peschlow

Other Decks in Technology

Transcript

  1. codecentric AG
    Patrick Peschlow
    The Do's and Don’ts of
    Elasticsearch Scalability and Performance

    View Slide

  2. codecentric AG
    Think hard about
    your mapping

    View Slide

  3. codecentric AG
    Think hard about your mapping
    −Which fields to analyze? How to analyze them?
    !
    −Need term frequencies, positions, offsets? Field norms?
    !
    −Which fields to not analyze or not index/enable?
    !
    −_all
    !
    −_source vs stored fields

    View Slide

  4. codecentric AG
    Think hard about your mapping
    −Dynamic mapping/templates
    −Excessive number of fields?
    !
    −Index-time vs. query-time solutions
    !
    −Multi field, copy to, transform script
    !
    −Relations: parent-child/nested

    View Slide

  5. codecentric AG
    Design for scale

    View Slide

  6. codecentric AG
    Design for scale
    Shard 1 Shard 2 Shard M
    ...
    Search

    View Slide

  7. codecentric AG
    Design for scale
    2015-05-30 2015-05-31 2015-06-01
    Search for „last 3 days“
    2014-11-25
    ...

    View Slide

  8. codecentric AG
    Design for scale
    routing for
    user 1
    Shard 1 Shard 2 Shard M
    ...
    User 2 User 1 User 5
    User 3
    User 4
    User 6 User 7
    User 8
    Search by user 1

    View Slide

  9. codecentric AG
    Design for scale
    −Can documents/access be partitioned in a natural way?
    !
    −Need to find documents by ID (update/delete/get)?
    !
    −Know the relevant features
    −Routing, aliases, multi-index search
    !
    −Indices don’t come for free
    !
    −Measure the impact of distributed search

    View Slide

  10. codecentric AG
    Don’t create more

    shards than you need

    View Slide

  11. codecentric AG
    Don’t create more shards than you need
    −More shards
    −Enable larger indices
    −Scale operations on individual documents
    !
    −But shards don’t come for free
    !
    −Measure how many shards you need
    −When unsure, overallocate a little

    View Slide

  12. codecentric AG
    Don’t treat all
    nodes as equal

    View Slide

  13. codecentric AG
    Don’t treat all nodes as equal
    −Cluster nodes
    −Master nodes, data nodes, client/aggregator nodes
    !
    −Client applications
    −HTTP?
    −Transport protocol?
    −Join the cluster as a client node?
    −In Java: HTTP client vs TransportClient vs NodeClient

    View Slide

  14. codecentric AG
    Don’t run
    wasteful queries

    View Slide

  15. codecentric AG
    Don’t run wasteful queries
    −Only request as many hits as you need
    !
    −Avoid deep pagination
    !
    −Use scan+scroll to iterate without sorting
    !
    −Only query indices/shards that may contain hits

    View Slide

  16. codecentric AG
    Engineer queries

    View Slide

  17. codecentric AG
    Engineer queries
    −Measure performance
    −Set up production-like cluster and data
    !
    −Use filters
    !
    −Check and tune filter caching
    !
    −Reduce work for heavyweight filters
    −Order them, consider accelerators

    View Slide

  18. codecentric AG
    Care about field data

    View Slide

  19. codecentric AG
    Care about field data
    −Used for sorting, aggregation, parent-child, scripts, …
    !
    −High memory consumption or OutOfMemoryError
    −Cache limit, circuit breakers avoid the worst
    !
    −Evaluate field data requirements in advance
    !
    −Use „doc values“ to store expensive field data on disk

    View Slide

  20. codecentric AG
    Be prepared
    for reindexing

    View Slide

  21. codecentric AG
    Be prepared for reindexing
    −Reasons for reindexing
    −Mapping changes
    −Index/shard reaches its capacity
    −Reduce number of indices/shards

    View Slide

  22. codecentric AG
    Be prepared for reindexing
    −Reindexing procedure depends on many factors
    −Data source?
    −Zero downtime?
    −Update API usage?
    −Possible deletes?
    −Designated component (queue) for indexing?

    View Slide

  23. codecentric AG
    Be prepared for reindexing
    −Use existing tooling
    !
    −Do it yourself? Use scan+scroll and bulk indexing
    !
    −Follow best practices
    −Use aliases
    −Disable refresh
    −Decrease number of replicas

    View Slide

  24. codecentric AG
    Don’t use the defaults

    View Slide

  25. codecentric AG
    Don’t use the defaults
    −Cluster settings
    −cluster name, discovery, minimum_master_nodes
    −recovery
    !
    −Number of shards and replicas
    !
    −Refresh interval
    !
    −Thread pool and cache configuration

    View Slide

  26. codecentric AG
    Monitor

    View Slide

  27. codecentric AG
    Monitor
    −Cluster health, split brains
    !
    −Thread pools and caches
    !
    −Garbage collection (the actual JVM output)
    !
    −Slow log
    !
    −Hot threads

    View Slide

  28. codecentric AG
    Follow the production
    recommendations

    View Slide

  29. codecentric AG
    Follow the production recommendations
    −A good start would be to read/research them at all
    !
    −Just to mention a few
    −The more memory, the better
    −Isolate as much as possible
    −SSDs and local storage recommended

    View Slide

  30. codecentric AG
    Don’t test
    in production

    View Slide

  31. codecentric AG
    Don’t test in production
    −Use a test environment
    !
    −Test the cluster
    −Single node restarts, rolling upgrades, node loss
    −Full cluster restarts
    !
    −Test behavior under expected load
    −Queries
    −Indexing

    View Slide

  32. codecentric AG
    Read the guide

    View Slide

  33. codecentric AG
    Read the guide
    −Elasticsearch: The Definitive Guide
    − https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
    !
    −Elasticsearch Reference
    − https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

    View Slide

  34. codecentric AG
    Questions?
    Dr. rer. nat. Patrick Peschlow

    codecentric AG

    Merscheider Straße 1

    42699 Solingen


    tel +49 (0) 212.23 36 28 54

    fax +49 (0) 212.23 36 28 79

    [email protected]

    www.codecentric.de

    View Slide