Two Years of Elasticsearch in Development and Production

Two Years of Elasticsearch in Development and Production

Presentation held at the Elasticsearch User Group Berlin on March 31, 2015.

00655e17a4f690cb462153f921f8eb77?s=128

Patrick Peschlow

March 31, 2015
Tweet

Transcript

  1. codecentric AG Patrick Peschlow Two Years of Elasticsearch in Development

    and Production
  2. codecentric AG Mapping − Disable the _all field (unless you

    really need it) ! − Prefer _source over _stored − _source is useful anyway (for updates, reindexing, highlighting) ! − Only analyze/compute what you need − not_analyzed, field norms, term frequencies and positions ! − Be careful with dynamic mapping and dynamic templates − Can lead to undesired fields or types in the index − Can considerably grow the cluster state
  3. codecentric AG Queries − Pagination − Don’t load too many

    results with a single query − Avoid deep pagination − Consider using the scan+scroll API when you don’t need sorting ! − Think about index-time vs. query-time solutions − Prefix query vs. edge ngrams ? − Sorting via script vs. indexing another field ? − Don’t be afraid to index a source field twice
  4. codecentric AG Filters and Caching − Use filters for yes/no

    criteria that don’t need scoring − In contrast to queries, filter results can be cached ! − Tricky caching behavior − Some filters are cached by default, others not (depends on cost) − Caching may also depend on how often filters are used − Pay special attention to compound filters ! − Possible to override caching behavior and cache key
  5. codecentric AG Filters and Ordering − Elements of bool filters

    are executed sequentially − Place more selective filters first ! − Consider using „accelerator“ filters − Redundant filters that reduce work for heavyweight filters ! − Learn about possible „strategy“ settings for filtered queries − Controls how filter and query parts are interleaved − Measure, don’t guess ! − Note: With ES 2.0 queries and filters might get unified
  6. codecentric AG Analysis Tooling − Use the search/explain feature (score

    computation) ! − Use the validate/explain feature (query rewriting, cache usage) ! − Make sure your analyzers work correctly − Use the analyze API − Check out the „inquisitor“ and „extended-analyze“ plugins ! − When in doubt, take a look at the terms in your index − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/ − „skywalker“ plugin
  7. codecentric AG Replication and Search Preference − With replicas, we

    can get different results for the same search − Searches are routed to replicas in „round robin“ fashion − Deleted documents still affect scoring − Segment merging (physical deletion) can differ among replicas ! ! ! ! ! − Solution: Use the search „preference“ parameter − For consistent results by user, choose user ID as preference doc1 doc2 doc3 doc4 doc1 doc2
  8. codecentric AG Aggregations (Facets) − Load aggregations as lazily as

    possible − Do you really need to offer all of them on the UI right away? − Can you hide some less relevant ones by default? ! − Only load aggregations once when retrieving paginated results − Consider not requesting them again when just switching the page − They likely stay the same ! − Many aggregations use approximation algorithms − Don’t expect results to be 100% true
  9. codecentric AG Field Data − Some operations require document field

    data − Sorting, aggregation, parent-child queries, some scripts ! − Field data is usually loaded for all documents − Leads to high memory consumption or OutOfMemoryError ! − Use „doc values“: Store field data on the file system − Let the OS do the caching − Can be enabled on a per-field basis ! − Note: With ES 2.0 „doc values“ might become the default
  10. codecentric AG Unit/Integration Testing − Set up a comprehensive test

    suite − Test expectations about matches − Prevent regressions when changing or modifying analyzers ! − The Elasticsearch Java client is embeddable − No mocks or test doubles needed ! − Try it by solving the „mapping challenge“ ! − https://github.com/peschlowp/elasticsearch-mapping-challenge
  11. codecentric AG Indexing and Real-Time Requirements − Default refresh interval:

    1 second − Targeted at human users ! − What if API clients want RYOW semantics for search ? − Refresh after every request ? ! − Recommendation: Leave RYOW to the primary database, if at all − Provide a separate API if needed
  12. codecentric AG Bulk Indexing − For optimum bulk size, consider

    document size not count ! − Be careful with merge throttling − Elasticsearch might throttle indexing anyway − Look out for „now throttling indexing“ log messages − Is it worth it? ! − Decrease refresh rate (or disable completely) ! − Reduce number of replicas (or set to zero) − Add missing replicas later, much cheaper than „live“ replication
  13. codecentric AG Update API − Update = Delete + Add

    − Only saves network traffic ! − Even small updates might take a while − Consider splitting (nested documents or parent-child relationships) ! − „Partial document“ update trickiness − Fields are replaced, except for inner objects which are merged − To replace inner objects, consider wrapping them in an array
  14. codecentric AG Cluster settings − Safety − Choose a unique

    cluster name − Consider using unicast discovery ! − Recovery − gateway.recover_after_nodes − gateway.recover_after_time − gateway.expected_nodes ! − Stability − minimum_master_nodes
  15. codecentric AG Split Brain ! ! ! ! ! !

    ! ! ! − Prevent split brains by partitioning − Set minimum_master_nodes to quorum
  16. codecentric AG Split Brain ! ! ! ! ! !

    ! ! ! − Prevent split brains when single links fail − Upgrade to ES 1.4.x
  17. codecentric AG Split Brain ! ! ! ! ! !

    ! ! ! − Monitor the cluster for split brains − Ask each node who is master − Use the cat master API
  18. codecentric AG Dedicated Master Nodes master Node 1 Other nodes

    master Node 3 Node 2 master
  19. codecentric AG Distributed Search Client Compute global statistics Get local

    top hits Get global top hits fields
  20. codecentric AG Aggregator Nodes Node 1 data Node 2 data

    Search client Node 3
  21. codecentric AG Aggregator Nodes Node 1 data Node 2 data

    client Node 3 Indexing preferable
  22. codecentric AG Java Clients − NodeClient − Joins the cluster

    as a client node − Potentially saves a network hop − Will participate in distributed searches ! − TransportClient − More lightweight than NodeClient ! − Some HTTP Client − Smaller memory footprint − Pay attention to settings: Chunking, long-lived HTTP connections
  23. codecentric AG Some Stories from Production − The close/open gamble

    ! − Last resort single node ! − The devastating query ! − About upgrades
  24. codecentric AG Designing for Scalability − Think about scaling right

    from the start − Fixed number of shards per index − Shard key cannot be changed later − Distributed searches are expensive ! − Patterns in the data can be used for optimization − Time-based data − User-based data
  25. codecentric AG User-based Data: Separate Indexes Index 1 Index 2

    Index N ... User 1 User 2 User N ! ! ! ! ! ! ! ! ! ! − Disadvantage: Resource consumption, larger cluster state
  26. codecentric AG User-based Data: Shared Index Shard 1 Shard 2

    Shard M ... Search by user 1 filter by user 1 ! ! ! ! ! ! ! ! ! ! − Disadvantage: Distributed search
  27. codecentric AG filter by user 1 User-based Data: Shared Index

    with Routing Shard 1 Shard 2 Shard M ... User 2 User 1 User 5 User 3 User 4 User 6 User N User N-1 Search by user 1 ! ! ! ! ! ! ! ! ! ! − Disadvantage: At most one shard per user (capacity)
  28. codecentric AG User-based Data: Aliases − With aliases the approach

    chosen can be hidden from clients − Aliases can even carry filter and routing information − Present separate „user“ indexes (aliases) to the client ! − Advantage − Flexibility: Adapt mapping to physical indexes/shards on demand ! − Limitation − Huge number of users means lots of aliases (cluster state) − Still much better than huge number of indexes
  29. codecentric AG Zero Downtime Migration − Possible reasons − Backwards-incompatible

    mapping changes − Index/shard reaches its capacity ! − Needs a lot of careful thinking − Especially challenging if the update API is used
  30. codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG


    Merscheider Straße 1
 42699 Solingen
 
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79
 patrick.peschlow@codecentric.de
 
 www.codecentric.de