Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch Lessons Learned

Elasticsearch Lessons Learned

Presentation held at the code.talks conference in Hamburg on October 9, 2014. Apart from the slides, the presentation featured various examples demonstrated using the Elasticsearch Marvel/Sense plugin. The slides contain a link to a gist showing those examples.

00655e17a4f690cb462153f921f8eb77?s=128

Patrick Peschlow

October 09, 2014
Tweet

Transcript

  1. codecentric AG Patrick Peschlow Elasticsearch Lessons Learned

  2. codecentric AG Beginner’s lesson Quick introduction

  3. codecentric AG Lessons learned Map carefully

  4. codecentric AG Map carefully − Disable the _all field, you

    definitely don’t want it ! − Keep the _source field enabled and don’t set any fields to _stored ! − Disable dynamic mapping (except where really needed) ! − Choose analyzers carefully (for many fields, „not_analyzed“ might be enough) ! − Consider mapping fields more than once ! − Existing mappings cannot be changed without deleting the type
  5. codecentric AG Lessons learned Understand filters

  6. codecentric AG Understand filters − Use filters instead of queries

    whenever you don’t need scoring − Elasticsearch is able to cache the results of filters ! − Tricky caching behavior − Most simple filters are cached by default, but some not (e.g., geo) − Compound filters (bool/and/or/not) are not cached − You can still explicitly request caching by setting _cache − Bool filters query the cache for their (sub-)filters, but and/or/not filters don’t ! − But: This topic seems to be a moving target ! − Consider the scope of filters − Apply to query, facets/aggregations, or both? − Often „filtered query“ is what you need
  7. codecentric AG Lessons learned Use analysis tooling

  8. codecentric AG Use analysis tooling − Use the explain feature

    for search result scores ! − Use the validate/explain feature to see what the query looks like after rewriting ! − Make sure your analyzers work correctly − Use the analyze API or the „inquisitor“ plugin − And maybe the „extended-analyze“ plugin ! − When in doubt, take a look at the terms in your index − Use the Lucene tool „Luke“ − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/ − The „skywalker“ plugin is also useful to understand Elasticsearch indexing
  9. codecentric AG Lessons learned Create regression tests

  10. codecentric AG Create regression tests − Set up a comprehensive

    test suite − Test expectations about matches − Prevent regressions when changing or modifying analyzers ! − The Elasticsearch Java client is embeddable − Consider using it for a unit test suite even if not using Java otherwise ! − Try it by solving the „mapping challenge“! − https://github.com/peschlowp/elasticsearch-mapping-challenge
  11. codecentric AG Lessons learned Know some Lucene basics

  12. codecentric AG Know some Lucene basics

  13. codecentric AG Know some Lucene basics Segment flush()

  14. codecentric AG Know some Lucene basics Segment flush()

  15. codecentric AG Know some Lucene basics Segment flush() Segment flush()

  16. codecentric AG Know some Lucene basics Segment flush() Segment flush()

    commit() Synced to Disk
  17. codecentric AG Know some Lucene basics Visible to newly opened

    readers Segment flush() Segment flush() commit() Synced to Disk If desired, visible via NRT
  18. codecentric AG Know some Lucene basics Segment flush() Segment flush()

    commit() Synced to Disk Executed heuristically (or explicitly via NRT) Explicit call (transaction)
  19. codecentric AG Know some Lucene basics

  20. codecentric AG Know some Lucene basics Persisted

  21. codecentric AG Know some Lucene basics Persisted refresh()

  22. codecentric AG Know some Lucene basics Persisted + Reopen reader


    for NRT refresh() Segment flush()
  23. codecentric AG Know some Lucene basics Persisted + Reopen reader


    for NRT refresh() Segment flush()
  24. codecentric AG Know some Lucene basics flush() Persisted + Reopen

    reader
 for NRT refresh() Segment flush()
  25. codecentric AG Know some Lucene basics Segment flush() Synced to

    Disk Persisted refresh() Segment flush() commit() + Reopen reader
  26. codecentric AG Know some Lucene basics Segment flush() Persisted refresh()

    Segment flush() Executed heuristically Executed regularly commit() Synced to Disk + Reopen reader
  27. codecentric AG Know some Lucene basics All documents persisted and

    searchable. Transaction log can be cleared.
  28. codecentric AG Lessons learned Understand updates

  29. codecentric AG Understand updates − With Lucene we have Update

    = Delete + Add ! − Elasticsearch offers two types of update: „Partial document“ and „Script“ − Internally still delete + add − This means even small updates might take a while ! − Partial document update trickiness − Updated fields are replaced, except for inner objects which are merged − To replace inner objects, consider wrapping them in a single-element array
  30. codecentric AG Lessons learned Understand relations

  31. codecentric AG Understand relations − Lucene documents are flat −

    Arrays of inner objects are flattened when indexing − Cannot build one-to-many relationships this way ! − Elasticsearch offers two alternatives: nested objects and parent/child mapping ! − Parent/child mapping is very flexible − Documents stay separated − May be queried together ! − Nested objects − Better performance, but documents are combined into one
  32. codecentric AG Lessons learned Embrace aliases

  33. codecentric AG Embrace aliases − A logical name for one

    or more Elasticsearch index(es) − Decouples client view from physical storage ! − Use cases − (Read-only) views on one or multiple indexes − Dynamically split indexes without affecting clients − Zero downtime reindexing ! − Limitations − Writes are only permitted for aliases that point to a single index ! − Recommendation: Use aliases right from the start
  34. codecentric AG Lessons learned Don’t underestimate reindexing

  35. codecentric AG Lessons learned Adjust default settings

  36. codecentric AG Adjust default settings − Choose a unique cluster

    name ! − Consider using unicast discovery ! − Configure (at least) these settings according to your requirements − minimum_master_nodes − gateway.recover_after_nodes − gateway.recover_after_time − gateway.expected_nodes
  37. codecentric AG Lessons learned Think about read preference

  38. codecentric AG Think about read preference − Search has a

    „preference“ parameter − Which shard (primary or some replica) to contact ! − By default „round robin“ − May cause some nasty effects − Documents marked as deleted still affect scoring − If segment merging on shards differs, may return different search results ! − Set „preference“ according to your needs: − Possible values: local, primary, only some shards or nodes, arbitrary string − One approach: Always direct a user to the same shard
  39. codecentric AG Lessons learned Satisfy scaling requirements

  40. codecentric AG Satisfy scaling requirements − Number of shards needs

    to be fixed at index creation time ! − Choose number of shards depending on estimation and measurements − A little overallocation is OK, yet shards don’t come for free ! − If it turns out you need to scale higher, index aliases may help − Create another, identically configured index and add new documents to it − Define an alias so that search considers both indexes − But: Requires additional effort for updates and deletes (which index to address?) − Alternative: Migrate to a new index with more shards ! − Remember: − Search 1 index with 50 shards =~ Search 50 indexes with 1 shard each − In both cases, 50 Lucene indexes are searched
  41. codecentric AG Lessons learned Give plugins a try

  42. codecentric AG Resources − The official blog
 http://www.elasticsearch.org/blog/ ! −

    The official book
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html ! − The official reference
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html ! − The official Google Group
 https://groups.google.com/d/forum/elasticsearch ! − The Sense examples shown in this talk
 https://gist.github.com/peschlowp/560411af9bac3be909c0
  43. codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG


    Merscheider Straße 1
 42699 Solingen
 
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79
 patrick.peschlow@codecentric.de
 
 www.codecentric.de