$30 off During Our Annual Pro Sale. View Details »

Elasticsearch Lessons Learned

Elasticsearch Lessons Learned

Presentation held at the code.talks conference in Hamburg on October 9, 2014. Apart from the slides, the presentation featured various examples demonstrated using the Elasticsearch Marvel/Sense plugin. The slides contain a link to a gist showing those examples.

Patrick Peschlow

October 09, 2014
Tweet

More Decks by Patrick Peschlow

Other Decks in Technology

Transcript

  1. codecentric AG
    Patrick Peschlow
    Elasticsearch Lessons Learned

    View Slide

  2. codecentric AG
    Beginner’s lesson
    Quick introduction

    View Slide

  3. codecentric AG
    Lessons learned
    Map carefully

    View Slide

  4. codecentric AG
    Map carefully
    − Disable the _all field, you definitely don’t want it
    !
    − Keep the _source field enabled and don’t set any fields to _stored
    !
    − Disable dynamic mapping (except where really needed)
    !
    − Choose analyzers carefully (for many fields, „not_analyzed“ might be enough)
    !
    − Consider mapping fields more than once
    !
    − Existing mappings cannot be changed without deleting the type

    View Slide

  5. codecentric AG
    Lessons learned
    Understand filters

    View Slide

  6. codecentric AG
    Understand filters
    − Use filters instead of queries whenever you don’t need scoring
    − Elasticsearch is able to cache the results of filters
    !
    − Tricky caching behavior
    − Most simple filters are cached by default, but some not (e.g., geo)
    − Compound filters (bool/and/or/not) are not cached
    − You can still explicitly request caching by setting _cache
    − Bool filters query the cache for their (sub-)filters, but and/or/not filters don’t
    !
    − But: This topic seems to be a moving target
    !
    − Consider the scope of filters
    − Apply to query, facets/aggregations, or both?
    − Often „filtered query“ is what you need

    View Slide

  7. codecentric AG
    Lessons learned
    Use analysis tooling

    View Slide

  8. codecentric AG
    Use analysis tooling
    − Use the explain feature for search result scores
    !
    − Use the validate/explain feature to see what the query looks like after rewriting
    !
    − Make sure your analyzers work correctly
    − Use the analyze API or the „inquisitor“ plugin
    − And maybe the „extended-analyze“ plugin
    !
    − When in doubt, take a look at the terms in your index
    − Use the Lucene tool „Luke“
    − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/
    − The „skywalker“ plugin is also useful to understand Elasticsearch indexing

    View Slide

  9. codecentric AG
    Lessons learned
    Create regression tests

    View Slide

  10. codecentric AG
    Create regression tests
    − Set up a comprehensive test suite
    − Test expectations about matches
    − Prevent regressions when changing or modifying analyzers
    !
    − The Elasticsearch Java client is embeddable
    − Consider using it for a unit test suite even if not using Java otherwise
    !
    − Try it by solving the „mapping challenge“!
    − https://github.com/peschlowp/elasticsearch-mapping-challenge

    View Slide

  11. codecentric AG
    Lessons learned
    Know some Lucene basics

    View Slide

  12. codecentric AG
    Know some Lucene basics

    View Slide

  13. codecentric AG
    Know some Lucene basics
    Segment
    flush()

    View Slide

  14. codecentric AG
    Know some Lucene basics
    Segment
    flush()

    View Slide

  15. codecentric AG
    Know some Lucene basics
    Segment
    flush()
    Segment
    flush()

    View Slide

  16. codecentric AG
    Know some Lucene basics
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk

    View Slide

  17. codecentric AG
    Know some Lucene basics
    Visible to newly opened readers
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk
    If desired, visible via NRT

    View Slide

  18. codecentric AG
    Know some Lucene basics
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk
    Executed heuristically
    (or explicitly via NRT)
    Explicit call (transaction)

    View Slide

  19. codecentric AG
    Know some Lucene basics

    View Slide

  20. codecentric AG
    Know some Lucene basics
    Persisted

    View Slide

  21. codecentric AG
    Know some Lucene basics
    Persisted
    refresh()

    View Slide

  22. codecentric AG
    Know some Lucene basics
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  23. codecentric AG
    Know some Lucene basics
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  24. codecentric AG
    Know some Lucene basics
    flush()
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  25. codecentric AG
    Know some Lucene basics
    Segment
    flush()
    Synced to Disk
    Persisted
    refresh()
    Segment
    flush() commit()
    + Reopen reader

    View Slide

  26. codecentric AG
    Know some Lucene basics
    Segment
    flush()
    Persisted
    refresh()
    Segment
    flush()
    Executed heuristically
    Executed regularly
    commit()
    Synced to Disk
    + Reopen reader

    View Slide

  27. codecentric AG
    Know some Lucene basics
    All documents persisted and searchable. Transaction log can be cleared.

    View Slide

  28. codecentric AG
    Lessons learned
    Understand updates

    View Slide

  29. codecentric AG
    Understand updates
    − With Lucene we have Update = Delete + Add
    !
    − Elasticsearch offers two types of update: „Partial document“ and „Script“
    − Internally still delete + add
    − This means even small updates might take a while
    !
    − Partial document update trickiness
    − Updated fields are replaced, except for inner objects which are merged
    − To replace inner objects, consider wrapping them in a single-element array

    View Slide

  30. codecentric AG
    Lessons learned
    Understand relations

    View Slide

  31. codecentric AG
    Understand relations
    − Lucene documents are flat
    − Arrays of inner objects are flattened when indexing
    − Cannot build one-to-many relationships this way
    !
    − Elasticsearch offers two alternatives: nested objects and parent/child mapping
    !
    − Parent/child mapping is very flexible
    − Documents stay separated
    − May be queried together
    !
    − Nested objects
    − Better performance, but documents are combined into one

    View Slide

  32. codecentric AG
    Lessons learned
    Embrace aliases

    View Slide

  33. codecentric AG
    Embrace aliases
    − A logical name for one or more Elasticsearch index(es)
    − Decouples client view from physical storage
    !
    − Use cases
    − (Read-only) views on one or multiple indexes
    − Dynamically split indexes without affecting clients
    − Zero downtime reindexing
    !
    − Limitations
    − Writes are only permitted for aliases that point to a single index
    !
    − Recommendation: Use aliases right from the start

    View Slide

  34. codecentric AG
    Lessons learned
    Don’t underestimate reindexing

    View Slide

  35. codecentric AG
    Lessons learned
    Adjust default settings

    View Slide

  36. codecentric AG
    Adjust default settings
    − Choose a unique cluster name
    !
    − Consider using unicast discovery
    !
    − Configure (at least) these settings according to your requirements
    − minimum_master_nodes
    − gateway.recover_after_nodes
    − gateway.recover_after_time
    − gateway.expected_nodes

    View Slide

  37. codecentric AG
    Lessons learned
    Think about read preference

    View Slide

  38. codecentric AG
    Think about read preference
    − Search has a „preference“ parameter
    − Which shard (primary or some replica) to contact
    !
    − By default „round robin“
    − May cause some nasty effects
    − Documents marked as deleted still affect scoring
    − If segment merging on shards differs, may return different search results
    !
    − Set „preference“ according to your needs:
    − Possible values: local, primary, only some shards or nodes, arbitrary string
    − One approach: Always direct a user to the same shard

    View Slide

  39. codecentric AG
    Lessons learned
    Satisfy scaling requirements

    View Slide

  40. codecentric AG
    Satisfy scaling requirements
    − Number of shards needs to be fixed at index creation time
    !
    − Choose number of shards depending on estimation and measurements
    − A little overallocation is OK, yet shards don’t come for free
    !
    − If it turns out you need to scale higher, index aliases may help
    − Create another, identically configured index and add new documents to it
    − Define an alias so that search considers both indexes
    − But: Requires additional effort for updates and deletes (which index to address?)
    − Alternative: Migrate to a new index with more shards
    !
    − Remember:
    − Search 1 index with 50 shards =~ Search 50 indexes with 1 shard each
    − In both cases, 50 Lucene indexes are searched

    View Slide

  41. codecentric AG
    Lessons learned
    Give plugins a try

    View Slide

  42. codecentric AG
    Resources
    − The official blog

    http://www.elasticsearch.org/blog/
    !
    − The official book

    http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html
    !
    − The official reference

    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html
    !
    − The official Google Group

    https://groups.google.com/d/forum/elasticsearch
    !
    − The Sense examples shown in this talk

    https://gist.github.com/peschlowp/560411af9bac3be909c0

    View Slide

  43. codecentric AG
    Questions?
    Dr. rer. nat. Patrick Peschlow

    codecentric AG

    Merscheider Straße 1

    42699 Solingen


    tel +49 (0) 212.23 36 28 54

    fax +49 (0) 212.23 36 28 79

    [email protected]

    www.codecentric.de

    View Slide