Slide 1

Slide 1 text

codecentric AG Patrick Peschlow Elasticsearch Lessons Learned

Slide 2

Slide 2 text

codecentric AG Beginner’s lesson Quick introduction

Slide 3

Slide 3 text

codecentric AG Lessons learned Map carefully

Slide 4

Slide 4 text

codecentric AG Map carefully − Disable the _all field, you definitely don’t want it ! − Keep the _source field enabled and don’t set any fields to _stored ! − Disable dynamic mapping (except where really needed) ! − Choose analyzers carefully (for many fields, „not_analyzed“ might be enough) ! − Consider mapping fields more than once ! − Existing mappings cannot be changed without deleting the type

Slide 5

Slide 5 text

codecentric AG Lessons learned Understand filters

Slide 6

Slide 6 text

codecentric AG Understand filters − Use filters instead of queries whenever you don’t need scoring − Elasticsearch is able to cache the results of filters ! − Tricky caching behavior − Most simple filters are cached by default, but some not (e.g., geo) − Compound filters (bool/and/or/not) are not cached − You can still explicitly request caching by setting _cache − Bool filters query the cache for their (sub-)filters, but and/or/not filters don’t ! − But: This topic seems to be a moving target ! − Consider the scope of filters − Apply to query, facets/aggregations, or both? − Often „filtered query“ is what you need

Slide 7

Slide 7 text

codecentric AG Lessons learned Use analysis tooling

Slide 8

Slide 8 text

codecentric AG Use analysis tooling − Use the explain feature for search result scores ! − Use the validate/explain feature to see what the query looks like after rewriting ! − Make sure your analyzers work correctly − Use the analyze API or the „inquisitor“ plugin − And maybe the „extended-analyze“ plugin ! − When in doubt, take a look at the terms in your index − Use the Lucene tool „Luke“ − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/ − The „skywalker“ plugin is also useful to understand Elasticsearch indexing

Slide 9

Slide 9 text

codecentric AG Lessons learned Create regression tests

Slide 10

Slide 10 text

codecentric AG Create regression tests − Set up a comprehensive test suite − Test expectations about matches − Prevent regressions when changing or modifying analyzers ! − The Elasticsearch Java client is embeddable − Consider using it for a unit test suite even if not using Java otherwise ! − Try it by solving the „mapping challenge“! − https://github.com/peschlowp/elasticsearch-mapping-challenge

Slide 11

Slide 11 text

codecentric AG Lessons learned Know some Lucene basics

Slide 12

Slide 12 text

codecentric AG Know some Lucene basics

Slide 13

Slide 13 text

codecentric AG Know some Lucene basics Segment flush()

Slide 14

Slide 14 text

codecentric AG Know some Lucene basics Segment flush()

Slide 15

Slide 15 text

codecentric AG Know some Lucene basics Segment flush() Segment flush()

Slide 16

Slide 16 text

codecentric AG Know some Lucene basics Segment flush() Segment flush() commit() Synced to Disk

Slide 17

Slide 17 text

codecentric AG Know some Lucene basics Visible to newly opened readers Segment flush() Segment flush() commit() Synced to Disk If desired, visible via NRT

Slide 18

Slide 18 text

codecentric AG Know some Lucene basics Segment flush() Segment flush() commit() Synced to Disk Executed heuristically (or explicitly via NRT) Explicit call (transaction)

Slide 19

Slide 19 text

codecentric AG Know some Lucene basics

Slide 20

Slide 20 text

codecentric AG Know some Lucene basics Persisted

Slide 21

Slide 21 text

codecentric AG Know some Lucene basics Persisted refresh()

Slide 22

Slide 22 text

codecentric AG Know some Lucene basics Persisted + Reopen reader
 for NRT refresh() Segment flush()

Slide 23

Slide 23 text

codecentric AG Know some Lucene basics Persisted + Reopen reader
 for NRT refresh() Segment flush()

Slide 24

Slide 24 text

codecentric AG Know some Lucene basics flush() Persisted + Reopen reader
 for NRT refresh() Segment flush()

Slide 25

Slide 25 text

codecentric AG Know some Lucene basics Segment flush() Synced to Disk Persisted refresh() Segment flush() commit() + Reopen reader

Slide 26

Slide 26 text

codecentric AG Know some Lucene basics Segment flush() Persisted refresh() Segment flush() Executed heuristically Executed regularly commit() Synced to Disk + Reopen reader

Slide 27

Slide 27 text

codecentric AG Know some Lucene basics All documents persisted and searchable. Transaction log can be cleared.

Slide 28

Slide 28 text

codecentric AG Lessons learned Understand updates

Slide 29

Slide 29 text

codecentric AG Understand updates − With Lucene we have Update = Delete + Add ! − Elasticsearch offers two types of update: „Partial document“ and „Script“ − Internally still delete + add − This means even small updates might take a while ! − Partial document update trickiness − Updated fields are replaced, except for inner objects which are merged − To replace inner objects, consider wrapping them in a single-element array

Slide 30

Slide 30 text

codecentric AG Lessons learned Understand relations

Slide 31

Slide 31 text

codecentric AG Understand relations − Lucene documents are flat − Arrays of inner objects are flattened when indexing − Cannot build one-to-many relationships this way ! − Elasticsearch offers two alternatives: nested objects and parent/child mapping ! − Parent/child mapping is very flexible − Documents stay separated − May be queried together ! − Nested objects − Better performance, but documents are combined into one

Slide 32

Slide 32 text

codecentric AG Lessons learned Embrace aliases

Slide 33

Slide 33 text

codecentric AG Embrace aliases − A logical name for one or more Elasticsearch index(es) − Decouples client view from physical storage ! − Use cases − (Read-only) views on one or multiple indexes − Dynamically split indexes without affecting clients − Zero downtime reindexing ! − Limitations − Writes are only permitted for aliases that point to a single index ! − Recommendation: Use aliases right from the start

Slide 34

Slide 34 text

codecentric AG Lessons learned Don’t underestimate reindexing

Slide 35

Slide 35 text

codecentric AG Lessons learned Adjust default settings

Slide 36

Slide 36 text

codecentric AG Adjust default settings − Choose a unique cluster name ! − Consider using unicast discovery ! − Configure (at least) these settings according to your requirements − minimum_master_nodes − gateway.recover_after_nodes − gateway.recover_after_time − gateway.expected_nodes

Slide 37

Slide 37 text

codecentric AG Lessons learned Think about read preference

Slide 38

Slide 38 text

codecentric AG Think about read preference − Search has a „preference“ parameter − Which shard (primary or some replica) to contact ! − By default „round robin“ − May cause some nasty effects − Documents marked as deleted still affect scoring − If segment merging on shards differs, may return different search results ! − Set „preference“ according to your needs: − Possible values: local, primary, only some shards or nodes, arbitrary string − One approach: Always direct a user to the same shard

Slide 39

Slide 39 text

codecentric AG Lessons learned Satisfy scaling requirements

Slide 40

Slide 40 text

codecentric AG Satisfy scaling requirements − Number of shards needs to be fixed at index creation time ! − Choose number of shards depending on estimation and measurements − A little overallocation is OK, yet shards don’t come for free ! − If it turns out you need to scale higher, index aliases may help − Create another, identically configured index and add new documents to it − Define an alias so that search considers both indexes − But: Requires additional effort for updates and deletes (which index to address?) − Alternative: Migrate to a new index with more shards ! − Remember: − Search 1 index with 50 shards =~ Search 50 indexes with 1 shard each − In both cases, 50 Lucene indexes are searched

Slide 41

Slide 41 text

codecentric AG Lessons learned Give plugins a try

Slide 42

Slide 42 text

codecentric AG Resources − The official blog
 http://www.elasticsearch.org/blog/ ! − The official book
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html ! − The official reference
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html ! − The official Google Group
 https://groups.google.com/d/forum/elasticsearch ! − The Sense examples shown in this talk
 https://gist.github.com/peschlowp/560411af9bac3be909c0

Slide 43

Slide 43 text

codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG
 Merscheider Straße 1
 42699 Solingen
 
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79
 [email protected]
 
 www.codecentric.de