Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch - from novice to expert

Elasticsearch - from novice to expert

Presentation held at the Coding Serbia Meetup in Novi Sad, Serbia, on September 25, 2014. Apart from the slides, the presentation featured various examples demonstrated using the Elasticsearch Marvel/Sense plugin. The slides contain a link to a gist showing those examples.

Patrick Peschlow

September 25, 2014
Tweet

More Decks by Patrick Peschlow

Other Decks in Technology

Transcript

  1. codecentric AG Crash course (demo with Sense) − Introduction −

    Quickstart − Analysis − Mapping − Search features − Sharding + Replication
  2. codecentric AG The road to expertise − Get the basics

    right − Map carefully − Tune analysis incrementally − Understand filters − Know about Lucene − Don’t let your cluster fool you − Use index aliases − Learn about plugins
  3. codecentric AG Map carefully − Disable the _all field, you

    definitely don’t want it ! − Keep the _source field enabled and don’t set any fields to _stored ! − Disable dynamic mapping (except where really needed) ! − Choose analyzers carefully (maybe even not_analyzed is enough?) ! − Consider mapping fields more than once, depending on the requirements ! − Existing mappings cannot be changed without deleting the type
  4. codecentric AG Tune analysis incrementally − Don’t guess ! −

    Use the explain feature for search result scores and query rewriting ! − Prevent regressions by having a comprehensive unit test suite − In Java land: embed Elasticsearch − Test expectations about matches ! − Make sure your analyzers work correctly − Use the analyze API (and maybe the extended-analyze plugin) ! − Understand why queries match or don’t match − Use Luke to see what’s in the index − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/
  5. codecentric AG Understand filters − Use filters instead of queries

    whenever you don’t need scoring − Many filters can get cached − You can even do filters-only (constant_score or match_all) ! − Compound filters (bool/and/or/not) are not cached − But you can still explicitly request caching by setting _cache ! − Prefer bool filters over and/or/not when combining cached filters − and/or/not don’t use the cache ! − Consider the scope of filters − May be applied before or after the query − Affects the scope of facets/aggregations − Often, „filtered query“ is what you need
  6. codecentric AG Lucene internals Visible to newly opened readers Segment

    flush() Segment flush() commit() Synced to Disk If desired, visible via NRT
  7. codecentric AG Lucene internals Segment flush() Segment flush() commit() Synced

    to Disk Executed heuristically (or explicitly via NRT) Explicit call (transaction)
  8. codecentric AG Transaction log Segment flush() Synced to Disk Persisted

    refresh() Segment flush() commit() + Reopen reader
  9. codecentric AG Transaction log Segment flush() Persisted refresh() Segment flush()

    Executed heuristically Executed regularly commit() Synced to Disk + Reopen reader
  10. codecentric AG Update API − Lucene doesn’t know updates !

    − Elasticsearch offers two approaches − Partial document − Script ! − Attention − Update = Delete + Add − Updates require _source − Partial document merges inner objects instead of replacing them
  11. codecentric AG Relations − Lucene documents are flat ! −

    Elasticsearch offers two alternatives − Nested objects − Parent/child mapping
  12. codecentric AG Cluster state − Shard state − red =

    Primary shard not allocated − yellow = Primary shard allocated but not all replicas − green = All shared allocated ! − Index state = Worst state of all shards of the index ! − Cluster state = Worst state of all indexes of the cluster
  13. codecentric AG Things to consider − Access − Choose a

    unique cluster name − Consider unicast vs. multicast discovery ! − Allocation awareness − Supports arbitrary rules to place shards or indexes on nodes ! − Nodes can have different roles: master, data, client ! − Pay attention to these settings: − minimum_master_nodes − gateway.recover_after_nodes − gateway.expected_nodes
  14. codecentric AG Write and read consistency − „consistency“ − all,

    quorum (default), one − How many shards need to be available to permit an operation ! − „replication=async“ − Return after the primary shard has safely stored the document − By default returns only after full replication is completed ! − „preference“ − On which shards to execute a search (default: round robin) − Possible values: local, primary, only some shards or nodes, arbitrary string
  15. codecentric AG Index alias − A logical name for one

    or more Elasticsearch index(es) − Decouples client view from physical storage ! − Use cases: − Zero downtime re-indexing − (Read-only) views on multiple indices ! − May be associated with a query − Interesting for implementing access control
  16. codecentric AG Thoughts on scalability − Choose number of shards

    depending on estimation and measurements − A little overallocation is OK − But not too much, as shards don’t come for free ! − If the amount of data exceeds the available shards, index aliases may help − Create another, identically configured index − Add new documents to the new index − Define an alias so that search considers both indexes − Advice: Work with aliases right from the start ! − Remember: − Search in an index with 50 shards = Search in 50 indexes with one shard each − In both cases, 50 Lucene indexes are searched
  17. codecentric AG Resources − The official blog
 http://www.elasticsearch.org/blog/ ! −

    The official book
 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html ! − The official reference
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html ! − Great blog
 https://www.found.no/foundation/ ! − The Sense examples shown in this talk
 https://gist.github.com/peschlowp/3aa550665ce3a417b617
  18. codecentric AG Questions? Dr. rer. nat. Patrick Peschlow
 codecentric AG


    Merscheider Straße 1
 42699 Solingen
 
 tel +49 (0) 212.23 36 28 54
 fax +49 (0) 212.23 36 28 79
 [email protected]
 
 www.codecentric.de