$30 off During Our Annual Pro Sale. View Details »

Elasticsearch - from novice to expert

Elasticsearch - from novice to expert

Presentation held at the Coding Serbia Meetup in Novi Sad, Serbia, on September 25, 2014. Apart from the slides, the presentation featured various examples demonstrated using the Elasticsearch Marvel/Sense plugin. The slides contain a link to a gist showing those examples.

Patrick Peschlow

September 25, 2014
Tweet

More Decks by Patrick Peschlow

Other Decks in Technology

Transcript

  1. codecentric AG
    Patrick Peschlow
    Elasticsearch - from novice to expert

    View Slide

  2. codecentric AG
    Crash course (demo with Sense)
    − Introduction
    − Quickstart
    − Analysis
    − Mapping
    − Search features
    − Sharding + Replication

    View Slide

  3. codecentric AG
    The road to expertise
    − Get the basics right
    − Map carefully
    − Tune analysis incrementally
    − Understand filters
    − Know about Lucene
    − Don’t let your cluster fool you
    − Use index aliases
    − Learn about plugins

    View Slide

  4. codecentric AG
    The road to expertise
    Get the basics right

    View Slide

  5. codecentric AG
    The road to expertise
    Map carefully

    View Slide

  6. codecentric AG
    Map carefully
    − Disable the _all field, you definitely don’t want it
    !
    − Keep the _source field enabled and don’t set any fields to _stored
    !
    − Disable dynamic mapping (except where really needed)
    !
    − Choose analyzers carefully (maybe even not_analyzed is enough?)
    !
    − Consider mapping fields more than once, depending on the requirements
    !
    − Existing mappings cannot be changed without deleting the type

    View Slide

  7. codecentric AG
    The road to expertise
    Tune analysis incrementally

    View Slide

  8. codecentric AG
    Tune analysis incrementally
    − Don’t guess
    !
    − Use the explain feature for search result scores and query rewriting
    !
    − Prevent regressions by having a comprehensive unit test suite
    − In Java land: embed Elasticsearch
    − Test expectations about matches
    !
    − Make sure your analyzers work correctly
    − Use the analyze API (and maybe the extended-analyze plugin)
    !
    − Understand why queries match or don’t match
    − Use Luke to see what’s in the index
    − http://rosssimpson.com/blog/2014/05/06/using-luke-with-elasticsearch/

    View Slide

  9. codecentric AG
    The road to expertise
    Understand filters

    View Slide

  10. codecentric AG
    Understand filters
    − Use filters instead of queries whenever you don’t need scoring
    − Many filters can get cached
    − You can even do filters-only (constant_score or match_all)
    !
    − Compound filters (bool/and/or/not) are not cached
    − But you can still explicitly request caching by setting _cache
    !
    − Prefer bool filters over and/or/not when combining cached filters
    − and/or/not don’t use the cache
    !
    − Consider the scope of filters
    − May be applied before or after the query
    − Affects the scope of facets/aggregations
    − Often, „filtered query“ is what you need

    View Slide

  11. codecentric AG
    The road to expertise
    Know about Lucene

    View Slide

  12. codecentric AG
    Lucene internals

    View Slide

  13. codecentric AG
    Lucene internals
    Segment
    flush()

    View Slide

  14. codecentric AG
    Lucene internals
    Segment
    flush()

    View Slide

  15. codecentric AG
    Lucene internals
    Segment
    flush()
    Segment
    flush()

    View Slide

  16. codecentric AG
    Lucene internals
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk

    View Slide

  17. codecentric AG
    Lucene internals
    Visible to newly opened readers
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk
    If desired, visible via NRT

    View Slide

  18. codecentric AG
    Lucene internals
    Segment
    flush()
    Segment
    flush()
    commit()
    Synced to Disk
    Executed heuristically
    (or explicitly via NRT)
    Explicit call (transaction)

    View Slide

  19. codecentric AG
    Transaction log

    View Slide

  20. codecentric AG
    Transaction log
    Persisted

    View Slide

  21. codecentric AG
    Transaction log
    Persisted
    refresh()

    View Slide

  22. codecentric AG
    Transaction log
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  23. codecentric AG
    Transaction log
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  24. codecentric AG
    Transaction log
    flush()
    Persisted
    + Reopen reader

    for NRT
    refresh()
    Segment
    flush()

    View Slide

  25. codecentric AG
    Transaction log
    Segment
    flush()
    Synced to Disk
    Persisted
    refresh()
    Segment
    flush() commit()
    + Reopen reader

    View Slide

  26. codecentric AG
    Transaction log
    Segment
    flush()
    Persisted
    refresh()
    Segment
    flush()
    Executed heuristically
    Executed regularly
    commit()
    Synced to Disk
    + Reopen reader

    View Slide

  27. codecentric AG
    Transaction log
    All documents persisted and searchable:
    Transaction log can be cleared

    View Slide

  28. codecentric AG
    Update API
    − Lucene doesn’t know updates
    !
    − Elasticsearch offers two approaches
    − Partial document
    − Script
    !
    − Attention
    − Update = Delete + Add
    − Updates require _source
    − Partial document merges inner objects instead of replacing them

    View Slide

  29. codecentric AG
    Relations
    − Lucene documents are flat
    !
    − Elasticsearch offers two alternatives
    − Nested objects
    − Parent/child mapping

    View Slide

  30. codecentric AG
    The road to expertise
    Don’t let your cluster fool you

    View Slide

  31. codecentric AG
    Cluster state
    − Shard state
    − red = Primary shard not allocated
    − yellow = Primary shard allocated but not all replicas
    − green = All shared allocated
    !
    − Index state = Worst state of all shards of the index
    !
    − Cluster state = Worst state of all indexes of the cluster

    View Slide

  32. codecentric AG
    Things to consider
    − Access
    − Choose a unique cluster name
    − Consider unicast vs. multicast discovery
    !
    − Allocation awareness
    − Supports arbitrary rules to place shards or indexes on nodes
    !
    − Nodes can have different roles: master, data, client
    !
    − Pay attention to these settings:
    − minimum_master_nodes
    − gateway.recover_after_nodes
    − gateway.expected_nodes

    View Slide

  33. codecentric AG
    Write and read consistency
    − „consistency“
    − all, quorum (default), one
    − How many shards need to be available to permit an operation
    !
    − „replication=async“
    − Return after the primary shard has safely stored the document
    − By default returns only after full replication is completed
    !
    − „preference“
    − On which shards to execute a search (default: round robin)
    − Possible values: local, primary, only some shards or nodes, arbitrary string

    View Slide

  34. codecentric AG
    The road to expertise
    Use index aliases

    View Slide

  35. codecentric AG
    Index alias
    − A logical name for one or more Elasticsearch index(es)
    − Decouples client view from physical storage
    !
    − Use cases:
    − Zero downtime re-indexing
    − (Read-only) views on multiple indices
    !
    − May be associated with a query
    − Interesting for implementing access control

    View Slide

  36. codecentric AG
    Thoughts on scalability
    − Choose number of shards depending on estimation and measurements
    − A little overallocation is OK
    − But not too much, as shards don’t come for free
    !
    − If the amount of data exceeds the available shards, index aliases may help
    − Create another, identically configured index
    − Add new documents to the new index
    − Define an alias so that search considers both indexes
    − Advice: Work with aliases right from the start
    !
    − Remember:
    − Search in an index with 50 shards = Search in 50 indexes with one shard each
    − In both cases, 50 Lucene indexes are searched

    View Slide

  37. codecentric AG
    The road to expertise
    Learn about plugins

    View Slide

  38. codecentric AG
    Resources
    − The official blog

    http://www.elasticsearch.org/blog/
    !
    − The official book

    http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html
    !
    − The official reference

    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html
    !
    − Great blog

    https://www.found.no/foundation/
    !
    − The Sense examples shown in this talk

    https://gist.github.com/peschlowp/3aa550665ce3a417b617

    View Slide

  39. codecentric AG
    Questions?
    Dr. rer. nat. Patrick Peschlow

    codecentric AG

    Merscheider Straße 1

    42699 Solingen


    tel +49 (0) 212.23 36 28 54

    fax +49 (0) 212.23 36 28 79

    [email protected]

    www.codecentric.de

    View Slide