$30 off During Our Annual Pro Sale. View Details »

Elasticsearch Query Optimization

Elasticsearch Query Optimization

Grab bag of tips to help improve your queries in Elasticsearch. Everything may not be applicable to your data/architecture, so feel free to skim and selectively steal tips :)

Zachary Tong

January 16, 2014
Tweet

More Decks by Zachary Tong

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Query Optimization
    Go more faster better

    View Slide

  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    @ZacharyTong
    polyfractal on IRC
    Development - Support - Training
    ಠ_ಠ
    (amoeba)

    View Slide

  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Filters
    Performing binary decisions since 2010

    View Slide

  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Instead of this…
    {!
    ! “query” : {!
    ! ! “term” : {!
    ! ! ! “my_field” : “value”!
    ! ! }!
    ! }!
    }!

    View Slide

  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Do this.
    {!
    ! “query” : {!
    ! ! “filtered” : {!
    ! ! ! “query” : {!
    ! ! ! ! “match_all” : {},!
    ! ! ! },!
    ! ! ! “filter” : {!
    ! ! ! ! “term” : {!
    ! ! ! ! ! “my_field” : “value”!
    ! ! ! ! }!
    ! ! ! }!
    ! ! }!
    ! }!
    }!

    View Slide

  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Filters are fast
    No score is calculated, only inclusion / exclusion

    View Slide

  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Filters are cached
    fast
    What’s faster than fast? Not calculating it again

    View Slide

  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Filters are composable
    cached
    fast
    Cached filters are independent of their original query

    View Slide

  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Filters will short-circuit
    composable
    cached
    If the filter doesn’t match, it isn’t evaluated by a query
    fast

    View Slide

  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Replace:
    Term Query
    Terms Query
    Range Query
    With:
    Term Filter
    Terms Filter
    Range Filter

    View Slide

  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Need to combine filters?
    Which do you use?
    And Filter
    Or Filter
    Not Filter
    Bool Filter

    View Slide

  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    These?
    And Filter
    Or Filter
    Not Filter
    Bool Filter
    Need to combine filters?

    View Slide

  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Nope. Use this.
    And Filter
    Or Filter
    Not Filter
    Bool Filter
    Need to combine filters?

    View Slide

  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Why?!
    See this article:
    http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

    View Slide

  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Bool
    And/Or/Not
    Geo
    Script
    Everything else

    View Slide

  16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    {!
    ! “query” : {!
    ! ! “filtered” : {!
    ! ! “filter” : {!
    ! ! ! “range” : {!
    ! ! ! ! “my_field” : {!
    ! ! ! ! ! ! “gte” : “now - 1h”!
    ! ! ! ! ! }!
    ! ! ! ! }!
    }!
    ! ! }!
    ! }!
    }!
    Consider “cacheability”
    This is going to cause poor performance

    View Slide

  17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Now
    Filter Cache
    Now never stops moving

    View Slide

  18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Now never stops moving
    Time
    Now
    Filter Cache

    View Slide

  19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Now never stops moving

    View Slide

  20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Now never stops moving

    View Slide

  21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Now never stops moving

    View Slide

  22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    That was all a lie
    As of version 1.0, “now” expressions are not
    cached by default anymore
    (sorry)

    View Slide

  23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Now
    Filter Cache
    Wining the battle, losing the war

    View Slide

  24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Now
    Filter Cache
    Wining the battle, losing the war

    View Slide

  25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Wining the battle, losing the war

    View Slide

  26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Wining the battle, losing the war

    View Slide

  27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    Now
    Wining the battle, losing the war
    “I wish I could do some caching…”

    View Slide

  28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    ...!
    “bool” : {!
    “must” : [!
    {!
    “range” : {!
    “my_field” :{!
    “gte” : “now / 1d”!
    “_cached” : true !
    }!
    },!
    {!
    “range” : {!
    ! “my_field” : {!
    “gte” : “now - 1h”, !
    “_cached” : false!
    ! ! }!
    }!
    }!
    ! ]!
    }!
    ...!
    Add a second filter
    Uncached, hourly granularity
    Cached, daily granularity

    View Slide

  29. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    No cache churn
    Now

    View Slide

  30. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    No cache churn
    Now

    View Slide

  31. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    No cache churn
    Now

    View Slide

  32. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Time
    Filter Cache
    No cache churn
    Now
    Applies to many filters, not just ranges

    View Slide

  33. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Top level filter is slow(er)
    {!
    ! “query” : { … },!
    ! “filter” : { … }!
    }!
    Don’t use this unless you need it
    (only useful with facets)

    View Slide

  34. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Version 1.0+
    {!
    ! “query” : { … },!
    ! “post_filter” : { … }!
    }!
    Same performance semantics
    (applies to aggregations too)
    Renamed to “post_filter”

    View Slide

  35. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    filtered

    query
    top level

    filter
    facet_filter
    documents
    matching
    the query
    “hits”: [...]
    “facets”: {...}

    View Slide

  36. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Queries
    This discussion has become relevant

    View Slide

  37. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Query-Time
    Choose where to pay a computation price
    Index-Time
    vs

    View Slide

  38. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Functionality Query-time Index-time
    Misspellings
    fuzzy query,
    term suggester
    ngrams
    Autocomplete
    prefix query,
    phrase suggester,
    Completion suggester
    shingles
    Leading wildcard Wildcard
    Reverse filter +
    prefix query
    Relations Parent/Child +
    Nested
    Denormalization

    View Slide

  39. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Avoid deep pagination
    {!
    ! “query” : { … },!
    ! “from” : 10000000,!
    ! “size” : 10!
    }!
    Builds a PriorityQueue 10,000,010 large
    (for each shard in your index)
    (just to return 10 results)

    View Slide

  40. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    GoogleBot
    Bots will happily traverse millions of pages
    Destroyer of clusters

    View Slide

  41. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Use Count
    GET /index/_search!
    {!
    ! “query” : { … },!
    ! “size” : 0!
    }!
    This is faster
    GET /index/_search?search_type=count!
    {!
    ! “query” : { … }!
    }!

    View Slide

  42. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Rescore API
    1. Query/filter to quickly find top N results
    2. Rescore with complex logic to find top 10

    View Slide

  43. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Common Terms
    Very cool query, makes stop-words obsolete
    !
    See this presentation:
    https://speakerdeck.com/polyfractal/common-terms-query

    View Slide

  44. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    In General
    1. Think about what you want to search
    2. Structure your document to make that easy

    View Slide

  45. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Scripts
    There’s a python in my elasticsearch server!

    View Slide

  46. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    _source.my_field
    _fields.my_field
    Do not EVER use these in a search script:
    These access the disk and are sloooooow.
    You will destroy your performance
    FOR ALL THAT IS HOLY

    View Slide

  47. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    FOR ALL THAT IS HOLY
    doc[‘my_field’]
    Use this instead:
    Accesses in-memory field data. Fast!

    View Slide

  48. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Use common sense
    In general, scripting is slower than queries.
    Don’t go crazy.
    !
    If you end up with a 10-page script, bake
    some of that logic into your index

    View Slide

  49. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Arcanum
    Here be dragons

    View Slide

  50. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Disclaimer
    1. You probably don’t need these tricks
    2. But they are fun to talk about
    If you break your cluster…I warned you :P

    View Slide

  51. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Codecs
    Controls the data structure for
    • terms
    • positions
    • frequencies
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/
    current/index-modules-codec.html

    View Slide

  52. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Memory Codec
    When you have more memory than Google
    • Data stored as uncompressed arrays in memory
    • About as fast as you can go
    (2m Wikipedia dataset == 8gb used heap)

    View Slide

  53. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Bloom-pulsing Codec
    Fast execution for rare terms
    • Bloom filter fails fast if term is not present
    • Pulsing inlines the postings to avoid extra disk seek
    • Good for rare terms, ID numbers, etc
    (_uid uses this internally for fast get-by-id scenarios)

    View Slide

  54. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Memory Codec
    Super Fast execution for rare terms
    • Encodes your term dictionary as an in-memory FST
    • Crazy fast lookups
    • Really great for “primary keys”
    • Compresses certain “sequential” data very well
    ( “00001”, “00002”, “00003”, etc)

    View Slide

  55. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Boosted Synonyms
    This is actually kinda slow
    We want to boost synonyms, such that:
    vegetable => potato^3, tomato^2, carrot^1
    https://gist.github.com/polyfractal/10276706

    View Slide

  56. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Boosted Synonyms
    "analyzer":{!
    "boosted_syn":{!
    "tokenizer":"whitespace",!
    "filter":[!
    "lowercase",!
    "syns",!
    "delimited_payload_filter"!
    ]!
    }!
    }
    "filter":{!
    "syns":{!
    "type":"synonym",!
    "synonyms":[!
    “potato => vegetable|3",!
    “tomato => vegetable|2",!
    “carrot => vegetable|1"!
    ]!
    }!
    }!

    View Slide

  57. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Boosted Synonyms
    {!
    "query": {!
    "function_score" : {!
    "query" : {!
    "match": {!
    "title": "vegetable"!
    }!
    },!
    "script_score" : {!
    "script" : "boosted_syn",!
    "params" : {!
    "field" : "title",!
    "term" : "vegetable"!
    }!
    },!
    "boost_mode" : "replace"!
    }!
    }!
    }
    termInfo =
    _index[field].get(term,_PAYLOADS);!
    score = 0;!
    for (pos : termInfo) {!
    score = score + pos.payloadAsFloat(0);
    }!
    return score;!

    View Slide

  58. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    "hits": [!
    {!
    "_score": 3,!
    "_source": {!
    "title": "potato"!
    }!
    },!
    {!
    "_score": 2,!
    "_source": {!
    "title": "tomato"!
    }!
    },!
    {!
    "_score": 1,!
    "_source": {!
    "title": "carrot"!
    }!
    },!
    {!
    "_score": 0,!
    "_source": {!
    "title": "vegetable"!
    }!
    }!
    ]

    View Slide

  59. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
    Questions?
    ಠ_ಠ
    @ZacharyTong
    polyfractal on IRC

    View Slide