Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch Query Optimization

Elasticsearch Query Optimization

Grab bag of tips to help improve your queries in Elasticsearch. Everything may not be applicable to your data/architecture, so feel free to skim and selectively steal tips :)

Zachary Tong

January 16, 2014
Tweet

More Decks by Zachary Tong

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Query Optimization Go more faster better
  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited @ZacharyTong polyfractal on IRC Development - Support - Training ಠ_ಠ (amoeba)
  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters Performing binary decisions since 2010
  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Instead of this… {! ! “query” : {! ! ! “term” : {! ! ! ! “my_field” : “value”! ! ! }! ! }! }!
  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Do this. {! ! “query” : {! ! ! “filtered” : {! ! ! ! “query” : {! ! ! ! ! “match_all” : {},! ! ! ! },! ! ! ! “filter” : {! ! ! ! ! “term” : {! ! ! ! ! ! “my_field” : “value”! ! ! ! ! }! ! ! ! }! ! ! }! ! }! }!
  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are fast No score is calculated, only inclusion / exclusion
  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are cached fast What’s faster than fast? Not calculating it again
  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are composable cached fast Cached filters are independent of their original query
  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters will short-circuit composable cached If the filter doesn’t match, it isn’t evaluated by a query fast
  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Replace: Term Query Terms Query Range Query With: Term Filter Terms Filter Range Filter
  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Need to combine filters? Which do you use? And Filter Or Filter Not Filter Bool Filter
  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited These? And Filter Or Filter Not Filter Bool Filter Need to combine filters?
  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Nope. Use this. And Filter Or Filter Not Filter Bool Filter Need to combine filters?
  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Why?! See this article: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Bool And/Or/Not Geo Script Everything else
  16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited {! ! “query” : {! ! ! “filtered” : {! ! ! “filter” : {! ! ! ! “range” : {! ! ! ! ! “my_field” : {! ! ! ! ! ! ! “gte” : “now - 1h”! ! ! ! ! ! }! ! ! ! ! }! }! ! ! }! ! }! }! Consider “cacheability” This is going to cause poor performance
  17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Now Filter Cache Now never stops moving
  18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Now never stops moving Time Now Filter Cache
  19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited That was all a lie As of version 1.0, “now” expressions are not cached by default anymore (sorry)
  23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Now Filter Cache Wining the battle, losing the war
  24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Now Filter Cache Wining the battle, losing the war
  25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Wining the battle, losing the war
  26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Wining the battle, losing the war
  27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Wining the battle, losing the war “I wish I could do some caching…”
  28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited ...! “bool” : {! “must” : [! {! “range” : {! “my_field” :{! “gte” : “now / 1d”! “_cached” : true ! }! },! {! “range” : {! ! “my_field” : {! “gte” : “now - 1h”, ! “_cached” : false! ! ! }! }! }! ! ]! }! ...! Add a second filter Uncached, hourly granularity Cached, daily granularity
  29. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache No cache churn Now Applies to many filters, not just ranges
  30. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Top level filter is slow(er) {! ! “query” : { … },! ! “filter” : { … }! }! Don’t use this unless you need it (only useful with facets)
  31. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Version 1.0+ {! ! “query” : { … },! ! “post_filter” : { … }! }! Same performance semantics (applies to aggregations too) Renamed to “post_filter”
  32. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited filtered query top level filter facet_filter documents matching the query “hits”: [...] “facets”: {...}
  33. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Queries This discussion has become relevant
  34. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Query-Time Choose where to pay a computation price Index-Time vs
  35. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Functionality Query-time Index-time Misspellings fuzzy query, term suggester ngrams Autocomplete prefix query, phrase suggester, Completion suggester shingles Leading wildcard Wildcard Reverse filter + prefix query Relations Parent/Child + Nested Denormalization
  36. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Avoid deep pagination {! ! “query” : { … },! ! “from” : 10000000,! ! “size” : 10! }! Builds a PriorityQueue 10,000,010 large (for each shard in your index) (just to return 10 results)
  37. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited GoogleBot Bots will happily traverse millions of pages Destroyer of clusters
  38. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use Count GET /index/_search! {! ! “query” : { … },! ! “size” : 0! }! This is faster GET /index/_search?search_type=count! {! ! “query” : { … }! }!
  39. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Rescore API 1. Query/filter to quickly find top N results 2. Rescore with complex logic to find top 10
  40. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Common Terms Very cool query, makes stop-words obsolete ! See this presentation: https://speakerdeck.com/polyfractal/common-terms-query
  41. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited In General 1. Think about what you want to search 2. Structure your document to make that easy
  42. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Scripts There’s a python in my elasticsearch server!
  43. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited _source.my_field _fields.my_field Do not EVER use these in a search script: These access the disk and are sloooooow. You will destroy your performance FOR ALL THAT IS HOLY
  44. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited FOR ALL THAT IS HOLY doc[‘my_field’] Use this instead: Accesses in-memory field data. Fast!
  45. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use common sense In general, scripting is slower than queries. Don’t go crazy. ! If you end up with a 10-page script, bake some of that logic into your index
  46. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Disclaimer 1. You probably don’t need these tricks 2. But they are fun to talk about If you break your cluster…I warned you :P
  47. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Codecs Controls the data structure for • terms • positions • frequencies http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/index-modules-codec.html
  48. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Memory Codec When you have more memory than Google • Data stored as uncompressed arrays in memory • About as fast as you can go (2m Wikipedia dataset == 8gb used heap)
  49. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Bloom-pulsing Codec Fast execution for rare terms • Bloom filter fails fast if term is not present • Pulsing inlines the postings to avoid extra disk seek • Good for rare terms, ID numbers, etc (_uid uses this internally for fast get-by-id scenarios)
  50. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Memory Codec Super Fast execution for rare terms • Encodes your term dictionary as an in-memory FST • Crazy fast lookups • Really great for “primary keys” • Compresses certain “sequential” data very well ( “00001”, “00002”, “00003”, etc)
  51. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Boosted Synonyms This is actually kinda slow We want to boost synonyms, such that: vegetable => potato^3, tomato^2, carrot^1 https://gist.github.com/polyfractal/10276706
  52. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Boosted Synonyms "analyzer":{! "boosted_syn":{! "tokenizer":"whitespace",! "filter":[! "lowercase",! "syns",! "delimited_payload_filter"! ]! }! } "filter":{! "syns":{! "type":"synonym",! "synonyms":[! “potato => vegetable|3",! “tomato => vegetable|2",! “carrot => vegetable|1"! ]! }! }!
  53. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Boosted Synonyms {! "query": {! "function_score" : {! "query" : {! "match": {! "title": "vegetable"! }! },! "script_score" : {! "script" : "boosted_syn",! "params" : {! "field" : "title",! "term" : "vegetable"! }! },! "boost_mode" : "replace"! }! }! } termInfo = _index[field].get(term,_PAYLOADS);! score = 0;! for (pos : termInfo) {! score = score + pos.payloadAsFloat(0); }! return score;!
  54. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited "hits": [! {! "_score": 3,! "_source": {! "title": "potato"! }! },! {! "_score": 2,! "_source": {! "title": "tomato"! }! },! {! "_score": 1,! "_source": {! "title": "carrot"! }! },! {! "_score": 0,! "_source": {! "title": "vegetable"! }! }! ]
  55. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Questions? ಠ_ಠ @ZacharyTong polyfractal on IRC