Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Query Optimization: Go more faster better

Query Optimization: Go more faster better

Presented by Zachary Tong at the Inaugural Elasticsearch Atlanta Meetup.

Elasticsearch Inc

January 15, 2014
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Query Optimization Go more faster better
  2. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited @ZacharyTong polyfractal on IRC Developing - Support - Training ಠ_ಠ (amoeba)
  3. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters Performing binary decisions since 2010
  4. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Instead of this… {! ! “query” : {! ! ! “term” : {! ! ! ! “my_field” : “value”! ! ! }! ! }! }!
  5. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Do this. {! ! “query” : {! ! ! “filtered” : {! ! ! ! “query” : {! ! ! ! ! “match_all” : {},! ! ! ! },! ! ! ! “filter” : {! ! ! ! ! “term” : {! ! ! ! ! ! “my_field” : “value”! ! ! ! ! }! ! ! ! }! ! ! }! ! }! }!
  6. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are fast No score is calculated, only inclusion / exclusion
  7. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are cached fast What’s faster than fast? Not calculating it again
  8. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters are composable cached fast Cached filters are independent of their original query
  9. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters will short-circuit composable cached If the filter doesn’t match, it isn’t evaluated by a query fast
  10. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Replace: Term Query Terms Query Range Query With: Term Filter Terms Filter Range Filter
  11. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Need to combine filters? Which do you use? And Filter Or Filter Not Filter Bool Filter
  12. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited These? And Filter Or Filter Not Filter Bool Filter Need to combine filters?
  13. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Nope. Use this. And Filter Or Filter Not Filter Bool Filter Need to combine filters?
  14. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Why?! See this article: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
  15. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Bool And/Or/Not Geo Script Everything else
  16. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited {! ! “query” : {! ! ! “filtered” : {! ! ! “filter” : {! ! ! ! “range” : {! ! ! ! ! “my_field” : {! ! ! ! ! ! ! “gte” : “now - 1h”! ! ! ! ! ! }! ! ! ! ! }! }! ! ! }! ! }! }! Consider “cacheability” This is going to cause poor performance
  17. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Now Filter Cache Now never stops moving
  18. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Now never stops moving Time Now Filter Cache
  19. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  20. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  21. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache Now Now never stops moving
  22. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited {! ! “query” : {! ! ! “filtered” : {! ! ! ! “filter” : {! ! ! ! ! “bool” : {! ! ! ! ! ! “must” : [! ! ! ! ! ! ! {“range” : {“gte” : “now / 1d”}},! ! ! ! ! ! ! {“range” : {! ! ! ! ! ! ! ! “my_field” : {! ! ! ! ! ! ! ! ! “gte” : “now - 1h”, ! ! ! ! ! ! ! ! ! “_cached” : false! ! ! ! ! ! ! ! }}! ! ! ! ! ! ! }! ! ! ! ! ! ]! ! ! ! ! }! ! ! }}}}! Add a second filter Uncached, hourly granularity Cached, daily granularity
  23. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Time Filter Cache No cache churn Now Applies to many filters, not just ranges
  24. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Top level filter is slow(er) {! ! “query” : { … },! ! “filter” : { … }! }! Don’t use this unless you need it (only useful with facets)
  25. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited filtered query top level filter facet_filter documents matching the query “hits”: [...] “facets”: {...}
  26. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Queries This discussion has become relevant
  27. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Query-Time Choose where to pay a computation price Index-Time vs
  28. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Functionality Query-time Index-time Misspellings fuzzy query, term suggester ngrams Autocomplete prefix query, phrase suggester, Completion suggester shingles Leading wildcard Wildcard Reverse filter + prefix query
  29. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Avoid deep pagination {! ! “query” : { … },! ! “from” : 10000000,! ! “size” : 10! }! Builds a PriorityQueue 10,000,010 large (for each shard in your index) (just to return 10 results)
  30. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited GoogleBot Bots will happily traverse millions of pages Destroyer of clusters
  31. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use Count GET /index/_search! {! ! “query” : { … },! ! “size” : 0! }! This is faster GET /index/_search?search_type=count! {! ! “query” : { … }! }!
  32. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Rescore API 1. Query/filter to quickly find top N results 2. Rescore with complex logic to find top 10
  33. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Common Terms Very cool query, makes stop-words obsolete ! See this presentation: https://speakerdeck.com/polyfractal/common-terms-query
  34. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited In General 1. Think about what you want to search 2. Structure your document to make that easy
  35. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Scripts There’s a python in my elasticsearch server!
  36. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited _source.my_field _fields.my_field Do not EVER use these in a search script: These access the disk and are sloooooow. You will destroy your performance FOR ALL THAT IS HOLY
  37. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited FOR ALL THAT IS HOLY doc[‘my_field’] Use this instead: Accesses in-memory field data. Fast!
  38. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use common sense In general, scripting is slower than queries. Don’t go crazy. ! If you end up with a 10-page script, bake some of that logic into your index
  39. Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission

    is strictly prohibited Questions? ಠ_ಠ @ZacharyTong polyfractal on IRC