Grab bag of tips to help improve your queries in Elasticsearch. Everything may not be applicable to your data/architecture, so feel free to skim and selectively steal tips :)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Query Optimization Go more faster better
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited @ZacharyTong polyfractal on IRC Development - Support - Training ಠ_ಠ (amoeba)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters Performing binary decisions since 2010
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are fast No score is calculated, only inclusion / exclusion
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are cached fast What’s faster than fast? Not calculating it again
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are composable cached fast Cached filters are independent of their original query
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters will short-circuit composable cached If the filter doesn’t match, it isn’t evaluated by a query fast
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Replace: Term Query Terms Query Range Query With: Term Filter Terms Filter Range Filter
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Need to combine filters? Which do you use? And Filter Or Filter Not Filter Bool Filter
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited These? And Filter Or Filter Not Filter Bool Filter Need to combine filters?
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nope. Use this. And Filter Or Filter Not Filter Bool Filter Need to combine filters?
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Why?! See this article: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Now never stops moving
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Now never stops moving Time Now Filter Cache
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited That was all a lie As of version 1.0, “now” expressions are not cached by default anymore (sorry)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Wining the battle, losing the war
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Wining the battle, losing the war
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war “I wish I could do some caching…”
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now Applies to many filters, not just ranges
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Top level filter is slow(er) {! ! “query” : { … },! ! “filter” : { … }! }! Don’t use this unless you need it (only useful with facets)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Queries This discussion has become relevant
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Query-Time Choose where to pay a computation price Index-Time vs
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Avoid deep pagination {! ! “query” : { … },! ! “from” : 10000000,! ! “size” : 10! }! Builds a PriorityQueue 10,000,010 large (for each shard in your index) (just to return 10 results)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited GoogleBot Bots will happily traverse millions of pages Destroyer of clusters
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Rescore API 1. Query/filter to quickly find top N results 2. Rescore with complex logic to find top 10
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Common Terms Very cool query, makes stop-words obsolete ! See this presentation: https://speakerdeck.com/polyfractal/common-terms-query
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited In General 1. Think about what you want to search 2. Structure your document to make that easy
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Scripts There’s a python in my elasticsearch server!
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited _source.my_field _fields.my_field Do not EVER use these in a search script: These access the disk and are sloooooow. You will destroy your performance FOR ALL THAT IS HOLY
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited FOR ALL THAT IS HOLY doc[‘my_field’] Use this instead: Accesses in-memory field data. Fast!
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Use common sense In general, scripting is slower than queries. Don’t go crazy. ! If you end up with a 10-page script, bake some of that logic into your index
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Disclaimer 1. You probably don’t need these tricks 2. But they are fun to talk about If you break your cluster…I warned you :P
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Codecs Controls the data structure for • terms • positions • frequencies http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/index-modules-codec.html
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Memory Codec When you have more memory than Google • Data stored as uncompressed arrays in memory • About as fast as you can go (2m Wikipedia dataset == 8gb used heap)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Bloom-pulsing Codec Fast execution for rare terms • Bloom filter fails fast if term is not present • Pulsing inlines the postings to avoid extra disk seek • Good for rare terms, ID numbers, etc (_uid uses this internally for fast get-by-id scenarios)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Memory Codec Super Fast execution for rare terms • Encodes your term dictionary as an in-memory FST • Crazy fast lookups • Really great for “primary keys” • Compresses certain “sequential” data very well ( “00001”, “00002”, “00003”, etc)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Boosted Synonyms This is actually kinda slow We want to boost synonyms, such that: vegetable => potato^3, tomato^2, carrot^1 https://gist.github.com/polyfractal/10276706
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Questions? ಠ_ಠ @ZacharyTong polyfractal on IRC