Slide 1

Slide 1 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Query Optimization Go more faster better

Slide 2

Slide 2 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited @ZacharyTong polyfractal on IRC Development - Support - Training ಠ_ಠ (amoeba)

Slide 3

Slide 3 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters Performing binary decisions since 2010

Slide 4

Slide 4 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Instead of this… {! ! “query” : {! ! ! “term” : {! ! ! ! “my_field” : “value”! ! ! }! ! }! }!

Slide 5

Slide 5 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Do this. {! ! “query” : {! ! ! “filtered” : {! ! ! ! “query” : {! ! ! ! ! “match_all” : {},! ! ! ! },! ! ! ! “filter” : {! ! ! ! ! “term” : {! ! ! ! ! ! “my_field” : “value”! ! ! ! ! }! ! ! ! }! ! ! }! ! }! }!

Slide 6

Slide 6 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are fast No score is calculated, only inclusion / exclusion

Slide 7

Slide 7 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are cached fast What’s faster than fast? Not calculating it again

Slide 8

Slide 8 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters are composable cached fast Cached filters are independent of their original query

Slide 9

Slide 9 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Filters will short-circuit composable cached If the filter doesn’t match, it isn’t evaluated by a query fast

Slide 10

Slide 10 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Replace: Term Query Terms Query Range Query With: Term Filter Terms Filter Range Filter

Slide 11

Slide 11 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Need to combine filters? Which do you use? And Filter Or Filter Not Filter Bool Filter

Slide 12

Slide 12 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited These? And Filter Or Filter Not Filter Bool Filter Need to combine filters?

Slide 13

Slide 13 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Nope. Use this. And Filter Or Filter Not Filter Bool Filter Need to combine filters?

Slide 14

Slide 14 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Why?! See this article: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

Slide 15

Slide 15 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Bool And/Or/Not Geo Script Everything else

Slide 16

Slide 16 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited {! ! “query” : {! ! ! “filtered” : {! ! ! “filter” : {! ! ! ! “range” : {! ! ! ! ! “my_field” : {! ! ! ! ! ! ! “gte” : “now - 1h”! ! ! ! ! ! }! ! ! ! ! }! }! ! ! }! ! }! }! Consider “cacheability” This is going to cause poor performance

Slide 17

Slide 17 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Now never stops moving

Slide 18

Slide 18 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Now never stops moving Time Now Filter Cache

Slide 19

Slide 19 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving

Slide 20

Slide 20 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving

Slide 21

Slide 21 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Now never stops moving

Slide 22

Slide 22 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited That was all a lie As of version 1.0, “now” expressions are not cached by default anymore (sorry)

Slide 23

Slide 23 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Wining the battle, losing the war

Slide 24

Slide 24 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Now Filter Cache Wining the battle, losing the war

Slide 25

Slide 25 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war

Slide 26

Slide 26 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war

Slide 27

Slide 27 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache Now Wining the battle, losing the war “I wish I could do some caching…”

Slide 28

Slide 28 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited ...! “bool” : {! “must” : [! {! “range” : {! “my_field” :{! “gte” : “now / 1d”! “_cached” : true ! }! },! {! “range” : {! ! “my_field” : {! “gte” : “now - 1h”, ! “_cached” : false! ! ! }! }! }! ! ]! }! ...! Add a second filter Uncached, hourly granularity Cached, daily granularity

Slide 29

Slide 29 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now

Slide 30

Slide 30 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now

Slide 31

Slide 31 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now

Slide 32

Slide 32 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Time Filter Cache No cache churn Now Applies to many filters, not just ranges

Slide 33

Slide 33 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Top level filter is slow(er) {! ! “query” : { … },! ! “filter” : { … }! }! Don’t use this unless you need it (only useful with facets)

Slide 34

Slide 34 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Version 1.0+ {! ! “query” : { … },! ! “post_filter” : { … }! }! Same performance semantics (applies to aggregations too) Renamed to “post_filter”

Slide 35

Slide 35 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited filtered query top level filter facet_filter documents matching the query “hits”: [...] “facets”: {...}

Slide 36

Slide 36 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Queries This discussion has become relevant

Slide 37

Slide 37 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Query-Time Choose where to pay a computation price Index-Time vs

Slide 38

Slide 38 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Functionality Query-time Index-time Misspellings fuzzy query, term suggester ngrams Autocomplete prefix query, phrase suggester, Completion suggester shingles Leading wildcard Wildcard Reverse filter + prefix query Relations Parent/Child + Nested Denormalization

Slide 39

Slide 39 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Avoid deep pagination {! ! “query” : { … },! ! “from” : 10000000,! ! “size” : 10! }! Builds a PriorityQueue 10,000,010 large (for each shard in your index) (just to return 10 results)

Slide 40

Slide 40 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited GoogleBot Bots will happily traverse millions of pages Destroyer of clusters

Slide 41

Slide 41 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Use Count GET /index/_search! {! ! “query” : { … },! ! “size” : 0! }! This is faster GET /index/_search?search_type=count! {! ! “query” : { … }! }!

Slide 42

Slide 42 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Rescore API 1. Query/filter to quickly find top N results 2. Rescore with complex logic to find top 10

Slide 43

Slide 43 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Common Terms Very cool query, makes stop-words obsolete ! See this presentation: https://speakerdeck.com/polyfractal/common-terms-query

Slide 44

Slide 44 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited In General 1. Think about what you want to search 2. Structure your document to make that easy

Slide 45

Slide 45 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Scripts There’s a python in my elasticsearch server!

Slide 46

Slide 46 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited _source.my_field _fields.my_field Do not EVER use these in a search script: These access the disk and are sloooooow. You will destroy your performance FOR ALL THAT IS HOLY

Slide 47

Slide 47 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited FOR ALL THAT IS HOLY doc[‘my_field’] Use this instead: Accesses in-memory field data. Fast!

Slide 48

Slide 48 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Use common sense In general, scripting is slower than queries. Don’t go crazy. ! If you end up with a 10-page script, bake some of that logic into your index

Slide 49

Slide 49 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Arcanum Here be dragons

Slide 50

Slide 50 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Disclaimer 1. You probably don’t need these tricks 2. But they are fun to talk about If you break your cluster…I warned you :P

Slide 51

Slide 51 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Codecs Controls the data structure for • terms • positions • frequencies http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/index-modules-codec.html

Slide 52

Slide 52 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Memory Codec When you have more memory than Google • Data stored as uncompressed arrays in memory • About as fast as you can go (2m Wikipedia dataset == 8gb used heap)

Slide 53

Slide 53 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Bloom-pulsing Codec Fast execution for rare terms • Bloom filter fails fast if term is not present • Pulsing inlines the postings to avoid extra disk seek • Good for rare terms, ID numbers, etc (_uid uses this internally for fast get-by-id scenarios)

Slide 54

Slide 54 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Memory Codec Super Fast execution for rare terms • Encodes your term dictionary as an in-memory FST • Crazy fast lookups • Really great for “primary keys” • Compresses certain “sequential” data very well ( “00001”, “00002”, “00003”, etc)

Slide 55

Slide 55 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Boosted Synonyms This is actually kinda slow We want to boost synonyms, such that: vegetable => potato^3, tomato^2, carrot^1 https://gist.github.com/polyfractal/10276706

Slide 56

Slide 56 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Boosted Synonyms "analyzer":{! "boosted_syn":{! "tokenizer":"whitespace",! "filter":[! "lowercase",! "syns",! "delimited_payload_filter"! ]! }! } "filter":{! "syns":{! "type":"synonym",! "synonyms":[! “potato => vegetable|3",! “tomato => vegetable|2",! “carrot => vegetable|1"! ]! }! }!

Slide 57

Slide 57 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Boosted Synonyms {! "query": {! "function_score" : {! "query" : {! "match": {! "title": "vegetable"! }! },! "script_score" : {! "script" : "boosted_syn",! "params" : {! "field" : "title",! "term" : "vegetable"! }! },! "boost_mode" : "replace"! }! }! } termInfo = _index[field].get(term,_PAYLOADS);! score = 0;! for (pos : termInfo) {! score = score + pos.payloadAsFloat(0); }! return score;!

Slide 58

Slide 58 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited "hits": [! {! "_score": 3,! "_source": {! "title": "potato"! }! },! {! "_score": 2,! "_source": {! "title": "tomato"! }! },! {! "_score": 1,! "_source": {! "title": "carrot"! }! },! {! "_score": 0,! "_source": {! "title": "vegetable"! }! }! ]

Slide 59

Slide 59 text

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Questions? ಠ_ಠ @ZacharyTong polyfractal on IRC