• Current shape support still based on postings
• BKD-based bounding boxes available in 7.1:
LatLonBoundingBox
• Indexed as a 4-dimensions point
• Using BKD tree as a R tree
• Upcoming general BKD-based shape support
(7.x)
BKD-based geo-shapes
16
Distance filter on geo points
Slide 17
Slide 17 text
Interested in geo?
17
The state of geo in Elasticsearch
tomorrow 9:30
Slide 18
Slide 18 text
Fine-grained flushing
Slide 19
Slide 19 text
Fine-grained flushing
19
Say you want to spend 1GB on
indexing and have 2 shards,
how do you do it?
Slide 20
Slide 20 text
Fine-grained flushing
20
1 2
124MB 900MB
Flush largest shard when total memory usage ≥ limit
Slide 21
Slide 21 text
Fine-grained flushing
21
1 2
124MB 0MB
Flush largest shard when total memory usage ≥ limit
Slide 22
Slide 22 text
Fine-grained flushing
22
1 2
124MB 900MB
Flush largest DWPT when total memory usage ≥ limit
Slide 23
Slide 23 text
Fine-grained flushing
23
1 2
124MB 600MB
Flush largest DWPT when total memory usage ≥ limit
WAND (8.0)
26
Can you make queries faster if
you don’t need total hit counts?
Sorted by field Index sorting (6.0)
Sorted by score ???
Slide 27
Slide 27 text
• Documents are identified by doc ids 0..N
• Queries produce iterators over (doc id, score) pairs, sorted by doc id
• Score of a boolean query is the sum of the scores of its clauses
Anatomy of a Lucene index/query
27
Slide 28
Slide 28 text
How do disjunctions work?
28
the
quick
fox
0 1 2 3 4 5 6 7
doc id
score
Slide 29
Slide 29 text
How do disjunctions work?
29
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score
Slide 30
Slide 30 text
How do disjunctions work?
30
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6
Slide 31
Slide 31 text
How do disjunctions work?
31
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3
Slide 32
Slide 32 text
How do disjunctions work?
32
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 2.0
Slide 33
Slide 33 text
How do disjunctions work?
33
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 2.0 0.1
Slide 34
Slide 34 text
How do disjunctions work?
34
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 2.0 0.1 1.9
Slide 35
Slide 35 text
How do disjunctions work?
35
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 2.0 4.0
0.1 1.9
Slide 36
Slide 36 text
How do disjunctions work?
36
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 4.0 2.4
2.0 0.1 1.9
Slide 37
Slide 37 text
How do disjunctions work?
37
the
quick
fox
0 1 2 3 4 5 6 7
2.5
doc id
score 1.6 2.3 4.0 2.4
2.0 0.1 1.9
Slide 38
Slide 38 text
• Search for “ the OR fox ”
• If
• minimum competitive score is 1
• “the” contributes at most 0.2 to the score
• Then documents MUST match “fox” to be competitive
WAND: intuition
38
Subtitle
Slide 39
Slide 39 text
WAND: max score?
39
Subtitle
≤
BM25 score
Slide 40
Slide 40 text
• Given C clauses, find next target:
• Sort by non-decreasing current doc id
• Sum up max scores until Σ max_score ≥ min_competitive_score
• Return doc id of the first clause to meet this requirement
WAND: algorithm
40
Subtitle
Slide 41
Slide 41 text
WAND: example
41
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
doc id
Min competitive score = 2.3
Next target
Slide 42
Slide 42 text
WAND: compute top 2 matches
42
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
doc id
score
Min competitive score = 0
Slide 43
Slide 43 text
WAND: compute top 2 matches
43
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 0
Slide 44
Slide 44 text
WAND: compute top 2 matches
44
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 1.6
1.6
Slide 45
Slide 45 text
WAND: compute top 2 matches
45
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 2.3
1.6 2.3
Slide 46
Slide 46 text
WAND: compute top 2 matches
46
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 2.3
1.6 2.3 X
Slide 47
Slide 47 text
WAND: compute top 2 matches
47
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 2.5
1.6 2.3 X 4.0
Slide 48
Slide 48 text
WAND: compute top 2 matches
48
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 2.5
1.6 2.3 X 4.0 2.4
Slide 49
Slide 49 text
WAND: compute top 2 matches
49
the
quick
fox
0 1 2 3 4 5 6 7
max score
0.2
max score
2.0
max score
3.0
2.5
doc id
score
Min competitive score = 2.5
1.6 2.3 X 4.0 2.4
Slide 50
Slide 50 text
• 0 to 1000x faster
• If all terms have the same IDF: no improvement
• Otherwise: could be 1000x faster!
WAND: speedup?
50
Slide 51
Slide 51 text
Where are we now?
51
Disjunctions ✓
Other queries ???
Slide 52
Slide 52 text
Impacts indexing
Slide 53
Slide 53 text
Indexing of impacts
53
What does the .doc file store?
Block of 128 doc ids
Skip data Block of 128 freqs
Slide 54
Slide 54 text
• First doc id of the block
• Offset of block in .doc (same file)
• Offset in .pos (if positions indexed)
• Offset in .pay (if offsets or payloads indexed)
• List of competitive (freq, norm) pairs (NEW)
• Makes it easy to know the upper bound of scores
• Still allows to change Similarity on existing index
Skip data
54
Slide 55
Slide 55 text
Usage of impacts?
55
Term queries Skip blocks whose max score is not
competitive
Conjunctions Skip blocks whose sum of max scores is
not competitive
Disjunctions WAND 㱺 block-max WAND
Other queries TODO
Slide 56
Slide 56 text
• Term queries: ~8x faster
• Conjunctions and disjunctions
• Many times faster when terms frequently appear together (united AND
kingdom, new OR york)
• Depends a lot on data distribution otherwise
Speedup?
56
Slide 57
Slide 57 text
57
More Questions?
Visit us at the AMA
Slide 58
Slide 58 text
www.elastic.co
Slide 59
Slide 59 text
Except where otherwise noted, this work is licensed under
http://creativecommons.org/licenses/by-nd/4.0/
Creative Commons and the double C in a circle are
registered trademarks of Creative Commons in the United States and other countries.
Third party marks and brands are the property of their respective holders.
59
Please attribute Elastic with a link to elastic.co