All About Elasticsearch Algorithms and Data Structures

Slide 1

Slide 1 text

‹#› Colin Goodheart-Smithe (@colings86) Zachary Tong (@zacharytong) All About Elasticsearch Algorithms and Data Structures

Slide 2

Slide 2 text

‹#› Roaring Bitmaps When you can’t decide if you’re data is dense or sparse

Slide 3

Slide 3 text

3 Filter Caching • A filter either matches or does not match a document • Due to immutable segments, we have an opportunity to   cache frequent filters Doc #1 Doc #2 Doc #3 Doc #4 Doc #5 Doc #6 Matches! Matches! Matches!

Slide 4

Slide 4 text

4 Filter Caching Doc #1 Doc #2 Doc #3 Doc #4 Doc #5 Doc #6 [ 1, 0, 0, 1, 0, 1 ] Bitmap • A filter either matches or does not match a document • Due to immutable segments, we have an opportunity to   cache frequent filters

Slide 5

Slide 5 text

5 Some points to keep in mind • Each Lucene segment can hold up to 231-1 documents (e.g. 4 byte IDs) • Stored in memory, so compression is important • However, usage must be faster than re-executing the filter

Slide 6

Slide 6 text

6 Approach #1: Sorted List • Store the ID’s in a sorted list Doc #1 Doc #4 Doc #6 [1, 4, 6]

Slide 7

Slide 7 text

7 Approach #1: Sorted List • Very compact when filters are sparse Doc #1 Doc #4 Doc #6 [1, 4, 6] 12 bytes yay!

Slide 8

Slide 8 text

8 Approach #1: Sorted List • Dense filters become problematic Doc #1 Doc #4 Doc #6 [1, 2, …………………………… ………………………………… ………………………………… … 99999999, 100000000 ] 381mb oh no! Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #99999998 Doc #99999999 Doc #100000000 =(

Slide 9

Slide 9 text

9 Approach #2: Bitmaps • Save a single bit for each matching document instead Doc #1 Doc #4 Doc #6 [1, 1, 1, 1, 1, 0, 1, 1, 1, 1 ….… ………………………………… ………………………………… … 1, 1 ] Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #99999998 Doc #99999999 Doc #100000000

Slide 10

Slide 10 text

10 Approach #2: Bitmaps • Save a single bit for each matching document instead Doc #1 Doc #4 Doc #6 [1, 1, 1, 1, 1, 0, 1, 1, 1, 1 ….… ………………………………… ………………………………… … 1, 1 ] Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Doc #1 Doc #4 Doc #6 Down to 12mb! =) Doc #99999998 Doc #99999999 Doc #100000000

Slide 11

Slide 11 text

11 Approach #2: Bitmap • … except it’s identical for the sparse case too. Doc #1 Doc #4 Doc #6 [1, 0, 0, 1, 0, 1, 0, 0, 0, 0 ….… ………………………………… ………………………………… … 0, 0 ] Hmm…still 12mb

Slide 12

Slide 12 text

12 Alternative #3: Various Compressed Bitmaps • Byte Aligned Bitmaps (BBC) • Word-Aligned Hybrid (WAH) • PLWAH / EWAH variants • Compressed’n’Composable Integer Set (CONCISE) • Compressed Adaptive Index (COMPAX) • SECOMPAX / ICX • “Traditional” compression (LZ4, DEFLATE, etc)

Slide 13

Slide 13 text

13 Alternative #3: Various RLE Compressed Bitmaps • Good compression!    • Slower (relatively) than Sorted Lists or Raw Bitmaps • Slow random access to bits • May lose ability to bitwise AND/OR multiple bitmaps together

Slide 14

Slide 14 text

Overview so far 14 • Great for sparse • Expensive for dense Sorted Lists • Great for Dense • Expensive for sparse Raw Bitmaps • Great compression for heterogeneous  • Slow(er) decoding • Slow random access RLE Compressed

Slide 15

Slide 15 text

Roaring Bitmaps 15 • Great for sparse • Expensive for dense Sorted Lists • Great for Dense • Expensive for sparse Raw Bitmaps • Great compression for heterogeneous • Slow(er) decoding • Slow random access RLE Compressed

Slide 16

Slide 16 text

16 Partition into 216 chunks 0 1 2 3 … … 65535 0 1 0 0 … … 1 Doc ID Match? 65536 65537 65538 65539 … .… 131071 1 1 1 1 … … 1 Doc ID Match? 131072 131073 131074 131075 … .… 196608 1 0 0 1 … … 0 Doc ID Match?

Slide 17

Slide 17 text

17 Store containers in vector 0 1 2 3 … … 65535 0 1 0 0 … … 1 Doc ID Match? 65536 65537 65538 65539 … .… 131071 1 1 1 1 … … 1 Doc ID Match? 131072 131073 131074 131075 … .… 196608 1 0 0 1 … … 0 Doc ID Match? 0 1 2

Slide 18

Slide 18 text

18 Vector index == 16 least-significant bits 0 1 2 3 … … 65535 0 1 0 0 … … 1 Doc ID Match? 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 Doc ID Match? 0 1 2 0 1 2 3 … … 65535 0 1 2 3 … … 65535

Slide 19

Slide 19 text

19 0 1 2 3 … … 6553 0 1 0 0 … … 1 Doc ID Match? 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 Doc ID Match? 0 1 2 0 1 2 3 … … 65535 0 1 2 3 … … 6553 2 Bytes instead of 4 Vector index == 16 least-significant bits Implicit 16 bits of ID

Slide 20

Slide 20 text

20 0 1 2 3 … … 65535 0 1 0 0 … … 1 Doc ID Match? 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 Doc ID Match? 0 1 2 0 1 2 3 … … 65535 0 1 2 3 … … 65535 Fewer than 4096 Values?

Slide 21

Slide 21 text

21 Fewer than 4096 Values? • Save as a Sorted List 1 1920 3303 Doc ID 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 Doc ID Match? 0 1 2 0 1 2 3 … … 65535 0 1 2 3 … … 65535

Slide 22

Slide 22 text

22 More than 4096 Values? 1 1920 3303 Doc ID 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 Doc ID Match? 0 1 2 0 1 2 3 … … 65535 0 1 2 3 … … 65535

Slide 23

Slide 23 text

23 • Save as dense bitmap 1 1920 3303 Doc ID 1 1 1 1 … … 1 Doc ID Match? 1 0 0 1 … … 0 State 0 1 2 0 1 2 3 … … 65535 More than 4096 Values?

Slide 24

Slide 24 text

24 • Super dense, relatively few zeros. Save as “inverted” Sorted List 1 1920 3303 Doc ID 1 0 0 1 … … 0 State 0 1 2 More than 61440 values? 2382 9112 10229 Doc ID Lucene Contribution

Slide 25

Slide 25 text

25 Why 4096 cutoff?

Slide 26

Slide 26 text

26 Memory Footprint

Slide 27

Slide 27 text

27 More reading • https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps  • http://roaringbitmap.org/  • https://issues.apache.org/jira/browse/LUCENE-5983

Slide 28

Slide 28 text

‹#› Simulated Annealing Quickly finding “good enough” parameters

Slide 29

Slide 29 text

29 Moving averages • Pipeline Aggs introduced moving averages

Slide 30

Slide 30 text

30 Variously weighted averages • Simple (no weighting) • Linear • Exponential • Double-Exponential (Holt) • Triple-Exponential (Holt-Winters)

Slide 31

Slide 31 text

31 Variously weighted averages • Simple (no weighting) • Linear • Exponential • Double-Exponential (Holt) • Triple-Exponential (Holt-Winters) Have configurable parameters

Slide 32

Slide 32 text

32 Configurable parameters α • Exponential • Holt • Holt-winters “Level” β “Trend” • Holt • Holt-winters γ “Seasonal” • Holt-winters (that’s a gamma)

Slide 33

Slide 33 text

33 Turns out, tuning parameters is hard • Small changes had large impact • Changing one parameter affected the other parameters • Not intuitive to mere mortals (e.g. me) • Frustrating user-experience

Slide 34

Slide 34 text

Black-box optimization 34 Because sometimes you just need a hammer

Slide 35

Slide 35 text

‹#› anneal: to heat and then slowly cool (metal, glass, etc.) in order to make it stronger Merriam-Webster Dictionary

Slide 36

Slide 36 text

36 Simulated Annealing Process 1.Pick random neighbor * 2.Evaluate “cost” • If “cost” > “best_cost”, keep solution • Otherwise discard   BUT with random probability p, keep solution anyway 3.Repeat, lowering probability p over time

Slide 37

Slide 37 text

37 Simulated Annealing Process 1.Pick random neighbor * 2.Evaluate “cost” • If “cost” > “best_cost”, keep solution • Otherwise discard BUT with random probability 3.Repeat, lowering probability “Random neighbor” Mutate one of parameters, leave the rest constant

Slide 38

Slide 38 text

38 Simulated Annealing Score Solution Space Best Score: Temperature: 0 100

Slide 39

Slide 39 text

39 Score Solution Space Best Score: Temperature: 10 100 10 10 > 0

Slide 40

Slide 40 text

40 Score Solution Space Best Score: Temperature: 14 95 14 14 > 10

Slide 41

Slide 41 text

41 Score Solution Space Best Score: Temperature: 14 90 8

Slide 42

Slide 42 text

42 Score Solution Space Best Score: Temperature: 35 85 35 35 > 14

Slide 43

Slide 43 text

43 Score Solution Space Best Score: Temperature: 35 80 24

Slide 44

Slide 44 text

44 Score Solution Space Best Score: Temperature: 12 75 12 random chance!

Slide 45

Slide 45 text

45 Score Solution Space Best Score: Temperature: 12 75 12 Notice how it unsticks the “pretty good” solution 35 Local Minima

Slide 46

Slide 46 text

46 Score Solution Space Best Score: Temperature: 85 70 85 85 > 12 Which allows finding the better solution

Slide 47

Slide 47 text

47 Score Solution Space Best Score: Temperature: 85 15 18 As temp drops, chance of random changes decreases

Slide 48

Slide 48 text

48 Simulated Annealing • Randomness samples the entire solution space • “Unsticks” from local minima • Over time, random changes less likely, “homing” in on a solution

Slide 49

Slide 49 text

49 Simulated Annealing in Elasticsearch • 100 Iterations per “round” • Decreases temperature by 10% each round • Ends when temp < 0.0001 • ~ 6600 iterations total

Slide 50

Slide 50 text

50 Simulated Annealing in Elasticsearch • “Trains” on the last window of data Training Window Forecasting Backcasting

Slide 51

Slide 51 text

‹#› T-Digest Percentiles

Slide 52

Slide 52 text

52 T-Digest Percentiles • The t-digest algorithm is used to compute quantiles • Quite similar to k-means • Builds sorted centroids • Constraint: max size of a cluster: • 4 * count * q * (1 - q) / C • C = compression

Slide 53

Slide 53 text

53 T-Digest Percentiles • compression trades accuracy for memory usage • about 5*C centroids • error almost always < 3/C • excellent accuracy on extreme quantiles thanks to the q (1 - q) factor • implemented on numbers but could work on anything that is comparable and can be averaged

Slide 54

Slide 54 text

54 Calculating T-digest Percentiles 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • 40 values overall

Slide 55

Slide 55 text

55 Calculating T-digest Percentiles 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • 40 values overall • -9 is the value for 0 <= q < 1/40

Slide 56

Slide 56 text

56 Calculating T-digest Percentiles 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • 40 values overall • -5 is the value for 1/40 <= q < 4 / 40

Slide 57

Slide 57 text

57 Calculating T-digest Percentiles 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • 40 values overall • 1 is the value for 4/40 <= q < 6/40 • etc.

Slide 58

Slide 58 text

58 Inserting Values into T-digest 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • Inserting 8 into the histogram

Slide 59

Slide 59 text

59 Inserting Values into T-digest 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • Inserting 8 into the histogram • Find the centroid nearest the value

Slide 60

Slide 60 text

60 Inserting Values into T-digest 4 7.3 10 2 1 -5 -9 1 3 2 4 8 3 5 2 2 5 2 3 1 • Inserting 8 into the histogram • Increment the count for the centroid • Adjust the centroid value • Notice that the capacity for all centroids increases slightly

Slide 61

Slide 61 text

61 Inserting Values into T-digest 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • Inserting 5 into the histogram

Slide 62

Slide 62 text

62 Inserting Values into T-digest 4 7 10 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • Inserting 5 into the histogram • Find the centroid nearest the value

Slide 63

Slide 63 text

63 Inserting Values into T-digest 4 5 7 2 1 -5 -9 1 3 2 4 8 2 5 2 2 5 2 3 1 • Inserting 5 into the histogram • Incrementing the count would exceed the threshold • Create new centroid with value 5 and count of 1 10 1

Slide 64

Slide 64 text

64 T-Digest Practical Notes • Adding a new value to the bounds always creates a new centroid (because q(1-q) is 0) • When the histogram is too large: compress • reinsert in random order • when centroid count is > 20 * C in practice

Slide 65

Slide 65 text

‹#› HDRHistogram Percentiles

Slide 66

Slide 66 text

66 HDRHistogram Percentiles • Uses a combination of logarithmic and linear bucketing • Conceptually buckets values in two levels: • Logarithmic scaled buckets • Linear scaled sub-buckets • No bound on the value in each buckets (in practice it is limited to a long value)

Slide 67

Slide 67 text

67 HDRHistogram Percentiles • Accuracy parameter is express as number of significant figures of a value to store in the histogram • Can be between 0 and 5 • Number of significant figures trades accuracy for memory usage • Affects the number of linear sub-buckets used for each logarithmic bucket

Slide 68

Slide 68 text

68 HDRHistogram Bucketing (1 s.f.) 100 101 102 103 104 105 106 107 10 20 30 40 50 60 70 80 90 Logarithmic Buckets Linear Sub-Buckets

Slide 69

Slide 69 text

69 HDRHistogram Bucketing (2 s.f.) 100 Linear Sub-Buckets 110 120 130 990 980 970 960 950 100 102 103 104 105 106 107 Logarithmic Buckets

Slide 70

Slide 70 text

70 Calculating HDRHistogram Percentiles 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 1 2 3 4 5 6 7 8 9 1 3 2 4 6 2 5 2 5 100 101 102 103 104 105 106 107 • 250 values overall

Slide 71

Slide 71 text

71 Calculating HDRHistogram Percentiles 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 1 2 3 4 5 6 7 8 9 1 3 2 4 6 2 5 2 5 100 101 102 103 104 105 106 107 • 250 values overall • 1 is the value for 0 <= q < 1/250

Slide 72

Slide 72 text

72 Calculating HDRHistogram Percentiles 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 1 2 3 4 5 6 7 8 9 1 3 2 4 6 2 5 2 5 100 101 102 103 104 105 106 107 • 250 values overall • 2 is the value for 1/250 <= q < 4/250

Slide 73

Slide 73 text

73 Calculating HDRHistogram Percentiles 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 1 2 3 4 5 6 7 8 9 1 3 2 4 6 2 5 2 5 100 101 102 103 104 105 106 107 • 250 values overall • 70 is the value for q = 0.2

Slide 74

Slide 74 text

74 Inserting Values into HDRHistogram 100 101 102 103 104 • Inserting 42 into the histogram

Slide 75

Slide 75 text

75 Inserting Values into HDRHistogram • Inserting 42 into the histogram • Find the logarithmic bucket for the value 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 100 101 102 103 104

Slide 76

Slide 76 text

76 Inserting Values into HDRHistogram 10 20 30 40 50 60 70 80 90 1 3 2 4 6 2 5 2 5 • Inserting 42 into the histogram • Find the sub-bucket for the value 100 101 102 103 104

Slide 77

Slide 77 text

77 Inserting Values into HDRHistogram • Inserting 42 into the histogram • Increment the count for the bucket 10 20 30 40 50 60 70 80 90 1 3 2 5 6 2 5 2 5 100 101 102 103 104

Slide 78

Slide 78 text

78 Inserting Values into HDRHistogram 100 101 102 103 104 • Inserting 1,400,300 into the histogram • No logarithmic bucket to hold the value

Slide 79

Slide 79 text

100 101 102 103 104 105 106 79 Inserting Values into HDRHistogram • Inserting 1,400,300 into the histogram • Create logarithmic buckets (and sub- buckets) to include the new value 1.1 E5 1.2 E5 1.3 E5 1.4 E5 1.5 E5 1.6 E5 1.7 E5 1.8 E5 1.9 E5 0 0 0 0 0 0 0 0 0 1.1 E6 1.2 E6 1.3 E6 1.4 E6 1.5 E6 1.6 E6 1.7 E6 1.8 E6 1.9 E6 0 0 0 0 0 0 0 0 0

Slide 80

Slide 80 text

100 101 102 103 104 105 106 80 Inserting Values into HDRHistogram • Inserting 1,400,300 into the histogram • Find logarithmic bucket for the value 1.1 E6 1.2 E6 1.3 E6 1.4 E6 1.5 E6 1.6 E6 1.7 E6 1.8 E6 1.9 E6 0 0 0 0 0 0 0 0 0

Slide 81

Slide 81 text

100 101 102 103 104 105 106 81 Inserting Values into HDRHistogram • Inserting 1,400,300 into the histogram • Find sub-bucket bucket for the value 1.1 E6 1.2 E6 1.3 E6 1.4 E6 1.5 E6 1.6 E6 1.7 E6 1.8 E6 1.9 E6 0 0 0 0 0 0 0 0 0

Slide 82

Slide 82 text

82 Inserting Values into HDRHistogram • Inserting 1,400,300 into the histogram • Increment the count for the bucket 1.1 E6 1.2 E6 1.3 E6 1.4 E6 1.5 E6 1.6 E6 1.7 E6 1.8 E6 1.9 E6 0 0 0 1 0 0 0 0 0 100 101 102 103 104 105 106

Slide 83

Slide 83 text

83 HDRHistogram Practical Notes • Implemented as flat long array with base-2 logarithmic bucket values • Accuracy can be better than the set significant digits but can not be worse • Size of histogram in memory depends on the range of values and the number of significant digits • Implementation requires values as longs but wrapper implementation supporting doubles is available

Slide 84

Slide 84 text

84 Which should I use? • Default in Elasticsearch is currently t-digest • Use t-digest when you are interested in the extreme values (e.g. 99.99th percentile) • T-Digest tries to adapt to the data so can be used for a wide variety of data as the expense of some time performance • HDRHistogram is fast as it has a fixed histogram which does not need compression or centroid re-calculations • HDRHistogram requires positive values and will be more beneficial when the data is zero based so cannot be applied to all use cases • HDRHistogram performs very well on latency data

Slide 85

Slide 85 text

‹#› Questions?

Slide 86

Slide 86 text

‹#› Please attribute Elastic with a link to elastic.co Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 86

Slide 87

Slide 87 text

87 Alternative #3: Various RLE Compressed Bitmaps • Generally encode “runs” with codewords All 0’s All 1’s 10110…1 10110…1 10110…1

Slide 88

Slide 88 text

88 Alternative #3: Various RLE Compressed Bitmaps • Generally encode “runs” with codewords All 0’s All 1’s 10110…1 10110…1 10110…1 31 bits “Dirty” 3x 31 bits “All Zero” 31 bits “Dirty” 31 bits “Dirty” 2x 31 bits “All One”

Slide 89

Slide 89 text

89 Alternative #3: Various RLE Compressed Bitmaps • Generally encode “runs” with codewords All 0’s All 1’s 10110…1 10110…1 10110…1 31 bits “Dirty” 3x 31 bits “All Zero” 31 bits “Dirty” 31 bits “Dirty” 2x 31 bits “All One” 10110…1 1 “3” (..100) 00 10110…1 1 10110…1 1 “2” (..010) 01