Slide 1

Slide 1 text

Amusing algorithms and data-structures Adrien Grand - Zachary Tong

Slide 2

Slide 2 text

{ } CC-BY-ND 4.0 Agenda • conjunctions • regexp queries • numeric doc values compression • cardinality aggregation 2

Slide 3

Slide 3 text

{ } CC-BY-ND 4.0 How are conjunctions implemented? 3

Slide 4

Slide 4 text

{ } CC-BY-ND 4.0 Inverted index 4 index lucene elastic shard Terms  dictionary 2 10 49 1 5 2 5 49 2 9 10 50 52 Postings  lists

Slide 5

Slide 5 text

{ } CC-BY-ND 4.0 Inverted index 5 index lucene elastic shard Terms  dictionary 2 10 49 1 5 2 5 49 2 9 10 50 52 Postings  lists next next

Slide 6

Slide 6 text

{ } CC-BY-ND 4.0 Inverted index 6 index lucene elastic shard Terms  dictionary 2 10 49 1 5 2 5 49 2 9 10 50 52 Postings  lists advance(30) advance(“search”) Uses  skip  lists Uses  a  tiny  in-­‐ memory  terms   index.

Slide 7

Slide 7 text

{ } CC-BY-ND 4.0 Conjunctions 7 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 1. Sort by cost

Slide 8

Slide 8 text

{ } CC-BY-ND 4.0 Conjunctions 8 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 1. Sort by cost 2. Leap frog!

Slide 9

Slide 9 text

{ } CC-BY-ND 4.0 Conjunctions 9 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2

Slide 10

Slide 10 text

{ } CC-BY-ND 4.0 Conjunctions 10 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR

Slide 11

Slide 11 text

{ } CC-BY-ND 4.0 Conjunctions 11 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13

Slide 12

Slide 12 text

{ } CC-BY-ND 4.0 Conjunctions 12 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13

Slide 13

Slide 13 text

{ } CC-BY-ND 4.0 Conjunctions 13 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH 13

Slide 14

Slide 14 text

{ } CC-BY-ND 4.0 Conjunctions 14 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 13

Slide 15

Slide 15 text

{ } CC-BY-ND 4.0 Conjunctions 15 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR 13

Slide 16

Slide 16 text

{ } CC-BY-ND 4.0 Conjunctions 16 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR 13

Slide 17

Slide 17 text

{ } CC-BY-ND 4.0 Conjunctions 17 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR advance(22) → 98 13

Slide 18

Slide 18 text

{ } CC-BY-ND 4.0 Conjunctions 18 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR advance(22) → 98 advance(98) → 98 13

Slide 19

Slide 19 text

{ } CC-BY-ND 4.0 Conjunctions 19 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR advance(22) → 98 advance(98) → 98 advance(98) → 98 MATCH 13 98

Slide 20

Slide 20 text

{ } CC-BY-ND 4.0 Conjunctions 20 1 3 13 20 35 80 2 13 17 20 98 98 1 13 22 35 98 99 next → 2 advance(2) → 13 TOO FAR advance(13) → 13 already on 13 advance(13) → 13 MATCH next → 17 advance(17) → 22 TOO FAR advance(22) → 98 advance(98) → 98 advance(98) → 98 MATCH next → ∞ END 13 98

Slide 21

Slide 21 text

{ } CC-BY-ND 4.0 How do regexp queries work? 21

Slide 22

Slide 22 text

{ } CC-BY-ND 4.0 Regexp queries 22 index lucene elastic search 2 10 49 1 5 2 5 49 5 10 50 shard 2 9 10 Challenge: find matching terms and merge postings lists Naive way: - iterate over terms - evaluate regexp against every term SLOWWWWWW

Slide 23

Slide 23 text

{ } CC-BY-ND 4.0 Regexp queries 23 Ela[Ss]tic.* E l a S t i c s *

Slide 24

Slide 24 text

{ } CC-BY-ND 4.0 Regexp queries 24 • Not limited to regexps • Fuzzy queries too! – example: es~1

Slide 25

Slide 25 text

{ } CC-BY-ND 4.0 How are numeric doc values compressed? a column-stride, on-disk, un-inverted index 25

Slide 26

Slide 26 text

{ } CC-BY-ND 4.0 Aggregation Execution 26 “color” Doc IDs blue green red 0 5, 20, 22 12 What is average price of green docs? (inverted index)

Slide 27

Slide 27 text

{ } CC-BY-ND 4.0 _______________ 27 Doc ID “price” 0 5 10 12 20 22 10 20 20 60 60 20 “color” Doc IDs blue green red 0 5, 20, 22 12 What is average price of green docs? (20 + 60 + 20) = 33.33 3 (field data) Aggregation Execution

Slide 28

Slide 28 text

{ } CC-BY-ND 4.0 Field Data and Doc Values 28 • In-memory, lives on JVM Heap • All-or-nothing • Lazily constructed at query-time • Disk-based, leverages OS FS cache • Pages in/out of FS cache • Precomputed at index-time • Allows better compression Field Data Doc Values

Slide 29

Slide 29 text

{ } CC-BY-ND 4.0 “Allows better compression” 29 Lots of cool tricks, let’s dive in!

Slide 30

Slide 30 text

{ } CC-BY-ND 4.0 Numerics: one unique value 30 Doc ID “price” 0 5 10 12 20 22 10 10 10 10 10 10 • Easy :) • Write the constant value and set a flag • 4 bytes to represent n values Constant Encoding

Slide 31

Slide 31 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 31 Table Encoding • Write a table of all possible values, then encode data as bit- packed ordinals • Great compression when few unique values • Better when num_docs >> num_values • Best case is 1 bits/doc, worst case is 1 byte/doc

Slide 32

Slide 32 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 32 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 Only three unique values (10, 20, 50) Table Encoding

Slide 33

Slide 33 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 33 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 De-dupe, sort Table Encoding [10, 20, 50]

Slide 34

Slide 34 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 34 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 De-dupe, sort Table Encoding Write Longs [10, 20, 50] 00 00 00 0A 00 00 00 14 00 00 00 30

Slide 35

Slide 35 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 35 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 Table Encoding Encode with min bits De-dupe, sort Write Longs [10, 20, 50] 00 00 00 0A 00 00 00 14 00 00 00 30

Slide 36

Slide 36 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 36 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 Table Encoding Encode with min bits 11 10 10 10 01 11 De-dupe, sort Write Longs [10, 20, 50] \x0 \x0 \x0 \x10 \x0 \x0 \x0 \x14 \x0 \x0 \x0 \x30 min_bits = msb( table.size() - 1 ); most significant bit (2 bits) 3 values = 00000011

Slide 37

Slide 37 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 37 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 Table Encoding Encode with min bits 11 10 10 10 01 11 De-dupe, sort Write Longs [10, 20, 50] 00 00 00 0A 00 00 00 14 00 00 00 30

Slide 38

Slide 38 text

{ } CC-BY-ND 4.0 Numerics: < 256 unique values 38 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 Table Encoding Encode with min bits 11 10 10 10 01 11 De-dupe, sort Write Longs [10, 20, 50] 00 00 00 0A 00 00 00 14 00 00 00 30 Pack Bytes 0E 0A 07

Slide 39

Slide 39 text

{ } CC-BY-ND 4.0 Numerics: Common Denominator 39 GCD Encoding • Certain types of data share common denominators • E.g. timestamps without ms precision • 142542454000 • 142542455000 • 142542456000
 • If a gcd is found, can encode multiples 
 of gcd interval, fewer bits gcd of 1000 x2 gcd x1 gcd 142542454000

Slide 40

Slide 40 text

{ } CC-BY-ND 4.0 Numerics: Common Denominator 40 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 GCD Encoding Share 10 as common divisor

Slide 41

Slide 41 text

{ } CC-BY-ND 4.0 Numerics: Common Denominator 41 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 GCD Encoding = (value - minValue) / gcd = (50 - 10) / 10 = 4 GCD Encode

Slide 42

Slide 42 text

{ } CC-BY-ND 4.0 Numerics: Common Denominator 42 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 GCD Encoding GCD Encode [4, 2, 2, 1, 4]

Slide 43

Slide 43 text

{ } CC-BY-ND 4.0 Encode with min bits Numerics: Common Denominator 43 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 GCD Encoding GCD Encode [4, 2, 2, 1, 4] 100 010 010 001 100

Slide 44

Slide 44 text

{ } CC-BY-ND 4.0 Encode with min bits Numerics: Common Denominator 44 Doc ID “price” 0 5 10 12 20 22 50 20 20 20 10 50 GCD Encoding GCD Encode [4, 2, 2, 1, 4] 100 010 010 001 100 Pack Bytes 04 04 08 0E

Slide 45

Slide 45 text

{ } CC-BY-ND 4.0 Numerics: If all else fails 45 Delta Encoding • If we can’t use any “tricks”, delta encode • Encode everything as an offset from the minValue Offset minimum value 199872 Basically less-good GCD encoding!

Slide 46

Slide 46 text

{ } CC-BY-ND 4.0 Numerics: If all else fails 46 Delta Encoding Doc ID “price” 0 5 10 12 20 22 3 2 2 4 5 6 = (value - minValue) = (3 - 2) = 1 Delta Encode

Slide 47

Slide 47 text

{ } CC-BY-ND 4.0 Numerics: If all else fails 47 Delta Encoding Doc ID “price” 0 5 10 12 20 22 3 2 2 4 5 6 [1, 0, 0, 2, 3, 4] Delta Encode Encode with min bits 001 000 000 010 011 100 Pack Bytes 08 00 09 0C

Slide 48

Slide 48 text

{ } CC-BY-ND 4.0 48 Slides at end of presentation if you’re curious No time to talk about strings, sorry!

Slide 49

Slide 49 text

{ } CC-BY-ND 4.0 How does the Cardinality agg work? bit-pattern observable magic 49

Slide 50

Slide 50 text

{ } CC-BY-ND 4.0 50 Distinct Counts: Naive Solution Cardinality agg == SELECT Count(Distinct foo)

Slide 51

Slide 51 text

{ } CC-BY-ND 4.0 51 blue green red … Distinct Counts: Naive Solution Maintain a map all values • Cardinality == map.size()
 • map.size() == n • Memory usage == n * size of each term (Ignoring map overhead)

Slide 52

Slide 52 text

{ } CC-BY-ND 4.0 52 blue green red … Distinct Counts: Naive Solution Gets worse in distributed environment Node 1 abc xyz … Node 2 abc green xyz … Node 3

Slide 53

Slide 53 text

{ } CC-BY-ND 4.0 53 Distinct Counts: Naive Solution Gets worse in distributed environment Node 1 blue green red … abc xyz … Node 2 Node 3 abc green xyz … Node 4 blue green red … abc green xyz … abc xyz … Merge

Slide 54

Slide 54 text

{ } CC-BY-ND 4.0 54 Distinct Counts: HyperLogLog++ Cardinality agg uses HyperLogLog++ instead • Approximates cardinality • Uses only a few Kb of memory for billions of distinct values • < 5% error (adjustable) • Fast! • Lossless unions

Slide 55

Slide 55 text

{ } CC-BY-ND 4.0 55 Bit-Observable Patterns Let’s flip some coins… 2 n 1 Probability of a “run”

Slide 56

Slide 56 text

{ } CC-BY-ND 4.0 56 Bit-Observable Patterns Let’s flip some coins… 2 n 1 Probability of a “run” 32 1 5 heads in a row

Slide 57

Slide 57 text

{ } CC-BY-ND 4.0 57 Bit-Observable Patterns Let’s flip some coins… 2 n 1 32 1 Probability of a “run” 5 heads in a row 1048576 1 20 heads in a row

Slide 58

Slide 58 text

{ } CC-BY-ND 4.0 58 Bit-Observable Patterns Let’s flip some coins… 2 n 1 32 1 Probability of a “run” 5 heads in a row 1048576 1 20 heads in a row Could do this in one sitting Might take all day

Slide 59

Slide 59 text

{ } CC-BY-ND 4.0 Key Insight: Length of the run ~= duration of coin flipping 59

Slide 60

Slide 60 text

{ } CC-BY-ND 4.0 60 Bit-Observable Patterns Let’s hash values, instead of flipping coins… v = 12345 h(v) = cbf5a = 11001011111101011010 Run of 1 zero

Slide 61

Slide 61 text

{ } CC-BY-ND 4.0 61 Bit-Observable Patterns Let’s hash values, instead of flipping coins… v = 12345 h(v) = cbf5a = 11001011111101011010 Set “register” to 1 1

Slide 62

Slide 62 text

{ } CC-BY-ND 4.0 62 Bit-Observable Patterns Let’s hash values, instead of flipping coins… v = 3456 h(v) = 8D338 = 10001101001100111000 Run of 3 zeros 1

Slide 63

Slide 63 text

{ } CC-BY-ND 4.0 63 Bit-Observable Patterns Let’s hash values, instead of flipping coins… Set “register” to 3 3 v = 3456 h(v) = 8D338 = 10001101001100111000

Slide 64

Slide 64 text

{ } CC-BY-ND 4.0 64 Bit-Observable Patterns Let’s hash values, instead of flipping coins… v = 948 h(v) = 47D34 = 01000111110100110100 Run of 2 zeros Don’t update register 3

Slide 65

Slide 65 text

{ } CC-BY-ND 4.0 Key Insight: Length of the run ~= 65 2 n 1 32 1 Probability of a “run” 5 zeros in a row 1048576 1 20 zeros in a row ~32 distinct values ~1048576 distinct values cardinality duration of coin flipping

Slide 66

Slide 66 text

{ } CC-BY-ND 4.0 What if you get unlucky on first value? 66 v = 938 h(v) = 0400 = 0000010000000000 Run of 10 zeros oops :(

Slide 67

Slide 67 text

{ } CC-BY-ND 4.0 Solution: keep multiple counters 67 v = 938 h(v) = 0400 = 0000010000000000 Stochastic Averaging

Slide 68

Slide 68 text

{ } CC-BY-ND 4.0 Solution: keep multiple counters 68 v = 938 h(v) = 0400 = 0000010000000000 Stochastic Averaging Run of 10 zeros Use first 3 bits as register index

Slide 69

Slide 69 text

{ } CC-BY-ND 4.0 Solution: keep multiple counters 69 v = 938 h(v) = 0400 = 0000010000000000 Stochastic Averaging Set register[0] to 10 10

Slide 70

Slide 70 text

{ } CC-BY-ND 4.0 Solution: keep multiple counters 70 v = 7482 h(v) = 9D3A = 1001110100111010 Stochastic Averaging Run of 1 zero Use first 3 bits as register index 10

Slide 71

Slide 71 text

{ } CC-BY-ND 4.0 Solution: keep multiple counters 71 v = 7482 h(v) = 9D3A = 1001110100111010 Stochastic Averaging 10 Set register[4] to 1 1

Slide 72

Slide 72 text

{ } CC-BY-ND 4.0 Cardinality is the Harmonic Mean of the registers 72 Stochastic Averaging 10 1 1 8 3 Harmonic Mean = 1.9544 (and some empirical constants)

Slide 73

Slide 73 text

{ } CC-BY-ND 4.0 Registers are small! 73 Other neat attributes 5-6 bits 10 1 1 8 3

Slide 74

Slide 74 text

{ } CC-BY-ND 4.0 Unions are lossless! 74 Other neat attributes 10 1 1 8 3 4 5 1 2 7 U = 10 5 1 8 7 Take max of each register

Slide 75

Slide 75 text

{ } CC-BY-ND 4.0 75 Which is perfect for distributed environments Node 1 Node 2 Node 3 Node 4 Merge Other neat attributes

Slide 76

Slide 76 text

{ } CC-BY-ND 4.0 In closing… 76 Stop worrying and learn to love approximate algorithms

Slide 77

Slide 77 text

{ } Thank you! @jpountz @polyfractal

Slide 78

Slide 78 text

{ } This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA CC-BY-ND 4.0

Slide 79

Slide 79 text

{ } CC-BY-ND 4.0 Strings: Just a big ol' blob 79 • Strings are simpler • Basic idea is to: • Encode a term dictionary in a binary blob • Compress ordinal using numeric compression
 • Three schemes to encode the blob: • Fixed, Variable, Prefix Term Dictionary Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 aaa bbb abc Ordinal Map

Slide 80

Slide 80 text

{ } CC-BY-ND 4.0 Strings: Equal-sized terms 80 Serialize bytes Fixed-width Encoding 61 61 61 62 62 62 61 62 63 … ‘aaa’ ‘bbb’ ‘abc’ Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 aaa bbb abc

Slide 81

Slide 81 text

{ } CC-BY-ND 4.0 Strings: Equal-sized terms 81 Serialize bytes Fixed-width Encoding 61 61 61 62 62 62 61 62 63 … ‘aaa’ ‘bbb’ ‘abc’ Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 aaa bbb abc Compress as Numeric data

Slide 82

Slide 82 text

{ } CC-BY-ND 4.0 Strings: Variable-sized, < 1024 terms 82 Serialize bytes Variable-width Encoding 61 61 62 61 62 63 … ‘a’ ‘ab’ ‘abc’ Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 a ab abc

Slide 83

Slide 83 text

{ } CC-BY-ND 4.0 Strings: Variable-sized, < 1024 terms 83 Serialize bytes Variable-width Encoding 61 61 62 61 62 63 … ‘a’ ‘ab’ ‘abc’ Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 a ab abc Pack lengths in VarInts 06

Slide 84

Slide 84 text

{ } CC-BY-ND 4.0 Strings: Variable-sized, < 1024 terms 84 Serialize bytes Variable-width Encoding 61 61 62 61 62 63 … ‘a’ ‘ab’ ‘abc’ Doc ID “widget” 0 5 10 0 1 2 Ord ID Term 0 1 2 a ab abc Pack lengths in VarInts 06 Compress as Numeric data

Slide 85

Slide 85 text

{ } CC-BY-ND 4.0 Strings: Everything Else 85 Serialize first Prefix Encoding 61 61 ‘aa’ Ord ID Term 0 1 2 aa aaa abc

Slide 86

Slide 86 text

{ } CC-BY-ND 4.0 Strings: Everything Else 86 Serialize first Prefix Encoding 61 61 ‘aa’ Ord ID Term 0 1 2 aa aaa abc Write prefix length 02

Slide 87

Slide 87 text

{ } CC-BY-ND 4.0 Strings: Everything Else 87 Serialize first Prefix Encoding 61 61 ‘aa’ Ord ID Term 0 1 2 aa aaa abc Write prefix length 02 Write remaining bytes 61 ‘a’

Slide 88

Slide 88 text

{ } CC-BY-ND 4.0 Strings: Everything Else 88 Serialize first Prefix Encoding 61 61 ‘aa’ Ord ID Term 0 1 2 aa aaa abc Write prefix length 02 Write remaining bytes 61 ‘a’ Write prefix length 01 Write remaining bytes 62 63 ‘bc’

Slide 89

Slide 89 text

{ } CC-BY-ND 4.0 Strings: Everything Else 89 Serialize first Prefix Encoding 61 61 ‘aa’ Ord ID Term 0 1 2 aa aaa abc Write prefix length 02 Write remaining bytes 61 ‘a’ Write prefix length 01 Write remaining bytes 62 63 ‘bc’ Re-serialize prefix start point every 16 terms

Slide 90

Slide 90 text

{ } CC-BY-ND 4.0 Strings: Everything Else 90 Prefix Encoding When done, write a ReverseTermIndex every 1024 terms Position Term 0 1024 2048 aa gef xyz Pack as VarInts

Slide 91

Slide 91 text

{ } CC-BY-ND 4.0 Strings: Everything Else 91 Prefix Encoding Finally, write out Ordinal Map Doc ID “widget” 0 5 10 0 1 2 Compress as Numeric data