Slide 1

Slide 1 text

Fast indexes with roaring Daniel Lemire and collaborators blog: https://lemire.me twitter: @lemire Université du Québec (TÉLUQ) Montreal Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 2

Slide 2 text

The roaring Go library is used by Cloud Torrent runv InfluxDB Pilosa Bleve lindb Elasticell SourceGraph M3 trident Part of the Awesome Go collection. Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 3

Slide 3 text

Sets A fundamental concept (sets of documents, identifiers, tuples...) For performance, we often work with sets of integers (identifiers). → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 4

Slide 4 text

tests : ? intersections : , unions : , differences : Similarity (Jaccard/Tanimoto): Iteration x ∈ S S ∩ 2 S1 S ∪ 2 S1 S ∖ 2 S1 ∣S ∩ 1 S ∣/∣S ∪ 1 1 S ∣ 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 5

Slide 5 text

How to implement sets? hash tables ( map[int]bool{} ) bitmap: willf/bitset compressed bitmaps Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 6

Slide 6 text

Hash tables in-order access is kind of terrible [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1 , 14, 10, 7] [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2 , 1, 14, 10, 7] [15, 3 , 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4 , 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4, 5 , 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6 , 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 7

Slide 7 text

Bitmaps Efficient way to represent sets of integers. For example, 0, 1, 3, 4 becomes 0b11011 or "27". 0b00001 0b01001 0b11001 0b11011 {0} → {0, 3} → {0, 3, 4} → {0, 1, 3, 4} → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 8

Slide 8 text

Manipulate a bitmap 64-bit processor. Given x , word index is x/64 and bit index x % 64 . add(x) { array[x / 64] |= (1 << (x % 64)) } Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 9

Slide 9 text

How fast is it? index = x / 64 -> a shift mask = 1 << ( x % 64) -> a shift array[ index ] |- mask -> a OR with memory One bit every cycles because of superscalarity ≈ 1.65 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 10

Slide 10 text

Bit parallelism Intersection between {0, 1, 3} and {1, 3} a single AND operation between 0b1011 and 0b1010 . Result is 0b1010 or {1, 3}. No branching! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 11

Slide 11 text

Bitsets can take too much memory {1, 32000, 64000} : 1000 bytes for three values We use compression! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 12

Slide 12 text

Git (GitHub) use EWAH Run-length encoding Example: est Code long runs of 0s or 1s efficiently. https://github.com/git/git/blob/master/ewah/bitmap.c 000000001111111100 00000000 − 11111111 − 00 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 13

Slide 13 text

Complexity Intersection : or In-place union ( ): or O(∣S ∣ + 1 ∣S ∣) 2 O(min(∣S ∣, ∣S ∣)) 1 2 S ← 2 S ∪ 1 S2 O(∣S ∣ + 1 ∣S ∣) 2 O(∣S ∣) 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

Slide 14

Slide 14 text

Roaring Bitmaps Java, C, Go, Swift, Python, Node/JavaScript, Rust, C# interoperable Roaring bitmaps

Slide 15

Slide 15 text

Roaring bitmaps (http://roaringbitmap.org/) are found in: Apache Lucene and derivative systems such as Solr and Elasticsearch, Apache Druid, Apache Spark, Yandex ClickHouse, Netflix Atlas, LinkedIn Pinot, Whoosh, Microsoft Visual Studio Team Services (VSTS), Intel's Optimized Analytics Package (OAP), eBay's Apache Kylin, and many more!!! Roaring bitmaps

Slide 16

Slide 16 text

Several papers Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), April 2018. Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), May 2016. Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), November 2016. Roaring bitmaps

Slide 17

Slide 17 text

Hybrid model Set of containers sorted arrays ({1,20,144}) bitset (0b10000101011) runs ([0,10],[15,20]) Roaring bitmaps

Slide 18

Slide 18 text

Roaring bitmaps

Slide 19

Slide 19 text

Format specification See https://github.com/RoaringBitmap/RoaringFormatSpec Roaring bitmaps

Slide 20

Slide 20 text

Roaring All containers are small (8 kB), fit in CPU cache We predict the output container type during computations E.g., when array gets too large, we switch to a bitset Union of two large arrays is materialized as a bitset... Dozens of heuristics... sorting networks and so on Roaring bitmaps

Slide 21

Slide 21 text

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017) Roaring bitmaps

Slide 22

Slide 22 text

Go issues Roaring bitmaps

Slide 23

Slide 23 text

Go is shy about inlining Won't inline some small functions that contain a branch? func (b *BitSet) Set(i uint) *BitSet { b.extendSetMaybe(i) b.set[i>>log2WordSize] |= 1 << (i & (wordSize - 1)) return b } https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/ Roaring bitmaps

Slide 24

Slide 24 text

Go guards too much bits.OnesCount64(x) Roaring bitmaps

Slide 25

Slide 25 text

0x1093534 0fb63d22810c00 MOVZX 0xc8122(IP), DI 0x109353b 4084ff TESTL DI, DI 0x109353e 7407 JE 0x1093547 0x1093540 f3480fb8f6 POPCNT SI, SI 0x1093545 ebd6 JMP 0x109351d 0x1093547 4889442418 MOVQ AX, 0x18(SP) 0x109354c 4889542410 MOVQ DX, 0x10(SP) 0x1093551 48894c2420 MOVQ CX, 0x20(SP) 0x1093556 48893424 MOVQ SI, 0(SP) 0x109355a e801ffffff CALL math/bits.OnesCount64(SB) 0x109355f 488b742408 MOVQ 0x8(SP), SI 0x1093564 488b442418 MOVQ 0x18(SP), AX 0x1093569 488b4c2420 MOVQ 0x20(SP), CX 0x109356e 488b542410 MOVQ 0x10(SP), DX 0x1093573 488b5c2440 MOVQ 0x40(SP), BX 0x1093578 eba3 JMP 0x109351d Roaring bitmaps

Slide 26

Slide 26 text

Thankfully assembly in Go is "easy" TEXT ·popcntOrSliceAsm(SB),4,$0-56 XORQ AX, AX MOVQ s+0(FP), SI MOVQ s_len+8(FP), CX TESTQ CX, CX JZ popcntOrSliceEnd MOVQ m+24(FP), DI popcntOrSliceLoop: MOVQ (DI), DX ORQ (SI), DX POPCNTQ_DX_DX ADDQ DX, AX ADDQ $8, SI ADDQ $8, DI LOOP popcntOrSliceLoop popcntOrSliceEnd: MOVQ AX, ret+48(FP) RET Roaring bitmaps

Slide 27

Slide 27 text

But may not work in the cloud. Roaring bitmaps

Slide 28

Slide 28 text

Fast serialization buf := &bytes.Buffer{} _, err := rb.WriteTo(buf) Roaring bitmaps

Slide 29

Slide 29 text

Fast deserialization No memory allocation, no copy! r := NewBitmap() _, err = r.FromBuffer(buf.Bytes()) Roaring bitmaps

Slide 30

Slide 30 text

Casting a slice is tricky func byteSliceAsUint16Slice(slice []byte) (result []uint16) { // here we create a new slice holder if len(slice)%2 != 0 { panic("Slice size should be divisible by 2") } // reference: https://go101.org/article/unsafe.html // make a new slice header bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice)) rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result)) // transfer the data from the given slice to a new variable (our result) rHeader.Data = bHeader.Data rHeader.Len = bHeader.Len / 2 rHeader.Cap = bHeader.Cap / 2 // instantiate result and use KeepAlive so data isn't unmapped. runtime.KeepAlive(&slice) // it is still crucial, GC can free it) // return result return } Roaring bitmaps

Slide 31

Slide 31 text

Iterators: don't drink from straws Roaring bitmaps

Slide 32

Slide 32 text

Old School it := b.Iterator() for it.HasNext() { ... } Roaring bitmaps

Slide 33

Slide 33 text

Batched Iterations buf := make([]uint32, 4096) ... for n := it.NextMany(buf); n != 0; n = it.NextMany(buf) { for _, v := range buf[:n] { ... } } Roaring bitmaps

Slide 34

Slide 34 text

BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run - BenchmarkRealDataNext/census1881-4 8479939 ns/op BenchmarkRealDataNextMany/census1881-4 1057743 ns/op Batched iterators can be 8 times faster! Roaring bitmaps

Slide 35

Slide 35 text

To learn more... Blog (twice a week) : https://lemire.me/blog/ GitHub: https://github.com/lemire Home page : https://lemire.me/en/ CRSNG : Faster Compressed Indexes On Next-Generation Hardware (2017-2022) Twitter @lemire Roaring bitmaps