Pro Yearly is on sale from $80 to $50! »

Fast indexes with roaring #gomtl-10

4b736113aa1557b9a110b5123d81d5f6?s=47 Daniel Lemire
November 14, 2019

Fast indexes with roaring #gomtl-10

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).

Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.

4b736113aa1557b9a110b5123d81d5f6?s=128

Daniel Lemire

November 14, 2019
Tweet

Transcript

  1. Fast indexes with roaring Daniel Lemire and collaborators blog: https://lemire.me

    twitter: @lemire Université du Québec (TÉLUQ) Montreal Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  2. The roaring Go library is used by Cloud Torrent runv

    InfluxDB Pilosa Bleve lindb Elasticell SourceGraph M3 trident Part of the Awesome Go collection. Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  3. Sets A fundamental concept (sets of documents, identifiers, tuples...) For

    performance, we often work with sets of integers (identifiers). → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  4. tests : ? intersections : , unions : , differences

    : Similarity (Jaccard/Tanimoto): Iteration x ∈ S S ∩ 2 S1 S ∪ 2 S1 S ∖ 2 S1 ∣S ∩ 1 S ∣/∣S ∪ 1 1 S ∣ 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  5. How to implement sets? hash tables ( map[int]bool{} ) bitmap:

    willf/bitset compressed bitmaps Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  6. Hash tables in-order access is kind of terrible [15, 3,

    0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1 , 14, 10, 7] [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2 , 1, 14, 10, 7] [15, 3 , 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4 , 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4, 5 , 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6 , 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  7. Bitmaps Efficient way to represent sets of integers. For example,

    0, 1, 3, 4 becomes 0b11011 or "27". 0b00001 0b01001 0b11001 0b11011 {0} → {0, 3} → {0, 3, 4} → {0, 1, 3, 4} → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  8. Manipulate a bitmap 64-bit processor. Given x , word index

    is x/64 and bit index x % 64 . add(x) { array[x / 64] |= (1 << (x % 64)) } Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  9. How fast is it? index = x / 64 ->

    a shift mask = 1 << ( x % 64) -> a shift array[ index ] |- mask -> a OR with memory One bit every cycles because of superscalarity ≈ 1.65 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  10. Bit parallelism Intersection between {0, 1, 3} and {1, 3}

    a single AND operation between 0b1011 and 0b1010 . Result is 0b1010 or {1, 3}. No branching! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  11. Bitsets can take too much memory {1, 32000, 64000} :

    1000 bytes for three values We use compression! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  12. Git (GitHub) use EWAH Run-length encoding Example: est Code long

    runs of 0s or 1s efficiently. https://github.com/git/git/blob/master/ewah/bitmap.c 000000001111111100 00000000 − 11111111 − 00 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  13. Complexity Intersection : or In-place union ( ): or O(∣S

    ∣ + 1 ∣S ∣) 2 O(min(∣S ∣, ∣S ∣)) 1 2 S ← 2 S ∪ 1 S2 O(∣S ∣ + 1 ∣S ∣) 2 O(∣S ∣) 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  14. Roaring Bitmaps Java, C, Go, Swift, Python, Node/JavaScript, Rust, C#

    interoperable Roaring bitmaps
  15. Roaring bitmaps (http://roaringbitmap.org/) are found in: Apache Lucene and derivative

    systems such as Solr and Elasticsearch, Apache Druid, Apache Spark, Yandex ClickHouse, Netflix Atlas, LinkedIn Pinot, Whoosh, Microsoft Visual Studio Team Services (VSTS), Intel's Optimized Analytics Package (OAP), eBay's Apache Kylin, and many more!!! Roaring bitmaps
  16. Several papers Roaring Bitmaps: Implementation of an Optimized Software Library,

    Software: Practice and Experience 48 (4), April 2018. Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), May 2016. Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), November 2016. Roaring bitmaps
  17. Hybrid model Set of containers sorted arrays ({1,20,144}) bitset (0b10000101011)

    runs ([0,10],[15,20]) Roaring bitmaps
  18. Roaring bitmaps

  19. Format specification See https://github.com/RoaringBitmap/RoaringFormatSpec Roaring bitmaps

  20. Roaring All containers are small (8 kB), fit in CPU

    cache We predict the output container type during computations E.g., when array gets too large, we switch to a bitset Union of two large arrays is materialized as a bitset... Dozens of heuristics... sorting networks and so on Roaring bitmaps
  21. Use Roaring for bitmap compression whenever possible. Do not use

    other bitmap compression methods (Wang et al., SIGMOD 2017) Roaring bitmaps
  22. Go issues Roaring bitmaps

  23. Go is shy about inlining Won't inline some small functions

    that contain a branch? func (b *BitSet) Set(i uint) *BitSet { b.extendSetMaybe(i) b.set[i>>log2WordSize] |= 1 << (i & (wordSize - 1)) return b } https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/ Roaring bitmaps
  24. Go guards too much bits.OnesCount64(x) Roaring bitmaps

  25. 0x1093534 0fb63d22810c00 MOVZX 0xc8122(IP), DI 0x109353b 4084ff TESTL DI, DI

    0x109353e 7407 JE 0x1093547 0x1093540 f3480fb8f6 POPCNT SI, SI 0x1093545 ebd6 JMP 0x109351d 0x1093547 4889442418 MOVQ AX, 0x18(SP) 0x109354c 4889542410 MOVQ DX, 0x10(SP) 0x1093551 48894c2420 MOVQ CX, 0x20(SP) 0x1093556 48893424 MOVQ SI, 0(SP) 0x109355a e801ffffff CALL math/bits.OnesCount64(SB) 0x109355f 488b742408 MOVQ 0x8(SP), SI 0x1093564 488b442418 MOVQ 0x18(SP), AX 0x1093569 488b4c2420 MOVQ 0x20(SP), CX 0x109356e 488b542410 MOVQ 0x10(SP), DX 0x1093573 488b5c2440 MOVQ 0x40(SP), BX 0x1093578 eba3 JMP 0x109351d Roaring bitmaps
  26. Thankfully assembly in Go is "easy" TEXT ·popcntOrSliceAsm(SB),4,$0-56 XORQ AX,

    AX MOVQ s+0(FP), SI MOVQ s_len+8(FP), CX TESTQ CX, CX JZ popcntOrSliceEnd MOVQ m+24(FP), DI popcntOrSliceLoop: MOVQ (DI), DX ORQ (SI), DX POPCNTQ_DX_DX ADDQ DX, AX ADDQ $8, SI ADDQ $8, DI LOOP popcntOrSliceLoop popcntOrSliceEnd: MOVQ AX, ret+48(FP) RET Roaring bitmaps
  27. But may not work in the cloud. Roaring bitmaps

  28. Fast serialization buf := &bytes.Buffer{} _, err := rb.WriteTo(buf) Roaring

    bitmaps
  29. Fast deserialization No memory allocation, no copy! r := NewBitmap()

    _, err = r.FromBuffer(buf.Bytes()) Roaring bitmaps
  30. Casting a slice is tricky func byteSliceAsUint16Slice(slice []byte) (result []uint16)

    { // here we create a new slice holder if len(slice)%2 != 0 { panic("Slice size should be divisible by 2") } // reference: https://go101.org/article/unsafe.html // make a new slice header bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice)) rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result)) // transfer the data from the given slice to a new variable (our result) rHeader.Data = bHeader.Data rHeader.Len = bHeader.Len / 2 rHeader.Cap = bHeader.Cap / 2 // instantiate result and use KeepAlive so data isn't unmapped. runtime.KeepAlive(&slice) // it is still crucial, GC can free it) // return result return } Roaring bitmaps
  31. Iterators: don't drink from straws Roaring bitmaps

  32. Old School it := b.Iterator() for it.HasNext() { ... }

    Roaring bitmaps
  33. Batched Iterations buf := make([]uint32, 4096) ... for n :=

    it.NextMany(buf); n != 0; n = it.NextMany(buf) { for _, v := range buf[:n] { ... } } Roaring bitmaps
  34. BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run - BenchmarkRealDataNext/census1881-4 8479939 ns/op

    BenchmarkRealDataNextMany/census1881-4 1057743 ns/op Batched iterators can be 8 times faster! Roaring bitmaps
  35. To learn more... Blog (twice a week) : https://lemire.me/blog/ GitHub:

    https://github.com/lemire Home page : https://lemire.me/en/ CRSNG : Faster Compressed Indexes On Next-Generation Hardware (2017-2022) Twitter @lemire Roaring bitmaps