Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast indexes with roaring #gomtl-10

Daniel Lemire
November 14, 2019

Fast indexes with roaring #gomtl-10

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).

Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.

Daniel Lemire

November 14, 2019
Tweet

More Decks by Daniel Lemire

Other Decks in Technology

Transcript

  1. Fast indexes with roaring Daniel Lemire and collaborators blog: https://lemire.me

    twitter: @lemire Université du Québec (TÉLUQ) Montreal Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  2. The roaring Go library is used by Cloud Torrent runv

    InfluxDB Pilosa Bleve lindb Elasticell SourceGraph M3 trident Part of the Awesome Go collection. Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  3. Sets A fundamental concept (sets of documents, identifiers, tuples...) For

    performance, we often work with sets of integers (identifiers). → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  4. tests : ? intersections : , unions : , differences

    : Similarity (Jaccard/Tanimoto): Iteration x ∈ S S ∩ 2 S1 S ∪ 2 S1 S ∖ 2 S1 ∣S ∩ 1 S ∣/∣S ∪ 1 1 S ∣ 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  5. How to implement sets? hash tables ( map[int]bool{} ) bitmap:

    willf/bitset compressed bitmaps Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  6. Hash tables in-order access is kind of terrible [15, 3,

    0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1 , 14, 10, 7] [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2 , 1, 14, 10, 7] [15, 3 , 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4 , 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6, 11, 4, 5 , 9, 12, 13, 8, 2, 1, 14, 10, 7] [15, 3, 0, 6 , 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7] Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  7. Bitmaps Efficient way to represent sets of integers. For example,

    0, 1, 3, 4 becomes 0b11011 or "27". 0b00001 0b01001 0b11001 0b11011 {0} → {0, 3} → {0, 3, 4} → {0, 1, 3, 4} → Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  8. Manipulate a bitmap 64-bit processor. Given x , word index

    is x/64 and bit index x % 64 . add(x) { array[x / 64] |= (1 << (x % 64)) } Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  9. How fast is it? index = x / 64 ->

    a shift mask = 1 << ( x % 64) -> a shift array[ index ] |- mask -> a OR with memory One bit every cycles because of superscalarity ≈ 1.65 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  10. Bit parallelism Intersection between {0, 1, 3} and {1, 3}

    a single AND operation between 0b1011 and 0b1010 . Result is 0b1010 or {1, 3}. No branching! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  11. Bitsets can take too much memory {1, 32000, 64000} :

    1000 bytes for three values We use compression! Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  12. Git (GitHub) use EWAH Run-length encoding Example: est Code long

    runs of 0s or 1s efficiently. https://github.com/git/git/blob/master/ewah/bitmap.c 000000001111111100 00000000 − 11111111 − 00 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  13. Complexity Intersection : or In-place union ( ): or O(∣S

    ∣ + 1 ∣S ∣) 2 O(min(∣S ∣, ∣S ∣)) 1 2 S ← 2 S ∪ 1 S2 O(∣S ∣ + 1 ∣S ∣) 2 O(∣S ∣) 2 Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.
  14. Roaring bitmaps (http://roaringbitmap.org/) are found in: Apache Lucene and derivative

    systems such as Solr and Elasticsearch, Apache Druid, Apache Spark, Yandex ClickHouse, Netflix Atlas, LinkedIn Pinot, Whoosh, Microsoft Visual Studio Team Services (VSTS), Intel's Optimized Analytics Package (OAP), eBay's Apache Kylin, and many more!!! Roaring bitmaps
  15. Several papers Roaring Bitmaps: Implementation of an Optimized Software Library,

    Software: Practice and Experience 48 (4), April 2018. Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), May 2016. Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), November 2016. Roaring bitmaps
  16. Roaring All containers are small (8 kB), fit in CPU

    cache We predict the output container type during computations E.g., when array gets too large, we switch to a bitset Union of two large arrays is materialized as a bitset... Dozens of heuristics... sorting networks and so on Roaring bitmaps
  17. Use Roaring for bitmap compression whenever possible. Do not use

    other bitmap compression methods (Wang et al., SIGMOD 2017) Roaring bitmaps
  18. Go is shy about inlining Won't inline some small functions

    that contain a branch? func (b *BitSet) Set(i uint) *BitSet { b.extendSetMaybe(i) b.set[i>>log2WordSize] |= 1 << (i & (wordSize - 1)) return b } https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/ Roaring bitmaps
  19. 0x1093534 0fb63d22810c00 MOVZX 0xc8122(IP), DI 0x109353b 4084ff TESTL DI, DI

    0x109353e 7407 JE 0x1093547 0x1093540 f3480fb8f6 POPCNT SI, SI 0x1093545 ebd6 JMP 0x109351d 0x1093547 4889442418 MOVQ AX, 0x18(SP) 0x109354c 4889542410 MOVQ DX, 0x10(SP) 0x1093551 48894c2420 MOVQ CX, 0x20(SP) 0x1093556 48893424 MOVQ SI, 0(SP) 0x109355a e801ffffff CALL math/bits.OnesCount64(SB) 0x109355f 488b742408 MOVQ 0x8(SP), SI 0x1093564 488b442418 MOVQ 0x18(SP), AX 0x1093569 488b4c2420 MOVQ 0x20(SP), CX 0x109356e 488b542410 MOVQ 0x10(SP), DX 0x1093573 488b5c2440 MOVQ 0x40(SP), BX 0x1093578 eba3 JMP 0x109351d Roaring bitmaps
  20. Thankfully assembly in Go is "easy" TEXT ·popcntOrSliceAsm(SB),4,$0-56 XORQ AX,

    AX MOVQ s+0(FP), SI MOVQ s_len+8(FP), CX TESTQ CX, CX JZ popcntOrSliceEnd MOVQ m+24(FP), DI popcntOrSliceLoop: MOVQ (DI), DX ORQ (SI), DX POPCNTQ_DX_DX ADDQ DX, AX ADDQ $8, SI ADDQ $8, DI LOOP popcntOrSliceLoop popcntOrSliceEnd: MOVQ AX, ret+48(FP) RET Roaring bitmaps
  21. Fast deserialization No memory allocation, no copy! r := NewBitmap()

    _, err = r.FromBuffer(buf.Bytes()) Roaring bitmaps
  22. Casting a slice is tricky func byteSliceAsUint16Slice(slice []byte) (result []uint16)

    { // here we create a new slice holder if len(slice)%2 != 0 { panic("Slice size should be divisible by 2") } // reference: https://go101.org/article/unsafe.html // make a new slice header bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice)) rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result)) // transfer the data from the given slice to a new variable (our result) rHeader.Data = bHeader.Data rHeader.Len = bHeader.Len / 2 rHeader.Cap = bHeader.Cap / 2 // instantiate result and use KeepAlive so data isn't unmapped. runtime.KeepAlive(&slice) // it is still crucial, GC can free it) // return result return } Roaring bitmaps
  23. Batched Iterations buf := make([]uint32, 4096) ... for n :=

    it.NextMany(buf); n != 0; n = it.NextMany(buf) { for _, v := range buf[:n] { ... } } Roaring bitmaps
  24. BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run - BenchmarkRealDataNext/census1881-4 8479939 ns/op

    BenchmarkRealDataNextMany/census1881-4 1057743 ns/op Batched iterators can be 8 times faster! Roaring bitmaps
  25. To learn more... Blog (twice a week) : https://lemire.me/blog/ GitHub:

    https://github.com/lemire Home page : https://lemire.me/en/ CRSNG : Faster Compressed Indexes On Next-Generation Hardware (2017-2022) Twitter @lemire Roaring bitmaps