$30 off During Our Annual Pro Sale. View Details »

Fast indexes with roaring #gomtl-10

Daniel Lemire
November 14, 2019

Fast indexes with roaring #gomtl-10

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).

Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.

Daniel Lemire

November 14, 2019
Tweet

More Decks by Daniel Lemire

Other Decks in Technology

Transcript

  1. Fast indexes with roaring
    Daniel Lemire and collaborators
    blog: https://lemire.me
    twitter: @lemire
    Université du Québec (TÉLUQ)
    Montreal
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  2. The roaring Go library is used by
    Cloud Torrent
    runv
    InfluxDB
    Pilosa
    Bleve
    lindb
    Elasticell
    SourceGraph
    M3
    trident
    Part of the Awesome Go collection.
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  3. Sets
    A fundamental concept (sets of documents, identifiers, tuples...)
    For performance, we often work with sets of integers (identifiers).

    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  4. tests : ?
    intersections : , unions : , differences :
    Similarity (Jaccard/Tanimoto):
    Iteration
    x ∈ S
    S ∩
    2
    S1
    S ∪
    2
    S1
    S ∖
    2
    S1
    ∣S ∩
    1 S ∣/∣S ∪
    1 1 S ∣
    2
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  5. How to implement sets?
    hash tables ( map[int]bool{}
    )
    bitmap: willf/bitset
    compressed bitmaps
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  6. Hash tables
    in-order access is kind of terrible
    [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1 , 14, 10, 7]
    [15, 3, 0, 6, 11, 4, 5, 9, 12, 13, 8, 2 , 1, 14, 10, 7]
    [15, 3 , 0, 6, 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7]
    [15, 3, 0, 6, 11, 4 , 5, 9, 12, 13, 8, 2, 1, 14, 10, 7]
    [15, 3, 0, 6, 11, 4, 5 , 9, 12, 13, 8, 2, 1, 14, 10, 7]
    [15, 3, 0, 6 , 11, 4, 5, 9, 12, 13, 8, 2, 1, 14, 10, 7]
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  7. Bitmaps
    Efficient way to represent sets of integers.
    For example, 0, 1, 3, 4 becomes 0b11011
    or "27".
    0b00001
    0b01001
    0b11001
    0b11011
    {0} →
    {0, 3} →
    {0, 3, 4} →
    {0, 1, 3, 4} →
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  8. Manipulate a bitmap
    64-bit processor.
    Given x
    , word index is x/64
    and bit index x % 64
    .
    add(x) {
    array[x / 64] |= (1 << (x % 64))
    }
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  9. How fast is it?
    index = x / 64 -> a shift
    mask = 1 << ( x % 64) -> a shift
    array[ index ] |- mask -> a OR with memory
    One bit every cycles because of superscalarity
    ≈ 1.65
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  10. Bit parallelism
    Intersection between {0, 1, 3} and {1, 3}
    a single AND operation
    between 0b1011
    and 0b1010
    .
    Result is 0b1010
    or {1, 3}.
    No branching!
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  11. Bitsets can take too much memory
    {1, 32000, 64000} : 1000 bytes for three values
    We use compression!
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  12. Git (GitHub) use EWAH
    Run-length encoding
    Example: est
    Code long runs of 0s or 1s efficiently.
    https://github.com/git/git/blob/master/ewah/bitmap.c
    000000001111111100
    00000000 − 11111111 − 00
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  13. Complexity
    Intersection : or
    In-place union ( ): or
    O(∣S ∣ +
    1 ∣S ∣)
    2 O(min(∣S ∣, ∣S ∣))
    1 2
    S ←
    2 S ∪
    1 S2 O(∣S ∣ +
    1 ∣S ∣)
    2 O(∣S ∣)
    2
    Fast indexes with roaring - Daniel Lemire #gomtl-10 November 19th.

    View Slide

  14. Roaring Bitmaps
    Java, C, Go, Swift, Python, Node/JavaScript, Rust, C#
    interoperable
    Roaring bitmaps

    View Slide

  15. Roaring bitmaps (http://roaringbitmap.org/) are found in:
    Apache Lucene and derivative systems such as Solr and Elasticsearch,
    Apache Druid,
    Apache Spark,
    Yandex ClickHouse,
    Netflix Atlas,
    LinkedIn Pinot,
    Whoosh,
    Microsoft Visual Studio Team Services (VSTS),
    Intel's Optimized Analytics Package (OAP),
    eBay's Apache Kylin,
    and many more!!!
    Roaring bitmaps

    View Slide

  16. Several papers
    Roaring Bitmaps: Implementation of an Optimized Software Library, Software:
    Practice and Experience 48 (4), April 2018.
    Better bitmap performance with Roaring bitmaps, Software: Practice and
    Experience 46 (5), May 2016.
    Consistently faster and smaller compressed bitmaps with Roaring, Software:
    Practice and Experience 46 (11), November 2016.
    Roaring bitmaps

    View Slide

  17. Hybrid model
    Set of containers
    sorted arrays ({1,20,144})
    bitset (0b10000101011)
    runs ([0,10],[15,20])
    Roaring bitmaps

    View Slide

  18. Roaring bitmaps

    View Slide

  19. Format specification
    See https://github.com/RoaringBitmap/RoaringFormatSpec
    Roaring bitmaps

    View Slide

  20. Roaring
    All containers are small (8 kB), fit in CPU cache
    We predict the output container type during computations
    E.g., when array gets too large, we switch to a bitset
    Union of two large arrays is materialized as a bitset...
    Dozens of heuristics... sorting networks and so on
    Roaring bitmaps

    View Slide

  21. Use Roaring for bitmap compression whenever possible. Do not use other bitmap
    compression methods (Wang et al., SIGMOD 2017)
    Roaring bitmaps

    View Slide

  22. Go issues
    Roaring bitmaps

    View Slide

  23. Go is shy about inlining
    Won't inline some small functions that contain a branch?
    func (b *BitSet) Set(i uint) *BitSet {
    b.extendSetMaybe(i)
    b.set[i>>log2WordSize] |= 1 << (i & (wordSize - 1))
    return b
    }
    https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/
    Roaring bitmaps

    View Slide

  24. Go guards too much
    bits.OnesCount64(x)
    Roaring bitmaps

    View Slide

  25. 0x1093534 0fb63d22810c00 MOVZX 0xc8122(IP), DI
    0x109353b 4084ff TESTL DI, DI
    0x109353e 7407 JE 0x1093547
    0x1093540 f3480fb8f6 POPCNT SI, SI
    0x1093545 ebd6 JMP 0x109351d
    0x1093547 4889442418 MOVQ AX, 0x18(SP)
    0x109354c 4889542410 MOVQ DX, 0x10(SP)
    0x1093551 48894c2420 MOVQ CX, 0x20(SP)
    0x1093556 48893424 MOVQ SI, 0(SP)
    0x109355a e801ffffff CALL math/bits.OnesCount64(SB)
    0x109355f 488b742408 MOVQ 0x8(SP), SI
    0x1093564 488b442418 MOVQ 0x18(SP), AX
    0x1093569 488b4c2420 MOVQ 0x20(SP), CX
    0x109356e 488b542410 MOVQ 0x10(SP), DX
    0x1093573 488b5c2440 MOVQ 0x40(SP), BX
    0x1093578 eba3 JMP 0x109351d
    Roaring bitmaps

    View Slide

  26. Thankfully assembly in Go is "easy"
    TEXT ·popcntOrSliceAsm(SB),4,$0-56
    XORQ AX, AX
    MOVQ s+0(FP), SI
    MOVQ s_len+8(FP), CX
    TESTQ CX, CX
    JZ popcntOrSliceEnd
    MOVQ m+24(FP), DI
    popcntOrSliceLoop:
    MOVQ (DI), DX
    ORQ (SI), DX
    POPCNTQ_DX_DX
    ADDQ DX, AX
    ADDQ $8, SI
    ADDQ $8, DI
    LOOP popcntOrSliceLoop
    popcntOrSliceEnd:
    MOVQ AX, ret+48(FP)
    RET
    Roaring bitmaps

    View Slide

  27. But may not work in the cloud.
    Roaring bitmaps

    View Slide

  28. Fast serialization
    buf := &bytes.Buffer{}
    _, err := rb.WriteTo(buf)
    Roaring bitmaps

    View Slide

  29. Fast deserialization
    No memory allocation, no copy!
    r := NewBitmap()
    _, err = r.FromBuffer(buf.Bytes())
    Roaring bitmaps

    View Slide

  30. Casting a slice is tricky
    func byteSliceAsUint16Slice(slice []byte) (result []uint16) { // here we create a new slice holder
    if len(slice)%2 != 0 {
    panic("Slice size should be divisible by 2")
    }
    // reference: https://go101.org/article/unsafe.html
    // make a new slice header
    bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
    rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
    // transfer the data from the given slice to a new variable (our result)
    rHeader.Data = bHeader.Data
    rHeader.Len = bHeader.Len / 2
    rHeader.Cap = bHeader.Cap / 2
    // instantiate result and use KeepAlive so data isn't unmapped.
    runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
    // return result
    return
    }
    Roaring bitmaps

    View Slide

  31. Iterators: don't drink from straws
    Roaring bitmaps

    View Slide

  32. Old School
    it := b.Iterator()
    for it.HasNext() {
    ...
    }
    Roaring bitmaps

    View Slide

  33. Batched Iterations
    buf := make([]uint32, 4096)
    ...
    for n := it.NextMany(buf); n != 0; n = it.NextMany(buf) {
    for _, v := range buf[:n] {
    ...
    }
    }
    Roaring bitmaps

    View Slide

  34. BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -
    BenchmarkRealDataNext/census1881-4 8479939 ns/op
    BenchmarkRealDataNextMany/census1881-4 1057743 ns/op
    Batched iterators can be 8 times faster!
    Roaring bitmaps

    View Slide

  35. To learn more...
    Blog (twice a week) : https://lemire.me/blog/
    GitHub: https://github.com/lemire
    Home page : https://lemire.me/en/
    CRSNG : Faster Compressed Indexes On Next-Generation Hardware (2017-2022)
    Twitter @lemire
    Roaring bitmaps

    View Slide