Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Roaring Bitmaps at the AI Paper Reading Club

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Roaring Bitmaps at the AI Paper Reading Club

Presented May 4, 2026 at Frontier Tower by Arto Bendiken

I will be presenting the Roaring Bitmaps papers and giving a sneak peek on how ASIMOV uses them to power complex on-device queries on large social graphs!

​Compressed bitmaps are a core data structure behind databases, search engines, and analytics systems, helping represent large sets of IDs efficiently while supporting fast operations like union, intersection, and filtering.

https://luma.com/frontier-tower-ai-paper-reading-club-wee-37d6

Avatar for Arto Bendiken

Arto Bendiken PRO

May 04, 2026

More Decks by Arto Bendiken

Other Decks in Programming

Transcript

  1. Roaring Bitmaps Presented May 4, 2026 at Frontier Tower by

    Arto Bendiken (ar.to) AI Paper Reading Club
  2. Table of Contents 1. Table of Contents 2. Your Presenter

    3. What are Roaring Bitmaps? 4. Who’s Using Roaring Bitmaps? 5. Today’s Papers 6. Daniel Lemire 7. 2014 February 8. 2016 March 9. 2017 September 10. Bonus 11. Software Libraries 12. Rust Library 13. Bitsets 14. Data Structure 15. Bitset Operations 16. Set Intersection (AND) 17. Set Union (OR) Clone the repository to follow along:
  3. Your Presenter ar.to, @artob on GitHub, @bendiken on X Autodidact,

    cypherpunk, entrepreneur Coding since 1993, full time since 1997 Created The Unlicense, used at last count by 3%+ of all GitHub repositories Built the first graph database-as-a-service, years before Neo4j’s (Dydra, 2010) Built OSINT systems for the US Navy Built data warehouses for S&P 500 and the European Space Agency Built tactical software for drones Led the EVM team at NEAR Protocol Featured in exhibit 270 in the Silk Road trial Raised $15M+ total so far in four startups
  4. What are Roaring Bitmaps? (1/2) Crazy Fast - Outperforms traditional

    RLE-based formats (WAH, EWAH, etc) by up to three orders of magnitude for set intersections and unions. High Compression - Significantly smaller memory footprint compared to ordinary uncompressed bitsets; and this without sacrificing processing speed. Hybrid Indexing - Partitions 32-bit integers into a two-level hierarchy (16-bit high keys and 16-bit low values) for CPU cache locality. (64-bit integers supported as an extension.) Adaptive Storage - Dynamically switches chunk representations under the hood between an integer array (for sparse chunks), a bitmap (for dense chunks), and a run (for continuous segments) based on local cardinality. An optimized, compressed bitset structure designed for performance, scale, and memory efficiency.
  5. What are Roaring Bitmaps? (2/2) O(1) Random Access - Enables

    constant-time membership testing without needing to linearly scan and decompress the entire bitset as previous RLE-based approaches had to do. SIMD Accelerated - Leverages hardware vector instructions (AVX on x86, NEON on Apple Silicon and ARM, etc) natively to vectorize bitwise operations across containers. Proven Origin - Introduced in 2014 by Lemire et al. in the landmark paper "Better bitmap performance with Roaring bitmaps.", cited hundreds of times Ubiquitous Ecosystem - Adopted as the standard underlying indexing structure for open source (Apache Spark, Elasticsearch, Apache Druid, and Lucene) and Big Tech both. Find more information at roaringbitmap.org and implementations at github.com/RoaringBitmap An optimized, compressed bitset structure designed for performance, scale, and memory efficiency.
  6. Who’s Using Roaring Bitmaps? Time-Series & Telemetry: Netflix relies heavily

    on RBs within Atlas, their in-memory dimensional time- series telemetry system. Search Engines & Code Search: Sourcegraph uses RBs extensively for blazing-fast code search indexing. Apache Lucene (Elasticsearch, Solr) relies on them for inverted indexing. Google and YouTube deploy RBs in the massive-scale SQL query engine (Procella) handling hundreds of billions of queries per day. OLAP Databases & Data Warehouses: Real-time analytics databases such as ClickHouse, Apache Druid, Apache Pinot, Apache Doris, and StarRocks use RBs to accelerate drill-down queries. Big Data Ecosystems: Frameworks like Apache Spark, Apache Hive, and Apache Tez use RBs for distributed data processing and fast computations. AdTech & Analytics Platforms: Companies like Quantcast maintain hundreds of billions of distinct bitmaps for audience modeling (Ara platform), while Alibaba uses RBs for real-time user profiling (Hologres). Roaring bitmaps have silently become a standard infrastructure for modern databases and data analytics.
  7. Who’s Using Roaring Bitmaps? Alibaba, Amazon Web Services, Apache CarbonData,

    Apache Doris, Apache Druid, Apache Hive, Apache Hivemall, Apache Kylin, Apache Lucene, Apache Pinot, Apache Spark, Apache Tez, Apache Zeppelin, ASIMOV, Bleve, Bluesky, Circonus, ClickHouse, Cloud Torrent, Datadog, Disney, Doist, eBay, Elastic, Elasticell, FrostDB, Gaffer, Google, Grab, Huawei, Husky, InfluxData, Intel, Jive Software, King Digital Entertainment, LinDB, LinkedIn, M3, Metamarkets, Microsoft, Netflix, OpenSearchServer, Pilosa, Quantcast, Redpanda, Roku, SEEK Group, Solr, Sourcegraph, StarRocks, Tablesaw, Tencent, Trident, Weaviate, Whoosh, Yandex, YouTube, Zomato, etc, etc, … A decade into adoption, the question is really, who isn’t already using Roaring Bitmaps?
  8. Today’s Papers Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri,

    Chris O’Hara, François Saint-Jacques, and Gregory Ssi-Yan-Kai. 2018. Roaring Bitmaps: Implementation of an Optimized Software Library. Software: Practice and Experience 48, 4 (April 2018). arXiv:1709.07821 Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016. Consistently Faster and Smaller Compressed Bitmaps with Roaring. Software: Practice and Experience 46, 11 (November 2016), 1547–1569. arXiv:1603.06549 Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016. Better Bitmap Performance with Roaring Bitmaps. Software: Practice and Experience 46, 5 (May 2016), 709–719. arXiv:1402.6407 Ulrich Drepper. 2007. What Every Programmer Should Know About Memory. Red Hat, Inc. (November 2007). PDFakkadia.org/drepper/cpumemory.pdf (Bonus paper!) Find more related publications at roaringbitmap.org/publications/
  9. Daniel Lemire Lemire (@lemire) ranks in the top 2% of

    scientists worldwide according to the Stanford University/Elsevier’s 2025 ranking. He is also editor of the journal Software: Practice and Experience, a long-established (1971) journal where many crucial results were published (e.g., articles by Knuth and Bentley). Lemire’s blog is one of the top 50 most popular blogs on Hacker News. He wrote several books. He serves on the program committees of leading computer science conferences. He is among the 0.0006% most followed programmers in the world on GitHub; GitHub has over 100 million developers. h-index 35 (Google Scholar)
  10. 2014 February Samy Chambi, Daniel Lemire, Owen Kaser, and Robert

    Godin. 2016. Better Bitmap Performance with Roaring Bitmaps. Software: Practice and Experience 46, 5 (May 2016), 709– 719. arXiv:1402.6407 Cited 300 times (Google Scholar)
  11. 2016 March Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016.

    Consistently Faster and Smaller Compressed Bitmaps with Roaring. Software: Practice and Experience 46, 11 (November 2016), 1547–1569. arXiv:1603.06549 Cited 91 times (Google Scholar)
  12. 2017 September Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri,

    Chris O’Hara, François Saint-Jacques, and Gregory Ssi-Yan-Kai. 2018. Roaring Bitmaps: Implementation of an Optimized Software Library. Software: Practice and Experience 48, 4 (April 2018). arXiv:1709.07821 arXiv:1709.07821 Cited 96 times (Google Scholar)
  13. Bonus Ulrich Drepper. 2007. What Every Programmer Should Know About

    Memory. Red Hat, Inc. (November 2007). akkadia.org/drepper/cpumemory.pdf Cited 554 times (Google Scholar)
  14. Software Libraries C/C++ github.com/RoaringBitmap/CRoaring Go github.com/RoaringBitmap/roaring Java github.com/RoaringBitmap/RoaringBitmap Node github.com/SalvatorePreviti/roaring-node

    Python github.com/Ezibenroc/PyRoaringBitMap Rust github.com/RoaringBitmap/roaring-rs Swift github.com/RoaringBitmap/SwiftRoaring Zig github.com/jwhear/roaring-zig
  15. Bitsets Integer Set The sequence of prime numbers starts: Bitset

    (32-bit Word) The same numbers represented in a 32-bit bitset (LSB to the left, MSB to the right): Note that the least-significant bit (LSB) on the left here represents 0, not 1. This is to distinguish between an empty set vis-a-vis a set containing the number zero. (Also known as a bitmap, bit map, bit string, bit vector, or bit array)
  16. Bitset Operations (1/2) Bitwise Operation Set Operation Description OR Set

    Union Keeps elements present in A, B, or both. Example: AND Set Intersection Keeps only elements present in both A and B. Example: ANDNOT Set Difference (aka Relative Complement) Keeps elements in A, but removes any that are also in B. Example: XOR Symmetric Difference Keeps elements in A or B, but not both. Logical operations including OR, AND, ANDNOT, and XOR.
  17. Bitset Operations (2/2) Set Intersection (AND) Set Union (OR) Logical

    operations including OR, AND, ANDNOT, and XOR.