Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probabilistic Data Structures
Search
cnu
September 03, 2017
Programming
0
630
Probabilistic Data Structures
Learn how to use Probabilistic Data Structures and modules in Redis v4 to analyse logs.
cnu
September 03, 2017
Tweet
Share
More Decks by cnu
See All by cnu
Redisconf 2018: Probabilistic Data Structures
cnu
1
970
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1k
AWS Lambda - Pycon India 2016
cnu
0
500
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
チームで開発し事業を加速するための"良い"設計の考え方 @ サポーターズCoLab 2025-07-08
agatan
0
180
Hypervel - A Coroutine Framework for Laravel Artisans
albertcht
1
110
新メンバーも今日から大活躍!SREが支えるスケールし続ける組織のオンボーディング
honmarkhunt
3
5.9k
チームのテスト力を総合的に鍛えて品質、スピード、レジリエンスを共立させる/Testing approach that improves quality, speed, and resilience
goyoki
3
420
PostgreSQLのRow Level SecurityをPHPのORMで扱う Eloquent vs Doctrine #phpcon #track2
77web
2
500
都市をデータで見るってこういうこと PLATEAU属性情報入門
nokonoko1203
1
600
C++20 射影変換
faithandbrave
0
570
スタートアップの急成長を支えるプラットフォームエンジニアリングと組織戦略
sutochin26
0
4.1k
GraphRAGの仕組みまるわかり
tosuri13
8
530
Is Xcode slowly dying out in 2025?
uetyo
1
260
明示と暗黙 ー PHPとGoの インターフェイスの違いを知る
shimabox
2
480
ruby.wasmで多人数リアルタイム通信ゲームを作ろう
lnit
3
440
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.8k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Bash Introduction
62gerente
614
210k
RailsConf 2023
tenderlove
30
1.1k
How to train your dragon (web standard)
notwaldorf
94
6.1k
Gamification - CAS2011
davidbonilla
81
5.3k
Designing for Performance
lara
610
69k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
Embracing the Ebb and Flow
colly
86
4.7k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Transcript
Probabilistic Data Structures in Redis
Srinivasan Rangarajan Head of Engineering
Srinivasan Rangarajan @cnu https://cnu.name
Agenda • Log Analysis • Redis V4 • Probabilistic Data
Structures
Log Analysis
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements
Cost Accuracy Scale
None
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
None
Redis Version 4 • Module system • Better Replication •
Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR
Modules mikicon NounProject
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute a custom command
Probabilistic Data Structures
There are three kinds of people in the world. 1.
Those who can count. 2. Those who can’t count.
There are three kinds of people in the world. data
structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.
None
Advantage: Huge Memory Savings
3 Data Structures
HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75
Count Unique Visitor / hour
Merge Hourly into Daily
TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk
Top k IP Addresses
None
CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch
User Pageview counter
Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom
Bloom Filters False Positives False Negatives
User Session checking
~3 Data Structures • HyperLogLog • TopK • CountMinSketch •
BloomFilter
Thank you
Follow me @cnu