Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probabilistic Data Structures
Search
cnu
September 03, 2017
Programming
0
620
Probabilistic Data Structures
Learn how to use Probabilistic Data Structures and modules in Redis v4 to analyse logs.
cnu
September 03, 2017
Tweet
Share
More Decks by cnu
See All by cnu
Redisconf 2018: Probabilistic Data Structures
cnu
1
970
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1k
AWS Lambda - Pycon India 2016
cnu
0
500
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
イベントストーミング図からコードへの変換手順 / Procedure for Converting Event Storming Diagrams to Code
nrslib
2
590
童醫院敏捷轉型的實踐經驗
cclai999
0
210
なぜ適用するか、移行して理解するClean Architecture 〜構造を超えて設計を継承する〜 / Why Apply, Migrate and Understand Clean Architecture - Inherit Design Beyond Structure
seike460
PRO
3
730
git worktree × Claude Code × MCP ~生成AI時代の並列開発フロー~
hisuzuya
1
520
NPOでのDevinの活用
codeforeveryone
0
720
AIコーディング道場勉強会#2 君(エンジニア)たちはどう生きるか
misakiotb
1
280
エラーって何種類あるの?
kajitack
5
340
GraphRAGの仕組みまるわかり
tosuri13
8
520
ruby.wasmで多人数リアルタイム通信ゲームを作ろう
lnit
3
360
#kanrk08 / 公開版 PicoRubyとマイコンでの自作トレーニング計測装置を用いたワークアウトの理想と現実
bash0c7
1
670
WindowInsetsだってテストしたい
ryunen344
1
230
CursorはMCPを使った方が良いぞ
taigakono
1
220
Featured
See All Featured
The Language of Interfaces
destraynor
158
25k
Typedesign – Prime Four
hannesfritz
42
2.7k
How STYLIGHT went responsive
nonsquared
100
5.6k
Gamification - CAS2011
davidbonilla
81
5.3k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Side Projects
sachag
455
42k
Code Review Best Practice
trishagee
69
18k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
680
Transcript
Probabilistic Data Structures in Redis
Srinivasan Rangarajan Head of Engineering
Srinivasan Rangarajan @cnu https://cnu.name
Agenda • Log Analysis • Redis V4 • Probabilistic Data
Structures
Log Analysis
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements
Cost Accuracy Scale
None
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
None
Redis Version 4 • Module system • Better Replication •
Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR
Modules mikicon NounProject
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute a custom command
Probabilistic Data Structures
There are three kinds of people in the world. 1.
Those who can count. 2. Those who can’t count.
There are three kinds of people in the world. data
structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.
None
Advantage: Huge Memory Savings
3 Data Structures
HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75
Count Unique Visitor / hour
Merge Hourly into Daily
TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk
Top k IP Addresses
None
CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch
User Pageview counter
Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom
Bloom Filters False Positives False Negatives
User Session checking
~3 Data Structures • HyperLogLog • TopK • CountMinSketch •
BloomFilter
Thank you
Follow me @cnu