Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probabilistic Data Structures
Search
cnu
September 03, 2017
Programming
0
650
Probabilistic Data Structures
Learn how to use Probabilistic Data Structures and modules in Redis v4 to analyse logs.
cnu
September 03, 2017
Tweet
Share
More Decks by cnu
See All by cnu
Redisconf 2018: Probabilistic Data Structures
cnu
1
980
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1.1k
AWS Lambda - Pycon India 2016
cnu
0
520
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
Pythonスレッドとは結局何なのか? CPython実装から見るNoGIL時代の変化
curekoshimizu
5
1.4k
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
3.1k
Serena MCPのすすめ
wadakatu
4
900
CSC305 Lecture 02
javiergs
PRO
1
260
iOSエンジニア向けの英語学習アプリを作る!
yukawashouhei
0
180
どの様にAIエージェントと 協業すべきだったのか?
takefumiyoshii
2
610
iOSエンジニア向けの英語学習アプリを作る!
yukawashouhei
0
180
「ちょっと古いから」って避けてた技術書、今だからこそ読もう
mottyzzz
1
240
XP, Testing and ninja testing ZOZ5
m_seki
3
320
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1k
詳しくない分野でのVibe Codingで困ったことと学び/vibe-coding-in-unfamiliar-area
shibayu36
3
4.5k
プロダクト開発をAI 1stに変革する〜SaaS is dead時代で生き残るために〜 / AI 1st Product Development
kobakei
0
500
Featured
See All Featured
Code Review Best Practice
trishagee
72
19k
Documentation Writing (for coders)
carmenintech
75
5k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.2k
Visualization
eitanlees
148
16k
Site-Speed That Sticks
csswizardry
11
880
Thoughts on Productivity
jonyablonski
70
4.9k
Designing for Performance
lara
610
69k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
960
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Transcript
Probabilistic Data Structures in Redis
Srinivasan Rangarajan Head of Engineering
Srinivasan Rangarajan @cnu https://cnu.name
Agenda • Log Analysis • Redis V4 • Probabilistic Data
Structures
Log Analysis
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements
Cost Accuracy Scale
None
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
None
Redis Version 4 • Module system • Better Replication •
Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR
Modules mikicon NounProject
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute a custom command
Probabilistic Data Structures
There are three kinds of people in the world. 1.
Those who can count. 2. Those who can’t count.
There are three kinds of people in the world. data
structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.
None
Advantage: Huge Memory Savings
3 Data Structures
HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75
Count Unique Visitor / hour
Merge Hourly into Daily
TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk
Top k IP Addresses
None
CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch
User Pageview counter
Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom
Bloom Filters False Positives False Negatives
User Session checking
~3 Data Structures • HyperLogLog • TopK • CountMinSketch •
BloomFilter
Thank you
Follow me @cnu