Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probabilistic Data Structures
Search
cnu
September 03, 2017
Programming
0
660
Probabilistic Data Structures
Learn how to use Probabilistic Data Structures and modules in Redis v4 to analyse logs.
cnu
September 03, 2017
Tweet
Share
More Decks by cnu
See All by cnu
Redisconf 2018: Probabilistic Data Structures
cnu
1
1k
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1.1k
AWS Lambda - Pycon India 2016
cnu
0
520
ZeroMQ - PyCon India 2013
cnu
2
1.6k
Other Decks in Programming
See All in Programming
CloudflareのSandbox SDKを試してみた
syumai
0
170
Developing Specifications - Jakarta EE: a Real World Example
ivargrimstad
0
140
イベントストーミングのはじめかた / Getting Started with Event Storming
nrslib
1
650
PyCon mini 東海 2025「個人ではじめるマルチAIエージェント入門 〜LangChain × LangGraphでアイデアを形にするステップ〜」
komofr
3
1.1k
無秩序からの脱却 / Emergence from chaos
nrslib
1
5.6k
開発生産性が組織文化になるまでの軌跡
tonegawa07
0
180
Nitro v3
kazupon
2
320
自動テストのアーキテクチャとその理由ー大規模ゲーム開発の場合ー
segadevtech
2
1k
Eloquentを使ってどこまでコードの治安を保てるのか?を新人が考察してみた
itokoh0405
0
3.2k
モデル駆動設計をやってみよう Modeling Forum2025ワークショップ/Let’s Try Model-Driven Design
haru860
0
170
AI駆動開発ライフサイクル(AI-DLC)のホワイトペーパーを解説
swxhariu5
0
1.2k
Patterns of Patterns (and why we need them)
denyspoltorak
0
110
Featured
See All Featured
Rebuilding a faster, lazier Slack
samanthasiow
84
9.3k
A better future with KSS
kneath
239
18k
GitHub's CSS Performance
jonrohan
1032
470k
Being A Developer After 40
akosma
91
590k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
How to train your dragon (web standard)
notwaldorf
97
6.4k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
YesSQL, Process and Tooling at Scale
rocio
174
15k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Facilitating Awesome Meetings
lara
57
6.6k
Scaling GitHub
holman
463
140k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Transcript
Probabilistic Data Structures in Redis
Srinivasan Rangarajan Head of Engineering
Srinivasan Rangarajan @cnu https://cnu.name
Agenda • Log Analysis • Redis V4 • Probabilistic Data
Structures
Log Analysis
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements
Cost Accuracy Scale
None
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
None
Redis Version 4 • Module system • Better Replication •
Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR
Modules mikicon NounProject
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute a custom command
Probabilistic Data Structures
There are three kinds of people in the world. 1.
Those who can count. 2. Those who can’t count.
There are three kinds of people in the world. data
structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.
None
Advantage: Huge Memory Savings
3 Data Structures
HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75
Count Unique Visitor / hour
Merge Hourly into Daily
TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk
Top k IP Addresses
None
CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch
User Pageview counter
Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom
Bloom Filters False Positives False Negatives
User Session checking
~3 Data Structures • HyperLogLog • TopK • CountMinSketch •
BloomFilter
Thank you
Follow me @cnu