Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Redisconf 2018: Probabilistic Data Structures
Search
cnu
April 25, 2018
Programming
1
980
Redisconf 2018: Probabilistic Data Structures
Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.
cnu
April 25, 2018
Tweet
Share
More Decks by cnu
See All by cnu
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1.1k
Probabilistic Data Structures
cnu
0
640
AWS Lambda - Pycon India 2016
cnu
0
510
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
知っているようで知らない"rails new"の世界 / The World of "rails new" You Think You Know but Don't
luccafort
PRO
1
180
FindyにおけるTakumi活用と脆弱性管理のこれから
rvirus0817
0
520
Ruby Parser progress report 2025
yui_knk
1
450
為你自己學 Python - 冷知識篇
eddie
1
350
意外と簡単!?フロントエンドでパスキー認証を実現する WebAuthn
teamlab
PRO
2
770
テストカバレッジ100%を10年続けて得られた学びと品質
mottyzzz
2
610
請來的 AI Agent 同事們在寫程式時,怎麼用 pytest 去除各種幻想與盲點
keitheis
0
120
プロパティベーステストによるUIテスト: LLMによるプロパティ定義生成でエッジケースを捉える
tetta_pdnt
0
3.3k
CJK and Unicode From a PHP Committer
youkidearitai
PRO
0
110
Putting The Genie in the Bottle - A Crash Course on running LLMs on Android
iurysza
0
140
AI Coding Agentのセキュリティリスク:PRの自己承認とメルカリの対策
s3h
0
230
Reading Rails 1.0 Source Code
okuramasafumi
0
250
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
696
190k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Code Reviewing Like a Champion
maltzj
525
40k
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
The Pragmatic Product Professional
lauravandoore
36
6.9k
What's in a price? How to price your products and services
michaelherold
246
12k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Practical Orchestrator
shlominoach
190
11k
Being A Developer After 40
akosma
90
590k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
113
20k
Done Done
chrislema
185
16k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
Transcript
Probabilistic Data Structures in Redis Srinivasan Rangarajan @cnu
Srinivasan Rangarajan •
[email protected]
• @cnu • https://cnu.name
Log Analysis
User Events Kinesis Firehose ELK
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
None
User Events Kinesis Firehose ELK AWS Lambda Redis
Cost Accuracy Scale
Probabilistic Data Structures
xkcd/1132
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute custom commands >>> import redis >>> r = redis.Redis()
>>> out = r.execute_command('CMD param1 param2')
Data Structures • HyperLogLog • TopK • CountMinSketch • Bloom
Filters
HyperLogLog Count the Cardinality of a Set
Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>
r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1
>>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
Links • https://redis.io/commands#hyperloglog • http://antirez.com/news/75
TopK Get top K elements in a set
Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>
r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0
0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
Links • https://github.com/RedisLabsModules/topk
CountMinSketch Count the frequency of items
1 2 3 4 h1 0 0 0 0 h2
0 0 0 0 h3 0 0 0 0
1 2 3 4 h1 1 0 0 0 h2
0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
1 2 3 4 h1 1 0 0 1 h2
0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
1 2 3 4 h1 2 1 1 1 h2
0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3
789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
Merge Counters >>> r.execute_command('CMS.MERGE u:pv:201708 3 u:pv:20170829 u:pv:20170830 u:pv:20170831') 'OK'
Links • https://github.com/RedisLabsModules/countminsketch • https://redislabs.com/blog/count-min-sketch-the-art-and-science- of-estimating-stuff/
Bloom Filters Test Membership in a Set
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 Empty Bit Array
0 0 1 0 0 1 0 0 1 0
0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
Bloom Filter returns What it means False Definitely not in
the set True Maybe in the set
Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,
1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L
Links • https://github.com/RedisLabsModules/rebloom • https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/ • https://github.com/kristoff-it/redis-cuckoofilter - Better than
bloom filters
“An 80% solution today is much better than an 100%
solution tomorrow.”
Thank You https://cnu.name/talks/redisconf-2018/