Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Redisconf 2018: Probabilistic Data Structures
Search
cnu
April 25, 2018
Programming
1
970
Redisconf 2018: Probabilistic Data Structures
Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.
cnu
April 25, 2018
Tweet
Share
More Decks by cnu
See All by cnu
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1k
Probabilistic Data Structures
cnu
0
630
AWS Lambda - Pycon India 2016
cnu
0
500
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
코딩 에이전트 체크리스트: Claude Code ver.
nacyot
0
870
Python型ヒント完全ガイド 初心者でも分かる、現代的で実践的な使い方
mickey_kubo
1
180
チームのテスト力を総合的に鍛えて品質、スピード、レジリエンスを共立させる/Testing approach that improves quality, speed, and resilience
goyoki
5
1k
Composerが「依存解決」のためにどんな工夫をしているか #phpcon
o0h
PRO
1
330
AI時代の『改訂新版 良いコード/悪いコードで学ぶ設計入門』 / ai-good-code-bad-code
minodriven
22
9k
オンコール⼊⾨〜ページャーが鳴る前に、あなたが備えられること〜 / Before The Pager Rings
yktakaha4
1
770
GitHub Copilot and GitHub Codespaces Hands-on
ymd65536
2
150
MCPを使ってイベントソーシングのAIコーディングを効率化する / Streamlining Event Sourcing AI Coding with MCP
tomohisa
0
160
Railsアプリケーションと パフォーマンスチューニング ー 秒間5万リクエストの モバイルオーダーシステムを支える事例 ー Rubyセミナー 大阪
falcon8823
5
1.4k
NPOでのDevinの活用
codeforeveryone
0
870
20250704_教育事業におけるアジャイルなデータ基盤構築
hanon52_
5
910
顧客の画像データをテラバイト単位で配信する 画像サーバを WebP にした際に起こった課題と その対応策 ~継続的な取り組みを添えて~
takutakahashi
1
310
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
How to train your dragon (web standard)
notwaldorf
96
6.1k
Fireside Chat
paigeccino
37
3.5k
The World Runs on Bad Software
bkeepers
PRO
69
11k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Designing Experiences People Love
moore
142
24k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
How to Think Like a Performance Engineer
csswizardry
25
1.7k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Scaling GitHub
holman
460
140k
Transcript
Probabilistic Data Structures in Redis Srinivasan Rangarajan @cnu
Srinivasan Rangarajan •
[email protected]
• @cnu • https://cnu.name
Log Analysis
User Events Kinesis Firehose ELK
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
None
User Events Kinesis Firehose ELK AWS Lambda Redis
Cost Accuracy Scale
Probabilistic Data Structures
xkcd/1132
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute custom commands >>> import redis >>> r = redis.Redis()
>>> out = r.execute_command('CMD param1 param2')
Data Structures • HyperLogLog • TopK • CountMinSketch • Bloom
Filters
HyperLogLog Count the Cardinality of a Set
Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>
r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1
>>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
Links • https://redis.io/commands#hyperloglog • http://antirez.com/news/75
TopK Get top K elements in a set
Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>
r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0
0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
Links • https://github.com/RedisLabsModules/topk
CountMinSketch Count the frequency of items
1 2 3 4 h1 0 0 0 0 h2
0 0 0 0 h3 0 0 0 0
1 2 3 4 h1 1 0 0 0 h2
0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
1 2 3 4 h1 1 0 0 1 h2
0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
1 2 3 4 h1 2 1 1 1 h2
0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3
789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
Merge Counters >>> r.execute_command('CMS.MERGE u:pv:201708 3 u:pv:20170829 u:pv:20170830 u:pv:20170831') 'OK'
Links • https://github.com/RedisLabsModules/countminsketch • https://redislabs.com/blog/count-min-sketch-the-art-and-science- of-estimating-stuff/
Bloom Filters Test Membership in a Set
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 Empty Bit Array
0 0 1 0 0 1 0 0 1 0
0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
Bloom Filter returns What it means False Definitely not in
the set True Maybe in the set
Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,
1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L
Links • https://github.com/RedisLabsModules/rebloom • https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/ • https://github.com/kristoff-it/redis-cuckoofilter - Better than
bloom filters
“An 80% solution today is much better than an 100%
solution tomorrow.”
Thank You https://cnu.name/talks/redisconf-2018/