Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Redisconf 2018: Probabilistic Data Structures
Search
cnu
April 25, 2018
Programming
1
970
Redisconf 2018: Probabilistic Data Structures
Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.
cnu
April 25, 2018
Tweet
Share
More Decks by cnu
See All by cnu
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1k
Probabilistic Data Structures
cnu
0
640
AWS Lambda - Pycon India 2016
cnu
0
510
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
[SRE NEXT] 複雑なシステムにおけるUser Journey SLOの導入
yakenji
1
890
Amazon Q CLI開発で学んだAIコーディングツールの使い方
licux
3
160
React 使いじゃなくても知っておきたい教養としての React
oukayuka
18
5.2k
コーディングエージェント概観(2025/07)
itsuki_t88
1
480
LLMは麻雀を知らなすぎるから俺が教育してやる
po3rin
3
1.9k
はじめてのWeb API体験 ー 飲食店検索アプリを作ろうー
akinko_0915
0
180
テスターからテストエンジニアへ ~新米テストエンジニアが歩んだ9ヶ月振り返り~
non0113
2
250
Bedrock AgentCore ObservabilityによるAIエージェントの運用
licux
8
550
AWS Summit Japan 2024と2025の比較/はじめてのKiro、今あなたは岐路に立つ
satoshi256kbyte
1
260
Streamlitで実現できるようになったこと、実現してくれたこと
ayumu_yamaguchi
2
260
Comparing decimals in Swift Testing
417_72ki
0
160
AIに安心して任せるためにTypeScriptで一意な型を作ろう
arfes0e2b3c
0
330
Featured
See All Featured
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
RailsConf 2023
tenderlove
30
1.2k
Adopting Sorbet at Scale
ufuk
77
9.5k
Become a Pro
speakerdeck
PRO
29
5.5k
A designer walks into a library…
pauljervisheath
207
24k
The Cult of Friendly URLs
andyhume
79
6.5k
Unsuck your backbone
ammeep
671
58k
Bash Introduction
62gerente
613
210k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.5k
How to Ace a Technical Interview
jacobian
278
23k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
How STYLIGHT went responsive
nonsquared
100
5.7k
Transcript
Probabilistic Data Structures in Redis Srinivasan Rangarajan @cnu
Srinivasan Rangarajan •
[email protected]
• @cnu • https://cnu.name
Log Analysis
User Events Kinesis Firehose ELK
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
None
User Events Kinesis Firehose ELK AWS Lambda Redis
Cost Accuracy Scale
Probabilistic Data Structures
xkcd/1132
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute custom commands >>> import redis >>> r = redis.Redis()
>>> out = r.execute_command('CMD param1 param2')
Data Structures • HyperLogLog • TopK • CountMinSketch • Bloom
Filters
HyperLogLog Count the Cardinality of a Set
Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>
r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1
>>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
Links • https://redis.io/commands#hyperloglog • http://antirez.com/news/75
TopK Get top K elements in a set
Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>
r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0
0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
Links • https://github.com/RedisLabsModules/topk
CountMinSketch Count the frequency of items
1 2 3 4 h1 0 0 0 0 h2
0 0 0 0 h3 0 0 0 0
1 2 3 4 h1 1 0 0 0 h2
0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
1 2 3 4 h1 1 0 0 1 h2
0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
1 2 3 4 h1 2 1 1 1 h2
0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3
789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
Merge Counters >>> r.execute_command('CMS.MERGE u:pv:201708 3 u:pv:20170829 u:pv:20170830 u:pv:20170831') 'OK'
Links • https://github.com/RedisLabsModules/countminsketch • https://redislabs.com/blog/count-min-sketch-the-art-and-science- of-estimating-stuff/
Bloom Filters Test Membership in a Set
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 Empty Bit Array
0 0 1 0 0 1 0 0 1 0
0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
Bloom Filter returns What it means False Definitely not in
the set True Maybe in the set
Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,
1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L
Links • https://github.com/RedisLabsModules/rebloom • https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/ • https://github.com/kristoff-it/redis-cuckoofilter - Better than
bloom filters
“An 80% solution today is much better than an 100%
solution tomorrow.”
Thank You https://cnu.name/talks/redisconf-2018/