Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Redisconf 2018: Probabilistic Data Structures
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
cnu
April 25, 2018
Programming
1
1k
Redisconf 2018: Probabilistic Data Structures
Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.
cnu
April 25, 2018
Tweet
Share
More Decks by cnu
See All by cnu
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1.1k
Probabilistic Data Structures
cnu
0
690
AWS Lambda - Pycon India 2016
cnu
0
570
ZeroMQ - PyCon India 2013
cnu
2
1.6k
Other Decks in Programming
See All in Programming
ふつうのRubyist、ちいさなデバイス、大きな一年 / Ordinary Rubyists, Tiny Devices, Big Year
chobishiba
1
480
What Spring Developers Should Know About Jakarta EE
ivargrimstad
0
480
Claude Code Skill入門
mayahoney
0
410
コードレビューをしない選択 #でぃーぷらすトウキョウ
kajitack
3
1k
守る「だけ」の優しいEMを抜けて、 事業とチームを両方見る視点を身につけた話
maroon8021
3
1.1k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
840
「効かない!」依存性注入(DI)を活用したAPI Platformのエラーハンドリング奮闘記 / "It’s Not Working!" A Struggle with Error Handling in API Platform using DI
mkmk884
0
110
OTP を自動で入力する裏技
megabitsenmzq
0
120
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
190
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
600
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
820
go directiveを最新にしすぎないで欲しい話──あるいは、Go 1.26からgo mod initで作られるgo directiveの値が変わる話 / Go 1.26 リリースパーティ
arthur1
2
570
Featured
See All Featured
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
410
What's in a price? How to price your products and services
michaelherold
247
13k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
150
The Pragmatic Product Professional
lauravandoore
37
7.2k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.5k
Mobile First: as difficult as doing things right
swwweet
225
10k
Why Our Code Smells
bkeepers
PRO
340
58k
Transcript
Probabilistic Data Structures in Redis Srinivasan Rangarajan @cnu
Srinivasan Rangarajan •
[email protected]
• @cnu • https://cnu.name
Log Analysis
User Events Kinesis Firehose ELK
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
None
User Events Kinesis Firehose ELK AWS Lambda Redis
Cost Accuracy Scale
Probabilistic Data Structures
xkcd/1132
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute custom commands >>> import redis >>> r = redis.Redis()
>>> out = r.execute_command('CMD param1 param2')
Data Structures • HyperLogLog • TopK • CountMinSketch • Bloom
Filters
HyperLogLog Count the Cardinality of a Set
Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>
r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1
>>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
Links • https://redis.io/commands#hyperloglog • http://antirez.com/news/75
TopK Get top K elements in a set
Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>
r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0
0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
Links • https://github.com/RedisLabsModules/topk
CountMinSketch Count the frequency of items
1 2 3 4 h1 0 0 0 0 h2
0 0 0 0 h3 0 0 0 0
1 2 3 4 h1 1 0 0 0 h2
0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
1 2 3 4 h1 1 0 0 1 h2
0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
1 2 3 4 h1 2 1 1 1 h2
0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3
789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
Merge Counters >>> r.execute_command('CMS.MERGE u:pv:201708 3 u:pv:20170829 u:pv:20170830 u:pv:20170831') 'OK'
Links • https://github.com/RedisLabsModules/countminsketch • https://redislabs.com/blog/count-min-sketch-the-art-and-science- of-estimating-stuff/
Bloom Filters Test Membership in a Set
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 Empty Bit Array
0 0 1 0 0 1 0 0 1 0
0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
Bloom Filter returns What it means False Definitely not in
the set True Maybe in the set
Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,
1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L
Links • https://github.com/RedisLabsModules/rebloom • https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/ • https://github.com/kristoff-it/redis-cuckoofilter - Better than
bloom filters
“An 80% solution today is much better than an 100%
solution tomorrow.”
Thank You https://cnu.name/talks/redisconf-2018/