Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Redisconf 2018: Probabilistic Data Structures
Search
cnu
April 25, 2018
Programming
1
970
Redisconf 2018: Probabilistic Data Structures
Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.
cnu
April 25, 2018
Tweet
Share
More Decks by cnu
See All by cnu
The Rocky Road from Monolithic to Microservices Architecture
cnu
0
1k
Probabilistic Data Structures
cnu
0
630
AWS Lambda - Pycon India 2016
cnu
0
500
ZeroMQ - PyCon India 2013
cnu
2
1.5k
Other Decks in Programming
See All in Programming
なぜ適用するか、移行して理解するClean Architecture 〜構造を超えて設計を継承する〜 / Why Apply, Migrate and Understand Clean Architecture - Inherit Design Beyond Structure
seike460
PRO
3
780
MCPを使ってイベントソーシングのAIコーディングを効率化する / Streamlining Event Sourcing AI Coding with MCP
tomohisa
0
150
The Niche of CDK Grant オブジェクトって何者?/the-niche-of-cdk-what-isgrant-object
hassaku63
1
440
ペアプロ × 生成AI 現場での実践と課題について / generative-ai-in-pair-programming
codmoninc
2
20k
テスト駆動Kaggle
isax1015
1
490
おやつのお供はお決まりですか?@WWDC25 Recap -Japan-\(region).swift
shingangan
0
140
Goで作る、開発・CI環境
sin392
0
240
The Modern View Layer Rails Deserves: A Vision For 2025 And Beyond @ RailsConf 2025, Philadelphia, PA
marcoroth
2
650
Azure AI Foundryではじめてのマルチエージェントワークフロー
seosoft
0
190
PicoRuby on Rails
makicamel
2
140
スタートアップの急成長を支えるプラットフォームエンジニアリングと組織戦略
sutochin26
1
6.6k
はじめてのWeb API体験 ー 飲食店検索アプリを作ろうー
akinko_0915
0
110
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
830
How GitHub (no longer) Works
holman
314
140k
Typedesign – Prime Four
hannesfritz
42
2.7k
Thoughts on Productivity
jonyablonski
69
4.7k
Writing Fast Ruby
sferik
628
62k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Transcript
Probabilistic Data Structures in Redis Srinivasan Rangarajan @cnu
Srinivasan Rangarajan •
[email protected]
• @cnu • https://cnu.name
Log Analysis
User Events Kinesis Firehose ELK
Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",
"email": "
[email protected]
", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
Challenges • 100s of Millions of events processed every day
• Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
None
User Events Kinesis Firehose ELK AWS Lambda Redis
Cost Accuracy Scale
Probabilistic Data Structures
xkcd/1132
Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf loadmodule /path/to/module.so
• MODULE LOAD /path/to/module.so
Execute custom commands >>> import redis >>> r = redis.Redis()
>>> out = r.execute_command('CMD param1 param2')
Data Structures • HyperLogLog • TopK • CountMinSketch • Bloom
Filters
HyperLogLog Count the Cardinality of a Set
Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>
r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1
>>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
Links • https://redis.io/commands#hyperloglog • http://antirez.com/news/75
TopK Get top K elements in a set
Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>
r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0
0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
Links • https://github.com/RedisLabsModules/topk
CountMinSketch Count the frequency of items
1 2 3 4 h1 0 0 0 0 h2
0 0 0 0 h3 0 0 0 0
1 2 3 4 h1 1 0 0 0 h2
0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
1 2 3 4 h1 1 0 0 1 h2
0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
1 2 3 4 h1 2 1 1 1 h2
0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3
789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
Merge Counters >>> r.execute_command('CMS.MERGE u:pv:201708 3 u:pv:20170829 u:pv:20170830 u:pv:20170831') 'OK'
Links • https://github.com/RedisLabsModules/countminsketch • https://redislabs.com/blog/count-min-sketch-the-art-and-science- of-estimating-stuff/
Bloom Filters Test Membership in a Set
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 Empty Bit Array
0 0 1 0 0 1 0 0 1 0
0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
0 0 1 0 0 1 0 1 1 0
1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
Bloom Filter returns What it means False Definitely not in
the set True Maybe in the set
Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,
1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L
Links • https://github.com/RedisLabsModules/rebloom • https://redislabs.com/blog/rebloom-bloom-filter-datatype-redis/ • https://github.com/kristoff-it/redis-cuckoofilter - Better than
bloom filters
“An 80% solution today is much better than an 100%
solution tomorrow.”
Thank You https://cnu.name/talks/redisconf-2018/