Slide 1

Slide 1 text

Probabilistic Data Structures in Redis

Slide 2

Slide 2 text

Srinivasan Rangarajan Head of Engineering

Slide 3

Slide 3 text

Srinivasan Rangarajan @cnu https://cnu.name

Slide 4

Slide 4 text

Agenda • Log Analysis • Redis V4 • Probabilistic Data Structures

Slide 5

Slide 5 text

Log Analysis

Slide 6

Slide 6 text

Challenges • 100s of Millions of events processed every day • Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements

Slide 7

Slide 7 text

Cost Accuracy Scale

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827", "email": "[email protected]", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Redis Version 4 • Module system • Better Replication • Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR

Slide 12

Slide 12 text

Modules mikicon NounProject

Slide 13

Slide 13 text

Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf
 loadmodule /path/to/module.so • MODULE LOAD /path/to/module.so

Slide 14

Slide 14 text

Execute a custom command

Slide 15

Slide 15 text

Probabilistic Data Structures

Slide 16

Slide 16 text

There are three kinds of people in the world. 1. Those who can count. 2. Those who can’t count.

Slide 17

Slide 17 text

There are three kinds of people in the world. data structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Advantage: Huge Memory Savings

Slide 20

Slide 20 text

3 Data Structures

Slide 21

Slide 21 text

HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75

Slide 22

Slide 22 text

Count Unique Visitor / hour

Slide 23

Slide 23 text

Merge Hourly into Daily

Slide 24

Slide 24 text

TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk

Slide 25

Slide 25 text

Top k IP Addresses

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch

Slide 28

Slide 28 text

User Pageview counter

Slide 29

Slide 29 text

Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom

Slide 30

Slide 30 text

Bloom Filters False Positives False Negatives

Slide 31

Slide 31 text

User Session checking

Slide 32

Slide 32 text

~3 Data Structures • HyperLogLog • TopK • CountMinSketch • BloomFilter

Slide 33

Slide 33 text

Thank you

Slide 34

Slide 34 text

Follow me @cnu