Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Redisconf 2018: Probabilistic Data Structures

cnu
April 25, 2018

Redisconf 2018: Probabilistic Data Structures

Real Time Log Analysis using Probabilistic Data Structures in Redis. Presented at Redisconf 2018.

cnu

April 25, 2018
Tweet

More Decks by cnu

Other Decks in Programming

Transcript

  1. Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",

    "email": "[email protected]", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
  2. Challenges • 100s of Millions of events processed every day

    • Peak of ~10 Million events in an hour • Needed Real Time processing • Low memory/storage requirements
  3. Execute custom commands >>> import redis >>> r = redis.Redis()

    >>> out = r.execute_command('CMD param1 param2')
  4. Count Unique Visitors/hour >>> r.pfadd('users:2017083120', 123, 456, 789) 1 >>>

    r.pfcount('users:2017083120') 3 >>> r.pfadd('users:2017083120', 456) 0
  5. Merge Hourly into Daily >>> r.pfadd('users:2017083121', 121, 454, 787) 1

    >>> r.pfmerge('users:20170831', 'users:2017083120', 'users:2017083121') True >>> r.pfcount('users:20170831’) 6
  6. Top K IP Addresses >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.89') >>>

    r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.90') >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.91') 1L >>> r.execute_command('TOPK.ADD ip:20170831 3 123.45.67.92') -1L
  7. Top K IP Addresses >>> r.zrange('ip:20170831’, 0, -1, withscores=True) [('TOPK:1.0.1:1.0:\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x0

    0\x00\x00\x00\x00\x00', 1.0), ('123.45.67.89', 1.0), ('123.45.67.90', 1.0), ('123.45.67.92', 2.0)]
  8. 1 2 3 4 h1 0 0 0 0 h2

    0 0 0 0 h3 0 0 0 0
  9. 1 2 3 4 h1 1 0 0 0 h2

    0 1 0 0 h3 0 0 1 0 h1(s1) = 1; h2(s1) = 2; h3(s1) = 3
  10. 1 2 3 4 h1 1 0 0 1 h2

    0 1 0 1 h3 0 0 1 1 h1(s2) = 4; h2(s2) = 4; h3(s2) = 4
  11. 1 2 3 4 h1 2 1 1 1 h2

    0 1 0 1 h3 0 0 1 1 h1(s3) = 1; h2(s3) = 1; h3(s3) = 1
  12. User Pageview counter >>> r.execute_command('CMS.INCRBY u:pv:20170831 123 1 456 3

    789 2 234 1 567 1') 'OK' >>> r.execute_command('CMS.QUERY u:pv:20170831 123 456 789 234 567') [1L, 3L, 2L, 1L, 1L]
  13. 0 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 Empty Bit Array
  14. 0 0 1 0 0 1 0 0 1 0

    0 0 0 0 0 0 h1(item1) = 2; h2(item1) = 5; h3(item1) = 8 Insert Item 1
  15. 0 0 1 0 0 1 0 1 1 0

    1 0 0 0 0 0 h1(item2) = 7; h2(item2) = 8; h3(item2) = 10 Insert Item 2
  16. 0 0 1 0 0 1 0 1 1 0

    1 0 0 0 0 0 h1(item3) = 2; h2(item3) = 11; h3(item3) = 0 Check Item3
  17. 0 0 1 0 0 1 0 1 1 0

    1 0 0 0 0 0 h1(item4) = 10; h2(item4) = 8; h3(item4) = 7 Check Item4
  18. Check User Session >>> r.execute_command('BF.MADD u:sess:20170831 123 456 789') [1L,

    1L, 1L] >>> r.execute_command('BF.EXISTS u:sess:20170831 456') 1L >>> r.execute_command('BF.EXISTS u:sess:20170831 234') 0L