Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Probabilistic Data Structures

Be291a56bc3a4ba6b3b0091c0021957b?s=47 cnu
September 03, 2017

Probabilistic Data Structures

Learn how to use Probabilistic Data Structures and modules in Redis v4 to analyse logs.

Be291a56bc3a4ba6b3b0091c0021957b?s=128

cnu

September 03, 2017
Tweet

Transcript

  1. Probabilistic Data Structures in Redis

  2. Srinivasan Rangarajan Head of Engineering

  3. Srinivasan Rangarajan @cnu https://cnu.name

  4. Agenda • Log Analysis • Redis V4 • Probabilistic Data

    Structures
  5. Log Analysis

  6. Challenges • 100s of Millions of events processed every day

    • Peak of ~10 Million events in an hour • Needs Realtime processing • Memory/Storage Requirements
  7. Cost Accuracy Scale

  8. None
  9. Sample Event Data { "ip": "123.123.123.123", "client_id": 232, "user_id": "35827",

    "email": "foo@example.com", "product_id": "ABC-12345", "image_id": 3, "action": "pageview", "datetime": "2017-06-29T12:42:53Z", }
  10. None
  11. Redis Version 4 • Module system • Better Replication •

    Cache eviction Improvements • Non-Blocking DEL and FLUSH* commands • Mixed RDB-AOF persistence format • MEMORY DOCTOR
  12. Modules mikicon NounProject

  13. Loading Modules • ./redis-server --loadmodule /path/to/module.so • redis.conf
 loadmodule /path/to/module.so

    • MODULE LOAD /path/to/module.so
  14. Execute a custom command

  15. Probabilistic Data Structures

  16. There are three kinds of people in the world. 1.

    Those who can count. 2. Those who can’t count.
  17. There are three kinds of people in the world. data

    structures 1. Those who can count. 2. Those who can’t count. 3. Those who count approximately.
  18. None
  19. Advantage: Huge Memory Savings

  20. 3 Data Structures

  21. HyperLogLog Count the Cardinality of a Set http://antirez.com/news/75

  22. Count Unique Visitor / hour

  23. Merge Hourly into Daily

  24. TopK Get Top k Elements in a set https://github.com/RedisLabsModules/topk

  25. Top k IP Addresses

  26. None
  27. CountMinSketch Count the frequency of items https://github.com/RedisLabsModules/countminsketch

  28. User Pageview counter

  29. Bloom Filters Test membership in a set https://github.com/RedisLabsModules/rebloom

  30. Bloom Filters False Positives False Negatives

  31. User Session checking

  32. ~3 Data Structures • HyperLogLog • TopK • CountMinSketch •

    BloomFilter
  33. Thank you

  34. Follow me @cnu