Harnessing the power of Redis and Apache Kafka to crunch high-velocity time series data (RedisConf 2021)

Slide 1

Slide 1 text

Processing Time Series data with Redis and Kafka Abhishek Gupta Senior Developer Advocate, Microsoft

Slide 2

Slide 2 text

About Me • Focus: Kafka, Databases, Kubernetes • Blogger, (author), OSS contributor • Lot of Java in the past. Enjoy Go, Rust

Slide 3

Slide 3 text

Agenda Intro Demo – yes, we dive right in! Some food for thought Wrap up aka.ms/redis-timeseries-kafka

Slide 4

Slide 4 text

Time Series data: It’s everywhere ● Think of it as a Tuple (for simplicity) ● A single data point: Time(stamp) and a numeric value ● A Time Series: collection of many such data points

Slide 5

Slide 5 text

Redis joined the party! ● Before: Sorted Sets, Redis Streams ● RedisTimeSeries: A native data structure ● Thanks to Redis Modules!

Slide 6

Slide 6 text

RedisTimeSeries commands ● Basic ○ TS.CREATE ○ TS.ADD, TS.MADD ● Query ○ TS.GET, TS.MGET ○ TS.RANGE, TS.MRANGE (filters) ● Aggregations ○ avg, min, max, sum, count ○ TS.CREATERULE ● Clients – Java, Go, Python etc.

Slide 7

Slide 7 text

Databases are just a part of the solution… ● Time Series data is: ○ Relatively simple, but, ○ Fast: think tens of metrics from thousands of devices/sec ○ Big (data): think data accumulation over months ● How do you collect, send all that data? ○ Just send it directly to Redis – it’s lightning fast, right? ● What we need is a data pipeline. A system to: ○ Decouple producers, consumers ○ Act as a buffer ● Apache Kafka is a good one!

Slide 8

Slide 8 text

Time series processing in action!

Slide 9

Slide 9 text

Device monitoring: Multiple locations and devices ● Monitor device metrics - Temperature and Pressure ● Time Series setup (simulated data) ○ Name (key) - :: ○ Labels (metadata) - metric, location, device ○ Examples: ■ TS.ADD temp:3:2 * 20 LABELS metric temp location 3 device 2 ■ TS.ADD pressure:3:2 * 60 LABELS metric pressure location 3 device 2

Slide 10

Slide 10 text

High level architecture

Slide 11

Slide 11 text

Some food for thought

Slide 12

Slide 12 text

RedisTimeSeries specifics Retention policy • Maximum age for samples compared to last event time • Time series data does not get trimmed by default Rules for down-sampling/Aggregations • TS.CREATERULE temp:1:2 temp:avg:30 AGGREGATION avg 30000 Duplicate data policy • How to handle duplicate samples? • Default: BLOCK (error out) • Other options: FIRST, LAST, MIN, MAX, SUM Source RedisLabs docs

Slide 13

Slide 13 text

Visualizations ● Grafana dashboard powered by Redis Data Source for Grafana ● Redis Time Series adapter for Prometheus ● Redis Time Series Telegraf plugin

Slide 14

Slide 14 text

Other considerations ● Scalability - Your time series data volumes can only move one way – up! ● Long term data retention – cost-efficient storage ● Integration – RedisTimeSeries connector

Slide 15

Slide 15 text

Key takeaways Think about: ● The end-to-end data pipeline from source to Redis and beyond ● Data modeling, down-sampling and data retention

Slide 16

Slide 16 text

Next Steps, resources ● GitHub repo: aka.ms/redis-timeseries-kafka ● Azure Cache for Redis Enterprise Tiers

Slide 17

Slide 17 text

Thank you.