Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Harnessing the power of Redis and Apache Kafka ...

Abhishek Gupta
April 20, 2021
89

Harnessing the power of Redis and Apache Kafka to crunch high-velocity time series data (RedisConf 2021)

This talk is all about how to combine the power of RedisTimeSeries and Apache Kafka to build scalable solutions that handle time series data. To illustrate these concepts, dive into an end-to-end implementation of a data pipeline on Azure and see it in action.

Abhishek Gupta

April 20, 2021
Tweet

Transcript

  1. About Me • Focus: Kafka, Databases, Kubernetes • Blogger, (author),

    OSS contributor • Lot of Java in the past. Enjoy Go, Rust
  2. Agenda Intro Demo – yes, we dive right in! Some

    food for thought Wrap up aka.ms/redis-timeseries-kafka
  3. Time Series data: It’s everywhere • Think of it as

    a Tuple (for simplicity) • A single data point: Time(stamp) and a numeric value • A Time Series: collection of many such data points
  4. Redis joined the party! • Before: Sorted Sets, Redis Streams

    • RedisTimeSeries: A native data structure • Thanks to Redis Modules!
  5. RedisTimeSeries commands • Basic ◦ TS.CREATE ◦ TS.ADD, TS.MADD •

    Query ◦ TS.GET, TS.MGET ◦ TS.RANGE, TS.MRANGE (filters) • Aggregations ◦ avg, min, max, sum, count ◦ TS.CREATERULE • Clients – Java, Go, Python etc.
  6. Databases are just a part of the solution… • Time

    Series data is: ◦ Relatively simple, but, ◦ Fast: think tens of metrics from thousands of devices/sec ◦ Big (data): think data accumulation over months • How do you collect, send all that data? ◦ Just send it directly to Redis – it’s lightning fast, right? • What we need is a data pipeline. A system to: ◦ Decouple producers, consumers ◦ Act as a buffer • Apache Kafka is a good one!
  7. Device monitoring: Multiple locations and devices • Monitor device metrics

    - Temperature and Pressure • Time Series setup (simulated data) ◦ Name (key) - <metric>:<location>:<device> ◦ Labels (metadata) - metric, location, device ◦ Examples: ▪ TS.ADD temp:3:2 * 20 LABELS metric temp location 3 device 2 ▪ TS.ADD pressure:3:2 * 60 LABELS metric pressure location 3 device 2
  8. RedisTimeSeries specifics Retention policy • Maximum age for samples compared

    to last event time • Time series data does not get trimmed by default Rules for down-sampling/Aggregations • TS.CREATERULE temp:1:2 temp:avg:30 AGGREGATION avg 30000 Duplicate data policy • How to handle duplicate samples? • Default: BLOCK (error out) • Other options: FIRST, LAST, MIN, MAX, SUM Source RedisLabs docs
  9. Visualizations • Grafana dashboard powered by Redis Data Source for

    Grafana • Redis Time Series adapter for Prometheus • Redis Time Series Telegraf plugin
  10. Other considerations • Scalability - Your time series data volumes

    can only move one way – up! • Long term data retention – cost-efficient storage • Integration – RedisTimeSeries connector
  11. Key takeaways Think about: • The end-to-end data pipeline from

    source to Redis and beyond • Data modeling, down-sampling and data retention