Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unlocking Cloud Observability

Gang Tao
November 08, 2023

Unlocking Cloud Observability

How real-time monitoring is achieved by Apache Kafka/Confluent Cloud + Streaming Database

Gang Tao

November 08, 2023
Tweet

More Decks by Gang Tao

Other Decks in Technology

Transcript

  1. CONFIDENTIAL Unlocking Cloud Observability How real-time monitoring is achieved by

    Apache Kafka/Confluent Cloud + Streaming Database Gang Tao Nov 2023
  2. CONFIDENTIAL Observability In the field of software engineering and system

    monitoring, Obserbability refers to the ability to understand and gain insights into the behavior and performance of complex systems Metrics Logs Traces
  3. CONFIDENTIAL Challenges 1. Complex dynamic environments often need monitoring microservices

    and containers 2. The Volume, Velocity, and Variety of data 3. Difficult to quantify the business Impact of observability
  4. CONFIDENTIAL Timeplus Cloud Timeplus is a real-time streaming analytics platform

    in which you can find a lot of common attributes of modern software applications: • A SaaS software running on AWS Cloud • Build on top of container technology using Kubernetes • Have multiple environments • Have multiple layers developed by different software stacks • Contain both statefule and stateless components
  5. CONFIDENTIAL Our Choice - Streaming Data Architecture Data Source Data

    Collector Data Streamnig Stream Base Analytics
  6. CONFIDENTIAL Why Kafka/Confluent? • Flexibility - decoupling data producer and

    consumer • Real-time - low e2e data latency • Back pressure - effective caching layer • High Scalability • High Availability A true story • disk is almost full • log data cannot be ingested • no real-time data , 3 hours lag - back pressure • fix - add more spaces • back to normal With data cached in Kafka no data lost in this case
  7. CONFIDENTIAL Why Streaming Database? Streaming database is designed to handle

    and manage continuous flow of data that is generated, processed, and analyzed in real time. • Real-time insight monitor system behavior and performance as it happens • Low Latency process and analyze data with low latency, in milliseconds • CEP create sophisticated rules and patterns to detect specific conditions or issues
  8. CONFIDENTIAL the Next-Generation Streaming Database (Kafka + Flink + ClickHouse

    ) SQL with streaming extension Data Ingestion Unified Query Processing Pipeline ingest append stream read historical read streaming storage historical storage query
  9. CONFIDENTIAL SELECT * FROM logs Stream tail SELECT count(*) FROM

    logs Global aggregation SELECT window_start, count(*) FROM tumble(logs, 1m) GROUP BY window_start Window aggregation SELECT window_start, count(*) FROM tumble(logs, 5s) GROUP BY window_start EMIT AFTER WATERMARK AND DELAY 2s Late event SELECT * FROM logs WHERE _tp_time > now() - 1d Time travel SELECT device, cpu_usage, timestamp FROM logs INNER JOIN table(products_info) AS dim ON logs.product_id = dim.id Stream join SELECT * FROM table(logs) limit 10 Historical query
  10. CONFIDENTIAL Alerts WITH iot AS ( SELECT to_time(raw:_tp_time) AS _tp_time,

    * FROM es_sample_iot ), iot_agg_5s AS ( SELECT window_start, count() AS eps FROM tumble(iot, 1s) GROUP BY window_start ) SELECT eps, lag(eps) as prev_eps, (prev_eps - eps)/prev_eps as eps_drop FROM iot_agg_5s WHERE eps_drop > 0.2 Leveraging the continuously data processing capabilities of streaming databases, the alert is triggered as soon as the issue happens
  11. CONFIDENTIAL Challenges - Revist 1. Complex dynamic environments often need

    monitoring microservices and containers Real-time streaming analytic provide Rapid response and insight 2. The Volume, Velocity, and Variety of data Leverage Kafka/Conflunet to effectively store and transport streaming data 3. Difficult to quantify the business impact of observability Provide continuously, cross layer KPI/Metric monitoring with streaming processing