Streaming Aggregation of Cloud Scale Telemetry (Shay Lin, Confluent) | RTA Summit 2023

Streaming Aggregation of Cloud Scale Telemetry Shay Lin Software Engineer,
Conﬂuent

Contents • Telemetry serving: the platform evolution • Streaming aggregation
architectures • Deep dive into chosen solution

A tale of four clusters 3 Stores 81233469 bytes in
partition 1, node-4 at 2023-04-14 06:55 PST CPU usage 89% at timestamp 1680613879 4 fetch requests from client id dev-app-1 in the last minute, at 2023-01-23 10:09T000Z 1K incoming message in the last second to broker node 4

Terminologies 4 • Topic • Partition • Broker • Cluster

Telemetry Serving: The evolution

Kafka Telemetry Serving Usage Patterns Data Store

Kafka Telemetry Serving Usage Patterns Data Store Stores 81233469 bytes
in node-4 at 2023-04-14 06:55 PST

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879 Trends: CPU usage is always peaked on Friday Nights(PST) during a week

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879 Trends: CPU usage is always peaked on Friday Nights(PST) during a week 268 Fetch Request in the last minute from client id dev-app-01

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879 Trends: CPU usage is always peaked on Friday Nights(PST) during a week 268 produce request in the last minute from client id dev-app Attribution: dev-app issues most produce request among all clients

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879 Trends: CPU usage is always peaked on Friday Nights(PST) during a week 268 produce request in the last minute from client id dev-app Attribution: dev-app issues most produce request among all clients Diagnose: ﬁnd the point in time number of requests at Friday nights and identify a fan-in problem!

in node-4 at 2023-04-14 06:55 PST Facts: Number of bytes stored for Cluster 1 in the last hour CPU was at 89% at timestamp 1680613879 Trends: CPU usage is always peaked on …… during a week 268 produce request in the last minute from client id dev-app Attribution: dev-app issues ……….. request among all clients Diagnose ﬁnd the point in t………. identify a fan-in problem!

Time Series Optimized OLAP: Apache Druid Segment Segment Query Engine
# Fetch Request % CPU # Stored Bytes …

A tale of four clusters: when it gets analytical 16
Storage bytes across all partitions for Kafka topic % CPU at point in time # of produce request from client ID Ingress from cluster in the last hour

Highly concurrent ingestion and queries Druid Segment Segment Segment Segment
Query Engine # Fetch Request % CPU # Network Connection … # Fetch Request % CPU # Network Connection … # Fetch Request % CPU # Network Connection …

Scalability Concerns of the Pull Model Druid \ Example: Hourly
storage metric of cluster = N topics x P partitions x R replication factor x 60 N = 100 P = 10 R = 3, a total of 180K data points for one metric • Highly concurrent ingestion and query • Rising compute and serving cost • Inconsistent queries used by data consumers

Streamline metrics consumption with Push Model Publish In-Demand Metric Aggregations

Architecture Options Data Size Narrow use cases Broad use cases

Architecture Options Ofﬂine Custom rollup tasks in Apache Druid, or
Apache Pinot: Star Tree Index. Data Size Narrow use cases Broad use cases

Apache Pinot: Star Tree Index. Real-time Aggregate raw telemetry as they come in via stream processing (Flink, KStreams). Data Size Narrow use cases Broad use cases

Apache Pinot: Star Tree Index. Real-time Aggregate raw telemetry as they come in via stream processing (Flink, KStreams). Hybrid Aggregate through stream processing, and feed the results back to OLAP store. Data Size Narrow use cases Broad use cases

Apache Pinot: Star Tree Index. Real-time Aggregate raw telemetry as they come in via stream processing (Flink, KStreams). Chosen Hybrid Aggregate through stream processing, and feed the results back to OLAP store. Key decision factors, • High compression use cases • Metric accuracy and consistency • Cost efﬁcient Data Size Narrow use cases Broad use cases

KStreams + Druid Chosen Hybrid Solution Architecture

Kafka Streams: 10,000 ft View • It’s a Apache Kafka
client library • A processor topology deﬁnes the computational logic to be performed as messages comes through Kafka • Java(or scala) microservice that enjoys the beneﬁts: ◦ Fault tolerance, parallelism, backed by Kafka topics • Kafka Streams provides: ◦ Streams DSL: joins, windowed aggregation ◦ Processor API: custom data operations, state store management

Raw Telemetry Metric store built on KStreams Customer Customer Customer
Customer Customer Customer Aggregates Consumers

A tale of four clusters: uniﬁed metric interface 29 Storage
bytes across all partitions for Kafka topic % CPU at point in time # of produce request from client ID Ingress from cluster in the last hour

Topology 1: Global Task Manager Distributes aggregation tasks by metric
and entities: • Custom segment signal producer in the Druid Segments • Task manager dynamically allocates task based on upstream segments and additional trigger conditions

Topology 2: Metric Processing Workers Leverages Druid Query Engine for
metric rollups: • Statelessly process incoming aggregate tasks • Flatten into single metric output • Data retention is the same as Druid Segments

Topology 3: Additional Processing Assumption: Consumers expect Open Telemetry(OTel) metrics:
• Processing to support OTel semantics: e.g. emit delta for counter metrics • Consumers include Druid Segment and direct data consumers • Out-of-order data handling with the state store up to retention period*

Reference Architecture Kafka Streams

Horizontal Scaling Story in KStreams(KIP-878) Problem: as business expands, you
might want to increase the parallelism of the streams processing. Users will want to increase the number of partitions of input topics. However, internal topics(changelog, repartition) will not automatically increase, and today, KStreams application will crash upon detecting a mismatch of partition number between internal topics and input topics. KIP-878: Support auto scaling of internal topics. This works well if your application can be, • Statically partitioned or stateless: stateless is straightforward. In KStreams, your state store(e.g. RocksDB) is backed by internal topics, thus, bound to a partition. Upon autoscaling, the pre-existing state will not move. Choose a partition strategy that works well for your use case, such that you can drive tasks(without pre-existing state) to the newly created partitions, while existing keys remains sticky. • Upfront over-provisioned for stateful processing, while the KIP is in progress.

A tale of four clusters 35 Stores 81233469 bytes in
partition 1, node-4 at 2023-04-14 06:55 PST CPU usage 89% at timestamp 1680613879 4 fetch requests from client id dev-app-1 in the last minute, at 2023-01-23 10:09T000Z 1K incoming message in the last second to broker node 4

A tale of four clusters: when it gets analytical 36
Storage bytes across all partitions for Kafka topic % CPU at point in time # of produce request from client ID Ingress from cluster in the last hour

A tale of four clusters: happy consumers! 37 Storage bytes
across all partitions for Kafka topic % CPU at point in time # of produce request from client ID Ingress from cluster in the last hour

Twitter: @QiuxuanL https://www.linkedin.com/in/qiuxuanlin/

Bonus: KStreams Real-time raw telemetry processing

Pseudo Topology Kafka Streams

Design Diagram Time and Space Cardinality Reduction

As metrics use cases increases… Closing thoughts for KStreams Metrics
Aggregation • A DSL, KSQL or similar, to define versioned metrics: each metric aggregate is computed by one topology. When a metric definition changes, we need a strategy to handle and propagate changes. • Topic partition of the raw telemetry impacts aggregation efficiency: repartition may take up most of the processing time, as well as increased storage and network costs. • Performance tuning for RocksDB, or the state store implementation of your choice will become critical: SerDe, data retention, read/write patterns. • Query plans and smart roll ups could become essential: pre-aggregates should be shared for space and/or time aggregations across metrics for efficiency.

Streaming Aggregation of Cloud Scale Telemetry...

Streaming Aggregation of Cloud Scale Telemetry (Shay Lin, Confluent) | RTA Summit 2023

More Decks by StarTree

Other Decks in Technology

Featured

Transcript