topic-partition-2 topic-partition-N APP • Every topic in Kafka is split into one or more partitions • All the streaming tasks are executed through one or multiple threads of the same instance 12
topic-partition-2 topic-partition-N APP APP • Consumers from the same consumer group cooperate to consume data from topics. • Every instance by joining the group triggers a partition rebalance. 13
- Periodically adjusts the number of replicas - Base on CPU usage in autoscaling/v1 - Memory and custom metrics are covered by the autoscaling/v2beta1 - Use the metrics.k8s.io API through a metric server ➔ Source: Kubernetes.io Documentation 30
and k8s StateFull Sets adoption are the next challenges to ease auto-scaling BUILD THE FUTURE 1. Kafka-Streams exposes relevant metrics related to stream processing 2. Consumer-lag is one of the key metrics to monitor in real time application 3. The cloud native trends brings a set of powerful tools on which the Kafka community keep a close look 34
the current metric value and replica number desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] ➢ A ratio of two will double the number of intances within the respect of maxReplicas ➢ By using targetAverageValue, the metric is computed by taking the average of the given metric across all Pods The number of replicas may fluctuating frequently due to the dynamic nature of the metrics, it’s called trashing ➢ --horizontal-pod-autoscaler-downscale-delay (default 5m0s) ➢ --horizontal-pod-autoscaler-upscale-delay (default 3m0s) Note: Both Kafka-Streams topology modification and HPA makes rolling update imposible HPA & thrashing: “Should I stay or should I Go?” 39
states baby” 40 Streaming apps Maintains states with Rocksdb States are backuped in change logs topics State full topology nodes has portion of the state attached At startup if an instance has old versions of states it’s more likely to be assigned to the corresponding streaming tasks This reduces the states migration Large states implies high recovery time
-could-you-should-you https://kubernetes.io/blog/2018/04/13/local-persistent -volumes-beta/ https://stackoverflow.com/questions/49482873/how-to -deploy-kafka-stream-applications-on-kubernetes https://www.youtube.com/watch?v=9TOoThIKafo&list= PLhMG-8t0efEvJM5Bt2_zNLNVCEWcfRN0i https://www.confluent.io/blog/streaming-in-the-clouds- where-to-start Ideas from The event oriented architecture Concepts from Streams and tables white paper All the code on github DivLoic/xke-kingof-scaling Pictures: • Stormtroopers Photo by Corey Motta on Unsplash • Screen from “Close Rick-counters of the Rick Kind” • Sreens from https://kubernetes.io/docs/tasks/ • Coin-op machine by xoxo from the Noun Project • Arcade game by Icons Producer from the Noun Project • Mainframe by monkik from the Noun Project 45