Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scale in / scale out with Kafka Streams and Kubernetes

Ed81876bf33da90cdae47ce9b8df056b?s=47 Loïc DIVAD
November 20, 2018

Scale in / scale out with Kafka Streams and Kubernetes

Ed81876bf33da90cdae47ce9b8df056b?s=128

Loïc DIVAD

November 20, 2018
Tweet

Transcript

  1. @Xebiconfr #Xebicon18 @LoicMDivad Build the future Scale in / scale

    out with Kafka-Streams and Kubernetes Loïc DIVAD Data Engineer @ Xebia France
  2. @Xebiconfr #Xebicon18 @LoicMDivad 2

  3. @Xebiconfr #Xebicon18 @LoicMDivad Auto-scaling applied to streaming apps Scale out

    by respecting the workload and resources 3
  4. @Xebiconfr #Xebicon18 @LoicMDivad Loïc DIVAD Developer @XebiaFr (also #Data Engineer,

    #Spark Trainer, @DataXDay #Organiser, #Writer@blog.xebia.fr, #DataLover ) @LoicMDivad 4
  5. @Xebiconfr #Xebicon18 @LoicMDivad Streaming Apps Consumer polling

  6. @Xebiconfr #Xebicon18 @LoicMDivad Kafka clients and the consumer polling system

    APP 6
  7. @Xebiconfr #Xebicon18 @LoicMDivad Kafka clients and the consumer polling system

    APP 7
  8. @Xebiconfr #Xebicon18 @LoicMDivad Kafka clients and the consumer polling system

    APP 8
  9. @Xebiconfr #Xebicon18 @LoicMDivad Kafka clients and the consumer polling system

    APP 9
  10. @Xebiconfr #Xebicon18 @LoicMDivad Kafka clients and the consumer polling system

    APP 10
  11. @Xebiconfr #Xebicon18 @LoicMDivad Streaming Apps Kafka-Streams and the consumer protocol

  12. @Xebiconfr #Xebicon18 @LoicMDivad Kafka-Streams and the consumer protocol topic-partition-0 topic-partition-1

    topic-partition-2 topic-partition-N APP • Every topic in Kafka is split into one or more partitions • All the streaming tasks are executed through one or multiple threads of the same instance 12
  13. @Xebiconfr #Xebicon18 @LoicMDivad Kafka-Streams and the consumer protocol topic-partition-0 topic-partition-1

    topic-partition-2 topic-partition-N APP APP • Consumers from the same consumer group cooperate to consume data from topics. • Every instance by joining the group triggers a partition rebalance. 13
  14. @Xebiconfr #Xebicon18 @LoicMDivad Kafka-Streams and the consumer protocol topic-partition-0 topic-partition-1

    topic-partition-2 topic-partition-N APP APP APP APP • The maximum parallelism is determined by the number of partitions of the input topic(s) 14
  15. @Xebiconfr #Xebicon18 @LoicMDivad Container Orchestration K8s or the state of

    the art
  16. @Xebiconfr #Xebicon18 @LoicMDivad Container Orchestration: K8s or the state of

    the art ➔ Source: Kubernetes.io Documentation 16
  17. @Xebiconfr #Xebicon18 @LoicMDivad Container Orchestration Support for custom metrics

  18. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics 18 kind:

    Deployment # deployment.yaml #... template: containers: - name: streaming-app # ... - name: prometheus-to-sd # ... adapter.yaml - name: custom-metrics-sd-adapter Your Streaming App Prometheus to Stackdriver https://gcr.io/google-containers/prometheus-to-sd Metrics Server https://gcr.io/google-containers/custom-metrics-stackdriver-adapter JMX metrics in a Prometheus format Stackdriver
  19. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics # jmx-exporter.conf

    --- global: scrape_interval: 1s evaluation_interval: 1s rules: - pattern: "kafka.consumer<type=..., topic=GAME-FRAME-RS, partition=(.*)><>(.*):(.*)" labels: { partition: $2, topic: GAME-FRAME-RS, metric: $3 } name: "consumer_lag_game_frame_rs" type: GAUGE - pattern: "kafka.consumer<type=..., topic=GAME-FRAME-RQ, partition=(.*)><>(.*):(.*)" labels: { partition: $2, topic: GAME-FRAME-RQ, metric: $3 } name: "consumer_lag_game_frame_rq" type: GAUGE 19
  20. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics 20 1

    W , D f lolo ➜ ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date
  21. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary 21 2 E ! U f f … f
  22. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary lolo ➜ kubectl create -f deployment.yaml 22 3 W f , ,
  23. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary lolo ➜ kubectl create -f deployment.yaml lolo ➜ kubectl get pods prometheus-to-sd kstreams-app 23 4 E !
  24. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary lolo ➜ kubectl create -f deployment.yaml lolo ➜ kubectl get pods prometheus-to-sd kstreams-app 24
  25. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary lolo ➜ kubectl create -f deployment.yaml lolo ➜ kubectl get pods prometheus-to-sd kstreams-app 25
  26. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics lolo ➜

    ./gradlew dockerPush <=========----> 73% EXECUTING [2s] > :docker … BUILD SUCCESSFUL in 14s 10 actionable tasks: 5 executed, 5 up-to-date lolo ➜ terraform apply … + google_container_cluster.primary lolo ➜ kubectl create -f deployment.yaml lolo ➜ kubectl get pods prometheus-to-sd kstreams-app 26
  27. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Support for custom metrics 27 kind:

    Deployment # deployment.yaml #... template: containers: - name: streaming-app # ... - name: prometheus-to-sd # ... adapter.yaml - name: custom-metrics-sd-adapter Your Streaming App Prometheus to Stackdriver https://gcr.io/google-containers/prometheus-to-sd Metrics Server https://gcr.io/google-containers/custom-metrics-stackdriver-adapter Stackdriver
  28. @Xebiconfr #Xebicon18 @LoicMDivad Container Orchestration Horizontal Pod Autoscaler

  29. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Horizontal Pod Autoscaler ➔ Source: Kubernetes.io

    Documentation 29
  30. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Horizontal Pod Autoscaler - Kubernetes Resource

    - Periodically adjusts the number of replicas - Base on CPU usage in autoscaling/v1 - Memory and custom metrics are covered by the autoscaling/v2beta1 - Use the metrics.k8s.io API through a metric server ➔ Source: Kubernetes.io Documentation 30
  31. @Xebiconfr #Xebicon18 @LoicMDivad Scale In / scale out with kafka-streams

    and k8s 31
  32. @Xebiconfr #Xebicon18 @LoicMDivad 32

  33. @Xebiconfr #Xebicon18 @LoicMDivad Scale In / scale out with kafka-streams

    and k8s 33
  34. @Xebiconfr #Xebicon18 @LoicMDivad CONCLUSION States migration, changelog compaction, topology upgrades

    and k8s StateFull Sets adoption are the next challenges to ease auto-scaling BUILD THE FUTURE 1. Kafka-Streams exposes relevant metrics related to stream processing 2. Consumer-lag is one of the key metrics to monitor in real time application 3. The cloud native trends brings a set of powerful tools on which the Kafka community keep a close look 34
  35. @Xebiconfr #Xebicon18 @LoicMDivad MERCI 35

  36. @Xebiconfr #Xebicon18 @LoicMDivad 36

  37. @Xebiconfr #Xebicon18 @LoicMDivad 37

  38. @Xebiconfr #Xebicon18 @LoicMDivad ANNEXES 38

  39. @Xebiconfr #Xebicon18 @LoicMDivad The Horizontal Pod Autoscaler algorithm depends on

    the current metric value and replica number desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] ➢ A ratio of two will double the number of intances within the respect of maxReplicas ➢ By using targetAverageValue, the metric is computed by taking the average of the given metric across all Pods The number of replicas may fluctuating frequently due to the dynamic nature of the metrics, it’s called trashing ➢ --horizontal-pod-autoscaler-downscale-delay (default 5m0s) ➢ --horizontal-pod-autoscaler-upscale-delay (default 3m0s) Note: Both Kafka-Streams topology modification and HPA makes rolling update imposible HPA & thrashing: “Should I stay or should I Go?” 39
  40. @Xebiconfr #Xebicon18 @LoicMDivad Kafka-Streams & persistent storage: “Let’s talk about

    states baby” 40 Streaming apps Maintains states with Rocksdb States are backuped in change logs topics State full topology nodes has portion of the state attached At startup if an instance has old versions of states it’s more likely to be assigned to the corresponding streaming tasks This reduces the states migration Large states implies high recovery time
  41. @Xebiconfr #Xebicon18 @LoicMDivad K8s: Horizontal Pod Autoscaler 41

  42. @Xebiconfr #Xebicon18 @LoicMDivad now supports more than 42 https://www.confluent.io/blog/apache-kafka-supports-200k-partitions-per-cluster 200K

    partitions
  43. @Xebiconfr #Xebicon18 @LoicMDivad “Everything is awesome, when you're living in

    a THE CLOUD” 43
  44. @Xebiconfr #Xebicon18 @LoicMDivad Use Case - King Of Fighters: The

    combos sessionization 44 Streaming App Correlate Flatten Decode Group Produce Back Key => { "ts":1542609460412, "machine":"903071", "zone":"AU" } Value => { "bytes":[ "c3ff8ab19d00d9e5", "e3ff8c72b600d9e5" ]} [{ "impact":0, "key":"X", "direction":"DOWN", "type":"Missed", "level":"Pro", "game":"Neowave" }, ...]
  45. @Xebiconfr #Xebicon18 @LoicMDivad Links and references https://www.confluent.io/kafka-summit-sf18/deploying- kafka-streams-applications https://www.youtube.com/watch?v=9cyXXmRlGWQ https://www.confluent.io/blog/apache-kafka-kubernetes

    -could-you-should-you https://kubernetes.io/blog/2018/04/13/local-persistent -volumes-beta/ https://stackoverflow.com/questions/49482873/how-to -deploy-kafka-stream-applications-on-kubernetes https://www.youtube.com/watch?v=9TOoThIKafo&list= PLhMG-8t0efEvJM5Bt2_zNLNVCEWcfRN0i https://www.confluent.io/blog/streaming-in-the-clouds- where-to-start Ideas from The event oriented architecture Concepts from Streams and tables white paper All the code on github DivLoic/xke-kingof-scaling Pictures: • Stormtroopers Photo by Corey Motta on Unsplash • Screen from “Close Rick-counters of the Rick Kind” • Sreens from https://kubernetes.io/docs/tasks/ • Coin-op machine by xoxo from the Noun Project • Arcade game by Icons Producer from the Noun Project • Mainframe by monkik from the Noun Project 45
  46. @Xebiconfr #Xebicon18 @LoicMDivad With special thanks to: 46 Stéphane M.

    Florent R. Sylvain L. (Teddy Beer)
  47. @Xebiconfr #Xebicon18 @LoicMDivad Auto-scaling applied to streaming apps Scale out

    by respecting the workload and resources 47