Upgrade to Pro — share decks privately, control downloads, hide ads and more …

“Running Apache Samza on Kubernetes” Recap : KubeCon2019@NA

yosshi_
December 03, 2019

“Running Apache Samza on Kubernetes” Recap : KubeCon2019@NA

“Running Apache Samza on Kubernetes”
Recap : KubeCon2019@NA

yosshi_

December 03, 2019
Tweet

More Decks by yosshi_

Other Decks in Technology

Transcript

  1. • 吉村 翔太 • NTTコミュニケーションズ所属 • データサイエンスチーム • インフラエンジニア/データエンジニアリング •

    Kurbernetes 、Prometheus  etc • 趣味:ボードゲーム • コミュニティ活動 “Cloud Native Developers JP” @yosshi_ 自己紹介
  2. 参考:SparkとFlinkの資料 • Spark – ドキュメント • https://github.com/GoogleCloudPlatform/spark-on-k8s-operator – KubeCon2019@NAのセッション •

    Kubernetizing Big Data and ML Workloads at Uber - Mayank Bansal & Min Cai, Uber https://sched.co/Uaad • Flink – ドキュメント • https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html – KubeCon2019@NAのセッション • Managing Apache Flink on Kubernetes - FlinkK8sOperator - Anand Swaminathan, Lyft https://sched.co/UabA
  3. About Apache Samza (2回目) • Samza開発者のひとりであるLinkedInのChris Riccominiの話 – 「KafkaがHDFSなら、SamzaはMapReduceにあたる存在」 •

    Samzaのネーミング – フランツ・カフカ(Franz Kafka)の小説の「変身」の主人公である グレゴール・ザムザ(Gregor Samsa)
  4. About Apache Kafka(1/2) • Linkedinで開発され、2011年にOSS化 • 大事な役割 – Message queue

    – Message hub Kafkaのない世界 Kafkaのある世界 参考< https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/ >
  5. Samza Concept Overview • Samza processes streams. A stream is

    composed of immutable messages of a similar type or category. In Kafka a stream is a topic.
  6. Advanced Concept Overview (1/2) • Partition: each stream is broken

    into one or more partitions, which is an ordered, replayable sequence of records. • Task: the unit of parallelism of the job, just as the partition is to the stream. コンテナ数を増やすとスケール (ただし多重度の限界はprtition数に依存)
  7. Advanced Concept Overview (2/2) • Job Coordinator – manage the

    assignment of tasks across the individual containers – monitor the liveness of individual containers – redistribute the tasks among the remaining ones during a failure
  8. Proposed Changes • The Samza Operator, similar to the Samza

    AM in YARN, is the control hub for Samza applications running on Kubernetes. It is responsible for requesting Pods from Kubernetes and coordinating work assignment across Pods. • Below graph describes the lifecycle of a Samza application running on Kubernetes. 参考<https://cwiki.apache.org/confluence/display/SAMZA/SEP-20%3A+Samza+on+Kubernetes>
  9. Samzaの今後 • Adding support for other languages, like Python •

    Hot-standby containers to support applications with strict downtime requirements • Making it easy to auto-scale and auto-tune Samza applications • Supporting machine learning related use cases • Enabling end-to-end exactly once processing 参考< https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale>