Samza Concept Overview • Samza processes streams. A stream is composed of immutable messages of a similar type or category. In Kafka a stream is a topic.
Advanced Concept Overview (1/2) • Partition: each stream is broken into one or more partitions, which is an ordered, replayable sequence of records. • Task: the unit of parallelism of the job, just as the partition is to the stream. コンテナ数を増やすとスケール (ただし多重度の限界はprtition数に依存)
Advanced Concept Overview (2/2) • Job Coordinator – manage the assignment of tasks across the individual containers – monitor the liveness of individual containers – redistribute the tasks among the remaining ones during a failure
Proposed Changes • The Samza Operator, similar to the Samza AM in YARN, is the control hub for Samza applications running on Kubernetes. It is responsible for requesting Pods from Kubernetes and coordinating work assignment across Pods. • Below graph describes the lifecycle of a Samza application running on Kubernetes. 参考
Samzaの今後 • Adding support for other languages, like Python • Hot-standby containers to support applications with strict downtime requirements • Making it easy to auto-scale and auto-tune Samza applications • Supporting machine learning related use cases • Enabling end-to-end exactly once processing 参考< https://engineering.linkedin.com/blog/2018/11/samza-1-0--stream-processing-at-massive-scale>