This is the slides for the Flink Forward 2024 conference (https://www.flink-forward.org/berlin-2024/agenda#comparing-apache-flink-and-spark-for-modern-stream-data-processing)
Real-time data processing is essential for staying competitive in today’s fast-paced business environment, and choosing the right tool is a key decision. Apache Flink and Spark Structured Streaming are two leading stream processing frameworks, each with unique strengths and trade-offs.
This talk takes a look at our journey at Decodable, where we evaluated both tools and ultimately chose Apache Flink over Spark Structured Streaming for our stream data processing needs. By examining key differences between the two systems, we aim to provide a clear, technical comparison that will help you make informed decisions for your streaming data use cases.
Join us for this talk where we will discuss:
- Design philosophies: Learn about the origins of both systems and some of the fundamental architecture design choices of Flink that makes it more attractive for streaming use cases.
- (Stateful) streaming capabilities: We will dive into and compare similar features that both Spark and Flink offer in the various APIs, we will also share some features only available in Flink that make it a much richer streaming library. We will also talk about some of the data ecosystem tools/connectors that Flink supports natively, like Debezium.
- Production readiness: We will also talk about some of the recent features of Flink that makes running Flink at scale easy, like the Kubernetes operator and its sophisticated auto-scaler.