- Passionate about Stream data processing, Event Driven Architecture, Cloud, DevOps etc. - Love travelling & exploring places https://sg.linkedin.com/in/zabeer-farook
is designed with infinite data sets in mind. Nothing more” ▸ “Continuous processing of data that is continuously generated” ▸ “Processing unbounded data” ▸ “Processing of data in motion” ▸ “Processing infinite sequence of data”
Detection ◎ Real time stock trading ◎ Cybersecurity ◎ Online Gaming ◎ Click stream analytics ◎ Ride sharing apps ◎ Training ML Models ◎ Real time tracking in logistics ◎ Realtime monitoring ◎ Recommendation Engine ◎ Up to date Retail inventory ◎ Social media feeds ◎ Sensor (IOT) data
delivered at 10:30 AM today. Some items from your order are not available and we will refund you the corresponding amount” “Your income tax is due on May 15th 2023. Please ignore if you have already paid”
relevant based on the use case but it need not be the default any more ◎ What if you just need a daily report of the number of users who subscribed to your blog site? Use Batch ETL. ◎ What if the upstream legacy application only delivers a batch file at EOD? It doesn’t make sense to have a stream job waiting all day ◎ Does it mean we can’t do stream processing until we re-write the legacy application to produce streams in this case? ◦ Not necessarily, CDC (Change Data Capture) can help ◎ How do you prefer to detect Fraud? EOD Batch or Realtime Streaming?
processing framework offering • Low Latency • High Throughput • Fault Tolerant with Exactly Once Support • High Scalability • Support for both stream & batch processing (bounded stream) • Support for event-time processing • Checkpoint and Savepoint features • Written in Java & Scala • Stream jobs can be written in Java, Scala, Python or even SQL • Latest Version 1.18 released in October 2023
research project “Stratosphere” in collaboration with few German universities in 2010 • Became an Apache Incubator project in March 2014 and accepted as Apache top level project in December 2014 • Alibaba created an internal fork “Blink” from Flink in 2015 and was merged back to Flink in 2019/2020 • Fun Fact – Flink means Fast or Agile in German. The red squirrel logo was chosen as squirrels are fast, agile and squirrels in Berlin apparently have a shade of reddish brown J
acquired by Alibaba ▸ eventador.io acquired by Cloudera ▸ Immerok acquired by Confluent in early 2023 and Flink integrated in Confluent Cloud Platform ▸ Other Companies building managed streaming solutions on top of Flink like Decodable, Aiven.io, Deltastream Strong Community ▸ Community support with large organizations using Flink ▸ Also support and contribution from managed service providers ▸ Top Apache project in terms of user activity USP ▸ Leading choice for large scale stateful stream processing with high throughput and low latency ▸ Powerful and battle tested runtime ▸ Support for multiple programming languages and connectors ▸ Streaming first approach for both stream & batch processing ▸ Useful extensions like Flink CDC, Flink SQL, Flink ML, PyFlink etc. 20
of distributed execution of Flink Applications. ▸ TaskManagers - Also called workers which execute the tasks of a dataflow ▸ Client Program – Prepares and sends the data flow graph to the Job Manager
Support • Batch Data is treated as a finite / bounded stream and Stream Data is treated as an infinite /unbounded stream • Flink SQL supports ANSI standard SQL Level of Abstraction
& Flink are complementary technologies ▸ Kafka takes care of distributed storage layer for streaming data ▸ Flink adds up as a stream processing engine ▸ Kafka Streams & KSQL also can be used for stream processing and has some overlapping functionalities compared to Flink ▸ Kafka is Flink’s most popular connector
key points to consider ▹ Batch workload or streaming workload? ▹ Volume & Rate of Data to process? (throughput) ▹ Latency requirements ▹ Stateful or Stateless? ▹ Supported languages and expertise in the team ▹ Existing Tech Stack ▹ Community Support & Documentation ▹ Ordering & Delivery Guarantees ▹ Deployment modes ▹ State management