is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.)
is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.) ÞLet’s take a step back: What exactly is Flink?
“tries” to keep it simple Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • …for Flink’s API… Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • …and for Flink’s cluster…. Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
“tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Þ Let’s take another step back: Þ What does Flink do? Simplified?
1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time? Þ OKOK, we get it! It’s complicated. But now what??
is a complex III. Data & real-time are abstract 2.The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
• Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)
• Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)
• Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command) • => SUCCESS – with only a few steps
Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable
Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable
Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable ÞSolution: Flink 1.18.1 + Java 11 + upgrade soon
Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI
Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI => Solution: Use Flinks abstraction to set up a cluster and start the UI
Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework
Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework
Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework => Solution: embrace the complexity
is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3.The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex
might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅
might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅ • B) & C) -> I suggest a gross oversimplification
So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: • A toolbox for coding • With building blocks • To define WHAT will be happening to data • The Framework/Libraries/APIs/Job
So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 2: • A bigger toolbox • With commands and a UI • To define HOW data is handled • Mostly automatically • The Engine resp. The Cluster
So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: The Framework • Nutshell 2: The Engine • Simple principle: • Build it • Package it • Send it • Let it run
DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client
DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack
DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack • Typical squirrel