Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

WHO? Who: Fränk Lotzkes • Long time independent • App/Web/Fullstack dev • Now Solution Architect • Cloud • Real time processing • @ Posedio • Passion for abstraction

Slide 3

Slide 3 text

WHAT? What? • Apache Flink • Open-source tool • For Data streaming

Slide 4

Slide 4 text

WHY? Why? • Extensive POC • From scratch • Lots of new XP & hindsight • To share with newcomers

Slide 5

Slide 5 text

HOW? 1. The Problem I. Streaming is complicated II. Flink is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells

Slide 6

Slide 6 text

THE PROBLEM 1. I. Streaming is complicated II. Flink is complex III. Data & real-time are abstract

Slide 7

Slide 7 text

7 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology

Slide 8

Slide 8 text

8 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology • Flink tries to make streaming easy

Slide 9

Slide 9 text

9 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction

Slide 10

Slide 10 text

10 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro:

Slide 11

Slide 11 text

11 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.)

Slide 12

Slide 12 text

12 1. THE PROBLEM I. Streaming is complicated • Streaming is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.) ÞLet’s take a step back: What exactly is Flink?

Slide 13

Slide 13 text

13 1. THE PROBLEM II. Flink is complex

Slide 14

Slide 14 text

14 1. THE PROBLEM II. Flink is complex • Flink tries to keep it simple

Slide 15

Slide 15 text

15 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 16

Slide 16 text

16 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 17

Slide 17 text

17 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 18

Slide 18 text

18 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • …for Flink’s API… Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 19

Slide 19 text

19 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 20

Slide 20 text

20 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • …and for Flink’s cluster…. Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 21

Slide 21 text

21 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 22

Slide 22 text

22 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 23

Slide 23 text

23 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Slide 24

Slide 24 text

24 1. THE PROBLEM II. Flink is complex • Flink “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Þ Let’s take another step back: Þ What does Flink do? Simplified?

Slide 25

Slide 25 text

25 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it

Slide 26

Slide 26 text

26 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation

Slide 27

Slide 27 text

27 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again

Slide 28

Slide 28 text

28 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere

Slide 29

Slide 29 text

29 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?

Slide 30

Slide 30 text

30 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?

Slide 31

Slide 31 text

31 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?

Slide 32

Slide 32 text

32 1. THE PROBLEM III. Data & real-time are abstract 1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time? Þ OKOK, we get it! It’s complicated. But now what??

Slide 33

Slide 33 text

WHAT? 1. The Problem I. Streaming is complicated II. Flink is a complex III. Data & real-time are abstract 2.The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells

Slide 34

Slide 34 text

THE APPROACH 2. I. It can’t be THAT complicated? II. Can it?

Slide 35

Slide 35 text

35 2. THE APPROACH I. It can’t be THAT complicated? • Getting an example to run seems fast & easy

Slide 36

Slide 36 text

36 2. THE APPROACH I. It can’t be THAT complicated? • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)

Slide 37

Slide 37 text

37 2. THE APPROACH I. It can’t be THAT complicated? • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)

Slide 38

Slide 38 text

38 2. THE APPROACH I. It can’t be THAT complicated? • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command) • => SUCCESS – with only a few steps

Slide 39

Slide 39 text

39 2. THE APPROACH II. It can’t be THAT complicated? Can it?

Slide 40

Slide 40 text

40 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues

Slide 41

Slide 41 text

41 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version? -> my experience:

Slide 42

Slide 42 text

42 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable

Slide 43

Slide 43 text

43 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable

Slide 44

Slide 44 text

44 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable ÞSolution: Flink 1.18.1 + Java 11 + upgrade soon

Slide 45

Slide 45 text

45 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues B) A local cluster

Slide 46

Slide 46 text

46 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI

Slide 47

Slide 47 text

47 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI => Solution: Use Flinks abstraction to set up a cluster and start the UI

Slide 48

Slide 48 text

48 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues C) A premade job

Slide 49

Slide 49 text

49 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework

Slide 50

Slide 50 text

50 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework

Slide 51

Slide 51 text

51 2. THE APPROACH II. It can’t be THAT complicated? Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework => Solution: embrace the complexity

Slide 52

Slide 52 text

WHAT? 1. The Problem I. Streaming is complicated II. Flink is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3.The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells

Slide 53

Slide 53 text

THE SOLUTION 3. I. Layman’s Abstraction

Slide 54

Slide 54 text

54 3. THE SOLUTION I. Layman’s abstraction • Simple approach might not be simple enough • For a smooth jumpstart into Flink

Slide 55

Slide 55 text

55 3. THE SOLUTION I. Layman’s abstraction • Simple approach might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex

Slide 56

Slide 56 text

56 3. THE SOLUTION I. Layman’s abstraction • Simple approach might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅

Slide 57

Slide 57 text

57 3. THE SOLUTION I. Layman’s abstraction • Simple approach might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅ • B) & C) -> I suggest a gross oversimplification

Slide 58

Slide 58 text

THE SOLUTION 3. II. Flink in two nutshells

Slide 59

Slide 59 text

59 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and distributed processing engine for stateful computations over unbounded and […]”

Slide 60

Slide 60 text

60 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is…

Slide 61

Slide 61 text

61 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells

Slide 62

Slide 62 text

62 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: • A toolbox for coding • With building blocks • To define WHAT will be happening to data • The Framework/Libraries/APIs/Job

Slide 63

Slide 63 text

63 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 2: • A bigger toolbox • With commands and a UI • To define HOW data is handled • Mostly automatically • The Engine resp. The Cluster

Slide 64

Slide 64 text

64 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: The Framework • Nutshell 2: The Engine

Slide 65

Slide 65 text

65 3. THE SOLUTION II. Flink in two nutshells • So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: The Framework • Nutshell 2: The Engine • Simple principle: • Build it • Package it • Send it • Let it run

Slide 66

Slide 66 text

66 3. THE SOLUTION II. Flink in two nutshells BUT DON’T FORGET! • The devil is in the detail

Slide 67

Slide 67 text

67 3. THE SOLUTION II. Flink in two nutshells BUT DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client

Slide 68

Slide 68 text

68 3. THE SOLUTION II. Flink in two nutshells BUT DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack

Slide 69

Slide 69 text

69 3. THE SOLUTION II. Flink in two nutshells BUT DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack • Typical squirrel

Slide 70

Slide 70 text

THANK YOU! POSEDIO GMBH [email protected] Weyringergasse 1-3, 1040 Wien, Millenium Park 4, 6980 Lustenau www.posedio.com • References: • https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/learn- flink/overview/ • https://nightlies.apache.org/flink/flink-docs-release- 1.18/docs/concepts/overview/ • Connectors DS-API: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/datastream/overview/ • Formats DS-API: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/datastream/formats/overview/ • Connectors TableAPI: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/table/overview/ • Formats TableAPI: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/table/formats/overview/ • First steps: https://nightlies.apache.org/flink/flink-docs- master/docs/try-flink/local_installation/ • https://flink.apache.org/what-is-flink/roadmap/ • https://medium.com/big-data-processing/twitter-streaming-using- flink-d19504b676a5