Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Flink in two nutshells

Posedio
October 12, 2024

Flink in two nutshells

Posedio

October 12, 2024
Tweet

More Decks by Posedio

Other Decks in Programming

Transcript

  1. WHO? Who: Fränk Lotzkes • Long time independent • App/Web/Fullstack

    dev • Now Solution Architect • Cloud • Real time processing • @ Posedio • Passion for abstraction
  2. WHY? Why? • Extensive POC • From scratch • Lots

    of new XP & hindsight • To share with newcomers
  3. HOW? 1. The Problem I. Streaming is complicated II. Flink

    is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
  4. THE PROBLEM 1. I. Streaming is complicated II. Flink is

    complex III. Data & real-time are abstract
  5. 7 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology
  6. 8 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology • Flink tries to make streaming easy
  7. 9 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction
  8. 10 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro:
  9. 11 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.)
  10. 12 1. THE PROBLEM I. Streaming is complicated • Streaming

    is built on top of complicated technology • Flink tries to make streaming easy • With a lot of abstraction • Meanwhile, in the “learn Flink” intro: (q.e.d.) ÞLet’s take a step back: What exactly is Flink?
  11. 15 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  12. 16 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  13. 17 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  14. 18 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • …for Flink’s API… Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  15. 19 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  16. 20 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • …and for Flink’s cluster…. Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  17. 21 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  18. 22 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  19. 23 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  20. 24 1. THE PROBLEM II. Flink is complex • Flink

    “tries” to keep it simple • With complex & detailed documentation • and some abstraction • and multiple layers of abstraction • Wait… WDYM “Flink’s API”??? • Wait… WDYM “Flink’s Cluster”??? • Flink “forgets” to explain what it really is Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Þ Let’s take another step back: Þ What does Flink do? Simplified?
  21. 25 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it
  22. 26 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation
  23. 27 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again
  24. 28 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere
  25. 29 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
  26. 30 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
  27. 31 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time?
  28. 32 1. THE PROBLEM III. Data & real-time are abstract

    1. Flink takes some data & deserializes it 2. Does some transformation 3. Serializes it again 4. And puts the data somewhere 5. All in real-time • BUT: • Surely not ANY data(de/serialization)? • Surely not ANY transformation? • Surely not ANYWHERE? • Also how real is real-time? Þ OKOK, we get it! It’s complicated. But now what??
  29. WHAT? 1. The Problem I. Streaming is complicated II. Flink

    is a complex III. Data & real-time are abstract 2.The Approach I. It can’t be THAT complicated? II. Can it? 3. The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
  30. 35 2. THE APPROACH I. It can’t be THAT complicated?

    • Getting an example to run seems fast & easy
  31. 36 2. THE APPROACH I. It can’t be THAT complicated?

    • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)
  32. 37 2. THE APPROACH I. It can’t be THAT complicated?

    • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command)
  33. 38 2. THE APPROACH I. It can’t be THAT complicated?

    • Getting an example to run seems fast & easy • From Flink’s first steps: 1. Have Java 11 2. Download Flink 3. Start local cluster (1 command) 4. Submit premade job (1 command) • => SUCCESS – with only a few steps
  34. 40 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues
  35. 41 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version? -> my experience:
  36. 42 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable
  37. 43 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable
  38. 44 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues A) Java 11? Also, Flink version?: • At the time of POC • Flink 1.20: work in progress • Java 17 + Flink 1.19 not 100% stable • Flink 1.18.1 + Java 11 = stable ÞSolution: Flink 1.18.1 + Java 11 + upgrade soon
  39. 45 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues B) A local cluster
  40. 46 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI
  41. 47 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues B) A local cluster • Clusters require know-how • Luckily Flink provides simple solutions • Easy Standalone or native K8s setup • And a UI => Solution: Use Flinks abstraction to set up a cluster and start the UI
  42. 48 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues C) A premade job
  43. 49 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework
  44. 50 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework
  45. 51 2. THE APPROACH II. It can’t be THAT complicated?

    Can it? • Despite the fast result: already some issues C) A premade job • What is a job? • The Source-Transformation-Sink pipeline • Some Java(/…) code • written with Flink libraries • from the framework => Solution: embrace the complexity
  46. WHAT? 1. The Problem I. Streaming is complicated II. Flink

    is a complex III. Data & real-time are abstract 2. The Approach I. It can’t be THAT complicated? II. Can it? 3.The Solution (attempt) I. Layman's abstraction II. Flink in two nutshells
  47. 54 3. THE SOLUTION I. Layman’s abstraction • Simple approach

    might not be simple enough • For a smooth jumpstart into Flink
  48. 55 3. THE SOLUTION I. Layman’s abstraction • Simple approach

    might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex
  49. 56 3. THE SOLUTION I. Layman’s abstraction • Simple approach

    might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅
  50. 57 3. THE SOLUTION I. Layman’s abstraction • Simple approach

    might not be simple enough • For a smooth jumpstart into Flink • Issues-summary: A) Evolving ecosystem with many versions B) Some cluster know-how needed C) Data processing and it’s integration into the cluster still complex • A) -> easy solution: use the stable versions ✅ • B) & C) -> I suggest a gross oversimplification
  51. 59 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and distributed processing engine for stateful computations over unbounded and […]”
  52. 60 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is…
  53. 61 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells
  54. 62 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: • A toolbox for coding • With building blocks • To define WHAT will be happening to data • The Framework/Libraries/APIs/Job
  55. 63 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 2: • A bigger toolbox • With commands and a UI • To define HOW data is handled • Mostly automatically • The Engine resp. The Cluster
  56. 64 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: The Framework • Nutshell 2: The Engine
  57. 65 3. THE SOLUTION II. Flink in two nutshells •

    So, what is Flink if not “[…] a framework and […]” • In a nutshell, Flink is 2 nutshells • Nutshell 1: The Framework • Nutshell 2: The Engine • Simple principle: • Build it • Package it • Send it • Let it run
  58. 66 3. THE SOLUTION II. Flink in two nutshells BUT

    DON’T FORGET! • The devil is in the detail
  59. 67 3. THE SOLUTION II. Flink in two nutshells BUT

    DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client
  60. 68 3. THE SOLUTION II. Flink in two nutshells BUT

    DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack
  61. 69 3. THE SOLUTION II. Flink in two nutshells BUT

    DON’T FORGET! • The devil is in the detail • Every nutshell has many nested ones • DataStream API, Table API, Connectors • REST API, Watermarks, Checkpoints • Jobmanager, Taskmanager, Client • Often hidden & hard to crack • Typical squirrel
  62. THANK YOU! POSEDIO GMBH [email protected] Weyringergasse 1-3, 1040 Wien, Millenium

    Park 4, 6980 Lustenau www.posedio.com • References: • https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/learn- flink/overview/ • https://nightlies.apache.org/flink/flink-docs-release- 1.18/docs/concepts/overview/ • Connectors DS-API: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/datastream/overview/ • Formats DS-API: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/datastream/formats/overview/ • Connectors TableAPI: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/table/overview/ • Formats TableAPI: https://nightlies.apache.org/flink/flink-docs- release-1.19/docs/connectors/table/formats/overview/ • First steps: https://nightlies.apache.org/flink/flink-docs- master/docs/try-flink/local_installation/ • https://flink.apache.org/what-is-flink/roadmap/ • https://medium.com/big-data-processing/twitter-streaming-using- flink-d19504b676a5