SF Kafka Summit 2019: Cross the Streams Thanks to Kafka and Flink

Cross the Streams thanks to Kafka and Flink Christophe Philemotte

Hello, I am Christophe CTO, digazu toch @_toch ibakesoftware.com

Crossing the Streams

Crossing the Streams Motivations

10+ years ago: DIY 2000 2010 Stream Processing Papers Stratosphere
Kafka Lambda architecture Kappa architecture

10+ years ago: DIY

10+ years ago: DIY Resource Management? Collection? Distribution? Consistency? Planning
DAG? Distributed Processing? Only Once Semantic? Stateful Continuous Streaming?

Cross the Tables!

Today Today

Ecosystem Integration? Deployment? Resource Scaling? Stateful Operation? Today

Kafka Connect JDBC Source Our goal

Our goal Run Any Query

Agenda • Stack deployment • Flink Job • Outro Agenda

• Sandbox: toch/sf-kafka-summit-2019 • Lessons From Production • Integration How-to
of Kafka and Flink Ecosystems • SQL Streaming How-to Agenda Takeaways

Crossing the Streams Stack deployment

Kafka Connect JDBC Source

Agenda Kafka & Co.

⚠Warning ➔ Stateful & Stateless Pods ➔ Service Dependencies ➔
Storage Type ➔ JVM Heap & Container Memory Agenda Kafka & Co.

⚠Warning ➔ Stateful & Stateless Pods → k8s operator ➔
Service Dependencies ➔ Storage Type ➔ JVM Heap & Container Memory Agenda Kafka & Co.

Service Dependencies → init container ➔ Storage Type ➔ JVM Heap & Container Memory Agenda Kafka & Co.

Service Dependencies → init container ➔ Storage Type → persistent volume storage class ➔ JVM Heap & Container Memory Agenda Kafka & Co.

Service Dependencies → init container ➔ Storage Type → persistent volume storage class ➔ JVM Heap & Container Memory → mem. limits & requests Agenda Kafka & Co.

⚠Warning ➔ JVM Heap & RocksDB Memory & Container Memory
➔ State Backend ➔ HA Setup ➔ Rootless Container with random UID Flink

→ explicit allocation ➔ State Backend ➔ HA Setup ➔ Rootless Container with random UID Flink

→ explicit allocation ➔ State Backend → e.g. HDFS ➔ HA Setup ➔ Rootless Container with random UID Flink

→ explicit allocation ➔ State Backend → e.g. HDFS ➔ HA Setup → e.g. HDFS ➔ Rootless Container with random UID Flink

→ explicit allocation ➔ State Backend → e.g. HDFS ➔ HA Setup → e.g. HDFS ➔ Rootless Container with random UID → Build your own Docker Image Flink

PostgreSQL

⚠Warning ➔ Stateful Pod & replication ➔ Storage Type PostgreSQL

⚠Warning ➔ Stateful Pod & replication → k8s operator ➔
Storage Type PostgreSQL

⚠Warning ➔ Stateful Pod & replication → k8s operator ➔
Storage Type → fast without cache pv storage class PostgreSQL

Seed the data into PostgreSQL

ghosts id: INT name: TEXT 2 Slimer movies id: INT
name: TEXT year: INT 1 Ghostbusters 1984 2 Ghostbusters II 1989 ghosts_in_movies ghost_id: INT movie_id: INT id: INT 2 1 2 2 2 3 Seed the data into PostgreSQL (seed.sql)

Feed the Streams

Feed the Streams Feed the Streams

⚠Warning: ➔ Rebalancing at each Conﬁg Update (ﬁxed in 2.3)
➔ Connector JARs Location Kafka Connect Setup

→ upgrade ➔ Connector JARs Location Kafka Connect Setup

→ upgrade ➔ Connector JARs Location → Deps JARs close to plugin JAR Kafka Connect Setup

Kafka Connect JDBC Source

Main Lessons • Memory Allocation • Rootless container • Stateful
vs Stateless • Throughput vs Latency Main lessons

Crossing the Streams Flink Job

Query Example

ghosts id: INT name: TEXT 2 Slimer movies id: INT
name: TEXT year: INT 1 Ghostbusters 1984 2 Ghostbusters II 1989 ghosts_in_movies ghost_id: INT movie_id: INT id: INT 2 1 2 2 2 3 Query Result Example

ghost_appearances name: TEXT movie: TEXT Slimer Ghostbusters Slimer Ghostbusters II
Query Result Example

Unless You Have No Choice

Main Lessons Unless You Have No Choice

Or …

Main Lessons Or you have Flink

Unit test

1. TableEnvironment 2. TableSource 3. sqlQuery 4. TableSink 5. Execute
Job

Submit Job

• Transparent State Management • Transparent Fault-Tolerance • Undocumented Rich
Test Helpers • Required Topic Main Lessons Main lessons

Crossing the Streams Outro

• JVM & RocksDB Memory in Docker Container • K8s
Stateful Pod and Scaling • Service Dependencies (init container) Main Lessons Operations Pitfalls

• Deployment as Code • Throughput vs Latency Main Lessons
Operations Pitfalls

• Rebalancing • JARs Location • Development Environment docker run
\ --net=host lensesio/fast-data-dev Main Lessons Kafka Connect Pitfalls

Main Lessons Flink Job Pitfalls • Required Kafka Topic

Result

What have we achieved? Kafka Connect JDBC Source Rss Mgt
SQL Query & Fault Tolerance State Mgt

toch @_toch ibakesoftware.com Thanks! toch/sf-kafka-summit-2019

• Global Window for INNER JOIN • ANSI SQL •
Partitioning Freedom Main Lessons Why not KSQL?

• Temporal Table • CEP Main Lessons Why not KSQL?

SF Kafka Summit 2019: Cross the Streams Thanks ...

SF Kafka Summit 2019: Cross the Streams Thanks to Kafka and Flink

More Decks by Christophe Philemotte

Other Decks in Programming

Featured

Transcript