Slide 1

Slide 1 text

Quix streams / RTASummit 2023 Quix Streams Real-time stream processing in Python 1

Slide 2

Slide 2 text

Quix streams / RTASummit 2023 Hello, nice to meet you! CTO and co-founder at Quix Previously McLaren technical lead 2 Tomas Neubauer

Slide 3

Slide 3 text

Quix streams / RTASummit 2023 Racing background Roots in real-time data processing in the most extreme, time-critical environment ● 50,000 channels per car ● 1.5 kHz per channel ● 1,000s realtime models and simulations 3

Slide 4

Slide 4 text

Quix streams / RTASummit 2023 Now, raise your hand if you already knew about … 4

Slide 5

Slide 5 text

Quix streams / RTASummit 2023 Streaming Now, raise your hand if you already knew about … 5

Slide 6

Slide 6 text

Quix streams / RTASummit 2023 Kafka Now, raise your hand if you already knew about … 6

Slide 7

Slide 7 text

Quix streams / RTASummit 2023 7 phone-data Goal Crash detection Fitness app crashes

Slide 8

Slide 8 text

Quix streams / RTASummit 2023 ● Streaming vs batch ● ML deployment ● Streaming landscape ● How it works ● Demo Let's build it Content

Slide 9

Slide 9 text

Quix streams / RTASummit 2023 Streaming VS Batch 1 9 An overview of data processing approaches

Slide 10

Slide 10 text

Quix streams / RTASummit 2023 10 10 phone-data Crash detection alerts API gateway/websocket ANALYSIS & TRAINING API gateway/websocket Batch trained model Kinesis SageMaker Step Function EMR

Slide 11

Slide 11 text

Quix streams / RTASummit 2023 Streaming 11 11 phone-data Websocket gateway Streaming Crash detection alerts Websocket gateway ANALYSIS & TRAINING trained model SageMaker

Slide 12

Slide 12 text

Quix streams / RTASummit 2023 Data is collected over time into a database. At some point data is loaded from the database to the processing system. ● Operations are easy to compute: all historic data is present ● Results are not on real time Data Processing Batch t Gx Gy Gz t1 0.1 1.0 0.2 t2 0.2 1.1 0.1 t3 0.1 0.9 0.1 tn 0.3 1.0 0 GT 1.3 1.4 1.1 1.3 𝚫G 0.1 -0.3 1.3 GTn-1 >

Slide 13

Slide 13 text

Quix streams / RTASummit 2023 Data is collected over time into a streaming broker (like a Kafka topic). The processing system consumes the data as soon as it’s published in the topic. ● Operations are not always easy to compute: state is used to keep needed historic data ● Real time results Data Processing Streaming t Gx Gy Gz t1 0.1 1.0 0.2 GT 1.3 𝚫G > STATE t GT t1 1.3 t Gx Gy Gz t2 0.2 1.1 0.1 GT 1.4 𝚫G 0.1 t GT t2 1.4 t Gx Gy Gz t3 0.1 0.9 0.1 GT 1.1 𝚫G -0.3 t Gx Gy Gz tn 0.3 1.0 0.1 GT 1.3 𝚫G 1.3 GTn-1 t GT tn-1 GTn-1

Slide 14

Slide 14 text

Quix streams / RTASummit 2023 ML Deployment 2 REST API vs Streaming 14

Slide 15

Slide 15 text

Quix streams / RTASummit 2023 ML Deployment with API API REQUEST 15 REST API API RESPONSE APP gX gY gZ gTotal 0.5 0.3 0.1 0.9 gX gY gZ gTotal Crash 0.5 0.3 0.1 0.9 1

Slide 16

Slide 16 text

Quix streams / RTASummit 2023 Issues with REST APIs 2.1 REST API vs Streaming 16

Slide 17

Slide 17 text

Quix streams / RTASummit 2023 Problems with REST API 17 gX gY gZ gTotal API REQUEST 17 REST API APP - CPU overhead - Introducing delay - Requests gets lost in case of service downtime or slow performance

Slide 18

Slide 18 text

Quix streams / RTASummit 2023 Problems with REST API 18 API REQUEST 18 REST API APP gX gY gZ gTotal

Slide 19

Slide 19 text

Quix streams / RTASummit 2023 Problems with REST API 19 API REQUEST 19 REST API APP gX gY gZ gTotal

Slide 20

Slide 20 text

Quix streams / RTASummit 2023 Problems with REST API 20 gX gY gZ gTotal API REQUEST 20 API APP API APP API REQUEST

Slide 21

Slide 21 text

Quix streams / RTASummit 2023 Stream processing applications 03 21 An overview of stream processing approaches

Slide 22

Slide 22 text

Quix streams / RTASummit 2023 22 Stream processing applications When you building stream processing applications with Kafka, there are two options: 1. Just build an application that uses the Kafka producer and consumer APIs directly 2. Adopt a full-fledged stream processing framework (Flink, Spark streaming, Beam, etc)

Slide 23

Slide 23 text

Quix streams / RTASummit 2023 23 Kafka producer and consumer APIs ● Works for simple stuff like one-message-at-a-time processing ● No external dependencies like JVM ● Gets very complicated when stateful processing is needed like calculation aggregations or joining multiple streams

Slide 24

Slide 24 text

Quix streams / RTASummit 2023 24 Stream processing frameworks ● Fully fledged stream processing frameworks solves stateful, more complex operations ● But it is for a cost of increased complexity in many dimensions: ○ Java dependency ○ Deployment gets difficult because code is not running on its own but in server side cluster (Flink cluster or Spark cluster) ○ Debugging is difficult ○ Performance optimization is difficult ○ Gets even worse when we combine synchronous architecture with asynchronous in one application

Slide 25

Slide 25 text

Quix streams / RTASummit 2023 JAR files….. 25

Slide 26

Slide 26 text

Quix streams / RTASummit 2023 Connecting Flink to Kafka is difficult 26

Slide 27

Slide 27 text

Quix streams / RTASummit 2023 SQL looks easy to use but… 27

Slide 28

Slide 28 text

Quix streams / RTASummit 2023 28 28 IP UDFs are nasty ● Poor development experience ○ Logs only accessible from server, no debugging possible ● Performance hit caused by interface between JVM and Python

Slide 29

Slide 29 text

Quix streams / RTASummit 2023 DEBUGGING!!! 29

Slide 30

Slide 30 text

Quix streams / RTASummit 2023 30 30 Is there a third way? ● Combining Kafka API approach with stream processing library ● Abstraction from key-value messages of Kafka API to virtual tables ● Standalone library that runs: ○ Locally for development and debugging ○ In docker or in Kubernetes for production deployments at scale

Slide 31

Slide 31 text

Quix streams / RTASummit 2023 Stateful processing with Pub&Sub client libraries 31 1. Messages in topic 2. Split messages into individual streams 4. Messages decomposed into rows 5. Memory state updated from incoming rows / series 6. State persistence 3. Message converted to tables 7. State and incoming data is combined to output that is send to output topic Commit offsets

Slide 32

Slide 32 text

Quix streams / RTASummit 2023 Quix streams 32 1. Messages in topic 2. Messages decomposed as rows available via pandas API 3. Messages processed through pipeline defined as pandas operations. Output streamed to output topic. - Automatic state management - Automatic checkpointing - Automatic message serialization/deserialization

Slide 33

Slide 33 text

Quix streams / RTASummit 2023 How it works? 04 Kafka Kubernetes Python 33

Slide 34

Slide 34 text

Quix streams / RTASummit 2023 Our approach to stream processing Containers Containers running in Kubernetes scaling hand to hand with Kafka for compute scalability. Kafka Handle your data reliably and efficiently in memory with Kafka. Using Kafka partitions, replica system and persistence to deliver scalability and robustness. Python Python gives you flexibility. It lets you transform data, not just query it. From simple filtering to ML usecases like video processing. 34

Slide 35

Slide 35 text

Quix streams / RTASummit 2023 APP Processing with streaming SUB 35 gForce X gForce Y gForce Z 0.5 0.3 0.1 INPUT TOPIC OUTPUT TOPIC PUB gForce X gForce Y gForc eZ gForce Total Crash 0.5 0.3 0.1 0.9 1

Slide 36

Slide 36 text

Quix streams / RTASummit 2023 INPUT TOPIC Scale SUB 36 PUB OUTPUT TOPIC gForce X gForce Y gForce Z 0.5 0.3 0.1 gForce X gForce Y gForc eZ gForce Total Crash 0.5 0.3 0.1 0.9 1

Slide 37

Slide 37 text

Quix streams / RTASummit 2023 INPUT TOPIC Fault tolerant SUB 37 PUB OUTPUT TOPIC gForce X gForce Y gForce Z 0.5 0.3 0.1 gForce X gForce Y gForc eZ gForce Total Crash 0.5 0.3 0.1 0.9 1

Slide 38

Slide 38 text

Quix streams / RTASummit 2023 Demo Lets build it. 38

Slide 39

Slide 39 text

Quix streams / RTASummit 2023