latency, high throughput stream processing • Open source at the Apache Software Foundation, one of the biggest projects there • Wide industry adoption, various hosted services available https://flink.apache.org/poweredby.html
state, time • Events in a dataflow: • Flink takes care of efficient, parallel execution in a cluster Easy to build Applications Declarative APIs SQL Python Joins Aggregations Community & Documentation Data Source Data Aggregation Data Sink partition by key
state, time Easy to build Applications Declarative APIs SQL Python Joins Aggregations Community & Documentation Data Source Data Aggreg ation Data Sink partition by key Src Src Agg Agg Sink
state, time • State: • Flink guarantees that state is always available, by backing it up to cheap storage • Flink guarantees exactly-once semantics for state Easy to build Applications Declarative APIs SQL Python Joins Aggregations Community & Documentation Data Source Current Kafka reader offsets Data Aggregation Current aggregates (e.g. count by key) Data Sink Pending Kafka transaction data partition by key Cheap, durable storage (S3) Checkpointing
Python Joins Aggregations Community & Documentation This slide is copied from “Change Data Capture with Flink SQL and Debezium” a presentation at DataEngBytes by Marta Paes https://noti.st/morsapaes/liQzgs/change-data-capture-with-flink-sql-and-debezium
Joins Aggregations Community & Documentation public class MyFunction extends KeyedProcessFunction<Tuple, String, Tuple2<String, Long>> { /** The state that is maintained by this process function */ private ValueState<CountWithTimestamp> state; @Override public void processElement(String value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception { // set the state's timestamp to the record's assigned event time timestamp current.lastModified = ctx.timestamp(); // write the state back state.update(current); // schedule the next timer 60 seconds from the current event time ctx.timerService().registerEventTimeTimer(current.lastModified + 60000); } @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) throws Exception { // do stuff with time } }
Flink at 300M messages per second (150TB/s) • Examples ◦ State and checkpointing ▪ Scale state beyond memory using build-in RocksDB statebackend ▪ Fast, incremental, asynchronous checkpoints ◦ Network stack (Netty) ▪ Native backpressure support, optimized for both latency and throughput ◦ SQL ▪ Optimized using Apache Calcite, micro-batched aggregations, skew handling, efficient internal data format Low Cost Low Latency High throughput Efficiency In real-time Source: https://www.slideshare.net/FlinkForward/flink-powered-stream-processing-platform-at-pinterest
• Persist in-flight state via Savepoints, then upgrade Flink version, Flink application, investigate/rewrite state • Observability: Latency-tracking, RocksDB metrics, operator/task/JVM-level performance metrics, Flame Graph UI, Backpressure monitoring • Local Debugging/Profiling: run the cluster code from your IDE or Unit tests • High Availability with Zookeeper or K8s etcd