Slide 1

Slide 1 text

Tiny Flink: Minimizing the Memory Footprint of Apache Flink Robert Metzger, Staff Engineer @ Decodable Apache Flink Committer and PMC Chair

Slide 2

Slide 2 text

What is Apache Flink? Stateful Computations over Data Streams ● Highly Scalable ● Exactly-once processing semantics ● Event time semantics and watermarks ● Layered APIs: Streaming SQL (easy to use) ↔ DataStream (expressive)

Slide 3

Slide 3 text

Motivation 1: Minimum Flink cluster size (with the K8s operator and Flink config defaults). 650mb + 1024mb = 1.67gb minimum memory of a Flink cluster Motivation 2: Deploy like other JVM-based microservices or apps. No need to use a cluster manager. → unified deployment, monitoring and operations for all services → Not all use-cases process millions of events per second Motivation 3: Local Development & Testing. Flink in your IDE, integration tests and on your CI system. No datacenter required for local development. Why Minimizing Flink?

Slide 4

Slide 4 text

Kubernetes Docker Do it yourself (Cluster) Operator Native Kubernetes Flink manages resources on K8s Standalone Kubernetes Resources are fixed/managed externally Deployment Options for Apache Flink MiniCluster

Slide 5

Slide 5 text

Enter Flink MiniCluster ● Similar utilities ○ KafkaServer: launch a Kafka broker in your JVM ○ MiniDFSCluster: Launch Hadoop Distributed File System in your JVM public static void main(String[] args) throws Exception { MiniClusterConfiguration clusterConfig = new MiniClusterConfiguration.Builder() .setNumTaskManagers(1) .setNumSlotsPerTaskManager(1) .build(); try (var cluster = new MiniCluster(clusterConfig)) { cluster.start(); cluster.submitJob(/*TODO*/); } }

Slide 6

Slide 6 text

Flink MiniCluster: What’s the Size of my JVM? → JVM doesn’t start with -Xmx60m, but it does with -Xmx65m. Exception in thread "main" java.lang.OutOfMemoryError: Could not allocate enough memory segments for NetworkBufferPool (required (MB): 64, allocated (MB): 61, missing (MB): 3). Cause: Direct buffer memory. The direct out-of-memory error has occurred. This can mean two things: either job(s) require(s) a larger size of JVM direct memory or there is a direct memory leak. The direct memory can be allocated by user code or some of its dependencies. In this case 'taskmanager.memory.task.off-heap.size' configuration option should be increased. Flink framework and its dependencies also consume the direct memory, mostly for network communication. The most of network memory is managed by Flink and should not result in out-of-memory error. In certain special cases, in particular for jobs with high parallelism, the framework may require more direct memory which is not managed by Flink. In this case 'taskmanager.memory.framework.off-heap.size' configuration option should be increased. If the error persists then there is probably a direct memory leak in user code or some of its dependencies which has to be investigated and fixed. The task executor has to be shutdown... at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.(NetworkBufferPool.java:149) at org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createNettyShuffleEnvironment(NettyShuffleServiceFactory.java:173) at org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createNettyShuffleEnvironment(NettyShuffleServiceFactory.java:128) at org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createNettyShuffleEnvironment(NettyShuffleServiceFactory.java:97) at org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createShuffleEnvironment(NettyShuffleServiceFactory.java:78) at org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createShuffleEnvironment(NettyShuffleServiceFactory.java:57) at org.apache.flink.runtime.taskexecutor.TaskManagerServices.createShuffleEnvironment(TaskManagerServices.java:446) at org.apache.flink.runtime.taskexecutor.TaskManagerServices.fromConfiguration(TaskManagerServices.java:304) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:631) at org.apache.flink.runtime.minicluster.MiniCluster.startTaskManager(MiniCluster.java:755) at org.apache.flink.runtime.minicluster.MiniCluster.startTaskManagers(MiniCluster.java:736) at org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:457) at co.decodable.PassthroughKafka.initMiniClusterWithEnv(PassthroughKafka.java:62) at co.decodable.PassthroughKafka.main(PassthroughKafka.java:44)

Slide 7

Slide 7 text

Flink MiniCluster: Going even smaller → Optimizing configuration: Reduce network buffer size from 64mb to 8mb allows setting -Xmx20m. public static void main(String[] args) throws Exception { var flinkConfig = new Configuration(); flinkConfig.set(TaskManagerOptions.NETWORK_MEMORY_MIN, MemorySize.parse("8m")); flinkConfig.set(TaskManagerOptions.NETWORK_MEMORY_MAX, MemorySize.parse("8m")); MiniClusterConfiguration clusterConfig = new MiniClusterConfiguration .Builder() .setNumTaskManagers( 1) .setNumSlotsPerTaskManager( 1) .setConfiguration(flinkConfig) .build();

Slide 8

Slide 8 text

MiniCluster: What have we learned so far? → 20mb heap space are sufficient to run an empty Flink cluster. Final process size is ~80mb. Open items: ● Heap size, process size with a job running ● Throughput ● Heap and process size discrepancy

Slide 9

Slide 9 text

Flink MiniCluster: Making it real → Run a small Flink job reading from Kafka, filtering 1% of the data, writing to Kafka Empty MiniCluster MiniCluster with Kafka source and sink Configured Heap -Xmx20m -Xmx35m Approx Process Size 80mb 190mb ⚠ Caveats ● Checkpointing is not enabled ● We are using a Flink job with minimal state, no RocksDB

Slide 10

Slide 10 text

Benchmarking the Throughput: Setup Hardware: 10-core Apple M1 Max, 32GB memory, 1TB SSD Versions: Flink 1.17.0, Kafka 3.2.3

Slide 11

Slide 11 text

Benchmarking the Throughput: Results Heap limit Throughput Real memory 35 mb Out of Memory under light load (3mb/s) 50 mb 25 mb/s 200 mb 100 mb 97 mb/s 260 mb

Slide 12

Slide 12 text

6mb/s 12mb/s 25 mb/s 36 mb/s Benchmarks in Detail: -Xmx50m Heap Limit

Slide 13

Slide 13 text

Benchmarks in Detail: -Xmx100m Heap Limit ~100 mb/s

Slide 14

Slide 14 text

Heap vs Native Memory Heap limit Throughput Real memory 35 mb Out of Memory under light load (3mb/s) 50 mb 25 mb/s 200 mb 100 mb 97 mb/s 260 mb Enabling JVM native memory tracking and getting a report [1]: ● JVM argument: -XX:NativeMemoryTracking=summary ● jcmd VM.native_memory baseline ● jcmd VM.native_memory summary.diff [1] https://docs.oracle.com/en/java/javase/17/vm/native-memory-tracking.html

Slide 15

Slide 15 text

Heap vs Native Nemory Heap (106mb) Thread (217mb) Metaspace (58mb) GC (56mb)

Slide 16

Slide 16 text

Heap vs Native Nemory ● Reduce thread stack size from default 1mb to 256kb: -Xss256k ○ Higher risk of stack overflow exceptions ○ We have ~100 threads ● GC is using 56mb. Using the more lightweight Serial GC: -XX:+UseSerialGC reduces to 390 bytes (GC structures and fewer threads) ○ This reduces the throughput of Flink Further reading: https://shipilev.net/jvm/anatomy-quarks/12-native-memory-tracking/ Heap (106mb) Thread (27mb) Metaspace (56mb)

Slide 17

Slide 17 text

● 🔬 We can scale Flink down to a process size of ~250mb. ● 🚀 One MiniCluster is still able to process ~100mb/s ● MiniCluster runs the same code as a distributed cluster ○ Migrate from MiniCluster to distributed cluster (restoring latest checkpoint or from a savepoint) ○ Supports HA (e.g. you can kill the process, it will continue where it left off) ○ Supports metrics and logging integrations ○ Supports the Flink Web UI (and REST API) ● All code examples: https://github.com/rmetzger/tiny-flink-talk Conclusions Switch icons created by Gregor Cresnar - Flaticon

Slide 18

Slide 18 text

Tiny Flink: Minimizing the Memory Footprint of Apache Flink Robert Metzger, Staff Engineer @ Decodable Apache Flink Committer and PMC Chair Q&A Follow me on @rmetzger_

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

100 Threads ?! Heavy hitters: ● Flink rest server worker (20) ● Akka (10) ● Common pool (10) ● Io threads (4) ● Restcluster client (4) ●