The Top 5 Mistakes Deploying Apache Flink Webinar [email protected] @rmetzger_ [email protected] @esammer Robert Metzger Decodable Eric Sammer Decodable
#1 Mistake: Serialization is expensive - Mistake: People use Java Maps, Sets etc. to store state or do network transfers - Serialization happens when - transferring data over the network (between TaskManagers or from/to Sources/Sinks) - accessing state in RocksDB (even in-memory) - Sending data between non-chained tasks locally - Serialization costs a lot of CPU cycles
#1 Mistake: Serialization is expensive Example: public record OptimizedLocation (int startLon, int startLat, int endLon, int endLat) {} DataStream< OptimizedLocation > s2 = ... 11 16 bytes → 7.5x reduction in data Fewer object allocations = less CPU cycles Disclaimer: The actual binary representation used by Kryo might differ, this is for demonstration purposes only 22 88 99 Further reading: “Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can” https://flink.apache.org/news/2020/04/15/flink-ser ialization-tuning-vol-1.html
#2 Mistake: Flink doesn’t always need to be distributed - Flink’s MiniCluster allows you to spin up a full-fledged Flink cluster with everything known from distributed clusters (Rocksdb, checkpointing, the web UI, SQL, …) var clusterConfig = new MiniClusterConfiguration.Builder() .setNumTaskManagers( 1) .setNumSlotsPerTaskManager( 1) .build(); var cluster = new MiniCluster(clusterConfig); cluster.start(); var clusterAddress = cluster.getRestAddress().get(); var env = new RemoteStreamEnvironment(clusterAddress.getHost(), clusterAddress.getPort());
#2 Mistake: Flink doesn’t always need to be distributed - Use-cases - Local debugging and performance profiling: Step through the code as it executes, sample most frequently used code paths - Testing: make sure your Flink jobs work in end to end tests (together with Kafka’s MiniCluster, minio as an S3 replacement). Check out https://www.testcontainers.org/ - Processing small streams efficiently
… unless you have a good reason to do something else. - Flink’s deployment options might seem confusing. Here’s a simple framework to think about it: - Flink has 3 execution modes - Session mode - Per-job mode - Application Mode (preferred) - Flink has 2 deployment models - Integrated (active): Native K8s, YARN, (Mesos) - Flink requests resources from the resource manager as needed - Standalone (passive): well suited for K8s, bare metal, local deployment, DIY - Resources are provided to Flink from the outside world #3 Advice: Deploy one job per cluster, use standalone mode
#3 Execution Modes JobManager Job1 Job2 Job3 Session Mode Multiple Jobs share a JobManager JobManager Job1 Application Mode One Job per JobManager, planned on the JobManager JobManager Job1 Per-Job Mode One Job per JobManager, planned outside the JobManager Recommended as default
#4 Mistake: Inappropriate Cluster sizing - Mistake: Under or over-provisioning of clusters for a given workload - Understand the amount of data you have incoming and outgoing - How much network bandwidth do you have? How much throughput does your Kafka have? - Understand the amount of state you’ll need in Flink - Which state backend do you use? - How much memory / disk space do you have (per instance, in your cluster) available? - How fast is your connection to your state backup (e.g. S3)? This will give you a baseline for the checkpointing times
Solution: Proper cluster sizing - Do a back of the napkin calculation of your use-case in your environment - … assuming normal operation (“baseline”). Include a buffer for spiky loads (failure recovery, …)
Excursion: State & Checkpointing How much state are we checkpointing? per machine: 40 bytes * 5 windows * 100,000,000 keys = 20 GB We checkpoint every minute, so: 20 GB / 60 seconds = 333 MB/s How is the Window operator accessing state on disk? For each key-value access, we need to retrieve 40 bytes from disk, update the aggregates and put 40 bytes back per machine: 40 bytes * 5 windows * 200,000 msg/sec = 40 MB/s
Cluster sizing: Conclusion - This was just a “back of the napkin” approximation! Real world results will differ! - Ignored network factors - Protocol overheads (Ethernet, IP, TCP, …) - RPC (Flink‘s own RPC, Kafka, checkpoint store) - Checkpointing causes network bursts - A window emission causes bursts - Other systems using the network - CPU, memory, disk access speed have not been considered
#5 Advice: Ask for Help! - Most problems have been solved already online - Official, old-school way: [email protected] mailing list - Indexed by Google, searchable through https://lists.apache.org/ - Stack Overflow: the apache-flink tag has 6300 questions! - Apache Flink Slack instance - Global meetup communities, Flink Forward (w/ training)
Get Started with Decodable ● Visit http://decodable.co ● Start Free http://app.decodable.co ● Read the docs http://docs.decodable.co ● Watch demos on our YouTube Channel ● Join our community Slack channel ● Join us for future Demo Days and Webinars!