Apache Flink has been designed for, and is mostly used with large-scale real-time data processing use-cases. Companies report about TBs of data being processed per second, or TBs of state in huge clusters.
But what if you need to process low-throughput streams? Running a full, distributed Flink cluster might be an overkill, as there’s quite a bit of overhead for distributed coordination.
In this talk, we’ll explore options to reduce your resource footprint. We’ll dive deeper into Flink’s MiniCluster, allowing you to run Flink in-JVM for integration tests, as a micro service or just a small processor for your data in Kubernetes. We will also discuss lessons learned from running MiniCluster in production for a service offering Flink SQL in the cloud.