Please read the blog post (https://medium.com/data-engineer-things/5-minute-practical-streaming-techniques-that-can-save-you-millions-6d6b49400308) companies this deck.
Companies are looking for ways to reduce streaming infrastructure costs in the current macroeconomic environment. However, this is a difficult task for two reasons. First, cutting costs without sacrificing latency or correctness requires a deep knowledge of engine implementation details and a keen eye to identify opportunities. Second, optimization techniques are less accessible when working with high-level language abstraction such as SQL, as these techniques are often coupled with engine query planning, which requires even deeper expertise. Many Data Engineers and Data Scientists prefer to avoid dealing with Intermediate Representations (IR) and optimization rules. They also may not care too deeply about the details of applying streaming watermarks to reduce the runtime complexity for Point-In-Time-Correct join queries.
In this talk, I will share some simple optimization techniques you can apply in just a few minutes with streaming SQL that can cut costs by 10x or even 100x. Then, we’ll gradually dive deeper into some novel optimization techniques that can be applied across your distributed storage and compute stacks.
By the end of this talk, if you are a Data Engineer or a Data Scientist who is looking to build real-time streaming workloads but has concerns about cost, I hope you’ll be able to walk away with some tricks so you can check that box on your product ROI OKR :) If you are a platform engineer, I hope you will learn how to apply optimization abstractions across various computing and storage engines in your platform.