Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle "It runs best on Powerpoint", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead.