continuous, never-ending data f low with no beginning or end. The data is incrementally made available over time, enabling you to act upon it without needing to be downloaded f irst.
of a series of data points ordered in time. Each data point represents an “event” or a change in the state of the business. T4 T3 T2 T1 T0 Event source Event stream Time Events
with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX Fact Value User ID 1234 Item ID 567 Price Paid $3.99 Date 2022/06/12 Place Austin, TX • Events represents facts. • Events are immutable. • Events belong to the past.
in the forms of Complex Event Processing (CEP) and Event Stream Processing (ESP). • Most of the work has been academic. But few vendors like Progress Apama, Esper, Tibco, and Streambase tried bringing it to the mass market.
performance engineering. • Limited only to the JVM: Non- JVM developers had no option rather than adapting. • Higher footprint on infrastructure: Stream processors tax heavily on the CPU and RAM. • Maintenance overhead: Having to maintain both speed and batch layers.
store them durably until they are processed. • Based on an immutable, distributed log f ile. Events are appended to the log and partitioned across multiple servers for durability and scalability. EVENT PRODUCERS Streaming Data Platform TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC
provide a better user experience. • Support a throughput of hundreds of thousands of queries per second to serve an Internet-scale user base. • Ensure data freshness — serve analytics from data ingested a few seconds ago. • Run complex OLAP queries, supporting joins, aggregations, and f iltering on large data sets.
Storage • Back f illing • Hydrating new applications • Experimentation (ad-hoc querying) • Archival/regulatory compliance • Training ML models O ff line Use Cases
the stateful stream processing to the next level. SaaS o ff errings Integrated serving layer Write logic with SQL Pluggable integrations A ff ordable Developer friendly Pay-as-you-go Less components to manage Integrated tooling Caters to non-JVM developers Self-serve
warehouse, data lake, and an event streaming platform together. High-throughput streaming ingestion Change Data Capture Upserts Transactions Table formats
critical path, and iterate. • Pick components based on the need and know their limitations. • Experiment, fail fast, and fail cheap. • Go for managed services, if the team is small and new to streaming technologies. • Learn from mistakes, establish processes, and share wisdom!!