Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unbundling the Modern Streaming Stack - Dunith Dhanushak @ Current 2022

Unbundling the Modern Streaming Stack - Dunith Dhanushak @ Current 2022

Dunith Dhanushka

October 06, 2022
Tweet

More Decks by Dunith Dhanushka

Other Decks in Technology

Transcript

  1. Background • This talk is based on my blog that

    I published in April, 2022. • This talk has been updated with a few new things since then. • Enjoy!
  2. Goal of the Talk What Are We Going To Talk

    About Today? Introduce you to the things required to build real-time applications that harness value from streaming data
  3. The Plan The Order of Things 1. A refresher on

    streaming data 2. The classic streaming stack 3. The modern streaming stack 4. Current trends and the future outlook
  4. Streaming Data What Is a stream? A stream is a

    continuous, never-ending data f low with no beginning or end. The data is incrementally made available over time, enabling you to act upon it without needing to be downloaded f irst.
  5. Events Streams are made of events A data stream consists

    of a series of data points ordered in time. Each data point represents an “event” or a change in the state of the business. T4 T3 T2 T1 T0 Event source Event stream Time Events
  6. Event First Thinking Modelling State Changes in Systems A user

    with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX Fact Value User ID 1234 Item ID 567 Price Paid $3.99 Date 2022/06/12 Place Austin, TX • Events represents facts. • Events are immutable. • Events belong to the past.
  7. Events Have A Shelf Life Act Fast Before You Lose

    Their Value Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
  8. Real-time Analytics Extracting Value From Events As Soon as They

    Are Made Available REAL-TIME ANALYTICS Insights React Streams of Events
  9. A streaming stack is the processes, tools, and technologies you

    use to derive insights from unbounded data.
  10. The Beginning • Real-time analytics dates back to decades, existed

    in the forms of Complex Event Processing (CEP) and Event Stream Processing (ESP). • Most of the work has been academic. But few vendors like Progress Apama, Esper, Tibco, and Streambase tried bringing it to the mass market.
  11. Lambda Architecture Promotes A Uni f ied Serving Layer Image

    credit - https://www.databricks.com/glossary/lambda-architecture
  12. • Overly complicated technology: Specialised skillset of distributed systems and

    performance engineering. • Limited only to the JVM: Non- JVM developers had no option rather than adapting. • Higher footprint on infrastructure: Stream processors tax heavily on the CPU and RAM. • Maintenance overhead: Having to maintain both speed and batch layers.
  13. Modern Streaming Stack Modern Cloud-native tools Managed and Serverless platforms

    Rich tooling and developer experience Expressive programming model
  14. MSS is the classic streaming stack reimagined with 
 self-service

    cloud-native tools 
 providing a simpli f ied yet powerful developer experience 
 to build real-time analytics applications.
  15. Modern Streaming Stack STREAMING DATA PLATFORM STREAM PROCESSING EVENT PRODUCERS

    TIERED STORAGE DATA API, METADATA & GOVERNANCE Data-driven Applications Operational Systems Real-time 
 Analytics SERVING LAYER
  16. • Ingest events from sources in a scalable manner, and

    store them durably until they are processed. • Based on an immutable, distributed log f ile. Events are appended to the log and partitioned across multiple servers for durability and scalability. EVENT PRODUCERS Streaming Data Platform TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC
  17. STREAM PROCESSING Event-driven Microservices Streaming ETL • Stream joins for

    enrichment • Filtering/routing/transforming streams • Data integration • Repartitioning streams (re-keying) Streaming Analytics • Stateful aggregations • Window operations • Materialising streams, stream-table duality • Actors • Reactive logic execution • Event-by-event processing, triggering side e ff ects
  18. INPUT TOPIC OUTPUT TOPIC Event Streaming Platform STREAM PROCESSING Serving

    Layer Events Streaming ingestion Real-time Insights Consumption Internal/user-facing Analytics Data Applications Recommendation Ad-hoc Exploration
  19. Serving Layer Expectations • Serve queries with sub-second latency to

    provide a better user experience. • Support a throughput of hundreds of thousands of queries per second to serve an Internet-scale user base. • Ensure data freshness — serve analytics from data ingested a few seconds ago. • Run complex OLAP queries, supporting joins, aggregations, and f iltering on large data sets.
  20. Serving Layer STREAMING DATA PLATFORM New Events Older Events Tiered

    Storage • Back f illing • Hydrating new applications • Experimentation (ad-hoc querying) • Archival/regulatory compliance • Training ML models O ff line Use Cases
  21. Analytics must be democratised and accessible across the board… Image

    credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con f luent.io/blog/how-to-build-a-data-mesh-using- event-streams/
  22. Event Mesh EVENT CATALOG SCHEMA REGISTRY STREAMING API GRAPHQL API

    Serving Layer STREAM PROCESSOR EVENT STREAMING PLATFORM Decision makers Data applications Regulatory bodies Business partners Real-time Insights
  23. Convergence of Stream Processing and Serving Layer Streaming databases takes

    the stateful stream processing to the next level. SaaS o ff errings Integrated serving layer Write logic with SQL Pluggable integrations A ff ordable Developer friendly Pay-as-you-go Less components to manage Integrated tooling Caters to non-JVM developers Self-serve
  24. Rise of The Lakehouse Architecture A Lakehouse combines a data

    warehouse, data lake, and an event streaming platform together. High-throughput streaming ingestion Change Data Capture Upserts Transactions Table formats
  25. Takeaways There’s No Silver Bullet • Start small, build the

    critical path, and iterate. • Pick components based on the need and know their limitations. • Experiment, fail fast, and fail cheap. • Go for managed services, if the team is small and new to streaming technologies. • Learn from mistakes, establish processes, and share wisdom!!