Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Navigating the Streaming Landscape

Navigating the Streaming Landscape

Delivered at www.devday.lk virtual conference on 10th November 2022.

Dunith Dhanushka

November 10, 2022
Tweet

More Decks by Dunith Dhanushka

Other Decks in Programming

Transcript

  1. Navigating the Streaming and Real-time Analytics Landscape Dunith Dhanushka -

    10/11/2022 Unbundling the Modern Streaming Stack #DevDay22
  2. About Me twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/ • Senior Developer Advocate at

    Redpanda Data • Big data solution architect -> DevRel • Blogs at eventdrivenutopia.com
  3. Goal of the Talk What Are We Going To Talk

    About Today? Introduce you to the tools and technologies required to build real-time applications that harness value from streaming data
  4. The Plan The Order of Things 1. A refresher on

    streaming data 2. The classic streaming stack 3. The modern streaming stack 4. Current trends and the future outlook
  5. Streaming Data What Is a stream? A stream is a

    continuous, never-ending data f low with no beginning or end. The data is incrementally made available over time, enabling you to act upon it without needing to be downloaded f irst.
  6. Events Streams are made of events A data stream consists

    of a series of data points ordered in time. Each data point represents an “event” or a change in the state of the business. T4 T3 T2 T1 T0 Event source Event stream Time Events
  7. Event First Thinking Modelling State Changes in Systems A user

    with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX Fact Value User ID 1234 Item ID 567 Price Paid $3.99 Date 2022/06/12 Place Austin, TX • Events represents facts. • Events are immutable. • Events belong to the past.
  8. Events Have A Shelf Life Act Fast Before You Lose

    Their Value Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
  9. Real-time Analytics Extracting Value From Events As Soon as They

    Are Made Available REAL-TIME ANALYTICS Insights React Streams of Events
  10. A streaming stack is the processes, tools, and technologies you

    use to derive insights from unbounded data.
  11. The Beginning • Real-time analytics dates back to decades, existed

    in the forms of Complex Event Processing (CEP) and Event Stream Processing (ESP). • Most of the work has been academic. But few vendors like Progress Apama, Esper, Tibco, and Streambase tried bringing it to the mass market.
  12. Lambda Architecture Promotes A Uni f ied Serving Layer Image

    credit - https://www.databricks.com/glossary/lambda-architecture
  13. • Overly complicated technology: Specialised skillset of distributed systems and

    performance engineering. • Limited only to the JVM: Non- JVM developers had no option rather than adapting. • Higher footprint on infrastructure: Stream processors tax heavily on the CPU and RAM. • Maintenance overhead: Having to maintain both speed and batch layers.
  14. Modern Streaming Stack Modern Cloud-native tools Managed and Serverless platforms

    Rich tooling and developer experience Expressive programming model
  15. MSS is the classic streaming stack reimagined with 
 self-service

    cloud-native tools 
 providing a simpli f ied yet powerful developer experience 
 to build real-time analytics applications.
  16. Modern Streaming Stack STREAMING DATA PLATFORM STREAM PROCESSING EVENT PRODUCERS

    TIERED STORAGE DATA API, METADATA & GOVERNANCE Data-driven Applications Operational Systems Real-time 
 Analytics SERVING LAYER
  17. • Ingest events from sources in a scalable manner, and

    store them durably until they are processed. • Based on an immutable, distributed log f ile. Events are appended to the log and partitioned across multiple servers for durability and scalability. EVENT PRODUCERS Streaming Data Platform TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC
  18. STREAM PROCESSING Event-driven Microservices Streaming ETL • Stream joins for

    enrichment • Filtering/routing/transforming streams • Data integration • Repartitioning streams (re-keying) Streaming Analytics • Stateful aggregations • Window operations • Materialising streams, stream-table duality • Actors • Reactive logic execution • Event-by-event processing, triggering side e ff ects
  19. INPUT TOPIC OUTPUT TOPIC Event Streaming Platform STREAM PROCESSING Serving

    Layer Events Streaming ingestion Real-time Insights Consumption Internal/user-facing Analytics Data Applications Recommendation Ad-hoc Exploration
  20. Serving Layer Expectations • Serve queries with sub-second latency to

    provide a better user experience. • Support a throughput of hundreds of thousands of queries per second to serve an Internet-scale user base. • Ensure data freshness — serve analytics from data ingested a few seconds ago. • Run complex OLAP queries, supporting joins, aggregations, and f iltering on large data sets.
  21. Serving Layer STREAMING DATA PLATFORM New Events Older Events Tiered

    Storage • Back f illing • Hydrating new applications • Experimentation (ad-hoc querying) • Archival/regulatory compliance • Training ML models O ff line Use Cases
  22. Analytics must be democratised and accessible across the board… Image

    credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con f luent.io/blog/how-to-build-a-data-mesh-using- event-streams/
  23. Event Mesh EVENT CATALOG SCHEMA REGISTRY STREAMING API GRAPHQL API

    Serving Layer STREAM PROCESSOR EVENT STREAMING PLATFORM Decision makers Data applications Regulatory bodies Business partners Real-time Insights
  24. Convergence of Stream Processing and Serving Layer Streaming databases takes

    the stateful stream processing to the next level. SaaS o ff errings Integrated serving layer Write logic with SQL Pluggable integrations A ff ordable Developer friendly Pay-as-you-go Less components to manage Integrated tooling Caters to non-JVM developers Self-serve
  25. Rise of The Lakehouse Architecture A Lakehouse combines a data

    warehouse, data lake, and an event streaming platform together. High-throughput streaming ingestion Change Data Capture Upserts Transactions Table formats
  26. Takeaways There’s No Silver Bullet • Start small, build the

    critical path, and iterate. • Pick components based on the need and know their limitations. • Experiment, fail fast, and fail cheap. • Go for managed services, if the team is small and new to streaming technologies. • Learn from mistakes, establish processes, and share wisdom!!