$30 off During Our Annual Pro Sale. View Details »

Navigating the Streaming Landscape

Navigating the Streaming Landscape

Delivered at www.devday.lk virtual conference on 10th November 2022.

Dunith Dhanushka

November 10, 2022
Tweet

More Decks by Dunith Dhanushka

Other Decks in Programming

Transcript

  1. None
  2. Navigating the Streaming and Real-time Analytics Landscape Dunith Dhanushka -

    10/11/2022 Unbundling the Modern Streaming Stack #DevDay22
  3. About Me twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/ • Senior Developer Advocate at

    Redpanda Data • Big data solution architect -> DevRel • Blogs at eventdrivenutopia.com
  4. None
  5. Goal of the Talk What Are We Going To Talk

    About Today? Introduce you to the tools and technologies required to build real-time applications that harness value from streaming data
  6. The Plan The Order of Things 1. A refresher on

    streaming data 2. The classic streaming stack 3. The modern streaming stack 4. Current trends and the future outlook
  7. What is a Streaming Stack?

  8. Streaming Data What Is a stream? A stream is a

    continuous, never-ending data f low with no beginning or end. The data is incrementally made available over time, enabling you to act upon it without needing to be downloaded f irst.
  9. Events Streams are made of events A data stream consists

    of a series of data points ordered in time. Each data point represents an “event” or a change in the state of the business. T4 T3 T2 T1 T0 Event source Event stream Time Events
  10. Event First Thinking Modelling State Changes in Systems A user

    with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX Fact Value User ID 1234 Item ID 567 Price Paid $3.99 Date 2022/06/12 Place Austin, TX • Events represents facts. • Events are immutable. • Events belong to the past.
  11. Making Sense of Streaming Data

  12. Events Have A Shelf Life Act Fast Before You Lose

    Their Value Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
  13. Real-time Analytics Extracting Value From Events As Soon as They

    Are Made Available REAL-TIME ANALYTICS Insights React Streams of Events
  14. What is a Streaming Stack?

  15. A streaming stack is the processes, tools, and technologies you

    use to derive insights from unbounded data.
  16. The Classic Streaming Stack

  17. The Beginning • Real-time analytics dates back to decades, existed

    in the forms of Complex Event Processing (CEP) and Event Stream Processing (ESP). • Most of the work has been academic. But few vendors like Progress Apama, Esper, Tibco, and Streambase tried bringing it to the mass market.
  18. Then Came Big Data…

  19. None
  20. Lambda Architecture Promotes A Uni f ied Serving Layer Image

    credit - https://www.databricks.com/glossary/lambda-architecture
  21. Why Didn’t It Pick Up?

  22. None
  23. • Overly complicated technology: Specialised skillset of distributed systems and

    performance engineering. • Limited only to the JVM: Non- JVM developers had no option rather than adapting. • Higher footprint on infrastructure: Stream processors tax heavily on the CPU and RAM. • Maintenance overhead: Having to maintain both speed and batch layers.
  24. The Modern Streaming Stack

  25. Modern Streaming Stack Modern Cloud-native tools Managed and Serverless platforms

    Rich tooling and developer experience Expressive programming model
  26. MSS is the classic streaming stack reimagined with 
 self-service

    cloud-native tools 
 providing a simpli f ied yet powerful developer experience 
 to build real-time analytics applications.
  27. Modern Streaming Stack STREAMING DATA PLATFORM STREAM PROCESSING EVENT PRODUCERS

    TIERED STORAGE DATA API, METADATA & GOVERNANCE Data-driven Applications Operational Systems Real-time 
 Analytics SERVING LAYER
  28. The Unbundling

  29. Event Production/Enablement The Origins of Events STREAMING DATA PLATFORM Language

    Speci f ic SDK Clients
  30. Streaming Data Platform

  31. • Ingest events from sources in a scalable manner, and

    store them durably until they are processed. • Based on an immutable, distributed log f ile. Events are appended to the log and partitioned across multiple servers for durability and scalability. EVENT PRODUCERS Streaming Data Platform TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC
  32. Technology Choices

  33. Stream Processors

  34. STREAM PROCESSING Event-driven Microservices Streaming ETL • Stream joins for

    enrichment • Filtering/routing/transforming streams • Data integration • Repartitioning streams (re-keying) Streaming Analytics • Stateful aggregations • Window operations • Materialising streams, stream-table duality • Actors • Reactive logic execution • Event-by-event processing, triggering side e ff ects
  35. Technology Choices

  36. Serving Layer

  37. INPUT TOPIC OUTPUT TOPIC Event Streaming Platform STREAM PROCESSING Serving

    Layer Events Streaming ingestion Real-time Insights Consumption Internal/user-facing Analytics Data Applications Recommendation Ad-hoc Exploration
  38. Serving Layer Expectations • Serve queries with sub-second latency to

    provide a better user experience. • Support a throughput of hundreds of thousands of queries per second to serve an Internet-scale user base. • Ensure data freshness — serve analytics from data ingested a few seconds ago. • Run complex OLAP queries, supporting joins, aggregations, and f iltering on large data sets.
  39. Serving Layer Technology Choices Key-value stores, NoSQL databases Real-time OLAP

    Databases
  40. Tiered Storage

  41. Serving Layer STREAMING DATA PLATFORM New Events Older Events Tiered

    Storage • Back f illing • Hydrating new applications • Experimentation (ad-hoc querying) • Archival/regulatory compliance • Training ML models O ff line Use Cases
  42. Data APIs, Metadata, and Governance

  43. Analytics must be democratised and accessible across the board… Image

    credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con f luent.io/blog/how-to-build-a-data-mesh-using- event-streams/
  44. Event Mesh EVENT CATALOG SCHEMA REGISTRY STREAMING API GRAPHQL API

    Serving Layer STREAM PROCESSOR EVENT STREAMING PLATFORM Decision makers Data applications Regulatory bodies Business partners Real-time Insights
  45. Technology Choices Standards Schema Registries

  46. Observations & Future Outlook

  47. Convergence of Stream Processing and Serving Layer Streaming databases takes

    the stateful stream processing to the next level. SaaS o ff errings Integrated serving layer Write logic with SQL Pluggable integrations A ff ordable Developer friendly Pay-as-you-go Less components to manage Integrated tooling Caters to non-JVM developers Self-serve
  48. Rise of The Lakehouse Architecture A Lakehouse combines a data

    warehouse, data lake, and an event streaming platform together. High-throughput streaming ingestion Change Data Capture Upserts Transactions Table formats
  49. Takeaways

  50. Takeaways There’s No Silver Bullet • Start small, build the

    critical path, and iterate. • Pick components based on the need and know their limitations. • Experiment, fail fast, and fail cheap. • Go for managed services, if the team is small and new to streaming technologies. • Learn from mistakes, establish processes, and share wisdom!!
  51. Thank You