$30 off During Our Annual Pro Sale. View Details »

Real-Time Data Flow: Orchestrating Events with ...

Real-Time Data Flow: Orchestrating Events with Pub/Sub and Kafka (By: Salman Yousaf) - DevFest Lahore 2025

Talk by Salman Yousaf (https://www.linkedin.com/in/salman-y/) at DevFest Lahore 2025 by GDG Lahore.

Avatar for GDG Lahore

GDG Lahore PRO

December 20, 2025
Tweet

More Decks by GDG Lahore

Other Decks in Programming

Transcript

  1. About Me: Salman Yousaf • New York City, USA •

    Education: ◦ Lahore University of Management Sciences (LUMS), BSc. CS ◦ University of Illinois at Urbana Champaign (UIUC), Masters in CS ◦ University of Chicago, Masters in Applied Data Science • Work: ◦ Google ▪ Real-Time Data Flow • Kafka + Pub/Sub ▪ Data Processing Pipelines • Apache Beam ◦ Argonne National Laboratory ▪ Reinforcement Learning
  2. The Problem: The "Spaghetti" Architecture The State of Microservices: •

    Modern applications are rarely monolithic; they are distributed collections of services. • The Default approach: Service A calls Service B directly (Synchronous HTTP/REST or gRPC). Pain Points: • Tight Coupling: API changes (e.g., Inventory Service) break dependent services (e.g., Order Service). • Cascading Failures: Slow or down downstream services crash the upstream chain. • Traffic Spikes: Sudden user surges overwhelm the database due to a lack of buffer.
  3. Asynchronous Messaging - Key Benefits • Decoupling: Producers and Consumers

    operate independently. They interact only with the interface (Topic), removing direct dependencies and "spaghetti code." • Load Leveling (Buffering): Acts as a shock absorber. If traffic spikes 10x, the broker buffers the data, protecting downstream databases from being overwhelmed. • Elastic Scalability: innovative parallel processing. You can handle massive throughput by horizontally scaling your consumer services; the broker balances the work automatically. • Reliability & Persistence: "Fire and forget." The system guarantees delivery and creates a durable record of events, ensuring data isn't lost even if the receiver is offline. • Extensibility: The "Fan-Out" pattern allows you to add new downstream services (like analytics or logging) to existing data streams without refactoring upstream applications.
  4. Question • What happens if the consumer (receiver) crashes? Is

    our unreceived data lost? ◦ No, data is not lost. ◦ The broker persists the message (stores it safely) until the receiver comes back online and acknowledges that it has processed it. ◦ Your data "waits" in the queue rather than disappearing.
  5. Question • "Can multiple different applications listen to the same

    message?" ◦ Yes! This is the "Fan-Out" pattern. ◦ For example, you can send a "User Signed Up" message once, and have three completely different services (e.g. Emailer, Analytics Engine, and CRM Sync) all subscribe to it. ◦ They will each get their own copy of the message to process independently.
  6. Google Cloud Pub/Sub Overview: • Google’s cloud-native, fully managed messaging

    service. • "Serverless" Philosophy: No provisioning of instances or clusters. You just create a topic and go. Key Features: • Global Availability: A single topic can be accessed from anywhere in the world without complex region replication configuration. • Auto-scaling: Automatically handles traffic from zero to millions of messages per second. • Delivery Modes: ◦ Pull: Consumer requests messages. ◦ Push: Pub/Sub initiates requests to your endpoint (e.g., a Cloud Function or Cloud Run container). • Message State: Tracks individual message acknowledgments (per-message state).
  7. Apache Kafka Overview: • The open-source industry standard for event

    streaming. • Designed originally by LinkedIn for massive throughput. Google Cloud offers its own Managed Kafka Service Architecture (The Log): • Distributed Commit Log: Messages are appended to the end of a log file. • Partitions: Topics are split into partitions to allow parallel processing. • Ordering: Strict ordering is guaranteed within a partition. Key Mechanics: • Retention: Messages are kept for a set time (e.g., 7 days) regardless of whether they have been read. • Offsets: Consumers track their own position (offset) in the log. This allows them to "rewind" and "replay" data.
  8. Choosing the right tool - Pub/Sub vs. Kafka The Core

    Trade-off: • Pub/Sub: Maximizes Operational Simplicity. (Serverless, "it just works"). • Managed Kafka: Maximizes Portability & Control. (Open-standard, fine-tuned performance).
  9. Choosing between Pub/Sub and Kafka Choose Pub/Sub If: You want

    "Zero-Ops": You don't want to manage clusters, partitions, or sizing. Variable Traffic: Your workload is unpredictable or "spiky" (Pub/Sub scales from 0 to millions instantly). Global Ingestion: You need to aggregate data from multiple regions effortlessly. Per-Message Processing: You need to track every single message independently (with built-in Dead Letter Queues). Choose Kafka if: Lift & Shift: You have existing Kafka applications and want to move to the cloud without rewriting code. Strict Ordering & Replay: You need strict ordering at high throughput or indefinite data retention (Event Sourcing). Low Latency: You require end-to-end latency in the ~10ms range (vs. ~100ms for Pub/Sub). Multi-Cloud: You need to run the exact same messaging stack on AWS, Azure, and On-Prem.
  10. Question • "Will messages always arrive in the exact order

    I sent them?" ◦ Answer: It depends on the tool. ◦ Kafka: Yes, it guarantees order (within a partition). This is essential for things like banking ledgers. ◦ Pub/Sub: Generally No. To achieve infinite scaling, it might deliver Message B before Message A. If strict order matters for you, you must use specific "ordering keys" (which comes with some limitations).