Real-Time Data Flow: Orchestrating Events with Pub/Sub and Kafka (By: Salman Yousaf) - DevFest Lahore 2025

Real-Time Data Flow: Orchestrating Events with Pub/Sub and Kafka. Salman
Yousaf

About Me: Salman Yousaf • New York City, USA •
Education: ◦ Lahore University of Management Sciences (LUMS), BSc. CS ◦ University of Illinois at Urbana Champaign (UIUC), Masters in CS ◦ University of Chicago, Masters in Applied Data Science • Work: ◦ Google ▪ Real-Time Data Flow • Kafka + Pub/Sub ▪ Data Processing Pipelines • Apache Beam ◦ Argonne National Laboratory ▪ Reinforcement Learning

The Problem: The "Spaghetti" Architecture The State of Microservices: •
Modern applications are rarely monolithic; they are distributed collections of services. • The Default approach: Service A calls Service B directly (Synchronous HTTP/REST or gRPC). Pain Points: • Tight Coupling: API changes (e.g., Inventory Service) break dependent services (e.g., Order Service). • Cascading Failures: Slow or down downstream services crash the upstream chain. • Traﬃc Spikes: Sudden user surges overwhelm the database due to a lack of buffer.

The Solution: “Asynchronous Messaging”

The Solution: “Asynchronous Messaging” Instead of Service A calling Service
B, Service A sends a message to a Broker.

Asynchronous Messaging - Key Benefits • Decoupling: Producers and Consumers
operate independently. They interact only with the interface (Topic), removing direct dependencies and "spaghetti code." • Load Leveling (Buffering): Acts as a shock absorber. If traffic spikes 10x, the broker buffers the data, protecting downstream databases from being overwhelmed. • Elastic Scalability: innovative parallel processing. You can handle massive throughput by horizontally scaling your consumer services; the broker balances the work automatically. • Reliability & Persistence: "Fire and forget." The system guarantees delivery and creates a durable record of events, ensuring data isn't lost even if the receiver is offline. • Extensibility: The "Fan-Out" pattern allows you to add new downstream services (like analytics or logging) to existing data streams without refactoring upstream applications.

Core Messaging Concepts

Core Concepts

Core Workﬂow

Question • What happens if the consumer (receiver) crashes? Is
our unreceived data lost?

Question • What happens if the consumer (receiver) crashes? Is
our unreceived data lost? ◦ No, data is not lost. ◦ The broker persists the message (stores it safely) until the receiver comes back online and acknowledges that it has processed it. ◦ Your data "waits" in the queue rather than disappearing.

Question • "Can multiple different applications listen to the same
message?" ◦ Yes! This is the "Fan-Out" pattern. ◦ For example, you can send a "User Signed Up" message once, and have three completely different services (e.g. Emailer, Analytics Engine, and CRM Sync) all subscribe to it. ◦ They will each get their own copy of the message to process independently.

Google Cloud Pub/Sub Link to ofﬁcial webpage.

Google Cloud Pub/Sub Overview: • Google’s cloud-native, fully managed messaging
service. • "Serverless" Philosophy: No provisioning of instances or clusters. You just create a topic and go. Key Features: • Global Availability: A single topic can be accessed from anywhere in the world without complex region replication conﬁguration. • Auto-scaling: Automatically handles traﬃc from zero to millions of messages per second. • Delivery Modes: ◦ Pull: Consumer requests messages. ◦ Push: Pub/Sub initiates requests to your endpoint (e.g., a Cloud Function or Cloud Run container). • Message State: Tracks individual message acknowledgments (per-message state).

Apache Kafka

Apache Kafka Overview: • The open-source industry standard for event
streaming. • Designed originally by LinkedIn for massive throughput. Google Cloud offers its own Managed Kafka Service Architecture (The Log): • Distributed Commit Log: Messages are appended to the end of a log file. • Partitions: Topics are split into partitions to allow parallel processing. • Ordering: Strict ordering is guaranteed within a partition. Key Mechanics: • Retention: Messages are kept for a set time (e.g., 7 days) regardless of whether they have been read. • Offsets: Consumers track their own position (offset) in the log. This allows them to "rewind" and "replay" data.

Choosing the right tool - Pub/Sub vs. Kafka The Core
Trade-off: • Pub/Sub: Maximizes Operational Simplicity. (Serverless, "it just works"). • Managed Kafka: Maximizes Portability & Control. (Open-standard, fine-tuned performance).

Choosing between Pub/Sub and Kafka Choose Pub/Sub If: You want
"Zero-Ops": You don't want to manage clusters, partitions, or sizing. Variable Traffic: Your workload is unpredictable or "spiky" (Pub/Sub scales from 0 to millions instantly). Global Ingestion: You need to aggregate data from multiple regions effortlessly. Per-Message Processing: You need to track every single message independently (with built-in Dead Letter Queues). Choose Kafka if: Lift & Shift: You have existing Kafka applications and want to move to the cloud without rewriting code. Strict Ordering & Replay: You need strict ordering at high throughput or indefinite data retention (Event Sourcing). Low Latency: You require end-to-end latency in the ~10ms range (vs. ~100ms for Pub/Sub). Multi-Cloud: You need to run the exact same messaging stack on AWS, Azure, and On-Prem.

Question • "Will messages always arrive in the exact order
I sent them?" ◦ Answer: It depends on the tool. ◦ Kafka: Yes, it guarantees order (within a partition). This is essential for things like banking ledgers. ◦ Pub/Sub: Generally No. To achieve inﬁnite scaling, it might deliver Message B before Message A. If strict order matters for you, you must use speciﬁc "ordering keys" (which comes with some limitations).

Questions?

Real-Time Data Flow: Orchestrating Events with ...

Real-Time Data Flow: Orchestrating Events with Pub/Sub and Kafka (By: Salman Yousaf) - DevFest Lahore 2025

GDG Lahore PRO

More Decks by GDG Lahore

Other Decks in Programming

Featured

Transcript

Real-Time Data Flow: Orchestrating Events with Pub/Sub and Kafka. Salman

About Me: Salman Yousaf • New York City, USA •

The Problem: The "Spaghetti" Architecture The State of Microservices: •

The Solution: “Asynchronous Messaging”

The Solution: “Asynchronous Messaging” Instead of Service A calling Service

Asynchronous Messaging - Key Beneﬁts • Decoupling: Producers and Consumers

Core Messaging Concepts

Core Concepts

Core Concepts

Core Workﬂow

Question • What happens if the consumer (receiver) crashes? Is

Question • What happens if the consumer (receiver) crashes? Is

Question • "Can multiple different applications listen to the same

Google Cloud Pub/Sub Link to ofﬁcial webpage.

Google Cloud Pub/Sub Overview: • Google’s cloud-native, fully managed messaging

Apache Kafka

Apache Kafka Overview: • The open-source industry standard for event

Choosing the right tool - Pub/Sub vs. Kafka The Core

Choosing between Pub/Sub and Kafka Choose Pub/Sub If: You want

Question • "Will messages always arrive in the exact order

Questions?