Building real-time streaming applications using Apache Kafka

Real time stream processing using Apache Kafka

Agenda • What is Apache Kafka? • Why do we
need stream processing? • Stream processing using Apache Kafka • Kafka @ Hotstar Feel free to stop me for questions 2

$ whoami • Personalisation lead at Hotstar • Led Data
Infrastructure team at Grofers and TinyOwl • Kafka fanboy • Usually rant on twitter @jayeshsidhwani 3

What is Kafka? 4 • Kafka is a scalable, fault-tolerant,
distributed queue • Producers and Consumers • Uses ◦ Asynchronous communication in event-driven architectures ◦ Message broadcast for database replication Diagram credits: http://kafka.apache.org

• Brokers ◦ Heart of Kafka ◦ Stores data ◦
Data stored into topics • Zookeeper ◦ Manages cluster state information ◦ Leader election Inside Kafka 5 BROKER ZOOKEEPER BROKER BROKER ZOOKEEPER TOPIC TOPIC TOPIC P P P C C C

• Topics are partitioned ◦ A partition is a append-only
commit-log file ◦ Achieves horizontal scalability • Messages written in a partitions are ordered • Each message gets an auto-incrementing offset # ◦ {“user_id”: 1, “term”: “GoT”} is a message in the topic searched Inside a topic 6 Diagram credits: http://kafka.apache.org

How do consumers read? • Consumer subscribes to a topic
• Consumers read from the head of the queue • Multiple consumers can read from a single topic 7 Diagram credits: http://kafka.apache.org

Kafka consumer scales horizontally • Consumers can be grouped •
Consumer Groups ◦ Horizontally scalable ◦ Fault tolerant ◦ Delivery guaranteed 8 Diagram credits: http://kafka.apache.org

Stream processing and its use-cases 9

Discrete data processing models 10 APP APP APP • Request
/ Response processing mode ◦ Processing time <1 second ◦ Clients can use this data

Discrete data processing models 11 APP APP APP • Request
/ Response processing mode ◦ Processing time <1 second ◦ Clients can use this data DWH HADOOP • Batch processing mode ◦ Processing time few hours to a day ◦ Analysts can use this data

Discrete data processing models 12 • As the system grows,
such synchronous processing model leads to a spaghetti and unmaintainable design APP APP APP APP SEARCH MONIT CACHE

Promise of stream processing 13 • Untangle movement of data
◦ Single source of truth ◦ No duplicate writes ◦ Anyone can consume anything ◦ Decouples data generation from data computation APP APP APP APP SEARCH MONIT CACHE STREAM PROCESSING FRAMEWORK

Promise of stream processing 14 • Untangle movement of data
◦ Single source of truth ◦ No duplicate writes ◦ Anyone can consume anything • Process, transform and react on the data as it happens ◦ Sub-second latencies ◦ Anomaly detection on bad stream quality ◦ Timely notification to users who dropped off in a live match Intelligence APP APP APP APP STREAM PROCESSING FRAMEWORK Filter Window Join Anomaly Action

Stream processing using Kafka 15

Stream processing frameworks • Write your own? ◦ Windowing ◦
State management ◦ Fault tolerance ◦ Scalability • Use frameworks such as Apache Spark, Samza, Storm ◦ Batteries attached ◦ Cluster manager to coordinate resources ◦ High memory / cpu footprint 16

Kafka Streams • Kafka Streams is a simple, low-latency, framework
independent stream processing framework • Simple DSL • Same principles as Kafka consumer (minus operations overhead) • No cluster manager! yay! 17

Writing Kafka Streams • Define a processing topology ◦ Source
nodes ◦ Processor nodes ▪ One or more ▪ Filtering, windowing, joins etc ◦ Sink nodes • Compile it and run like any other java application 18

Demo Simple Kafka Stream 19

Kafka Streams architecture and operations • Kafka manages ◦ Parallelism
◦ Fault tolerance ◦ Ordering ◦ State Management 20 Diagram credits: http://confluent.io

Streaming joins and state-stores • Beyond filtering and windowing •
Streaming joins are hard to scale ◦ Kafka scales at 800k writes/sec* ◦ How about your database? • Solution: Cache a static stream in-memory ◦ Join with running stream ◦ Stream<>table duality • Kafka supports in-memory cache OOB ◦ RocksDB ◦ In-memory hash ◦ Persistent / Transient 21 Diagram credits: http://confluent.io *achieved using librdkafka c++ library

Demo • Inputs: ◦ Incoming stream of benchmark stream quality
from CDN provider ◦ Incoming stream quality reported by Hotstar clients • Output: ◦ Calculate the locations reporting bad QoS in real-time 22 Diagram credits: http://confluent.io *achieved using librdkafka c++ library

Demo • Inputs: ◦ Incoming stream of benchmark stream quality
from CDN provider ◦ Incoming stream quality reported by Hotstar clients • Output: ◦ Calculate the locations reporting bad QoS in real-time 23 Diagram credits: http://confluent.io *achieved using librdkafka c++ library CDN benchmarks Client reports Alerts

KSQL - Kafka Streams ++ 24

Kafka @ Hotstar 25

Building real-time streaming applications using...

Building real-time streaming applications using Apache Kafka

Jayesh

More Decks by Jayesh

Other Decks in Technology

Featured

Transcript

Real time stream processing using Apache Kafka

Agenda • What is Apache Kafka? • Why do we

$ whoami • Personalisation lead at Hotstar • Led Data

What is Kafka? 4 • Kafka is a scalable, fault-tolerant,

• Brokers ◦ Heart of Kafka ◦ Stores data ◦

• Topics are partitioned ◦ A partition is a append-only

How do consumers read? • Consumer subscribes to a topic

Kafka consumer scales horizontally • Consumers can be grouped •

Stream processing and its use-cases 9

Discrete data processing models 10 APP APP APP • Request

Discrete data processing models 11 APP APP APP • Request

Discrete data processing models 12 • As the system grows,

Promise of stream processing 13 • Untangle movement of data

Promise of stream processing 14 • Untangle movement of data

Stream processing using Kafka 15

Stream processing frameworks • Write your own? ◦ Windowing ◦

Kafka Streams • Kafka Streams is a simple, low-latency, framework

Writing Kafka Streams • Define a processing topology ◦ Source

Demo Simple Kafka Stream 19

Kafka Streams architecture and operations • Kafka manages ◦ Parallelism

Streaming joins and state-stores • Beyond filtering and windowing •

Demo • Inputs: ◦ Incoming stream of benchmark stream quality

Demo • Inputs: ◦ Incoming stream of benchmark stream quality

KSQL - Kafka Streams ++ 24

Kafka @ Hotstar 25

26