Apache Pinot Case Study - Kafka Summit 2020

@apachepinot | @KishoreBytes Apache Pinot Case Study Building distributed analytics
systems using Apache Kafka

@apachepinot | @KishoreBytes

@apachepinot | @KishoreBytes Pinot @LinkedIn

@apachepinot | @KishoreBytes 70+ Products Pinot @ LinkedIn User Facing
Analytics 120k+ queries/sec ms - 1s latency

@apachepinot | @KishoreBytes Pinot @ LinkedIn Business Metrics Analytics 10k+
Metrics 50k+ Dimensions

@apachepinot | @KishoreBytes Pinot @ LinkedIn ThirdEye: Anomaly detection and
root cause analysis 50+ Teams 100K Time Series

@apachepinot | @KishoreBytes Apache Pinot @ Other Companies 2.7k Github
Stars Slack Users Companies 500+ 20+ Community has tripled in the last two quarters Join our growing community on the Apache Pinot Slack Channel https://communityinviter.com/apps/apache-pinot/apache-pinot

@apachepinot | @KishoreBytes User Facing Applications Business Facing Metrics Anomaly
Detection Time Series Multiple Use Cases: One Platform Kafka 70+ 10k 100k 120k Queries/sec Events/sec 1M+

@apachepinot | @KishoreBytes Challenges of User facing real-time analytics Velocity
of ingestion High Dimensionality 1000s of QPS Milliseconds Latency Seconds Freshness Highly Available Scalable Cost Effective User-facing real-time analytics system

@apachepinot | @KishoreBytes Pinot Real-time Ingestion Deep Dive

@apachepinot | @KishoreBytes Pinot Architecture Servers Brokers Queries Scatter Gather
• Servers - Consuming, indexing, serving • Brokers - Scatter gather

@apachepinot | @KishoreBytes Server 1 Deep Store Pinot Realtime Ingestion
Basics • Kafka Consumer on Pinot Server • Periodically create “Pinot segment” • Persist to deep store • In memory data - queryable • Continue consumption Kafka Consumer

@apachepinot | @KishoreBytes Kafka Consumer Groups Approach 1

@apachepinot | @KishoreBytes Kafka Consumer Group based design • Each
consumer consumes from 1 or more partitions Server 2 Server 1 3 partitions Consumer Group Kafka Consumer Kafka Consumer

@apachepinot | @KishoreBytes Kafka Consumer Group based design • Each
consumer consumes from 1 or more partitions Server 2 time 3 partitions Consumer Group Kafka Consumer Kafka Consumer • Periodic checkpointing Server1 starts consuming from 0 and 2 Checkpoint 350 Checkpoint 400 seg 1 seg 2 Seg 1 Seg 2

@apachepinot | @KishoreBytes Kafka Consumer Group based design Server 2
time 3 partitions Consumer Group Kafka Consumer Kafka Consumer • Relied on Kafka Rebalancer for ◦ Initial partition assignment ◦ Rebalancing partitions for node/partition changes Server1 starts consuming from 0 and 2 Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer • Fault tolerant consumption

@apachepinot | @KishoreBytes Challenges with Capacity Expansion Server 2 S1
Add Server3 time Server 3 3 partitions Kafka Consumer Kafka Consumer Consumer Group Kafka Consumer Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer Server1 starts consuming from 0 and 2

Add Server3 Partition 2 moves to Server 3 Server3 begins consumption from 400 time Server 3 3 partitions Kafka Consumer Kafka Consumer Consumer Group Kafka Consumer Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer Server1 starts consuming from 0 and 2

Add Server3 Partition 2 moves to Server 3 Server3 begins consumption from 400 time Server 3 Duplicate Data across Server 1 and Server 3 for Partition 2! 3 partitions Kafka Consumer Kafka Consumer Consumer Group Kafka Consumer Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer Server1 starts consuming from 0 and 2

@apachepinot | @KishoreBytes Multiple Consumer Groups Consumer Group 1 Consumer
Group 2 3 partitions 2 replicas • Tried multiple consumer groups to solve the issue, but... • No control over partitions assigned to consumer • No control over checkpointing

@apachepinot | @KishoreBytes Deep store Multiple Consumer Groups Consumer Group
1 Consumer Group 2 3 partitions 2 replicas • Segment disparity • Storage inefficient

@apachepinot | @KishoreBytes Operational Complexity Queries Consumer Group 1 Consumer
Group 2 3 partitions 2 replicas • Node failure in a consumer group • Cannot use good nodes of Consumer Group 1 and only look for missing data in Consumer Group 2

@apachepinot | @KishoreBytes Operational Complexity Consumer Group 1 Consumer Group
2 3 partitions 2 replicas • Disable consumer group for node failure/capacity changes

@apachepinot | @KishoreBytes Server 4 Scalability limitation Consumer Group 1
Consumer Group 2 3 partitions 2 replicas • Scalability limited by #partitions Idle • Cost inefficient

@apachepinot | @KishoreBytes Single node in a Consumer Group •
Eliminates incorrect results • Reduced operational complexity Server 1 Server 2 • Limited by capacity of 1 node • Storage overhead • Scalability limitation Consumer Group 1 Consumer Group 2 3 partitions 2 replicas The only deployment model that worked

@apachepinot | @KishoreBytes Incorrect Results Operational Complexity Storage overhead Limited
scalability Expensive Multi-node Consumer Group Y Y Y Y Y Single-node Consumer Group Y Y Y Issues with Kafka Consumer Group based solution

@apachepinot | @KishoreBytes What were the problems?

@apachepinot | @KishoreBytes Problem 1 Lack of control with Kafka
Rebalancer Solution Take control of partition assignment

@apachepinot | @KishoreBytes Problem 2 Segment Disparity due to checkpointing
mechanism Solution Take control of checkpointing

@apachepinot | @KishoreBytes Partition Level Consumption Approach 2

@apachepinot | @KishoreBytes S1 S3 Partition Level Consumption Pinot Controller
S2 3 partitions 2 replicas Partition Server State Start offset End offset S1 S2 CONSUMING CONSUMING 20 S3 S1 CONSUMING CONSUMING 20 S2 S3 CONSUMING CONSUMING 20 0 1 2 Cluster State • Single coordinator across all replicas • Creates cluster state - mapping from partition to servers, segment state, offsets Pinot Servers

@apachepinot | @KishoreBytes S1 S3 Partition Level Consumption Pinot Controller
S2 3 partitions 2 replicas Partition Server State Start offset End offset S1 S2 CONSUMING CONSUMING 20 S3 S1 CONSUMING CONSUMING 20 S2 S3 CONSUMING CONSUMING 20 0 1 2 Cluster State • All actions determined by cluster state • Cluster state tells servers which partitions to consume Pinot Servers

@apachepinot | @KishoreBytes S1 S3 Partition Level Consumption Controller S2
3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 CONSUMING CONSUMING 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 80 110 110 • Periodically consuming segments try to commit their segment, by reporting end offset to controller • Thresholds for commit are configurable - time based, rows based, size based

@apachepinot | @KishoreBytes S1 S3 Partition Level Consumption Controller S2
3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State Commit 80 110 110 ONLINE ONLINE • Controller picks 1 winner • Updates cluster state

@apachepinot | @KishoreBytes Deep Store S1 S3 Partition Level Consumption
Controller S2 3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 110 ONLINE ONLINE • Winner builds segment • Only 1 server persists segment to deep store • Only 1 copy stored

Controller S2 3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 110 ONLINE ONLINE • All other replicas ◦ Download from deep store ◦ Or build own segment if data is equivalent • Segment equivalence

Controller S2 3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 ONLINE ONLINE 20 110 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 0 S1 S2 CONSUMING CONSUMING 110 • New segment state created • Start where previous segment left off

Controller S2 3 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 ONLINE ONLINE 20 110 1 S3 S1 ONLINE ONLINE 20 120 2 S2 S3 ONLINE ONLINE 20 100 Cluster State 0 S1 S2 CONSUMING CONSUMING 110 1 S3 S1 CONSUMING CONSUMING 120 2 S2 S3 CONSUMING CONSUMING 100 • Same for every partition • Each partition independent of others

@apachepinot | @KishoreBytes Deep Store S1 S3 Capacity expansion Controller
S2 3 partitions 2 replicas S4 • Consuming segment - Restart consumption using offset in cluster state • Pinot segment - Download from deep store • Easy to handle changes in replication/partitions • No duplicates! • Cluster state table updated

@apachepinot | @KishoreBytes S1 S3 Node failures Controller S2 3
partitions 2 replicas S4 • At least 1 replica still alive • No complex operations

@apachepinot | @KishoreBytes S1 S3 Scalability Controller S2 3 partitions
2 replicas S4 • Easily add nodes • Segment equivalence = Smart segment assignment + Smart query routing S6 S5 Completed Servers Consuming Servers

@apachepinot | @KishoreBytes Incorrect Results Operational Complexity Storage overhead Limited
scalability Expensive Multi-node Consumer Group Y Y Y Y Y Single-node Consumer Group Y Y Y Partition Level Consumers Summary

@apachepinot | @KishoreBytes Q&A pinot.apache.org @apachepinot

Apache Pinot Case Study - Kafka Summit 2020

Apache Pinot Case Study - Kafka Summit 2020

Other Decks in Technology

Featured

Transcript