Leveraging Pinot’s Plugin Architecture to Build a Unified Stream Decoder (Mike Davis, DoorDash)

Leveraging Pinot’s Plugin Architecture to Build a Unified Stream Decoder (Mike Davis, DoorDash) | RTA Summit 2023

by StarTree

Slide 1

Slide 1 text

1 Mike Davis RTA Summit April 25th, 2023 Uniﬁed Stream Decoder: Leveraging Pinot's Plugin Architecture

Slide 2

Slide 2 text

2 Use-Cases for Real-time Analytics

Slide 3

Slide 3 text

CONFIDENTIAL Monitor ETA Models 3 ETAs (estimated time of arrival) are a common feature within the DoorDash app. They give the customer a general idea of when their order would be delivered. We’re continuously working to improve the accuracy of this calculation and need to monitor the real vs computed results in real time.

Slide 4

Slide 4 text

CONFIDENTIAL Track Experimentation Rollouts 4 DoorDash has thousands of experiments running month through an internal self-serve platform named Curie. Users need the ability to monitor the rollout of their experiments and conﬁrm their reaching the desired audience. *Meet Dash-AB May 2022 **Experimentation Platform Sep 2020

Slide 5

Slide 5 text

CONFIDENTIAL Ads Campaign Reporting Ads allows customer to boost a vendors visibility within the DoorDash app. Customers want to know in real-time how their ads are performing. 5

Slide 6

Slide 6 text

6 Real-time Stream Consumer

Slide 7

Slide 7 text

● Batch loaded ○ Spark, Flink, Minions ○ Hourly, daily ● Offline process, outside of Pinot ○ Convert data files into Segments ○ Tell Pinot about the new segments ● Write-once, read-many 7 Pinot Table Types 7 Offline Table ● Stream ingested ○ Apache Kafka ○ Amazon Kinesis ○ Apache Pulsar ● Pinot Servers are direct consumers ○ Convert streams into Segments ○ Tell Pinot about the new segments ○ In-flight events are also queryable ● Continuously writing Real-Time Table

Slide 8

Slide 8 text

8 ● streamType (e.g. Kafka) ● stream.kafka.topic.name ● stream.kafka.broker.list ● stream.kafka.consumer.type (LLC vs HLC) ● stream.kafka.consumer.factory.class.name ● Addl Consumer dependent configs ○ SSL ○ Authentication Kafka Real-Time Configuration 8 Consumer ● Stream.kafka.decoder.class.name ○ JSONMessageDecoder ○ KafkaAvroMessageDecoder ○ SimpleAvroMessageDecoder ○ KafkaConfluentSchemaRegistryAvroMessageDecoder ○ CSVMessageDecoder ○ ProtoBufMessageDecoder ● Decoder dependent configs: ○ stream.kafka.decoder.prop.schema.registry.rest.url Decoder

Slide 9

Slide 9 text

9 Why we need a custom decoder?

Slide 10

Slide 10 text

By the end of 2020 DoorDash had mostly transitioned to a microservices architecture 10 Protobuf and gRPC is the new standard gRPC was widely adopted so Protobuf was the encoding of choice for most systems Realtime event processing via Flink also adopted Protobuf encoding Custom producer and consumer libraries abstracted out the serialization frameworks 10 *How DoorDash Transitioned from a Monolith to Microservices Dec 2020 **Building Scalable Real Time Event Processing with Kafka and Flink Aug 2022

Slide 11

Slide 11 text

11 Protobuf Avro vs Protobuf >> Avro

Slide 12

Slide 12 text

CONFIDENTIAL Just use the ProtoBufMessageDecoder? 12 *Pinot Input Formats: Protocol Buffers Sample Configuration "streamType": "kafka", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.ProtoBufMessageDecoder", "stream.kafka.decoder.prop.descriptorFile": "file:///tmp/Workspace/protobuf/metrics.desc", "stream.kafka.decoder.prop.protoClassName": "Metrics" Does NOT support Schema Registry (open TODO) DoorDash implementation of Protobuf schema registry is not compatible

Slide 13

Slide 13 text

CONFIDENTIAL Workarounds ● Maintaining multiple topics ● Operating another stream processing job ● Most customers were unaware of the serialization format ● Error messages were vague and ● Back-and-forth development INCREASED ONBOARDING FRICTION INCREASED OVERHEAD ● Solution needed to work with existing streams ● Avoid bespoke integrations ● Work with existing Data Platform solutions SHOULD JUST WORK 13 Require customers to use Avro natively Replicate their topic to another topic in Avro Use a OFFLINE table instead

Slide 14

Slide 14 text

14 Walk-thru of the Pinot Stream SPI

Slide 15

Slide 15 text

15 How it works

Slide 16

Slide 16 text

16 Pinot Plugins via SPI 16 *Apache Pinot: Plugins

Slide 17

Slide 17 text

17 What is SPI? 17 *Service Provider Interface Wikipedia

Slide 18

Slide 18 text

StreamConsumerFactory 18 Stream Ingestion Plugin PartitionLevelConsumer StreamLevelConsumer StreamMessageDecoder 18 *Stream Ingestion Plugin StreamMetadataProvider

Slide 19

Slide 19 text

CONFIDENTIAL Custom Decoder Implementation 19 *KafkaConﬂuentSchemaRegistryAvroMessageDecoder.java Started by understanding KafkaConﬂuentSchemaRegistryAvroMessageDecoder Decoder only gets the Kafka Payload :(

Slide 20

Slide 20 text

CONFIDENTIAL Kafka Record Header 20 *Top 5 Things Every Apache Kafka Developer Should Know “Record headers give you the ability to add some metadata about the Kafka record, without adding any extra information to the key/value pair of the record itself” ● Kafka Header introduced in Kafka 0.11.0 ● Record header consists of a String key and Byte value ● Support multiple values per key Kafka events at DoorDash leverage the record header to specify the encoding

Slide 21

Slide 21 text

StreamConsumerFactory 21 Stream Ingestion Plugin PartitionLevelConsumer StreamLevelConsumer StreamMessageDecoder 21 *Stream Ingestion Plugin StreamMetadataProvider

Slide 22

Slide 22 text

CONFIDENTIAL Kafka Partitions in Review 22 *Consuming and Indexing rows in Realtime ● Topics in Kafka are made up of one to many partitions ● Number of partitions are deﬁned for each topic and are constant ● Each partition is assigned to a Kafka Broker ● Example topic with 100 partitions and 10 brokers. 10 partitions per broker Partitions allow for horizontal scaling of a topic

Slide 23

Slide 23 text

CONFIDENTIAL PartitionLevelConsumer vs StreamLevelConsumer 23 StreamLevelConsumer aka HighLevel (HLC) - Consume data without control over the partitions *Consuming and Indexing rows in Realtime

Slide 24

Slide 24 text

CONFIDENTIAL PartitionLevelConsumer vs StreamLevelConsumer 24 PartitionLevelConsumer aka LowLevel (LLC) - Consume data from each partition with oﬀset management *Consuming and Indexing rows in Realtime

Slide 25

Slide 25 text

CONFIDENTIAL PartitionLevelConsumer Implementation 25 *Consuming and Indexing rows in Realtime

Slide 26

Slide 26 text

CONFIDENTIAL 26 DoorDashStreamMessageDecoder

Slide 27

Slide 27 text

27 Deployment

Slide 28

Slide 28 text

CONFIDENTIAL Bundling with Docker 28 *Consuming and Indexing rows in Realtime FROM gradle:jdk11 as builder WORKDIR /home/gradle/src RUN gradle --no-daemon build FROM apachepinot/pinot:release-0.11.0 COPY --from=builder /home/gradle/src/pinot-plugins/build/libs/pinot-plugins.jar /opt/pinot/plugins/doordash/plugins.jar 1) Build our custom assets 2) Copy them into the base image 3) Base image builds classpath under plugins dir

Slide 29

Slide 29 text

CONFIDENTIAL Deploying via Helm 29 image: repository: apachepinot/pinot tag: latest values.yaml: (default) image: repository: /pinot-deploy tag: 1.23.0 prod-values.yaml:

Slide 30

Slide 30 text

30 Future Plans

Slide 31

Slide 31 text

CONFIDENTIAL Dead Message Queue 31 ● Kafka partitioned consumers process events in order. (e.g. FIFO) ● What happens when a bad message enters the stream? ● Fail and block or Discard and continue?

Slide 32

Slide 32 text

CONFIDENTIAL Dead Message Queue 32 ● Instead of blocking… ● Skip message at push into another Kafka topic ● Kafka topic can be written to datalake for recovery ● Consumed by Pinot ○ topic_name: String ○ timestamp: Timestamp ○ error_message: String ○ payload: JSON

Slide 33

Slide 33 text

CONFIDENTIAL Default Transformations 33 ● Protobuf#Timestamp ○ Seconds: Long ○ Nanoseconds: Long ● Common transformations "transformConfigs": [ { "columnName": "current_time_seconds", "transformFunction": "Groovy({current_time.seconds}, current_time)" }, { "columnName": "current_time_ms", "transformFunction": "Groovy({timestamp.seconds * 1000 + timestamp.nanos.intdiv(1000000)}, timestamp)" } ]