Gently Down the Stream with Apache Pinot (Navina Ramesh, StarTree) | RTA Summit 2023

Gently Down the Stream with Apache Pinot Navina Ramesh Software
Engineer, StarTree

Why Realtime Analytics Matters? 2

Why Realtime Analytics Matters? • Analytics helps make better decisions
by providing insights into data. 3 Events Insight Action

Why Realtime Analytics Matters? 4 • Realtime analytics reduces the
time to go from data to insights and actions Time Value Realtime

Examples 5

Total users 700 Million QPS 10000+ Latency SLA < 100
ms p99th Freshness Seconds Examples of Realtime Analytics

Examples of Real-Time Analytics Missed orders Inaccurate orders Downtime Top
selling items Menu item Feedback Total users 500,000+ QPS 100s Latency SLA < 100 ms p99th Freshness Seconds - Minutes

Characteristics of a Realtime Analytics Platform • Query latency ~sub-second
to seconds • Data freshness ~seconds • QPS ~ order of thousands 8 1000s of QPS Sub-second Latency Seconds Freshness Highly Available Scalable Cost Effective Realtime Ingestion Velocity High Dimensionality

From Stream to Pinot 9

10 From Stream to Pinot Data Semantics Data Partitioning Multiple
data sources Data Transformation Derived Data Complex type / Nested Fields Data Granularity

Realtime Ingestion in Apache Pinot 11

Realtime Ingestion in Apache Pinot • Partition-level, Replicated Consumption Model
• Periodic Segment Creation based on size/time thresholds • Data ready for query as soon as ingestion 12

High Level Architecture 13 Pinot Table Pinot Data Segments ColN
.. … … … … … . . . + Indexes

Realtime Ingestion 14 Server-1 Server-2 Server-3 Zookeeper Controller Consuming seg0_0
seg1_0 seg2_0 seg0_0 seg1_0 seg2_0 Replicated & Partitioned Consumption Brokers Query consuming segments Seg0_0 -> Server-1,Server-2 Seg1_0 -> Server-2,Server-3 Seg2_0 -> Server-3,Server-1

15 Server-1 Server-2 Server-3 Zookeeper Controller seg0_0 seg1_0 seg2_0 seg0_0
seg1_0 seg2_0 Realtime Ingestion Deep Store seg0_0 seg1_0 seg2_0 Seal segment (immutable)

16 Server-1 Server-2 Server-3 Zookeeper Controller seg0_0 seg1_0 seg2_0 seg0_0
seg1_0 seg2_0 Deep Store seg0_0 Realtime Ingestion seg0_1 seg0_1 seg2_1 seg1_1 seg1_0 seg2_0 seg2_1 seg1_1 Seg0_0 -> Server-1,Server-2 Seg1_0 -> Server-2,Server-3 Seg2_0 -> Server-3,Server-1 Seg0_1 -> Server-1,Server-2 Seg1_1 -> Server-2,Server-3 Seg2_1 -> Server-3,Server-1 Consuming, Indexing & Serving

Peek into Realtime Engine 17

Apache Pinot Realtime Engine • Pluggable Design • Components ◦
Stream Consumer ◦ Decoder, Transform Pipeline, Indexer ◦ Upsert Metadata Manager ◦ Segment Completion Handler • Components allow Pinot to: ◦ Fetch data from Source ◦ Parse and apply transforms and aggregations on data ◦ Apply transforms based on data semantics ◦ Build and store Segments 18

Stream Consumer Decode Transform Index Segment Completion Handler Upsert Manager
Controller Zookeeper Realtime Engine Component - Stream Consumer Fetches from a data source

Decode Transform Index Segment Completion Handler Upsert Manager Controller Zookeeper
Realtime Engine Component - Decoder Deserialize data into Pinot row Stream Consumer • CSV • JSON • Avro • Protobuf • … • Debezium Decoder

Controller Zookeeper Realtime Engine Component - Transform Pipeline Applies transformation functions Filtering Flattening Aggregation

Controller Zookeeper Realtime Engine Components - Indexer Index data in-memory; Ready to query

Controller Zookeeper Realtime Engine Components - Upsert Manager Maintains metadata for upsert semantics

Decode Transform Index Segment Completion Handler Upsert Manager Controller Zookeeper
Realtime Engine Components Stream Consumer Message Processor

Data Transformation & Derived Data 26

Transform Functions in Apache Pinot 27 • Large library of
in-built transform functions • Deﬁne custom transforms ◦ Using UDFs ◦ Using Groovy ◦ Using @Scalar Annotation on Java transform methods

Transform Function 28 Number of clicks per hour on a
given day SELECT SUM(clicks) FROM ads WHERE toEpochDays(timestamp) = ‘2023-03-30 00:00:00’ GROUP BY toEpochHours(timestamp) Transform Functions in Apache Pinot ad_id views clicks timestamp 1 7 4 2023-03-30 00:00:00 2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01

29 Number of clicks per hour on a given day
SELECT SUM(clicks) FROM ads WHERE toEpochDays(timestamp) = ‘2023-03-30 00:00:00’ GROUP BY toEpochHours(timestamp) Transform functions - toEpochDays, toEpochHours, SUM Transform Functions in Apache Pinot ad_id views clicks timestamp 1 7 4 2023-03-30 00:00:00 2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01

30 Average conversion rate since a given day SELECT AVG(click/views)
FROM ads WHERE toEpochDays(timestamp) >= ‘2023-03-01 00:00:00’ Transform Functions in Apache Pinot ad_id views clicks timestamp 1 7 4 2023-03-30 00:00:00 2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01

FROM ads WHERE toEpochDays(timestamp) >= ‘2023-03-01 00:00:00’ Transform functions - toEpochDays, AVG Transform Functions in Apache Pinot ad_id views clicks timestamp 1 7 4 2023-03-30 00:00:00 2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01

32 Transform Functions in Apache Pinot ad_id views clicks timestamp
1 7 4 2023-03-30 00:00:00 2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01 Transform functions are convenient. But it comes at a cost of query performance! • Needs extra calculations per record • Cannot apply index on transformation results Use Ingestion Transforms in Pinot!

33 Table Config: “ingestionConfig”: { “transformConfigs”: [{ “columnName”: “daysTS”, “transformFunction”:
“toEpochDays(timestamp)” }, { “columnName”: “hoursTs”, “transformFunction”: “toEpochHours(timestamp)” }] ... Ingestion Transforms in Apache Pinot • Pre-materialize transformation result as a dedicated pinot column • Index can be applied on the pre-materialized column!

34 ad_id views clicks timestamp 1 7 4 2023-03-30 00:00:00
2 10 2 2023-03-30 00:00:00 3 1 2 2023-03-30 00:00:01 1 4 1 2023-03-30 00:00:01 4 2 1 2023-03-30 00:00:01 Number of clicks per hour on a given day SELECT SUM(clicks) FROM ads WHERE toEpochDays(timestamp) = ‘2023-03-30 00:00:00’ GROUP BY toEpochHours(timestamp) SELECT SUM(clicks) FROM ads WHERE daysTs = ‘2023-03-30 00:00:00’ GROUP BY hoursTs Ingestion Transforms in Apache Pinot

FROM ads WHERE toEpochDays(timestamp) >= ‘2023-03-01 00:00:00’ SELECT AVG(click/views) FROM ads WHERE daysTs >= ‘2023-03-30 00:00:00’ Ingestion Transforms in Apache Pinot Table Config: “ingestionConfig”: { “transformConfigs”: [{ “columnName”: “daysTS”, “transformFunction”: “toEpochDays(timestamp)” }, { “columnName”: “hoursTs”, “transformFunction”: “toEpochHours(timestamp)” }] ...

37 Derived Columns in Apache Pinot • Column with its
value calculated from other columns • Generated on-the-ﬂy during ingestion • Backﬁll existing segments with reload of the segment on the server

38 Table Config: “ingestionConfig”: { “transformConfigs”: [{ “columnName”: “daysTS”, “transformFunction”:
“toEpochDays(timestamp)” }, { “columnName”: “hoursTs”, “transformFunction”: “toEpochHours(timestamp)” }, { “columnName”: “conversionRate”, “transformFunction”: “clicks/views” } ] ... Derived Columns in Apache Pinot Average conversion rate since a given day SELECT AVG(click/views) FROM ads WHERE daysTs >= ‘2023-03-30 00:00:00’ SELECT AVG(conversionRate) FROM ads WHERE daysTs >= ‘2023-03-30 00:00:00’

Complex Types & Nested Fields 41

42 Complex Types { "id": "17244207357", "type": "PullRequestEvent", "repo": {
"id": 326604100, "name": "jthegedus/svelte-adapter-ﬁrebase" }, "payload": { "number": 120, "pull_request": { "id": 694073641, "user": { "login": "github-actions[bot]", "id": 41898282, "type": "Bot" }, "body": ":robot: I have created a release ...", "created_at": "2021-07-21T05:59:14Z", "labels": [ { "id": 3184575526, "name": "autorelease: pending" }, { "id": 2412457281, "name": "autorelease: testing" } ] } } } Nested Fields Complex Types / Nested Fields Nested Fields Array

43 Complex Types { "id": "17244207357", "type": "PullRequestEvent", "repo": {
"id": 326604100, "name": "jthegedus/svelte-adapter-ﬁrebase" }, "payload": { "number": 120, "pull_request": { "id": 694073641, "user": { "login": "github-actions[bot]", "id": 41898282, "type": "Bot" }, "body": ":robot: I have created a release ...", "created_at": "2021-07-21T05:59:14Z", "labels": [ { "id": 3184575526, "name": "autorelease: pending" }, { "id": 2412457281, "name": "autorelease: testing" } ] } } } Nested Fields Complex Types / Nested Fields Nested Fields Array

44 • Store top-level ﬁelds as JSON string and extract
during query time Complex Types / Nested Fields SELECT id, type, json_extract_scalar("repo", '$.name', 'String', 'null') repo_name, json_extract_scalar("payload", '$.number', INT, 'null') payload_number, json_extract_scalar(json_extract_scalar("payload", '$.pull_request', 'String', 'null'), '$.labels', 'STRING', 'null') payload_pullrequest_labels, json_extract_scalar("payload", '$.created_at', 'String', 'null') payload_created_at FROM PullRequestEvent

45 • Store top-level ﬁelds as JSON string and extract
during query time Complex Types / Nested Fields SELECT id, type, json_extract_scalar("repo", '$.name', 'String', 'null') repo_name, json_extract_scalar("payload", '$.number', INT, 'null') payload_number, json_extract_scalar(json_extract_scalar("payload", '$.pull_request', 'String', 'null'), '$.labels', 'STRING', 'null') payload_pullrequest_labels, json_extract_scalar("payload", '$.created_at', 'String', 'null') payload_created_at FROM PullRequestEvent • Not optimized for query - ◦ Transform functions applied at query-time ◦ For every record, json object is constructed from the stored json string • Cannot index on nested json type • Storage overhead - storing entire json string

46 • Use Ingestion Transforms! ◦ Specify complexTypeConfig with delimiter
to flatten the maps recursively Complex Types / Nested Fields Schema: "dimensionFieldSpecs": [ { "name": "id", "dataType": "INT" }, { "name": "type", "dataType": "STRING”}, { "name": "repo.name", "dataType": "STRING" }, { "name": "payload.pull_request.id", "dataType": "INT"}, { "name": "payload.pull_request.labels", "dataType": "STRING"} ], “dateTimeFieldSpecs”: [ { “name”: “payload.pull_request.created_at, "dataType": "STRING", “format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'” } ] Table Config: "ingestionConfig": { "complexTypeConfig": { “delimiter”: “.” }, … }

to flatten the maps recursively Complex Types / Nested Fields Schema: "dimensionFieldSpecs": [ { "name": "id", "dataType": "INT" }, { "name": "type", "dataType": "STRING”}, { "name": "repo.name", "dataType": "STRING" }, { "name": "payload.pull_request.id", "dataType": "INT"}, { "name": "payload.pull_request.labels", "dataType": "STRING"} ], “dateTimeFieldSpecs”: [ { “name”: “payload.pull_request.created_at, "dataType": "STRING", “format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'” } ] Table Config: "ingestionConfig": { "complexTypeConfig": { “delimiter”: “.” }, … } id type repo.name payload.pull_re quest.number payload.pull_request.labels payload.pull_requ est.created_at 17244207357 PullRequestEvent jthegedus/svelte-adap ter-firebase 120 [ { "id": 3184575526, "name": "autorelease: pending"}, { "id": 2412457281, "name": "autorelease: testing"}] 2021-07-21T05:5 9:14Z

49 • Use Ingestion Transform! ◦ Specify fieldsToUnnest within complexTypeConfig
to unnest the array Complex Types / Nested Fields Schema: "dimensionFieldSpecs": [ { "name": "id", "dataType": "INT" }, { "name": "type", "dataType": "STRING”}, { "name": "repo.name", "dataType": "STRING" }, { "name": "payload.pull_request.id", "dataType": "INT"}, { "name": "payload.pull_request.labels.name", "dataType": "JSON"} ], “dateTimeFieldSpecs”: [ { “name”: “payload.pull_request.created_at, "dataType": "STRING", “format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'” } ] Table Config: "ingestionConfig": { "complexTypeConfig": { “delimiter”: “.”, “fieldsToUnnest”: [“payload.pull_request.labels”] }, … } id type repo.name payload.pull_re quest.number payload.pull_request.labels payload.pull_request.cr eated_at 17244207357 PullRequestEvent jthegedus/svelte-adap ter-firebase 120 autorelease: pending 2021-07-21T05:59:14Z 17244207357 PullRequestEvent jthegedus/svelte-adap ter-firebase 120 autorelease: testing 2021-07-21T05:59:14Z

Data Granularity 52

53 Dealing with Data Granularity Data Granularity customer_id product_id price
timestamp 205 car 1500 1571900400000 206 truck 2200 1571900400000 207 car 1300 1571900400000 208 truck 700 1572418800000 209 car 1100 1572505200000 210 car 2100 1572505200000 milliseconds product_id price timestamp car 2800 10-24-2019 00:00:00 truck 2200 10-24-2019 00:00:00 truck 700 10-30-2019 00:00:00 car 3200 10-31-2019 00:00:00 daily

Dealing with Data Granularity 54 customer_id product_id price timestamp 205
car 1500 1571900400000 206 truck 2200 1571900400000 207 car 1300 1571900400000 208 truck 700 1572418800000 209 car 1100 1572505200000 210 car 2100 1572505200000 211 truck 800 1572678000000 212 car 800 1572678000000 213 car 1900 1572678000000 214 car 1000 1572678000000 Daily sales value per product SELECT COUNT(*), SUM(price) FROM sales GROUP BY product_id, toEpochDays(timestamp) Computed at query time Aggregated at query time

Ingestion Aggregation 55 • Aggregate Data during Ingestion time Table
Config: { "transformConfigs": [{ "columnName": "daysSinceEpoch", "transformFunction": "toEpochDays(timestamp)" }], "aggregationConfigs": [{ "columnName": "total_sales", "aggregationFunction": "SUM(price)" }, { "columnName": "sales_count", "aggregationFunction": "COUNT(*)" }] }, "tableIndexConfig": { "noDictionaryColumns": [ "sales_count", "total_sales" ]}} product_id sales_count total_sales day car 2 2800 10-24-2019 00:00:00 truck 1 2200 10-24-2019 00:00:00 truck 1 700 10-30-2019 00:00:00 car 2 3300 10-31-2019 00:00:00 truck 1 800 11-02-2019 00:00:00 car 3 3700 11-02-2019 00:00:00 Daily sales value per product SELECT sales_count, total_sales FROM sales GROUP BY product_id, day

Ingestion Aggregation 56 • Use ingestion time aggregation when only
aggregated data is required and not the entire raw data • Allowed Aggregate functions: • SUM, MIN, MAX, COUNT, DISTINCTCOUNTHLL Caveats: • All metric columns must have aggregation functions deﬁned.

Data semantics 59

Data Semantics • Data Semantics ◦ Change Data Capture events
- READ, CREATE, INSERT, UPDATE, DELETE ◦ Multiple events for same record key MySql Mutable records Immutable records Debezium

Data Semantics Immutable records Stream with changes to same record
• Data Semantics ◦ Change Data Capture events - READ, CREATE, INSERT, UPDATE, DELETE ◦ Multiple events for same record key

62 Data Semantics [ {"name": "test.movies.movies.Key" }, "payload": { "id":
"{\"$oid\": \"1002\"}" } }, {"payload": { "before": null, "after": "{\"_id\": {\"$oid\": \"1002\"},\"title\": \"Black Panther\",\"rating\": \"PG-13\",\"imdb\": 7.3}", "source": { "connector": "mysql","name": "test","ts_ms": 1670357925000, "snapshot": "false","db": "movies", … }, "op": "c", "ts_ms": 1670357926132, "transaction": null } }] [ {"name": "test.movies.movies.Key" }, "payload": { "id": "{\"$oid\": \"1002\"}" } }, {"payload": { "before": null, "after": "{\"_id\": {\"$oid\": \"1002\"},\"title\": \"Black Panther\",\"rating\": \"PG-13\",\"imdb\": 7.3}", "source": { "connector": "mysql","name": "test","ts_ms": 1670357925000, "snapshot": "false","db": "movies", ... }, "op": "c", "ts_ms": 1670357926132, "transaction": null } }] movie_id movie_title rating imdb_score timestamp 1002 Black Panther PG-13 7.3 2022-12-06 20:18:46 Update record by primary key Dealing with Data Semantics

Dealing with Data Semantics ID Name Rating IMDB 1002 Black
Panther PG-13 7.3 [ {"schema": { "type": "struct", "fields": [ ... ], "optional": false, "name": "test.movies.movies.Key" }, "payload": { "id": "{\"$oid\": \"1002\"}" } }, { "schema": { "type": "struct", "fields": [ ... ], "optional": false, "name": "test.movies.movies.Envelope" }, "payload": { "before": null, "after": "{\"_id\": {\"$oid\": \"1002\"},\"title\": \"Black Panther\",\"rating\": \"PG-13\",\"imdb\": 7.3}", "source": { "connector": "mysql", "name": "test", "ts_ms": 1670357925000, "snapshot": "false", "db": "movies", … }, "op": "c", "ts_ms": 1670357926132, "transaction": null } } ] Create Record MySQL Debezium

Dealing with Data Semantics ID Name Rating IMDB 1002 Black
Panther PG-13 8.5 [ {"schema": { "type": "struct", "fields": [ ... ], "optional": false, "name": "test.movies.movies.Key" }, "payload": { "id": "{\"$oid\": \"1002\"}" } }, { "schema": { "type": "struct", "fields": [ ... ], "optional": false, "name": "test.movies.movies.Envelope" }, "payload": { "before": "{\"_id\": {\"$oid\": \"1002\"},\"title\": \"Black Panther\",\"rating\": \"PG-13\",\"imdb\": 7.3}, "after": "{\"_id\": {\"$oid\": \"1002\"},\"title\": \"Black Panther\",\"rating\": \"PG-13\",\"imdb\": 8.5}", "source": { "connector": "mysql", "name": "test", "ts_ms": 16703579961000, "snapshot": "false", "db": "movies" … }, "op": "u", "ts_ms": 1670357986189, "transaction": null } } ] Update MySQL

Stream Consumer Decode Transform Index Upsert Manager Primary Key Doc
Id DebeziumDecoder Record stored in segment Upsert Metadata Dealing with Data Semantics CREATE

Id 1002 0 1002 Black Panther PG-13 7.3 DebeziumDecoder Record stored in segment Upsert Metadata Dealing with Data Semantics - CDC UPDATE

Id 1002 1 1002 Black Panther PG-13 7.3 1002 Black Panther PG-13 8.5 DebeziumDecoder Record stored in segment Upsert Metadata Dealing with Data Semantics - CDC

69 From Stream to Pinot Data Partitioning Multiple data sources
Data Transformation Derived Data Complex type / Nested Fields Data Granularity Data Semantics

Stream Processing Still Needed? 70

Yes! When is Stream Processing Needed? • Multiple Data sources
◦ Complex processing of data from multiple sources such as ▪ Stream-Table join ▪ Stream-Stream join • Data that requires re-partitioning 71

Recap 72

Realtime Ingestion in Pinot • Pluggable architecture • Ingestion-time Data
Processing ◦ Ingestion Transforms ◦ Derived Columns ◦ Complex type handling (ﬁltering / ﬂattening) ◦ Ingestion Aggregation ◦ Upsert semantics • Use Ingestion features to get data into an optimal form for your use-case 73

Looking Ahead 74

Enhancements to Pinot Realtime Engine • Support record deletion in
upsert tables (In Progress - GH#10452) • Segment compaction in upsert tables (In Progress - GH#6912) • Support for Data Enrichment • Support ingesting from multiple sources 75

Thank You! dev.startree.ai 76

Stream Consumer in Apache Pinot • Streaming data system should
◦ Support or allow a partitioned-view of the data for consumption ◦ Support or allow deriving a notion of an “offset” or a stream “marker” ◦ Support some mechanism to “seek” within a stream ◦ Provide message ordering at a partition-level 77

Stream Decoders in Apache Pinot • Stream Message Decoders accept
the input from a StreamConsumer and convert it into Pinot’s data model represented as a GenericRow • Custom-build your decoder by implementing a simple interface Interface StreamMessageDecoder<T> { GenericRow decode(T payload, GenericRow destination); } 78

Gently Down the Stream with Apache Pinot (Navin...

Gently Down the Stream with Apache Pinot (Navina Ramesh, StarTree) | RTA Summit 2023

More Decks by StarTree

Other Decks in Technology

Featured

Transcript