Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

https://runwayml.com/ai-magic-tools/frame-interpolation/

Slide 4

Slide 4 text

Interpolating Time Series Data on-demand

Slide 5

Slide 5 text

Time Series Data ● Track change of value/state mutation over time. ● Discrete (Ex: Orders) vs Continuous (Ex: Status) events. ● Append vs Upsert.

Slide 6

Slide 6 text

Time Series Data: Examples ● Real-time vehicle data: speed, braking, acceleration, etc ● Server performance metrics: CPU, memory, network usage, etc ● Industrial equipments: Telemetry from IOT sensors

Slide 7

Slide 7 text

Time Series Data ● Sparse data. ● Missing data. ● Higher polling interval => Higher the granularity => More resources => Higher the cost.

Slide 8

Slide 8 text

Time Series Data

Slide 9

Slide 9 text

Data Visualization

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Climate Spiral: https://climate.nasa.gov

Slide 12

Slide 12 text

Time Series Charts

Slide 13

Slide 13 text

Past Present Future What happened? Why did it happen? What should I have done? What is happening? Why is it happening? What should I do now? What will happen? Why will it happen? What should I do next? What Why Should Time Series Charts

Slide 14

Slide 14 text

Use Case: Store uptime

Slide 15

Slide 15 text

Use case: Food delivery

Slide 16

Slide 16 text

Use case: Digital Storefronts

Slide 17

Slide 17 text

Store Status Events Dataset

Slide 18

Slide 18 text

Store Uptime

Slide 19

Slide 19 text

Customer facing Uptime Reports

Slide 20

Slide 20 text

Internal KPI: Online Stores count

Slide 21

Slide 21 text

Store Status Change Events Dataset timestamp store_id is_open should_be_open

Slide 22

Slide 22 text

Computing Store Uptime at Scale ● High fidelity pre-aggregation of events ○ Less flexible ○ High cost ● Interpolating raw state transitions on-demand ○ More flexibility ○ Low cost

Slide 23

Slide 23 text

Total Expected Store Uptime store_id1 store_id2 store_id3 store_id4 2022/07/01 11:00 2022/07/01 10:00 2022/07/01 16:00 2022/07/01 21:00 10 hours 6 hours 10 hours 6 hours

Slide 24

Slide 24 text

Actual Store Uptime store_1 store_2 store_3 store_4 2022/07/01 11:00 2022/07/01 10:00 2022/07/01 16:00 2022/07/01 21:00 10 hours 6 hours 9.5 hours 6 hours 2022/07/01 13:00 2022/07/01 13:30

Slide 25

Slide 25 text

Sample Dataset store_id event_ts is_open should_be_open store_1 2023-04-01 11:00 true true store_1 2023-04-01 21:00 false false store_2 2023-04-01 10:00 true true store_2 2023-04-01 16:00 false false store_3 2023-04-01 11:00 true true store_3 2023-04-01 13:00 false true store_3 2023-04-01 13:30 true true store_3 2023-04-01 21:00 false false store_4 2023-04-01 10:00 true true store_4 2023-04-01 16:00 false false

Slide 26

Slide 26 text

Total Store Uptime With Gapfill store_s1 store_s2 store_s3 store_s4 2023/04/01 10:00 2023/04/01 21:00 10 hours 6 hours 10 hours 6 hours

Slide 27

Slide 27 text

Total Expected Store Uptime With Gapfill store_s1 store_s2 store_s3 store_s4 2022/07/01 10:00 2022/07/01 21:00 10 hours 6 hours 10 hours 6 hours

Slide 28

Slide 28 text

Query: Uptime of a specific store SELECT time_col, SUM(*) as total_store_uptime_minutes FROM ( SELECT GapFill(time_col, '1:MILLISECONDS:EPOCH',1656633600,1656720000,'1:MINUTES', '1:DAYS', Fill(status, 'FILL_PREVIOUS_VALUE'), TimeSeriesOn(store_id)), status FROM ( SELECT DATETIMECONVERT(event_ts, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES') as time_col, LastWithTime(is_open, created_at, 'BOOLEAN') as status FROM store_status_events_tbl WHERE (event_ts BETWEEN 1656633600 AND 1656720000) AND (store_id = 'store_id1') GROUP BY time_col ORDER BY time_col ASC LIMIT 100000 ) LIMIT 2000000 ) WHERE status = 'true' GROUP BY time_col LIMIT 2000

Slide 29

Slide 29 text

Pinot Architecture

Slide 30

Slide 30 text

Step 1: Select raw events SELECT time_col, SUM(*) as total_store_uptime_minutes FROM ( SELECT GapFill(time_col, '1:MILLISECONDS:EPOCH',1656633600,1656720000,'1:MINUTES', '1:DAYS', FILL(status, 'FILL_PREVIOUS_VALUE'), TIMESERIESON(store_id), status FROM ( SELECT DATETIMECONVERT(event_ts, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES') as time_col, LastWithTime(is_open, created_at, 'BOOLEAN') as status FROM store_status_events_tbl WHERE (event_ts BETWEEN 1656633600 AND 1656720000) AND (store_id = 'store_id1') GROUP BY time_col ORDER BY time_col ASC LIMIT 100000 ) LIMIT 2000000 ) WHERE status = 'true' GROUP BY time_col LIMIT 2000

Slide 31

Slide 31 text

Step 2: Fill the gaps SELECT time_col, SUM(*) as total_store_uptime_minutes FROM ( SELECT GapFill(time_col, '1:MILLISECONDS:EPOCH',1656633600,1656720000,'1:MINUTES', '1:DAYS', Fill(status, 'FILL_PREVIOUS_VALUE'), TimeSeriesOn(store_id)), status FROM ( SELECT DATETIMECONVERT(event_ts, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES') as time_col, LastWithTime(is_open, created_at, 'BOOLEAN') as status FROM store_status_events_tbl WHERE (event_ts BETWEEN 1656633600 AND 1656720000) AND (store_id = 'store_id1') GROUP BY time_col ORDER BY time_col ASC LIMIT 100000 ) LIMIT 2000000 ) WHERE status = 'true' GROUP BY time_col LIMIT 2000

Slide 32

Slide 32 text

Step 3: Aggregate SELECT time_col, SUM(*) as total_store_uptime_minutes FROM ( SELECT GapFill(time_col, '1:MILLISECONDS:EPOCH',1656633600,1656720000,'1:MINUTES', '1:DAYS', Fill(status, 'FILL_PREVIOUS_VALUE'), TimeSeriesOn(store_id)), status FROM ( SELECT DATETIMECONVERT(event_ts, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES') as time_col, LastWithTime(is_open, created_at, 'BOOLEAN') as status FROM store_status_events_tbl WHERE (event_ts BETWEEN 1656633600 AND 1656720000) AND (store_id IN ('store_id1') GROUP BY time_col ORDER BY time_col ASC LIMIT 100000 ) LIMIT 2000000 ) WHERE status = 'true' GROUP BY time_col LIMIT 2000

Slide 33

Slide 33 text

Query: Total online stores SELECT time_col, COUNT(*) as total_store_uptime_minutes FROM ( SELECT GapFill(time_col, '1:MILLISECONDS:EPOCH',1656633600,1656720000,'1:MINUTES', '1:DAYS', FILL(status, 'FILL_PREVIOUS_VALUE'), TIMESERIESON(store_id)), status, store_id FROM ( SELECT DATETIMECONVERT(event_ts, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES') as time_col, LastWithTime(is_open, event_time, 'BOOLEAN') as status, store_id FROM enriched_store_staus_events WHERE (event_ts BETWEEN 1656633600 AND 1656720000) GROUP BY time_col, store_id ORDER BY time_col ASC LIMIT 100000 ) LIMIT 2000000) WHERE status = 'true' GROUP BY time_col LIMIT 2000

Slide 34

Slide 34 text

Under the hood

Slide 35

Slide 35 text

Request Flow Pinot Broker Server BrokerReducerService QueryRouter Pinot Server n …….. 2.Stripped Sql Query Result Set n Result set 0 Result set n …….. 1. Gapfill Sql Query 5.GapFilledResult Result Set 0 2. Stripped Sql Query GapfillProcessor 4.1 Gapfilling 4.2 Filtering 5.1 Aggregation 5.2 Filtering Reduced Result 3 Pinot Server 0

Slide 36

Slide 36 text

Execution Plan: Server Side Pinot Server 0 Filter Operator Aggregate Operator Combine Operator Pinot Server 1 Filter Operator Aggregate Operator Projector Operator 2023-04-01 11:00, store_id_0, true 2023-04-01 11:00, store_id_1, true 2023-04-01 13:00, store_id_1, false Transform Operator Transform Operator 2023-04-01 13:30, store_id_1, true 2023-04-01 21:00, store_id_0, false 2023-04-01 21:00, store_id_1, false Same as above Same as above Same as above Same as above Same as above Same as above Broker

Slide 37

Slide 37 text

Execution Plan: Broker Side Pinot Broker Reducer Gapfill 2023-04-01 11:00, store_id_0, true 2023-04-01 11:00, store_id_1, true 2023-04-01 13:00, store_id_1, false 2023-04-01 13:30, store_id_1, true 2023-04-01 21:00, store_id_0, false 2023-04-01 21:00, store_id_1, false Filter Aggregate 2023-04-01 11:00, store_id_0, true 2023-04-01 11:01, store_id_0, true 2023-04-01 11:02, store_id_0, true … 2023-04-01 11:00, store_id_1, true 2023-04-01 11:01, store_id_1, true 2023-04-01 11:02, store_id_1, true … 2023-04-01 13:00, store_id_1, false 2023-04-01 13:01, store_id_1, false 2023-04-01 13:01, store_id_1, false … 2023-04-01 11:00, store_id_0, true 2023-04-01 11:01, store_id_0, true 2023-04-01 11:02, store_id_0, true … … 2023-04-01 11:00, store_id_1, true 2023-04-01 11:01, store_id_1, true 2023-04-01 11:02, store_id_1, true … …

Slide 38

Slide 38 text

Challenges: Blind spots

Slide 39

Slide 39 text

Future Work ● Push down the gapfill logic from pinot broker to pinot server. ○ Leverage data partitioning if events are co-located. ○ Running the gapfill logic on the pinot servers where the data live. ○ Reduce the data transmission. ○ Increase the parallelism and performance of gapfill. ● Leverage V2 engine. ● Add support for advanced interpolations: Linear, Polynomial, Spline etc ● Time series extrapolation

Slide 40

Slide 40 text

To learn more… ● Documentation: https://docs.pinot.apache.org/users/user-guide-query/gap-fill-functions ● Blog: https://startree.ai/blog/gapfill-function-for-time-series-datasets-in-pinot ● RFC: https://docs.google.com/document/d/1FQbsTVywWLlmFaeUcYqDkb-XRWC pVhZe10BZyD2VzcU/edit#

Slide 41

Slide 41 text

Thank you!