Slide 1

Slide 1 text

Forecasting Time Series Data at Scale, with the TICK stack Nathaniel Cook @nathanielvcook

Slide 2

Slide 2 text

Overview ● Time series data forecasting ● Challenges of scale ● Introduce Facebook's Prophet procedure ● Example using the TICK stack

Slide 3

Slide 3 text

What is time series data forecasting? ● Predict future values based off past values ● Compute accuracy using historical windows ● Use simple baseline models

Slide 4

Slide 4 text

What are the challenges of scale? Not enough time and resources to manage each individual series

Slide 5

Slide 5 text

What is Facebook's Prophet procedure? ● Algorithm ● Workflow

Slide 6

Slide 6 text

What is the Prophet algorithm? Simple general additive model: y(t) = g(t) + s(t) + h(t) ● g(t) - growth ● s(t) - seasonality ● h(t) - holidays Simple intuitive parameters for each term

Slide 7

Slide 7 text

What is the Prophet workflow? Create/Update Model Evaluate Model Analyst in the Loop Surface Problems Visually Inspect Model

Slide 8

Slide 8 text

Example ● Dataset ● Goals

Slide 9

Slide 9 text

Dataset ● Github stars for ~400 python projects ● Projects came from an awesome-python list ● Small and large projects ● Diverse uses of python from common libraries to end user applications ● New and old projects

Slide 10

Slide 10 text

Goals ● Forecast all 400 time series reliably ● Repair any problematic forecasts ● Do it in a scalable manner

Slide 11

Slide 11 text

What is the TICK stack? ● Telegraf - collection agent (not used in today's example) ● InfluxDB - database ● Chronograf - visualization ● Kapacitor - processing engine

Slide 12

Slide 12 text

How does the TICK stack enable the workflow? ● Store time series in InfluxDB. ● Use Kapacitor tasks to evaluate and surface problems with models. ● Use Chronograf to inspect models.

Slide 13

Slide 13 text

All Github Stars by Project

Slide 14

Slide 14 text

1. Create/Update Model ● 3 Kapacitor task templates ○ Baseline models: Mean, Exponential Smoothing ○ Prophet Model (UDF) ○ 3 tasks per Github project = ~1200 tasks ● Each task(project) can have its own parameters

Slide 15

Slide 15 text

Model Prophet Task Template var data = batch |query(''' SELECT value FROM srcDB.srcRP.srcMeasurement WHERE project = '$project' ''') .period(history) .every(forecast) .align() .groupBy(groupBy) @prophet() .periods(forecast / interval) .field('value') .changepointPriorScale(changepointPriorScale) .intervalWidth(uncertaintyIntervalWidth) |influxDBOut()

Slide 16

Slide 16 text

2. Evaluate Models ● Evaluate each Github project for the past ~10 years of data, for each model type (mean, holt, prophet) ● Compute accuracy of each project/model using "mean absolute percentage error" (MAPE)

Slide 17

Slide 17 text

Accuracy Task var errors = src |join(forecasted) .on('project') |eval(lambda: abs(("src.value" - "forecasted.value") / "src.value")).as('error') var sum_errors = errors |sum('error').as('value') var count = errors |count('error').as('value') sum_errors |join(count) .as('sum_errors', 'count') |eval(lambda: float("sum_errors.value") / float("count.value")) .as('mape') |influxDBOut()

Slide 18

Slide 18 text

3. Surface Problematic Models // Best Performers SELECT bottom(mape, project, model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet' // Worst Performers SELECT top(mape, project, model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet'

Slide 19

Slide 19 text

How did the first pass go?

Slide 20

Slide 20 text

4. Visually Inspect Models

Slide 21

Slide 21 text

1. Update the Model

Slide 22

Slide 22 text

How does it look now?

Slide 23

Slide 23 text

Summary ● Forecasting at scale is about reducing the cost per forecast ● Using simple models and automating the workflow enables forecasting at scale ● The TICK stack provides a platform on which to automate the workflow

Slide 24

Slide 24 text

Resources ● https://docs.influxdata.com/ ● https://github.com/vinta/awesome-python ● https://facebookincubator.github.io/prophet/ ● https://www.gnu.org/software/parallel/ Questions?

Slide 25

Slide 25 text

Extras Live Explorations http://localhost:3000