Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Forecasting Time-Series data at Scale

Forecasting Time-Series data at Scale

Forecasting time series data across a variety of different time series comes with many challenges. Using the TICK stack we demonstrate a workflow that helps to overcome those challenges. Specifically we take a look at the Facebook Prophet procedure for forecasting business time series.

Nathaniel Cook

July 06, 2017
Tweet

Other Decks in Programming

Transcript

  1. Overview • Time series data forecasting • Challenges of scale

    • Introduce Facebook's Prophet procedure • Example using the TICK stack
  2. What is time series data forecasting? • Predict future values

    based off past values • Compute accuracy using historical windows • Use simple baseline models
  3. What are the challenges of scale? Not enough time and

    resources to manage each individual series
  4. What is the Prophet algorithm? Simple general additive model: y(t)

    = g(t) + s(t) + h(t) • g(t) - growth • s(t) - seasonality • h(t) - holidays Simple intuitive parameters for each term
  5. What is the Prophet workflow? Create/Update Model Evaluate Model Analyst

    in the Loop Surface Problems Visually Inspect Model
  6. Dataset • Github stars for ~400 python projects • Projects

    came from an awesome-python list • Small and large projects • Diverse uses of python from common libraries to end user applications • New and old projects
  7. Goals • Forecast all 400 time series reliably • Repair

    any problematic forecasts • Do it in a scalable manner
  8. What is the TICK stack? • Telegraf - collection agent

    (not used in today's example) • InfluxDB - database • Chronograf - visualization • Kapacitor - processing engine
  9. How does the TICK stack enable the workflow? • Store

    time series in InfluxDB. • Use Kapacitor tasks to evaluate and surface problems with models. • Use Chronograf to inspect models.
  10. 1. Create/Update Model • 3 Kapacitor task templates ◦ Baseline

    models: Mean, Exponential Smoothing ◦ Prophet Model (UDF) ◦ 3 tasks per Github project = ~1200 tasks • Each task(project) can have its own parameters
  11. Model Prophet Task Template var data = batch |query(''' SELECT

    value FROM srcDB.srcRP.srcMeasurement WHERE project = '$project' ''') .period(history) .every(forecast) .align() .groupBy(groupBy) @prophet() .periods(forecast / interval) .field('value') .changepointPriorScale(changepointPriorScale) .intervalWidth(uncertaintyIntervalWidth) |influxDBOut()
  12. 2. Evaluate Models • Evaluate each Github project for the

    past ~10 years of data, for each model type (mean, holt, prophet) • Compute accuracy of each project/model using "mean absolute percentage error" (MAPE)
  13. Accuracy Task var errors = src |join(forecasted) .on('project') |eval(lambda: abs(("src.value"

    - "forecasted.value") / "src.value")).as('error') var sum_errors = errors |sum('error').as('value') var count = errors |count('error').as('value') sum_errors |join(count) .as('sum_errors', 'count') |eval(lambda: float("sum_errors.value") / float("count.value")) .as('mape') |influxDBOut()
  14. 3. Surface Problematic Models // Best Performers SELECT bottom(mape, project,

    model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet' // Worst Performers SELECT top(mape, project, model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet'
  15. Summary • Forecasting at scale is about reducing the cost

    per forecast • Using simple models and automating the workflow enables forecasting at scale • The TICK stack provides a platform on which to automate the workflow