Forecasting Time-Series data at Scale

Forecasting Time-Series data at Scale

Forecasting time series data across a variety of different time series comes with many challenges. Using the TICK stack we demonstrate a workflow that helps to overcome those challenges. Specifically we take a look at the Facebook Prophet procedure for forecasting business time series.

Cd189761ed94f0f2852e78dc3d18c7fe?s=128

Nathaniel Cook

July 06, 2017
Tweet

Transcript

  1. Forecasting Time Series Data at Scale, with the TICK stack

    Nathaniel Cook @nathanielvcook
  2. Overview • Time series data forecasting • Challenges of scale

    • Introduce Facebook's Prophet procedure • Example using the TICK stack
  3. What is time series data forecasting? • Predict future values

    based off past values • Compute accuracy using historical windows • Use simple baseline models
  4. What are the challenges of scale? Not enough time and

    resources to manage each individual series
  5. What is Facebook's Prophet procedure? • Algorithm • Workflow

  6. What is the Prophet algorithm? Simple general additive model: y(t)

    = g(t) + s(t) + h(t) • g(t) - growth • s(t) - seasonality • h(t) - holidays Simple intuitive parameters for each term
  7. What is the Prophet workflow? Create/Update Model Evaluate Model Analyst

    in the Loop Surface Problems Visually Inspect Model
  8. Example • Dataset • Goals

  9. Dataset • Github stars for ~400 python projects • Projects

    came from an awesome-python list • Small and large projects • Diverse uses of python from common libraries to end user applications • New and old projects
  10. Goals • Forecast all 400 time series reliably • Repair

    any problematic forecasts • Do it in a scalable manner
  11. What is the TICK stack? • Telegraf - collection agent

    (not used in today's example) • InfluxDB - database • Chronograf - visualization • Kapacitor - processing engine
  12. How does the TICK stack enable the workflow? • Store

    time series in InfluxDB. • Use Kapacitor tasks to evaluate and surface problems with models. • Use Chronograf to inspect models.
  13. All Github Stars by Project

  14. 1. Create/Update Model • 3 Kapacitor task templates ◦ Baseline

    models: Mean, Exponential Smoothing ◦ Prophet Model (UDF) ◦ 3 tasks per Github project = ~1200 tasks • Each task(project) can have its own parameters
  15. Model Prophet Task Template var data = batch |query(''' SELECT

    value FROM srcDB.srcRP.srcMeasurement WHERE project = '$project' ''') .period(history) .every(forecast) .align() .groupBy(groupBy) @prophet() .periods(forecast / interval) .field('value') .changepointPriorScale(changepointPriorScale) .intervalWidth(uncertaintyIntervalWidth) |influxDBOut()
  16. 2. Evaluate Models • Evaluate each Github project for the

    past ~10 years of data, for each model type (mean, holt, prophet) • Compute accuracy of each project/model using "mean absolute percentage error" (MAPE)
  17. Accuracy Task var errors = src |join(forecasted) .on('project') |eval(lambda: abs(("src.value"

    - "forecasted.value") / "src.value")).as('error') var sum_errors = errors |sum('error').as('value') var count = errors |count('error').as('value') sum_errors |join(count) .as('sum_errors', 'count') |eval(lambda: float("sum_errors.value") / float("count.value")) .as('mape') |influxDBOut()
  18. 3. Surface Problematic Models // Best Performers SELECT bottom(mape, project,

    model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet' // Worst Performers SELECT top(mape, project, model, 10) FROM star_counts WHERE time > now() - 30d AND model = 'prophet'
  19. How did the first pass go?

  20. 4. Visually Inspect Models

  21. 1. Update the Model

  22. How does it look now?

  23. Summary • Forecasting at scale is about reducing the cost

    per forecast • Using simple models and automating the workflow enables forecasting at scale • The TICK stack provides a platform on which to automate the workflow
  24. Resources • https://docs.influxdata.com/ • https://github.com/vinta/awesome-python • https://facebookincubator.github.io/prophet/ • https://www.gnu.org/software/parallel/ Questions?

  25. Extras Live Explorations http://localhost:3000