Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Time Series Forecasting with Pr...

Kan Nishida
February 26, 2020

Introduction to Time Series Forecasting with Prophet

- Introduction to Prophet
- Seasonality and Additive/Multiplicative Modes
- Variable Importance / Effects
- Test Mode and Evaluation of the Model
- External Variables (Extra Regressor)

Kan Nishida

February 26, 2020
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  4. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  5. • Application of Supervised Training • Build a model to

    predict value from date, with past data as training data. • By feeding the model with future dates, it will produce forecasted values for the future dates. Time Series Forecasting
  6. Builds a model that predicts a day’s value from past

    N days. Traditional Approach with Time Series Forecasting Day 1 Day 2 Day 3 Day 4 Day 2 Day 3 Day 4 Day 5 Day 3 Day 4 Day 5 Day 6
  7. • Time interval between data has to be same throughout

    the data • Day with NA is not allowed • Seasonality with multiple periods (Week and Year) is hard to handle • Parameter tuning is hard and requires a forecasting expert level knowledge. Problems with Traditional Time Series Model
  8. • A ‘curve fitting’ algorithm to build time series forcasting

    models. • Designed for ease of use without expert knowledge on time series forecasting or statistics. • Built by Data Scientists (Sean J. Taylor & co.) at Facebook and open sourced. (https:// facebook.github.io/prophet) Prophet Sean J. Taylor @seanjtaylor
  9. Build a model by finding a best smooth line which

    can be represented as sum of the following components. • Overall growth trend • Seasonality - Yearly, Weekly, Daily, etc. • Holiday effects - X’mas, New Year, July 4th, etc. • External Predictors Prophet - Additive Model
  10. • Can handle uneven time intervals. • Can handle dates

    with NA. • Can handle multiple seasonality components (e.g. Yearly, Weekly, etc.). • Works well by default. • Can improve it by using easy-to-interpret parameters with business domain knowledge. Benefit of Prophet Approach
  11. The blue line is the actual data (Sales), and the

    orange line is the forecasted data.
  12. Under the Trend tab, you can see the overall trend

    that is used by the model. The blue line is the actual (Sales) data, and the green line is the trend.
  13. Weekly tab shows up only when the data is daily

    or more granular levels (hour, minutes, etc.)
  14. Every week, the sales are low on Sunday and Monday,

    and the rest of the week is high.
  15. There are NA for some dates. You can impute NA

    as part of the Data Preprocessing.
  16. Under the Importance tab, you can see which seasonality has

    more effect on the forecasting outcome.
  17. • Keep the latest part of past data as test

    data • Forecast the test data period with the data before the test data period • Estimate the forecast Backtesting
  18. 50 Training Period Test Period Build a model with the

    training data and forecast for the test period. Compare the forecasted values against the actual values in the test period.
  19. 51 Assign Order Date and select ‘Floor to Month’ Assign

    ‘Sales’ to Value and select ‘Sum’.
  20. Since the forecasting model is built based on the training

    data, the forecasted values (Orange) and the actual values (Blue) are usually very close. 54
  21. 55 Evaluate the model based on how much difference there

    is between the forecasted values and the actual values in the Test period.
  22. • MAE (Mean Absolute Error) : Mean of absolute differences

    between actual value and forecasted value. • RMSE (Root Mean Square Error) : Root of mean of squares of difference between actual value and forecasted value. • MAPE (Mean Absolute Percentage Error) : Mean of absolute differences in percentage of actual value. • MASE (Mean Absolute Scaled Error) : Divide MAE of the forecasting model by MAE of a naive forecasting model (a simple model that always forecasts one previous value). Metrics to Evaluate Forecasting Model
  23. Root
 Mean
 Square
 Error Square the Errors (Difference between the

    actual and the forecasted), take the mean, and root the mean. 58 RMSE (Root Mean Square Error)
  24. 
 22 + 22 + 22 + 42 
 4

    4 + 4 + 4 + 16 
 4 7 = 2.65 59 RMSE (Root Mean Square Error) 2 2 4 2 = =
  25. Mean
 Absolute
 Error Absolute values of the Errors (Difference between

    the actual and the forecasted) and take the mean. 60 MAE (Mean Absolute Error)
  26. 
 2 + 2 + 2 + 4 
 4

    61 = 2.5 MAE (Mean Absolute Error) 2 2 4 2
  27. Mean
 Absolute
 Percentage
 Error Absolute Percentage differences between the actual

    and the forecasted and take the mean. 62 MAPE (Mean Absolute Percentage Error)
  28. 63 12 13 16 11 Take the actual values. MAPE

    (Mean Absolute Percentage Error)
  29. 65 100 100 100 100 MAPE (Mean Absolute Percentage Error)

    16.6% 15.4% 25% 18.2% Calculate the ratio of the errors against the actual values. If it’s negative make it positive. (Absolute)
  30. 66 100 100 100 100 MAPE (Mean Absolute Percentage Error)

    16.6% 15.4% 25% 18.2% 
 16.6 + 15.4 + 18.2 + 25 
 4 Take the mean. = 18.8%
  31. 67 Mean
 Absolute
 Scaled
 Error Scale the MAE so that

    it can be compared among data sets with different variability. MASE (Mean Absolute Scaled Error)
  32. 68 MAE in the test period / MAE of a

    ‘naive’ prediction model in the training period MASE - How to Scale
  33. 71 MAE in the test period MAE of a ‘naive’

    prediction model in the training period MASE = MAE in the test period / MAE of a ‘naive’ prediction model in the training period
  34. 73 The difference between the actual line and the forecasted

    line becomes wider as the time progresses.
  35. What’s happening? 74 • This model is assuming that the

    seasonal variability stays the constant. • But, as Sales increases maybe the seasonal variability increases, too. • This model might not be able to keep up with the actual growth of Sales. Can we model the seasonality that also grows as the Sales grows?
  36. Additive Growth Multiplicative Growth Number of Employees at a very

    stable company Amazon’s Sales Trend Different Types of Growth Models 75
  37. 76 The effect is additive The effect is multiplicative Additive

    Multiplicative Seasonality Mode Grows 20% in February Grows $20,000 in February The growth is constant regardless of the previous value The growth rate is constant, which means it grows bigger when the previous value big.
  38. 80 Seasonality Effect is constant throughout the years Seasonality Effect

    is growing as the Sales grows Additive Model Multiplicative Model
  39. 81 Additive Model Multiplicative Model Seasonality Effect is constant throughout

    the years Seasonality Effect is growing as the Sales grows
  40. 83 The Additive Seasonality model is not keeping up with

    the actual Sales growth while the Multiplicative model is performing better. Additive Model Multiplicative Model
  41. 84 Additive Model Multiplicative Model All Evaluation Metrics suggest that

    the multiplicative seasonality model performs better than the additive model.
  42. 87

  43. • RMSE and MAE indicate that the errors are smaller

    for Africa compared to Asia Pacific. • Can we conclude that the model for Africa is better? 88
  44. 89 Sales amounts are bigger in Asia Pacific than in

    Africa in general, hence it’s expected that RMSE and MAE are bigger for Asia Pacific.
  45. • MAPE is calculated as the percentage of the errors

    compared to the actual values. • It is more useful when we want to compare the two without considering the scale of the actual values. • MAPE is smaller for Asia Pacific, which means we have a better model for Asia Pacific than for Africa. 90
  46. 91 • The ratio of MAE in Test Period against

    the naive prediction model. • Given it’s a ratio, we can use it to compare among the models regardless of the scale of the actual values. • One advantage of MASE over MAPE is that MAPE can be unreliable when the actual values are close to 0 or have positive and negative values together where MASE doesn’t get bothered by such cases. • The MASE suggests that the model is fitting better for Asia Pacific and USCA compared to other regions.
  47. If we know some variables are correlated with Sales and

    we can get to know the future values of such variables, can’t we forecast better?
  48. Marketing Let’s say we can control how much we will

    be spending on the marketing in the next 3 months. Can we make such information built into the forecasting model? Example
  49. Weather Let’s say our Sales is usually impacted by the

    weather and we can forecast the temperature or whether it will rain or not for the next 10 days. Can we use such information built into the forecasting model? Example
  50. You can assign variables as ‘External Predictors (Extra Regressors) and

    build forecasting models. Prophet investigate if the external predictor (Marketing) is useful to build a better model forecast the target variable (Sales) and calculate the coefficient of the predictor variable. External Predictors
  51. A forecasting model to forecast Sales based on the Trend,

    Yearly Seasonality, and Sales Comp.
  52. Base Model with Sales Comp. Base Model with Sales Comp.,

    Marketing, and Dicount The forecasting model quality has improved.
  53. Under the Effects tab, you can see how each of

    the seasonality and the predictor variables are effecting on the forecasted values.
  54. Under the Importance tab, you can comparesee how each of

    the seasonality and the predictor variables are effecting on the forecasted values.
  55. 113 • Holiday might effect the measures like sales, number

    of orders, etc. • Special events like conferences, black friday, sports events, etc. might effect the measures. Holiday Effect
  56. • Set ‘Day’ for Date / Time • Set 365

    as Forecasting Time Period Forecast without Considering Holiday Effect 117
  57. Select ‘Full join’ as the join type because we want

    to bring the dates that don’t exist in the current data frame but does exist in the target data frame. Add WWDC Dates by Join 120
  58. 121

  59. 125

  60. 128 You can set Upper / Lower Limit values. •

    Sales / Demand ForecastingɿWe know it won’t exceed a certain limit. • Conversion RateɿIt never exceed 100%. Prophet can take those limits into account. Capacity
  61. 129 You can set the lower limit when the upper

    limit is set. • The default value is 0. • You can set a negative value as well. Lower Limit
  62. 131

  63. Trend Line 132 Trend line becomes a logistic curve instead

    of a linear line when Upper/Lower Limits are set.