Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar: Time Series Forecasting with Prophet

Exploratory Seminar: Time Series Forecasting with Prophet

Prophet is an easy to use time series forecasting algorithm developed by Sean Taylor and co. at Facebook. I’ll be demonstrating how to use it in Exploratory.

Kan Nishida

November 26, 2018
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida co-founder/CEO Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to make Data Science available for everyone. Prior to Exploratory, Kan was a development director at Oracle leading development teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform by data. @KanAugust Instructor
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Smart Waves - Machine Learning / AI Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users Exploratory
  4. Questions Data Science Workflow Communication Data Access Data Wrangling Data

    Visualization Machine Learning / Statistics Exploration
  5. Questions What you can do with Exploratory Communication Data Access

    Data Wrangling Visualization Machine Learning / Statistics Exploratory Data Analysis
  6. • Application of Supervised Training • Build a model to

    predict value from date, with past data as training data. • By feeding the model with future dates, it will produce forecasted values for the future dates. Time Series Forecasting
  7. Builds a model that predicts a day’s value from past

    N days. Traditional Approach with Time Series Forecasting Day 1 Day 2 Day 3 Day 4 Day 2 Day 3 Day 4 Day 5 Day 3 Day 4 Day 5 Day 6
  8. • Time interval between data has to be same throughout

    the data • Day with NA is not allowed • Seasonality with multiple periods (Week and Year) is hard to handle • Parameter tuning by expert is necessary Problems with Traditional Time Series Model
  9. • Open source Time Series Forecasting Algorithm from Facebook. •

    Designed for ease of use without expert knowledge on time series forecasting or statistics. Prophet
  10. • Builds model by finding a best smooth line which

    can be represented as sum of the following components. • Overall growth trend • Yearly seasonality • Weekly seasonality • Holiday effects - X’mas, New Year, July 4th, etc. Time Series Forecasting by Prophet
  11. • Uneven time interval between data is not a problem.

    • Day with NA is not a problem. • Seasonality with multiple periods (Week and Year) is handled by default. • Works well by default setting. Parameters are easily interpretable Benefit of Prophet Approach
  12. • Assign Order Date to Date/Time. • Select DAY (Floor

    to Day) as the unit of time. • Column, and Sales to Value Column. • Select SUM as the function. By default, forecasting period is 10. In this case, since DAY is the unit, forecast for 10 days will be made. Run Time Series Forecasting on Sales
  13. Click Run button, and forecasted data is displayed in orange

    line. Let’s zoom into the period near the end
  14. • Period with only orange line is the 10 days

    that is forecasted. • Area in light orange is uncertainty interval of the forecast. Zoomed into the period near the end of data.
  15. • forecasted_value - The forecasted values • forecasted_value_high/forecasted_value_low - Uncertainty

    interval • trend - Overall growth trend • yearly - Yearly seasonal trend • weekly - Weekly trend Data with Forecasted Values
  16. 39 • Holiday might effect the measures like sales, number

    of orders, etc. • Special events like conferences, black friday, sports events, etc. might effect the measures. Holiday Effect
  17. • Set ‘Day’ for Date / Time • Set 365

    as Forecasting Time Period Forecast without Considering Holiday Effect 43
  18. Select ‘Full join’ as the join type because we want

    to bring the dates that don’t exist in the current data frame but does exist in the target data frame. Add WWDC Dates by Join 46
  19. 47

  20. 51

  21. 54 You can set Upper / Lower Limit values. •

    Sales / Demand ForecastingɿWe know it won’t exceed a certain limit. • Conversion RateɿIt never exceed 100%. Prophet can take those limits into account. Capacity
  22. 55 You can set the lower limit when the upper

    limit is set. • The default value is 0. • You can set a negative value as well. Lower Limit
  23. 57

  24. Trend Line 58 Trend line becomes a logistic curve instead

    of a linear line when Upper/Lower Limits are set.
  25. • Keep the latest part of past data as test

    data • Forecast the test data period with the data before the test data period • Estimate the forecast Backtesting
  26. 61 Assign Order Date and select ‘Floor to Month’ Assign

    ‘Sales’ to Value and select ‘Sum’.
  27. 63 Set ’12’, this means ’12 months’ because the data

    is currently set as monthly date. This will reserve the last 12 months data as the test data.
  28. Since the forecasting model is built based on the training

    data, the forecasted values (Orange) and the actual values (Blue) are very close. 66
  29. 68 The difference between the forecasted values and the actual

    values can be interpreted as how this year is different compared to the scenario if we assumed it could have been the same trend as the previous years.
  30. 69 Compared to the previous years, we can see that

    there was a big drop in February and big increases in the later part of the year.
  31. • MAE (Mean Absolute Error) : Mean of absolute differences

    between actual value and forecasted value. • RMSE (Root Mean Square Error) : Root of mean of squares of difference between actual value and forecasted value. • MAPE (Mean Absolute Percentage Error) : Mean of absolute differences in percentage of actual value. • MASE (Mean Absolute Scaled Error) : Divide MAE of the forecasting model by MAE of a naive forecasting model (a simple model that always forecasts one previous value). Metrics to Evaluate Time Series Forecast
  32. The evaluation metrics are all calculated based on the difference

    between the actual and the forecasted values in this period. 72
  33. Root
 Mean
 Square
 Error Square the Errors (Difference between the

    actual and the forecasted), take the mean, and root the mean. 73 RMSE (Root Mean Square Error)
  34. 
 22 + 22 + 22 + 42 
 4

    4 + 4 + 4 + 16 
 4 7 = 2.65 74 RMSE (Root Mean Square Error) 2 2 4 2 = =
  35. Mean
 Absolute
 Error Absolute values of the Errors (Difference between

    the actual and the forecasted) and take the mean. 75 MAE (Mean Absolute Error)
  36. 
 2 + 2 + 2 + 4 
 4

    76 = 2.5 MAE (Mean Absolute Error) 2 2 4 2
  37. Mean
 Absolute
 Percentage
 Error Absolute Percentage differences between the actual

    and the forecasted and take the mean. 77 MAPE (Mean Absolute Percentage Error)
  38. 78 12 13 16 11 Take the actual values. MAPE

    (Mean Absolute Percentage Error)
  39. 80 Calculate the ratio of the errors against the actual

    values. If it’s negative make it positive. (Absolute) MAPE (Mean Absolute Percentage Error) 16.6% 15.4% 25% 18.2%
  40. 
 16.6 + 15.4 + 18.2 + 25 
 4

    81 Take the mean. = 18.8% MAPE (Mean Absolute Percentage Error) 16.6% 15.4% 25% 18.2%
  41. Compare the Metrics by Market 82 To compare among Markets,

    select ‘Market’ for Repeat By and build multiple forecasting models.
  42. 83

  43. • RMSE and MAE indicate that the errors are smaller

    for Africa compared to Asia Pacific. • Can we conclude that the model for Africa is better? 86
  44. Sales amounts are bigger in Asia Pacific than in Africa

    in general, hence it’s expected that RMSE and MAE are bigger for Asia Pacific. 87
  45. • MAPE is calculated as the percentage of the errors

    compared to the actual values. • It is more useful when we want to compare the two without considering the scale of the actual values. • MAPE is smaller for Asia Pacific, which means we have a better model for Asia Pacific than for Africa. 88