Exploratory Seminar: Time Series Forecasting with Prophet

EXPLORATORY

Kan Nishida co-founder/CEO Exploratory Summary Beginning of 2016, launched Exploratory,
Inc. to make Data Science available for everyone. Prior to Exploratory, Kan was a development director at Oracle leading development teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform by data. @KanAugust Instructor

Vision Make Data Science available for everyone

Data Science is not just for Engineers and Statisticians. Exploratory
makes it possible for Everyone to do Data Science. The Third Wave

First Wave Second Wave Third Wave Proprietary Open Source UI
& Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Smart Waves - Machine Learning / AI Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users Exploratory

Questions Data Science Workﬂow Communication Data Access Data Wrangling Data
Visualization Machine Learning / Statistics Exploration

Questions What you can do with Exploratory Communication Data Access
Data Wrangling Visualization Machine Learning / Statistics Exploratory Data Analysis

Questions Communication Data Access Data Wrangling Visualization Machine Learning /
Statistics Exploratory Data Analysis

Time Series Forecasting with Prophet

• Application of Supervised Training • Build a model to
predict value from date, with past data as training data. • By feeding the model with future dates, it will produce forecasted values for the future dates. Time Series Forecasting

Builds a model that predicts a day’s value from past
N days. Traditional Approach with Time Series Forecasting Day 1 Day 2 Day 3 Day 4 Day 2 Day 3 Day 4 Day 5 Day 3 Day 4 Day 5 Day 6

• Time interval between data has to be same throughout
the data • Day with NA is not allowed • Seasonality with multiple periods (Week and Year) is hard to handle • Parameter tuning by expert is necessary Problems with Traditional Time Series Model

• Open source Time Series Forecasting Algorithm from Facebook. •
Designed for ease of use without expert knowledge on time series forecasting or statistics. Prophet

• Builds model by ﬁnding a best smooth line which
can be represented as sum of the following components. • Overall growth trend • Yearly seasonality • Weekly seasonality • Holiday effects - X’mas, New Year, July 4th, etc. Time Series Forecasting by Prophet

Trend, Yearly

Trend and Yearly Combined

Adding Weekly

Trend, Yearly, and Weekly Combined

22 Trend, Yearly, and Weekly Combined

23 Holiday Eﬀect

24 Trend, Yearly, Weekly Combined, Holiday Eﬀect

• Uneven time interval between data is not a problem.
• Day with NA is not a problem. • Seasonality with multiple periods (Week and Year) is handled by default. • Works well by default setting. Parameters are easily interpretable Beneﬁt of Prophet Approach

Time Series Forecasting by Analytics View Sales Data

Visualization by Line Chart

Select Time Series Forecast Analytics View

• Assign Order Date to Date/Time. • Select DAY (Floor
to Day) as the unit of time. • Column, and Sales to Value Column. • Select SUM as the function. By default, forecasting period is 10. In this case, since DAY is the unit, forecast for 10 days will be made. Run Time Series Forecasting on Sales

Click Run button, and forecasted data is displayed in orange
line. Let’s zoom into the period near the end

• Period with only orange line is the 10 days
that is forecasted. • Area in light orange is uncertainty interval of the forecast. Zoomed into the period near the end of data.

Under Trend tab, actual values and trend line is displayed.
Trend line is in green.

Under Yearly tab, yearly seasonal trend is displayed.

Under Weekly tab, weekly trend is displayed.

Under Data tab, table with forecasted values are shown.

• forecasted_value - The forecasted values • forecasted_value_high/forecasted_value_low - Uncertainty
interval • trend - Overall growth trend • yearly - Yearly seasonal trend • weekly - Weekly trend Data with Forecasted Values

Advanced Topics

Holiday Eﬀect

39 • Holiday might effect the measures like sales, number
of orders, etc. • Special events like conferences, black friday, sports events, etc. might effect the measures. Holiday Eﬀect

40 Unique Page Views of ”Apple Worldwide Developers Conference” (WWDC)
at Wikipedia Data

41 Apple Worldwide Developers Conference (WWDC) dates Data

42 Data Future dates.

• Set ‘Day’ for Date / Time • Set 365
as Forecasting Time Period Forecast without Considering Holiday Eﬀect 43

44 These WWDC dates are not forecasted well by the
model.

Add WWDC Dates by Join

Select ‘Full join’ as the join type because we want
to bring the dates that don’t exist in the current data frame but does exist in the target data frame. Add WWDC Dates by Join 46

Select Holiday Column 48 These WWDC dates are not forecasted
well by the model.

49 Future conference dates are better forecasted by incorporating the
holiday effect.

50 Holidays Tab Green line is the Holiday effect.

Capacity

53 Capacity - Upper / Lower Limit

54 You can set Upper / Lower Limit values. •
Sales / Demand ForecastingɿWe know it won’t exceed a certain limit. • Conversion RateɿIt never exceed 100%. Prophet can take those limits into account. Capacity

55 You can set the lower limit when the upper
limit is set. • The default value is 0. • You can set a negative value as well. Lower Limit

Trend Line 58 Trend line becomes a logistic curve instead
of a linear line when Upper/Lower Limits are set.

59 Evaluating Forecasting Model

• Keep the latest part of past data as test
data • Forecast the test data period with the data before the test data period • Estimate the forecast Backtesting

61 Assign Order Date and select ‘Floor to Month’ Assign
‘Sales’ to Value and select ‘Sum’.

62 Switch to Test Mode Select ‘TRUE’ for Test Mode.

63 Set ’12’, this means ’12 months’ because the data
is currently set as monthly date. This will reserve the last 12 months data as the test data.

How to understand the result 64

Data is split into two sections. 65 Training Data Test
Data

Since the forecasting model is built based on the training
data, the forecasted values (Orange) and the actual values (Blue) are very close. 66

This is the data that the model didn’t know when
it was built. 67

68 The difference between the forecasted values and the actual
values can be interpreted as how this year is different compared to the scenario if we assumed it could have been the same trend as the previous years.

69 Compared to the previous years, we can see that
there was a big drop in February and big increases in the later part of the year.

Model Summary 70

• MAE (Mean Absolute Error) : Mean of absolute differences
between actual value and forecasted value. • RMSE (Root Mean Square Error) : Root of mean of squares of difference between actual value and forecasted value. • MAPE (Mean Absolute Percentage Error) : Mean of absolute differences in percentage of actual value. • MASE (Mean Absolute Scaled Error) : Divide MAE of the forecasting model by MAE of a naive forecasting model (a simple model that always forecasts one previous value). Metrics to Evaluate Time Series Forecast

The evaluation metrics are all calculated based on the difference
between the actual and the forecasted values in this period. 72

Root  Mean  Square  Error Square the Errors (Difference between the
actual and the forecasted), take the mean, and root the mean. 73 RMSE (Root Mean Square Error)

  22 + 22 + 22 + 42   4
4 + 4 + 4 + 16   4 7 = 2.65 74 RMSE (Root Mean Square Error) 2 2 4 2 = =

Mean  Absolute  Error Absolute values of the Errors (Difference between
the actual and the forecasted) and take the mean. 75 MAE (Mean Absolute Error)

  2 + 2 + 2 + 4   4
76 = 2.5 MAE (Mean Absolute Error) 2 2 4 2

Mean  Absolute  Percentage  Error Absolute Percentage differences between the actual
and the forecasted and take the mean. 77 MAPE (Mean Absolute Percentage Error)

78 12 13 16 11 Take the actual values. MAPE
(Mean Absolute Percentage Error)

79 Take the errors. MAPE (Mean Absolute Percentage Error) 2
2 4 2

80 Calculate the ratio of the errors against the actual
values. If it’s negative make it positive. (Absolute) MAPE (Mean Absolute Percentage Error) 16.6% 15.4% 25% 18.2%

  16.6 + 15.4 + 18.2 + 25   4
81 Take the mean. = 18.8% MAPE (Mean Absolute Percentage Error) 16.6% 15.4% 25% 18.2%

Compare the Metrics by Market 82 To compare among Markets,
select ‘Market’ for Repeat By and build multiple forecasting models.

84 The drop in February is coming from Latin America
(LATAM). February

85 Unexpected increases in November are coming from Asia Paciﬁc
and Latin America. 11݄ 11݄

• RMSE and MAE indicate that the errors are smaller
for Africa compared to Asia Paciﬁc. • Can we conclude that the model for Africa is better? 86

Sales amounts are bigger in Asia Paciﬁc than in Africa
in general, hence it’s expected that RMSE and MAE are bigger for Asia Paciﬁc. 87

• MAPE is calculated as the percentage of the errors
compared to the actual values. • It is more useful when we want to compare the two without considering the scale of the actual values. • MAPE is smaller for Asia Paciﬁc, which means we have a better model for Asia Paciﬁc than for Africa. 88

Exploratory Seminar: Time Series Forecasting wi...

Exploratory Seminar: Time Series Forecasting with Prophet

More Decks by Kan Nishida

Other Decks in Technology

Featured

Transcript