Introduction to Time Series Forecasting with Prophet

Exploratory Seminar #25 Time Series Forecasting with Prophet

EXPLORATORY

Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,
Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker

Mission Democratize Data Science

Data Science is not just for Engineers and Statisticians. Exploratory
makes it possible for Everyone to do Data Science. The Third Wave

First Wave Second Wave Third Wave Proprietary Open Source UI
& Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users

Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics /
Machine Learning) Data Analysis Data Science Workﬂow

Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization
Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI

Exploratory Seminar #25 Time Series Forecasting with Prophet

• Application of Supervised Training • Build a model to
predict value from date, with past data as training data. • By feeding the model with future dates, it will produce forecasted values for the future dates. Time Series Forecasting

Builds a model that predicts a day’s value from past
N days. Traditional Approach with Time Series Forecasting Day 1 Day 2 Day 3 Day 4 Day 2 Day 3 Day 4 Day 5 Day 3 Day 4 Day 5 Day 6

• Time interval between data has to be same throughout
the data • Day with NA is not allowed • Seasonality with multiple periods (Week and Year) is hard to handle • Parameter tuning is hard and requires a forecasting expert level knowledge. Problems with Traditional Time Series Model

• A ‘curve ﬁtting’ algorithm to build time series forcasting
models. • Designed for ease of use without expert knowledge on time series forecasting or statistics. • Built by Data Scientists (Sean J. Taylor & co.) at Facebook and open sourced. (https:// facebook.github.io/prophet) Prophet Sean J. Taylor @seanjtaylor

Build a model by ﬁnding a best smooth line which
can be represented as sum of the following components. • Overall growth trend • Seasonality - Yearly, Weekly, Daily, etc. • Holiday effects - X’mas, New Year, July 4th, etc. • External Predictors Prophet - Additive Model

Yearly

Trend + Yearly

Weekly

Trend + Yearly + Weekly

21 Trend + Yearly + Weekly

22 Holiday Eﬀect

23 Trend + Yearly + Weekly + Holiday

• Can handle uneven time intervals. • Can handle dates
with NA. • Can handle multiple seasonality components (e.g. Yearly, Weekly, etc.). • Works well by default. • Can improve it by using easy-to-interpret parameters with business domain knowledge. Beneﬁt of Prophet Approach

Let’s do it!

Sales Data

Weekly Sales Trend

Build a Forecasting Model with Prophet

Assign Order Date to Date/Time and Sales to Value.

The blue line is the actual data (Sales), and the
orange line is the forecasted data.

The last area is the forecasted period where there is
no actual data.

The default is 10 units, in this case, that is
10 weeks.

Set the Forecasting Period to 52 weeks.

Under the Trend tab, you can see the overall trend
that is used by the model. The blue line is the actual (Sales) data, and the green line is the trend.

The vertical light green bars are Change Points where the
trend changes.

Under the Yearly tab you can see the Yearly Seasonality.

Every year, the sales doesn’t pick up until June, then
it goes down in July.

Weekly tab shows up only when the data is daily
or more granular levels (hour, minutes, etc.)

Every week, the sales are low on Sunday and Monday,
and the rest of the week is high.

Data Pre-Processing NA Data Handling

Somehow, the weekly seasonality doesn’t repeat exactly the same…

There are NA for some dates. You can impute NA
as part of the Data Preprocessing.

Under the Importance tab, you can see which seasonality has
more effect on the forecasting outcome.

47 Evaluating Forecasting Model

• Keep the latest part of past data as test
data • Forecast the test data period with the data before the test data period • Estimate the forecast Backtesting

Split the data into two sections. 49 Training Period Test
Period

50 Training Period Test Period Build a model with the
training data and forecast for the test period. Compare the forecasted values against the actual values in the test period.

51 Assign Order Date and select ‘Floor to Month’ Assign
‘Sales’ to Value and select ‘Sum’.

52 Set Test Mode to TRUE, type 12 (months) for
the Test Period.

53 Training Period Test Period 12 Months

Since the forecasting model is built based on the training
data, the forecasted values (Orange) and the actual values (Blue) are usually very close. 54

55 Evaluate the model based on how much difference there
is between the forecasted values and the actual values in the Test period.

Evaluation Metrics under Summary tab 56

• MAE (Mean Absolute Error) : Mean of absolute differences
between actual value and forecasted value. • RMSE (Root Mean Square Error) : Root of mean of squares of difference between actual value and forecasted value. • MAPE (Mean Absolute Percentage Error) : Mean of absolute differences in percentage of actual value. • MASE (Mean Absolute Scaled Error) : Divide MAE of the forecasting model by MAE of a naive forecasting model (a simple model that always forecasts one previous value). Metrics to Evaluate Forecasting Model

Root  Mean  Square  Error Square the Errors (Difference between the
actual and the forecasted), take the mean, and root the mean. 58 RMSE (Root Mean Square Error)

  22 + 22 + 22 + 42   4
4 + 4 + 4 + 16   4 7 = 2.65 59 RMSE (Root Mean Square Error) 2 2 4 2 = =

Mean  Absolute  Error Absolute values of the Errors (Difference between
the actual and the forecasted) and take the mean. 60 MAE (Mean Absolute Error)

  2 + 2 + 2 + 4   4
61 = 2.5 MAE (Mean Absolute Error) 2 2 4 2

Mean  Absolute  Percentage  Error Absolute Percentage differences between the actual
and the forecasted and take the mean. 62 MAPE (Mean Absolute Percentage Error)

63 12 13 16 11 Take the actual values. MAPE
(Mean Absolute Percentage Error)

64 12 13 16 11 MAPE (Mean Absolute Percentage Error)
2 2 4 2 Take the errors.

16.6% 15.4% 25% 18.2% Calculate the ratio of the errors against the actual values. If it’s negative make it positive. (Absolute)

16.6% 15.4% 25% 18.2%   16.6 + 15.4 + 18.2 + 25   4 Take the mean. = 18.8%

67 Mean  Absolute  Scaled  Error Scale the MAE so that
it can be compared among data sets with different variability. MASE (Mean Absolute Scaled Error)

68 MAE in the test period / MAE of a
‘naive’ prediction model in the training period MASE - How to Scale

69 Use the Previous Period Values as the Forecasted Values
Naive Forecasting

70 MAE in Training Period

71 MAE in the test period MAE of a ‘naive’
prediction model in the training period MASE = MAE in the test period / MAE of a ‘naive’ prediction model in the training period

72 Seasonality Mode Additive vs. Multiplicative

73 The difference between the actual line and the forecasted
line becomes wider as the time progresses.

What’s happening? 74 • This model is assuming that the
seasonal variability stays the constant. • But, as Sales increases maybe the seasonal variability increases, too. • This model might not be able to keep up with the actual growth of Sales. Can we model the seasonality that also grows as the Sales grows?

Additive Growth Multiplicative Growth Number of Employees at a very
stable company Amazon’s Sales Trend Different Types of Growth Models 75

76 The effect is additive The effect is multiplicative Additive
Multiplicative Seasonality Mode Grows 20% in February Grows $20,000 in February The growth is constant regardless of the previous value The growth rate is constant, which means it grows bigger when the previous value big.

Switch Seasonality Mode to ‘Multiplicative’ 77

The difference between the actual and the forecasted looks similar
even as the time progresses.

79 Additive Model Multiplicative Model

80 Seasonality Effect is constant throughout the years Seasonality Effect
is growing as the Sales grows Additive Model Multiplicative Model

81 Additive Model Multiplicative Model Seasonality Effect is constant throughout
the years Seasonality Effect is growing as the Sales grows

82 Building Forecasting Models for Monthly Sales Additive Model Multiplicative
Model

83 The Additive Seasonality model is not keeping up with
the actual Sales growth while the Multiplicative model is performing better. Additive Model Multiplicative Model

84 Additive Model Multiplicative Model All Evaluation Metrics suggest that
the multiplicative seasonality model performs better than the additive model.

Compare the Models among Markets with Repeat By

86 To compare among Markets, select ‘Market’ for Repeat By
and build multiple forecasting models.

• RMSE and MAE indicate that the errors are smaller
for Africa compared to Asia Paciﬁc. • Can we conclude that the model for Africa is better? 88

89 Sales amounts are bigger in Asia Paciﬁc than in
Africa in general, hence it’s expected that RMSE and MAE are bigger for Asia Paciﬁc.

• MAPE is calculated as the percentage of the errors
compared to the actual values. • It is more useful when we want to compare the two without considering the scale of the actual values. • MAPE is smaller for Asia Paciﬁc, which means we have a better model for Asia Paciﬁc than for Africa. 90

91 • The ratio of MAE in Test Period against
the naive prediction model. • Given it’s a ratio, we can use it to compare among the models regardless of the scale of the actual values. • One advantage of MASE over MAPE is that MAPE can be unreliable when the actual values are close to 0 or have positive and negative values together where MASE doesn’t get bothered by such cases. • The MASE suggests that the model is ﬁtting better for Asia Paciﬁc and USCA compared to other regions.

External Predictors

If we know some variables are correlated with Sales and
we can get to know the future values of such variables, can’t we forecast better?

Sales and Sales Comp. look correlated to one another.

Sales and Marketing look correlated to one another, too.

Sales and Discount (Avg) don’t look correlated to one another.

Marketing Let’s say we can control how much we will
be spending on the marketing in the next 3 months. Can we make such information built into the forecasting model? Example

Weather Let’s say our Sales is usually impacted by the
weather and we can forecast the temperature or whether it will rain or not for the next 10 days. Can we use such information built into the forecasting model? Example

You can assign variables as ‘External Predictors (Extra Regressors) and
build forecasting models. Prophet investigate if the external predictor (Marketing) is useful to build a better model forecast the target variable (Sales) and calculate the coefﬁcient of the predictor variable. External Predictors

A forecasting model to forecast Sales based on the Trend,
Yearly Seasonality with Test Mode.

Metrics to evaluate the base model.

Yearly Seasonality, and Sales Comp.

Model Evaluation Metrics Base Model Base Model with Sales Comp.

Yearly Seasonality, and Marketing.

Base Model with Sales Comp. Base Model with Marketing

How about all of them together? Sales Comp., Marketing, and
Discount

With Sales Comp., Marketing, Discount

Base Model with Sales Comp. Base Model with Sales Comp.,
Marketing, and Dicount The forecasting model quality has improved.

Under the Effects tab, you can see how each of
the seasonality and the predictor variables are effecting on the forecasted values.

Under the Importance tab, you can comparesee how each of
the seasonality and the predictor variables are effecting on the forecasted values.

Advanced Topics

Holiday Eﬀect

113 • Holiday might effect the measures like sales, number
of orders, etc. • Special events like conferences, black friday, sports events, etc. might effect the measures. Holiday Eﬀect

114 Unique Page Views of ”Apple Worldwide Developers Conference” (WWDC)
at Wikipedia Data

115 Apple Worldwide Developers Conference (WWDC) dates Data

116 Data Future dates.

• Set ‘Day’ for Date / Time • Set 365
as Forecasting Time Period Forecast without Considering Holiday Eﬀect 117

118 These WWDC dates are not forecasted well by the
model.

Add WWDC Dates by Join

Select ‘Full join’ as the join type because we want
to bring the dates that don’t exist in the current data frame but does exist in the target data frame. Add WWDC Dates by Join 120

Select Holiday Column 122 These WWDC dates are not forecasted
well by the model.

123 Future conference dates are better forecasted by incorporating the
holiday effect.

124 Holidays Tab Green line is the Holiday effect.

Capacity

127 Capacity - Upper / Lower Limit

128 You can set Upper / Lower Limit values. •
Sales / Demand ForecastingɿWe know it won’t exceed a certain limit. • Conversion RateɿIt never exceed 100%. Prophet can take those limits into account. Capacity

129 You can set the lower limit when the upper
limit is set. • The default value is 0. • You can set a negative value as well. Lower Limit

Trend Line 132 Trend line becomes a logistic curve instead
of a linear line when Upper/Lower Limits are set.

Information Email [email protected] Website https://exploratory.io Twitter @KanAugust Training https://exploratory.io/training

EXPLORATORY

Introduction to Time Series Forecasting with Pr...

Introduction to Time Series Forecasting with Prophet

More Decks by Kan Nishida

Other Decks in Technology

Featured

Transcript