Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Series ARIMA model Sales Prediction

Time Series ARIMA model Sales Prediction

Error minimization study using ML models esp ARIMA
Econometric Model
Machine Learning
Rossman Stores daily Sales data (2013 - 2015)
Kaggle

Randal S. Goomer, PhD

February 06, 2017
Tweet

More Decks by Randal S. Goomer, PhD

Other Decks in Business

Transcript

  1. Time-Series Modeling Sales Prediction MAE measures Randal S. Goomer, PhD

    Time-Series Econometric Models San Francisco, 02/06/2017 Randal S. Goomer, PhD 1/25/17 1
  2. DataSet Rossman Stores Sales – Source Kaggle; 9 Features, 1017209

    records Randal S. Goomer, PhD 1/25/17 2 <bound method Index.unique of Index([u'Store', u'DayOfWeek', u'Date', u'Sales', u'Customers', u'Open', u'Promo', u'StateHoliday', u'SchoolHoliday'], dtype='object') Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied. In their first Kaggle competition, Rossmann is challenging us to predict 6 weeks of daily sales for 1,115 stores located across Germany. Reliable sales forecasts enable store managers to create effective staff schedules that increase productivity and motivation. By helping Rossmann create a robust prediction model, will help store managers stay focused on what’s most important to them: their customers and their teams!
  3. Can we back-cast and predict sales from time-series and sales

    information? Can optimize and Train ARIMA model to minimize Mean Absolute Error of our time-series predictions? Time-Series Predictions Randal S. Goomer, PhD 1/25/17 3 Hypothesis Testing
  4. ML models • EDA with boxplots and time-series plots •

    Libraries: pandas, python-datetime, numpy, seaborn • Visualizations (Seaborn, MatplotLib) • Rolling_mean; autocorrelation • ARIMA (Auto Regressive Integrated Moving Average) • Econometrics Time-Series model incorporating p,d,q are: p= orders(# time lags) AR, • d = degree of difference (I); • q = order of the MA model • ARIMA (p, d, q) Prediction measures tested for various pdq values • Mean Absolute Error • Log Likelihood • Coef • std error • P-value • Forecast Randal S. Goomer, PhD 1/25/17 4
  5. 1/25/17 Randal S. Goomer, PhD 5 EDA: Typical sales in

    one store (store1) School holiday = 1
  6. 1/25/17 Randal S. Goomer, PhD 6 EDA: Effect of day

    of the week on Sales on days when stores are open
  7. 1/25/17 Randal S. Goomer, PhD 13 plot_acf(store1_open_data['Sales'], lags=30) (effect of

    store closing and opening once a week with lag of 30 days) Lag=30
  8. 1/25/17 Randal S. Goomer, PhD 16 ARIMA model p=1, d=0,

    q=0 Residual Autocorrelation lag=30 (1,0,0)
  9. 1/25/17 Randal S. Goomer, PhD 17 ARIMA model p=2, d=0,

    q=0 Residual Autocorrelation lag=30 (2,0,0)
  10. 1/25/17 Randal S. Goomer, PhD 18 ARIMA model p=2, d=0,

    q=2 Residual Autocorrelation lag=30 (2,0,2)
  11. 1/25/17 Randal S. Goomer, PhD 20 model.forecast(steps=1, alpha=0.05) Out[270]: (array([

    5622.30133802]), ß forecast array([ 714.99799398]), ß std error array([[ 4220.93102079, 7023.67165524]])) ß C.I. In [ ]: Forecast next step ARIMA(0,0,3)