Machine Learning for Time Series Forecasting

Machine Learning for Time Series Forecasting

PyCon Colombia - Medellin Feburary 2020

Miguel Cabrera

February 08, 2020

  Machine Learning for Time Series Forecasting Miguel Cabrera Senior Data Scientist at NewYorker

    Scientist at NewYorker @mfcabrera Photo by Joel Duncan on Unsplash
  GOALS ➔ Understand the basics of times series and time series forecasting

    series forecasting ➔ Learn the different approaches to solve the problem using machine learning techniques ➔ Learn some strategies and techniques commonly used to deal with these kind of problems. 5
  TIME SERIES AND FORECASTING ➔ Forecasting is needed in many situations

    situations ➔ Most of the time you want to use your previous experiences to make assumption on the future process ➔ You have at the very least a time dimension and variable of interest ➔ Different horizons (Short, Medium, Long) 8
  4. 9 9 DEMAND FORECASTING Demand Forecasting refers to predicting future

    demand (or sales), assuming that the factors which affected demand in the past and are affecting the present and will still have an influence in the future. [1] HISTORICAL PREDICTIONS 2018 2019 Date Sales Feb 2018 3500 Mar 2018 3000 April 2018 2000 May 2018 500 Jun 2018 500 … … T 1000 T+1 ?? T+2 ?? T+3 ?? … ?? T+n ??
  5. TIME SERIES - PATTERNS 10 Source: Hyndman, R.J., & Athanasopoulos,

    G. (2019) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 2020-01-15 ➔ Trend ➔ Seasonality ➔ Cycle
  6. TIME SERIES - STATIONARITY 12 Image Source: “An introduction to

    time series analysis”: https://medium.com/swlh/an-introduction-to-time-series-analysis-ef1a9200717a
  RECAP - TIME SERIES CONCEPTS ➔ Trend, Seasonality and Cycle ➔ Stationarity

    ➔ Stationarity ➔ Single-Step and Multi-step forecasts ➔ Different Horizon 14
  TIME SERIES FORECASTING - REQUIREMENTS 1. Non-stationary time series support 2. Multiple (many) time series

    2. Multiple (many) time series 3. Multi-Step & Multi-Horizon 4. Cold start problem 5. Model interpretability 6. Model capability 16
  TIME SERIES FORECASTING - SCORE CARD Characteristic / Requirement Score

    Score Highly non-stationary Multiple time series support Multi-horizon forecast Model interpretability Model Capability Computational Efficiency Handle the cold-start problem
  10. MODELING APPROACHES • (S)ARIMA • (G)ARCH • Exponential Smoothing •

    VAR • FB Prophet • Linear Regression • SVM • Gaussian Process • Tree Based Models • Random Forests • Gradient Boosted Trees • MLP • RNN • LSTM • SEQ2SEQ
  11. MODELING APPROACHES • (S)ARIMA • (G)ARCH • Exponential Smoothing •

    VAR • FB Prophet • Linear Regression • SVM • Gaussian Process • Tree Based Models • Random Forests • Gradient Boosted Trees • MLP • RNN • LSTM • SEQ2SEQ 19
  ARIMA Auto-Regressive Integrated Moving Average AR(p) MA(q) Past Values Past Errors

    Past Errors ARIMA(p, d, q) SARIMA(p, d, q)x(Q,D,P,m)
  ARIMA • Study ACF/PACF charts and determine the parameter or use an automated algorithm.

    or use an automated algorithm. • Seasonal pattern (Strong correlation between and ) • Algorithm found: SARIMAX(1, 1, 1)x(0, 1, 1)^12
  from statsmodels.tsa.arima_model import ARIMA import pmdarima as pm

    = pm.auto_arima( ts["sales"], start_p=1, start_q=1, test="adf", # use adftest to find optimal 'd' max_p=3, max_q=3, # maximum p and q m=12, # frequency of series d=1, D=1, seasonal=True # ... other parameters ) y_hat, conf = model.predict( n, return_conf_int=True, alpha=0.05 ) pmdarima SAMPLE
  TIME SERIES MODELS Characteristic / Requirement Score Highly non-stationary Limited

    Limited Multiple time series Limited Multi-horizon forecast Yes Model interpretability High Model Capability Low Computational Efficiency Medium Handle cold-starts No Sample plots of fashion product sales
  MACHINE LEARNING ‣Additional features in the model. ‣One single model can handle many or all time series.

    model can handle many or all time series. ‣Feature Engineering is very important.
  17. MACHINE LEARNING - FEATURES Time Series Product Attributes Time Location

    category, brand, color, size, style, identifier moving averages, statistics, lagged features Day of week, month of year, number of week, season Holiday, weather, macroeconomic information SOURCE EXTRACTION ENCODING Numerical One Hot Encoding Feature Hashing Embeddings FEATURES

    LINEAR REGRESSION Use decision trees to learn the characteristics of the data to make predictions ‣ Regression Tree ‣ Random Forest ‣ Gradient Boosting ‣ Catboost ‣ LightGBM ‣ XGBoost

    VECTOR REGRESSION Estimate the independent variable as the linear expression of the features. ‣ Least Squares ‣ Ridge / Lasso ‣ Elastic Net ‣ ARIMA + X Use decision trees to learn the characteristics of the data to make predictions ‣ Regression Tree ‣ Random Forest ‣ Gradient Boosting ‣ Catboost ‣ LightGBM ‣ XGBoost Minimise the error within the support vector threshold using a non-Linear kernel to model non-linear relationships. ‣ NuSVR ‣ LibLinear ‣ LibSVM ‣ SKLearn

    X < 20 X< 60 X < 5 3.2 5 13.6 …. 8.5 17.6
  TREE BASED MODELS - RANDOM FOREST • Bootstrap (Random resample with replacement)

    resample with replacement) • Independent classifiers • Random feature selection at split • “Bagging” • Parallel training • Generates a wide variety of trees

    • Sequential Classifiers • Resample with weights • Important parameters: ◦ Learning rate ◦ Number of trees ◦ Depth
  LightGBM SAMPLE import lightgbm as lgb from sklearn.model_selection import GridSearchCV

    GridSearchCV estimator = lgb.LGBMRegressor(num_leaves=31) param_grid = { 'learning_rate': [0.01, 0.1, 1], 'n_estimators': [20, 40] } gbm = GridSearchCV(estimator, param_grid, cv=3) gbm.fit(X_train, y_train)
  MACHINE LEARNING Characteristic / Requirement Score Highly non-stationary Yes Multiple time series Yes

    Multiple time series Yes Multi-horizon forecast Yes Model interpretability Medium Model Capability Medium Computational Efficiency Medium Handle cold-starts Partially ➔ Requires expert knowledge ➔ Time consuming feature engineering required ➔ Some features are difficult to capture ➔ Some methods might not be able to extrapolate

    NEURAL NETS LONG SHORT TERM MEMORY & RECURRENT NN SEQ2SEQ Fully connected multilayer artificial neural network.
  NEURAL NETS - MLP Image source: Faloutsos et al. 2019 [9]

    2019 [9] • Hidden layers are non-linearities • Flexible general function estimator • The larger and deeper, more complex functions. • Can learn complex relationships • Need data for training • Careful tuning needed • Feature engineering needed

    The learned German state embedding mapped to a 2D space with t-SNE.
  NEURAL NETS AND DEEP LEARNING Characteristic / Requirement Score Highly non-stationary Yes

    Highly non-stationary Yes Multiple time series Yes Multi-horizon forecast Yes Model interpretability Low Model Capability High Computational Efficiency Low Cold start problem No ➔ Very flexible approach ➔ Automated feature learning is more limited due to the lack of unlabelled data. ➔ Some feature engineering is still necessary. ➔ Poor model interpretability ➔ No clear consensus on which model (RNN, LSTM, CNN) work the best.
  29. 47 MODELS - SUMMARY ‣ Good model interpretability ‣ Limited

    model complexity to handle non-linear data ‣ Difficult to model multiple time series ‣ Difficult to integrate shared features across different time series ‣ Flexible ‣ Can incorporate many features across the time series ‣ A lot of feature engineering required ‣ Very flexible ‣ Automated feature learning via embeddings ‣ Still some degree of feature engineering necessary ‣ Poor model interpretability ‣ Hard to train ‣ It is not clear which model or approaches are the best. TRADITIONAL MODELS MACHINE LEARNING NEURAL NETS AND DEEP LEARNING
  EVALUATION AND METRICS Metric Formula Notes MAE (mean absolute error) Intuitive

    error) Intuitive MAPE (mean absolute percentage error) Independent of the scale of measurement SMAPE (symmetric mean absolute percentage error) Avoid Asymmetry of MAPE MSE (Mean squared error) Penalize extreme errors MSLE (Mean Squared Logarithmic loss) Large errors are not more significantly penalised than small ones Quantile Loss Measure distribution RMSPE (Root Mean Squared Percentage Error) Independent of the scale of measurement
  TARGET VARIABLE TRANSFORMATION import numpy as np from sklearn.compose import TransformedTargetRegressor

    import TransformedTargetRegressor tt = TransformedTargetRegressor( regressor=YourAwesomeRegressor(), func=np.log, inverse_func=np.exp ) ... tt.fit(X, y)
  SUMMARY ➔ Time Series Forecasting has a lot of practical applications

    practical applications ➔ Traditional methods might still be relevant in many use cases ➔ Machine Learning, in particular Gradient Boosting seem to offer a good compromise between model capacity and interpretability. ➔ Feature Engineering is key, and (some) is still necessary when using Deep Learning. ➔ Deep Learning has not yet “cracked” time series forecasting, but recent models show promise. ➔ Avoid feature leaking by using a robust time series cross-validation approach.
  35. 63 AWS DeepAR - LSTM + AUTOREGRESSIVE Source: CSalinas, D.,

    Flunkert, V., & Gasthaus, J. (2017). [11]
  36. THE M4 COMPETITION ➔ January-May 2018 ➔ 100,000 series of

    following frequencies: monthly, quarterly, yearly, daily, weekly, and hourly. ➔ 95% of series within first 3 categories. ➔ The forecasting horizons varied, e.g., six for yearly, 18 for monthly, and 48 for hourly series point forecasts and prediction intervals were evaluated ➔ No extra features / exogeneous variables 65
  THE M4 COMPETITION ➔ Winning solution: an RNN with integrated exponential smoothing formula.

    exponential smoothing formula. ➔ Second: ensembles of classical solutions using sophisticated time series feature extraction. ➔ The dataset might not be then most representative but it will offer a baseline. 66
  ATTENTION: LAG IS ALL YOU NEED Wikipedia Winning entry:

  APPLICATIONS OF TIME SERIES FORECASTING ➔ Manufacturing ➔ Government services (budgeting)

    (budgeting) ➔ Supply chain and retail/commerce ➔ Workforce scheduling ➔ Cloud computing ➔ Website traffic prediction ➔ Healthcare 68