Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Time Series Forecasting

Machine Learning for Time Series Forecasting

PyCon Colombia - Medellin Feburary 2020

Miguel Cabrera

February 08, 2020
Tweet

More Decks by Miguel Cabrera

Other Decks in Technology

Transcript

  1. Machine Learning for Time Series Forecasting Miguel Cabrera Senior Data

    Scientist at NewYorker @mfcabrera Photo by Joel Duncan on Unsplash
  2. GOALS ➔ Understand the basics of times series and time

    series forecasting ➔ Learn the different approaches to solve the problem using machine learning techniques ➔ Learn some strategies and techniques commonly used to deal with these kind of problems. 5
  3. TIME SERIES AND FORECASTING ➔ Forecasting is needed in many

    situations ➔ Most of the time you want to use your previous experiences to make assumption on the future process ➔ You have at the very least a time dimension and variable of interest ➔ Different horizons (Short, Medium, Long) 8
  4. 9 9 DEMAND FORECASTING Demand Forecasting refers to predicting future

    demand (or sales), assuming that the factors which affected demand in the past and are affecting the present and will still have an influence in the future. [1] HISTORICAL PREDICTIONS 2018 2019 Date Sales Feb 2018 3500 Mar 2018 3000 April 2018 2000 May 2018 500 Jun 2018 500 … … T 1000 T+1 ?? T+2 ?? T+3 ?? … ?? T+n ??
  5. TIME SERIES - PATTERNS 10 Source: Hyndman, R.J., & Athanasopoulos,

    G. (2019) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 2020-01-15 ➔ Trend ➔ Seasonality ➔ Cycle
  6. TIME SERIES - STATIONARITY 12 Image Source: “An introduction to

    time series analysis”: https://medium.com/swlh/an-introduction-to-time-series-analysis-ef1a9200717a
  7. RECAP - TIME SERIES CONCEPTS ➔ Trend, Seasonality and Cycle

    ➔ Stationarity ➔ Single-Step and Multi-step forecasts ➔ Different Horizon 14
  8. TIME SERIES FORECASTING - REQUIREMENTS 1. Non-stationary time series support

    2. Multiple (many) time series 3. Multi-Step & Multi-Horizon 4. Cold start problem 5. Model interpretability 6. Model capability 16
  9. TIME SERIES FORECASTING - SCORE CARD 17 Characteristic / Requirement

    Score Highly non-stationary Multiple time series support Multi-horizon forecast Model interpretability Model Capability Computational Efficiency Handle the cold-start problem
  10. MODELING APPROACHES • (S)ARIMA • (G)ARCH • Exponential Smoothing •

    VAR • FB Prophet • Linear Regression • SVM • Gaussian Process • Tree Based Models • Random Forests • Gradient Boosted Trees • MLP • RNN • LSTM • SEQ2SEQ
  11. MODELING APPROACHES • (S)ARIMA • (G)ARCH • Exponential Smoothing •

    VAR • FB Prophet • Linear Regression • SVM • Gaussian Process • Tree Based Models • Random Forests • Gradient Boosted Trees • MLP • RNN • LSTM • SEQ2SEQ 19
  12. 20 ARIMA Auto-Regressive Integrated Moving Average AR(p) MA(q) Past Values

    Past Errors ARIMA(p, d, q) SARIMA(p, d, q)x(Q,D,P,m)
  13. 22 ARIMA • Study ACF/PACF charts and determine the parameter

    or use an automated algorithm. • Seasonal pattern (Strong correlation between and ) • Algorithm found: SARIMAX(1, 1, 1)x(0, 1, 1)^12
  14. 23 from statsmodels.tsa.arima_model import ARIMA import pmdarima as pm model

    = pm.auto_arima( ts["sales"], start_p=1, start_q=1, test="adf", # use adftest to find optimal 'd' max_p=3, max_q=3, # maximum p and q m=12, # frequency of series d=1, D=1, seasonal=True # ... other parameters ) y_hat, conf = model.predict( n, return_conf_int=True, alpha=0.05 ) pmdarima SAMPLE
  15. 25 TIME SERIES MODELS Characteristic / Requirement Score Highly non-stationary

    Limited Multiple time series Limited Multi-horizon forecast Yes Model interpretability High Model Capability Low Computational Efficiency Medium Handle cold-starts No Sample plots of fashion product sales
  16. 26 MACHINE LEARNING ‣Additional features in the model. ‣One single

    model can handle many or all time series. ‣Feature Engineering is very important.
  17. MACHINE LEARNING - FEATURES Time Series Product Attributes Time Location

    category, brand, color, size, style, identifier moving averages, statistics, lagged features Day of week, month of year, number of week, season Holiday, weather, macroeconomic information SOURCE EXTRACTION ENCODING Numerical One Hot Encoding Feature Hashing Embeddings FEATURES
  18. 29 MACHINE LEARNING - MODELS LINEAR REGRESSION TREE BASED SUPPORT

    VECTOR REGRESSION Estimate the independent variable as the linear expression of the features. ‣ Least Squares ‣ Ridge / Lasso ‣ Elastic Net ‣ ARIMA + X Use decision trees to learn the characteristics of the data to make predictions ‣ Regression Tree ‣ Random Forest ‣ Gradient Boosting ‣ Catboost ‣ LightGBM ‣ XGBoost Minimise the error within the support vector threshold using a non-Linear kernel to model non-linear relationships. ‣ NuSVR ‣ LibLinear ‣ LibSVM ‣ SKLearn
  19. 30 MACHINE LEARNING - MODELS LINEAR REGRESSION TREE BASED SUPPORT

    VECTOR REGRESSION Estimate the independent variable as the linear expression of the features. ‣ Least Squares ‣ Ridge / Lasso ‣ Elastic Net ‣ ARIMA + X Use decision trees to learn the characteristics of the data to make predictions ‣ Regression Tree ‣ Random Forest ‣ Gradient Boosting ‣ Catboost ‣ LightGBM ‣ XGBoost Minimise the error within the support vector threshold using a non-Linear kernel to model non-linear relationships. ‣ NuSVR ‣ LibLinear ‣ LibSVM ‣ SKLearn
  20. 31 TREE BASED MODELS - REGRESSION TREES X < 40

    X < 20 X< 60 X < 5 3.2 5 13.6 …. 8.5 17.6
  21. 32 TREE BASED MODELS - RANDOM FOREST • Bootstrap (Random

    resample with replacement) • Independent classifiers • Random feature selection at split • “Bagging” • Parallel training • Generates a wide variety of trees
  22. 33 TREE BASED MODELS - GRADIENT BOOSTED TREES • Boosting

    • Sequential Classifiers • Resample with weights • Important parameters: ◦ Learning rate ◦ Number of trees ◦ Depth
  23. 35 LightGBM SAMPLE import lightgbm as lgb from sklearn.model_selection import

    GridSearchCV estimator = lgb.LGBMRegressor(num_leaves=31) param_grid = { 'learning_rate': [0.01, 0.1, 1], 'n_estimators': [20, 40] } gbm = GridSearchCV(estimator, param_grid, cv=3) gbm.fit(X_train, y_train)
  24. 36 MACHINE LEARNING Characteristic / Requirement Score Highly non-stationary Yes

    Multiple time series Yes Multi-horizon forecast Yes Model interpretability Medium Model Capability Medium Computational Efficiency Medium Handle cold-starts Partially ➔ Requires expert knowledge ➔ Time consuming feature engineering required ➔ Some features are difficult to capture ➔ Some methods might not be able to extrapolate
  25. 37 NEURAL NETWORKS AND DEEP LEARNING - MODELS MULTILAYER PERCEPTRON

    LONG SHORT TERM MEMORY & RECURRENT NN SEQ2SEQ Fully connected multilayer artificial neural network. A type of recurrent neural network used for sequential learning. Cell states updated by gates. Used for speech recognition, language models, translation, etc. Encoder decoder architecture. It uses two RNN that will work together trying to predict the next state sequence from the previous sequence. Image credits: https://github.com/ledell/sldm4-h2o/ https://smerity.com/articles/2016/google_nmt_arch.html
  26. 40 NEURAL NETS - MLP Image source: Faloutsos et al.

    2019 [9] • Hidden layers are non-linearities • Flexible general function estimator • The larger and deeper, more complex functions. • Can learn complex relationships • Need data for training • Careful tuning needed • Feature engineering needed
  27. 41 DEEP LEARNING - MLP WITH EMBEDDINGS Source: Cheng Guo

    and Felix Berkhahn. 2016. [7] The learned German state embedding mapped to a 2D space with t-SNE.
  28. 46 NEURAL NETS AND DEEP LEARNING Characteristic / Requirement Score

    Highly non-stationary Yes Multiple time series Yes Multi-horizon forecast Yes Model interpretability Low Model Capability High Computational Efficiency Low Cold start problem No ➔ Very flexible approach ➔ Automated feature learning is more limited due to the lack of unlabelled data. ➔ Some feature engineering is still necessary. ➔ Poor model interpretability ➔ No clear consensus on which model (RNN, LSTM, CNN) work the best.
  29. 47 MODELS - SUMMARY ‣ Good model interpretability ‣ Limited

    model complexity to handle non-linear data ‣ Difficult to model multiple time series ‣ Difficult to integrate shared features across different time series ‣ Flexible ‣ Can incorporate many features across the time series ‣ A lot of feature engineering required ‣ Very flexible ‣ Automated feature learning via embeddings ‣ Still some degree of feature engineering necessary ‣ Poor model interpretability ‣ Hard to train ‣ It is not clear which model or approaches are the best. TRADITIONAL MODELS MACHINE LEARNING NEURAL NETS AND DEEP LEARNING
  30. 49 EVALUATION AND METRICS Metric Formula Notes MAE (mean absolute

    error) Intuitive MAPE (mean absolute percentage error) Independent of the scale of measurement SMAPE (symmetric mean absolute percentage error) Avoid Asymmetry of MAPE MSE (Mean squared error) Penalize extreme errors MSLE (Mean Squared Logarithmic loss) Large errors are not more significantly penalised than small ones Quantile Loss Measure distribution RMSPE (Root Mean Squared Percentage Error) Independent of the scale of measurement
  31. TARGET VARIABLE TRANSFORMATION 53 import numpy as np from sklearn.compose

    import TransformedTargetRegressor tt = TransformedTargetRegressor( regressor=YourAwesomeRegressor(), func=np.log, inverse_func=np.exp ) ... tt.fit(X, y)
  32. 58 SUMMARY ➔ Time Series Forecasting has a lot of

    practical applications ➔ Traditional methods might still be relevant in many use cases ➔ Machine Learning, in particular Gradient Boosting seem to offer a good compromise between model capacity and interpretability. ➔ Feature Engineering is key, and (some) is still necessary when using Deep Learning. ➔ Deep Learning has not yet “cracked” time series forecasting, but recent models show promise. ➔ Avoid feature leaking by using a robust time series cross-validation approach.
  33. 59 REFERENCES • [1] Choi, T. M., Hui, C. L.,

    & Yu, Y. (2014). . (pp. 1–194). Springer Berlin Heidelberg. • [2] Hyndman, R.J., & Athanasopoulos, G. (2018) , 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on 01.10.2019 • [3] H&M, a Fashion Giant, Has a Problem: $4.3 Billion in Unsold Clothes. • [4] Thomassey, S. (2014). Sales Forecasting in Apparel and Fashion Industry: A Review. In (pp. 9–27). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-39869-8_2 • [5] Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. San Francisco: Holden-Day • [6] Autoregressive integrated moving average (ARIMA). https://en.wikipedia. org/wiki/Autoregressive_integrated_moving_average. Accessed: 2019-05-02 • [7] Cheng Guo and Felix Berkhahn. 2016. Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737 (2016). • [8] Shen, Yuan, Wu and Pei - Data Science in Retail-as-a-Service Workshop. KDD 2018. London. • [9] Faloutsos, Christos & Flunkert, Valentin & Gasthaus, Jan & Januschowski, Tim & Wang, Yuyang. (2019). Forecasting Big Time Series: Theory and Practice. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3209-3210. 10.1145/3292500.3332289. • [10] The M4 Competition: 100,000 time series and 61 forecasting methods [Makridakis et al., 2018] • [11] CSalinas, D., Flunkert, V., & Gasthaus, J. (2017). DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. Retrieved from http://arxiv.org/abs/1704.04110
  34. 60 IMAGE CREDITS • decision tree by H Alberto Gongora

    from the Noun Project • Ship Freight by ProSymbols from the Noun Project • warehouse by ProSymbols from the Noun Project • Store by AomAm from the Noun Project • Neural Network by Knut M. Synstad from the Noun Project • Tunic Dress by Vectors Market from the Noun Project • sales by Kantor Tegalsari from the Noun Project • time series by tom from the Noun Project • fashion by Smalllike from the Noun Project • Time by Anna Sophie from the Noun Project • linear regression by Becris from the Noun Project • Random Forest by Becris from the Noun Project • SVM by sachin modgekar from the Noun Project • production by Orin zuu from the Noun Project • Auto by Graphic Tigers from the Noun Project • Factory by Graphic Tigers from the Noun Project • Express Delivery by Vectors Market from the Noun Project • Stand Out by BomSymbols from the Noun Project • Photo Credit: https://www.flickr.com/photos/157635012@N07/47981346167/ by Artem Beliaikin on Flickr via Compfight CC 2.0 • Photo Credit: „https://www.flickr.com/photos/157635012@N07/48014587002/ Artem Beliaikin</a> Flickr via Compfight CC 2.0 • regression analysis by Vectors Market from the Noun Project • Research Experiment by Vectors Market from the Noun Project • weather by Alice Design from the Noun Project • Shirt by Ben Davis from the Noun Project • fashion by Eat Bread Studio from the Noun Project • renew by david from the Noun Project • price by Adrien Coquet from the Noun Project • requirements by ProSymbols from the Noun Project • marketing by Gregor Cresnar from the Noun Project • macroeconomic by priyanka from the Noun Project • competition by Gregor Cresnar from the Noun Project
  35. 63 AWS DeepAR - LSTM + AUTOREGRESSIVE Source: CSalinas, D.,

    Flunkert, V., & Gasthaus, J. (2017). [11]
  36. THE M4 COMPETITION ➔ January-May 2018 ➔ 100,000 series of

    following frequencies: monthly, quarterly, yearly, daily, weekly, and hourly. ➔ 95% of series within first 3 categories. ➔ The forecasting horizons varied, e.g., six for yearly, 18 for monthly, and 48 for hourly series point forecasts and prediction intervals were evaluated ➔ No extra features / exogeneous variables 65
  37. THE M4 COMPETITION ➔ Winning solution: an RNN with integrated

    exponential smoothing formula. ➔ Second: ensembles of classical solutions using sophisticated time series feature extraction. ➔ The dataset might not be then most representative but it will offer a baseline. 66
  38. ATTENTION: LAG IS ALL YOU NEED 67 Wikipedia Winning entry:

    https://github.com/Arturus/kaggle-web-traffic/blob/master/how_it_works.md
  39. APPLICATIONS OF TIME SERIES FORECASTING ➔ Manufacturing ➔ Government services

    (budgeting) ➔ Supply chain and retail/commerce ➔ Workforce scheduling ➔ Cloud computing ➔ Website traffic prediction ➔ Healthcare 68