and direc on of trend strength of seasonality ming of peak seasonality spectral entropy Called “features” in the machine learning literature. Visualizing and forecas ng big me series data Visualiza on of many me series 5 John W Tukey Cognos cs Computer-produced diagnos cs (Tukey and Tukey, 1985).
and direc on of trend strength of seasonality ming of peak seasonality spectral entropy Called “features” in the machine learning literature. Visualizing and forecas ng big me series data Visualiza on of many me series 5 John W Tukey Cognos cs Computer-produced diagnos cs (Tukey and Tukey, 1985).
and direc on of trend strength of seasonality ming of peak seasonality spectral entropy Called “features” in the machine learning literature. Visualizing and forecas ng big me series data Visualiza on of many me series 5 John W Tukey Cognos cs Computer-produced diagnos cs (Tukey and Tukey, 1985).
big me series data Visualiza on of many me series 7 0 3 6 9 0.00 0.25 0.50 0.75 1.00 Seasonality density Low Seasonality 4500 5000 5500 1980 1982 1984 1986 1988 1990 1992 N1361
big me series data Visualiza on of many me series 7 0 3 6 9 0.00 0.25 0.50 0.75 1.00 Seasonality density Low Seasonality 4500 5000 5500 1980 1982 1984 1986 1988 1990 1992 N1361 High Seasonality 2500 5000 7500 10000 1984 1986 1988 1990 1992 N1906
big me series data Visualiza on of many me series 8 0 5 10 15 0.00 0.25 0.50 0.75 1.00 Trend density Low Trend 6000 7000 8000 9000 1990 1991 1992 1993 1994 N1594
big me series data Visualiza on of many me series 8 0 5 10 15 0.00 0.25 0.50 0.75 1.00 Trend density Low Trend 6000 7000 8000 9000 1990 1991 1992 1993 1994 N1594 High Trend 6000 7000 8000 1984 1986 1988 1990 1992 N2694
ng big me series data Visualiza on of many me series 9 0.0 0.5 1.0 −0.5 0.0 0.5 ResidACF density Low ACF1 2000 4000 6000 1984 1986 1988 1990 1992 N2576
ng big me series data Visualiza on of many me series 9 0.0 0.5 1.0 −0.5 0.0 0.5 ResidACF density Low ACF1 2000 4000 6000 1984 1986 1988 1990 1992 N2576 High ACF1 2000 3000 4000 5000 6000 1964 1966 1968 1970 1972 1974 N2489
ng big me series data Visualiza on of many me series 10 0 1 2 0.6 0.8 1.0 SpectralEntropy density Low Spectral Entropy 3000 4000 5000 1964 1966 1968 1970 1972 1974 N2487
ng big me series data Visualiza on of many me series 10 0 1 2 0.6 0.8 1.0 SpectralEntropy density Low Spectral Entropy 3000 4000 5000 1964 1966 1968 1970 1972 1974 N2487 High Spectral Entropy 3000 4000 5000 6000 1990 1991 1992 1993 1994 N1552
big me series data Visualiza on of many me series 11 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Lambda density Low Box-Cox 5000 6000 7000 1980 1982 1984 1986 1988 1990 1992 N0850
big me series data Visualiza on of many me series 11 0 2 4 6 0.00 0.25 0.50 0.75 1.00 Lambda density Low Box-Cox 5000 6000 7000 1980 1982 1984 1986 1988 1990 1992 N0850 High Box-Cox 3500 4000 4500 5000 5500 6000 0 10 20 30 40 50 60 N3003
space to: ¯ Generate new me series with similar features to exis ng series ¯ Generate new me series where there are “holes” in the feature space. Let {PC1, PC2, . . . , PCn } be a “popula on” of me series of specified length and period. Gene c algorithm uses a process of selec on, crossover and muta on to evolve the popula on towards a target point Ti. Op mize: Fitness (PCj ) = − (|PCj − Ti |2). Ini al popula on random with some series in neighbourhood of Ti. Visualizing and forecas ng big me series data Visualiza on of many me series 18
space to: ¯ Generate new me series with similar features to exis ng series ¯ Generate new me series where there are “holes” in the feature space. Let {PC1, PC2, . . . , PCn } be a “popula on” of me series of specified length and period. Gene c algorithm uses a process of selec on, crossover and muta on to evolve the popula on towards a target point Ti. Op mize: Fitness (PCj ) = − (|PCj − Ti |2). Ini al popula on random with some series in neighbourhood of Ti. Visualizing and forecas ng big me series data Visualiza on of many me series 18
one-hour intervals over one month. Consis ng of several server metrics (e.g. CPU usage and paging views) from many server farms globally. Aim: find unusual (anomalous) me series. Visualizing and forecas ng big me series data Visualiza on of many me series 22
Yt−1 ) Strength of trend and seasonality based on STL Trend linearity and curvature Size of seasonal peak and trough Spectral entropy Lumpiness: variance of block variances (block size 24). Spikiness: variances of leave-one-out variances of STL remainders. Level shi : Maximum difference in trimmed means of consecu ve moving windows of size 24. Variance change: Max difference in variances of consecu ve moving windows of size 24. Flat spots: Discre ze sample space into 10 equal-sized intervals. Find max run length in any interval. Number of crossing points of mean line. Kullback-Leibler score: Maximum of DKL (P Q) = P(x) ln P(x)/Q(x)dx where P and Q are es mated by kernel density es mators applied to consecu ve windows of size 48. Change index: Time of maximum KL score Visualizing and forecas ng big me series data Visualiza on of many me series 24
of me series that need forecas ng at least monthly. 2 Forecasts are o en required by people who are untrained in me series analysis. Specifica ons For a given me series, an automa c forecas ng algorithm must: ¯ determine an appropriate me series model; ¯ es mate the parameters; ¯ compute the forecasts with predic on intervals. Visualizing and forecas ng big me series data Automa c forecas ng 28
of me series that need forecas ng at least monthly. 2 Forecasts are o en required by people who are untrained in me series analysis. Specifica ons For a given me series, an automa c forecas ng algorithm must: ¯ determine an appropriate me series model; ¯ es mate the parameters; ¯ compute the forecasts with predic on intervals. Visualizing and forecas ng big me series data Automa c forecas ng 28
where L is the model likelihood and k is the number of es mated parameters in the model. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asympto cally equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via me series cross-valida on. AICc a bias-corrected small-sample version. AICc much faster than CV Visualizing and forecas ng big me series data Automa c forecas ng 30
where L is the model likelihood and k is the number of es mated parameters in the model. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asympto cally equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via me series cross-valida on. AICc a bias-corrected small-sample version. AICc much faster than CV Visualizing and forecas ng big me series data Automa c forecas ng 30
where L is the model likelihood and k is the number of es mated parameters in the model. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asympto cally equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via me series cross-valida on. AICc a bias-corrected small-sample version. AICc much faster than CV Visualizing and forecas ng big me series data Automa c forecas ng 30
where L is the model likelihood and k is the number of es mated parameters in the model. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asympto cally equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via me series cross-valida on. AICc a bias-corrected small-sample version. AICc much faster than CV Visualizing and forecas ng big me series data Automa c forecas ng 30
c forecas ng 31 0.25 0.50 0.75 1.00 1.25 1995 2000 2005 2010 Year Total scripts (millions) Monthly cortecosteroid drug sales in Australia fit <- auto.arima(h02) fcast <- forecast(fit) autoplot(fcast)
c forecas ng 32 0.25 0.50 0.75 1.00 1.25 1995 2000 2005 2010 Year Total scripts (millions) Monthly cortecosteroid drug sales in Australia fit <- ets(h02) fcast <- forecast(fit) autoplot(fcast)
c forecas ng 33 7 8 9 10 1995 2000 2005 Year Thousands of barrels per day Forecasts from TBATS(1, {0,0}, 1, {<52.1785714285714,9>}) fit <- tbats(gasoline) fcast <- forecast(fit) plot(fcast)
Combina on of STL decomposi on with ETS or ARIMA NNETAR recurrent ANN inputs are lagged values and exogenous variables Prophet (from Facebook) for daily and sub-daily me series. handles public holiday effects. FASSTER model (Fast Addi ve Seasonal Switching with Trend and Exogenous Regressors) Generaliza on of structural model for mul ple seasonal me series with switching seasonality. handles public holiday effects, daily data, sub-daily data, weekly data, etc. Visualizing and forecas ng big me series data Automa c forecas ng 35
c forecas ng 3 Forecas ng hierarchical & grouped me series Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 36
data Forecas ng hierarchical & grouped me series 37 Quarterly data on visitor night from 1998:Q1 – 2013:Q4 From: Na onal Visitor Survey, based on annual interviews of 120,000 Australians aged 15+, collected by Tourism Research Australia. Split by 7 states, 27 zones and 82 regions (a geographical hierarchy)
Forecas ng hierarchical & grouped me series 38 Monthly UK sales data from 2000 – 2014 Provided by a large spectacle manufacturer Split by brand (26), gender (3), price range (6), materials (4), and stores (600) About 1 million bo om-level series
on of several me series that are linked together in a hierarchical structure. Total A AA AB AC B BA BB BC C CA CB CC Examples Tourism by state and region Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 39
on of several me series that are linked together in a hierarchical structure. Total A AA AB AC B BA BB BC C CA CB CC Examples Tourism by state and region Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 39
on of me series that can be grouped together in a number of non-hierarchical ways. Total A AX AY B BX BY Total X AX BX Y AY BY Examples Labour turnover by occupa on and state Spectacle sales by brand, gender, stores, etc. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 40
on of me series that can be grouped together in a number of non-hierarchical ways. Total A AX AY B BX BY Total X AX BX Y AY BY Examples Labour turnover by occupa on and state Spectacle sales by brand, gender, stores, etc. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 40
nodes such that the forecasts add up in the same way as the original data? 2 Can we exploit rela onships between the series to improve the forecasts? Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 41
nodes such that the forecasts add up in the same way as the original data? 2 Can we exploit rela onships between the series to improve the forecasts? Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 41
of aggrega on using an automa c forecas ng algorithm (e.g., ets, auto.arima, ...) 2 Reconcile the resul ng forecasts so they add up correctly using least squares op miza on (i.e., find closest reconciled forecasts to the original forecasts). 3 This is all available in the hts package in R. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 42
of aggrega on using an automa c forecas ng algorithm (e.g., ets, auto.arima, ...) 2 Reconcile the resul ng forecasts so they add up correctly using least squares op miza on (i.e., find closest reconciled forecasts to the original forecasts). 3 This is all available in the hts package in R. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 42
of aggrega on using an automa c forecas ng algorithm (e.g., ets, auto.arima, ...) 2 Reconcile the resul ng forecasts so they add up correctly using least squares op miza on (i.e., find closest reconciled forecasts to the original forecasts). 3 This is all available in the hts package in R. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 42
ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
, yA,t , yB,t , yC,t ] = 1 1 1 1 0 0 0 1 0 0 0 1 yA,t yB,t yC,t Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
, yA,t , yB,t , yC,t ] = 1 1 1 1 0 0 0 1 0 0 0 1 S yA,t yB,t yC,t Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
, yA,t , yB,t , yC,t ] = 1 1 1 1 0 0 0 1 0 0 0 1 S yA,t yB,t yC,t bt Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
, yA,t , yB,t , yC,t ] = 1 1 1 1 0 0 0 1 0 0 0 1 S yA,t yB,t yC,t bt yt = Sbt Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 43 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.
series with aggrega on constraints can be wri en as yt = Sbt where yt is a vector of all series at me t bt is a vector of the most disaggregated series at me t S is a “summing matrix” containing the aggrega on constraints. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 46
of ini al h-step forecasts, made at me n, stacked in same order as yt. (In general, they will not “add up”.) Reconciled forecasts must be of the form: ˜ yn (h) = SPˆ yn (h) for some matrix P. P extracts and combines base forecasts ˆ yn (h) to get bo om-level forecasts. S adds them up Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 47
of ini al h-step forecasts, made at me n, stacked in same order as yt. (In general, they will not “add up”.) Reconciled forecasts must be of the form: ˜ yn (h) = SPˆ yn (h) for some matrix P. P extracts and combines base forecasts ˆ yn (h) to get bo om-level forecasts. S adds them up Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 47
of ini al h-step forecasts, made at me n, stacked in same order as yt. (In general, they will not “add up”.) Reconciled forecasts must be of the form: ˜ yn (h) = SPˆ yn (h) for some matrix P. P extracts and combines base forecasts ˆ yn (h) to get bo om-level forecasts. S adds them up Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 47
of ini al h-step forecasts, made at me n, stacked in same order as yt. (In general, they will not “add up”.) Reconciled forecasts must be of the form: ˜ yn (h) = SPˆ yn (h) for some matrix P. P extracts and combines base forecasts ˆ yn (h) to get bo om-level forecasts. S adds them up Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 47
SPˆ yn (h) Bias Reconciled forecasts are unbiased iff SPS = S. Variance Let error variance of h-step base forecasts ˆ yn (h) be Wh = Var[yn+h − ˆ yn (h) | y1 , . . . , yn ] Then the error variance of the corresponding reconciled forecasts is Var[yn+h − ˜ yn (h) | y1 , . . . , yn ] = SPWh P S Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 48
SPˆ yn (h) Bias Reconciled forecasts are unbiased iff SPS = S. Variance Let error variance of h-step base forecasts ˆ yn (h) be Wh = Var[yn+h − ˆ yn (h) | y1 , . . . , yn ] Then the error variance of the corresponding reconciled forecasts is Var[yn+h − ˜ yn (h) | y1 , . . . , yn ] = SPWh P S Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 48
SPˆ yn (h) Bias Reconciled forecasts are unbiased iff SPS = S. Variance Let error variance of h-step base forecasts ˆ yn (h) be Wh = Var[yn+h − ˆ yn (h) | y1 , . . . , yn ] Then the error variance of the corresponding reconciled forecasts is Var[yn+h − ˜ yn (h) | y1 , . . . , yn ] = SPWh P S Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 48
yn (h) Theorem: MinT Reconcilia on For any P sa sfying SPS = S, then minP = trace[SPWh P S ] has solu on P = (S W−1 h S)−1S W−1 h . Reconciled forecasts Base forecasts Es mate W1 using shrinkage to the diagonal and assume that Wh = kh W1. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 49 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)
yn (h) Theorem: MinT Reconcilia on For any P sa sfying SPS = S, then minP = trace[SPWh P S ] has solu on P = (S W−1 h S)−1S W−1 h . Reconciled forecasts Base forecasts Es mate W1 using shrinkage to the diagonal and assume that Wh = kh W1. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 49 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)
yn (h) Theorem: MinT Reconcilia on For any P sa sfying SPS = S, then minP = trace[SPWh P S ] has solu on P = (S W−1 h S)−1S W−1 h . Reconciled forecasts Base forecasts Es mate W1 using shrinkage to the diagonal and assume that Wh = kh W1. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 49 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)
yn (h) Theorem: MinT Reconcilia on For any P sa sfying SPS = S, then minP = trace[SPWh P S ] has solu on P = (S W−1 h S)−1S W−1 h . Reconciled forecasts Base forecasts Es mate W1 using shrinkage to the diagonal and assume that Wh = kh W1. Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 49 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)
h = 2 h = 3 h = 4 h = 5 h = 6 Ave Australia Base 1762.04 1770.29 1766.02 1818.82 1705.35 1721.17 1757.28 Bo om 1736.92 1742.69 1722.79 1752.74 1666.73 1687.43 1718.22 MinT 1704.64 1715.60 1705.31 1729.04 1626.36 1661.64 1690.43 States Base 399.77 404.16 401.92 407.26 395.38 401.17 401.61 Bo om 404.29 406.95 404.96 409.02 399.80 401.55 404.43 MinT 398.84 402.16 400.86 405.03 394.59 398.22 399.95 Regions Base 93.15 93.38 93.45 93.79 93.50 93.56 93.47 Bo om 93.15 93.38 93.45 93.79 93.50 93.56 93.47 MinT 92.98 93.27 93.34 93.66 93.34 93.46 93.34 Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 53
(CRAN/github) Free textbook OTexts.org/fpp2 Papers and blog robjhyndman.com Visualizing and forecas ng big me series data Forecas ng hierarchical & grouped me series 54