Forecasting large collections of related time series

Rob J Hyndman Forecas ng large collec ons of related
me series

Outline 1 Hierarchical and grouped me series 2 BLUF: Best
Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series Hierarchical and grouped me series 2

Labour market par cipa on Australia and New Zealand Standard
Classiﬁca on of Occupa ons 8 major groups 43 sub-major groups 97 minor groups – 359 unit groups * 1023 occupa ons Example: Sta s cian 2 Professionals 22 Business, Human Resource and Marke ng Professionals 224 Informa on and Organisa on Professionals 2241 Actuaries, Mathema cians and Sta s cians 224113 Sta s cian Forecas ng large collec ons of related me series Hierarchical and grouped me series 3

Australian tourism demand Forecas ng large collec ons of related
me series Hierarchical and grouped me series 4

Australian tourism demand Forecas ng large collec ons of related
me series Hierarchical and grouped me series 4 Quarterly data on visitor night from 1998:Q1 – 2013:Q4 From Na onal Visitor Survey, based on annual interviews of 120,000 Australians aged 15+, collected by Tourism Research Australia. Split by 7 states, 27 zones and 76 regions (a geographical hierarchy) Also split by purpose of travel Holiday Visi ng friends and rela ves (VFR) Business Other 304 bo om-level series

Spectacle sales Forecas ng large collec ons of related me
series Hierarchical and grouped me series 5 Monthly UK sales data from 2000 – 2014 Provided by a large spectacle manufacturer Split by brand (26), gender (3), price range (6), materials (4), and stores (600) About 1 million bo om-level series

Hierarchical me series A hierarchical me series is a collec
on of several me series that are linked together in a hierarchical structure. Total A AA AB AC B BA BB BC C CA CB CC Examples Labour turnover by occupa on Tourism by state and region Forecas ng large collec ons of related me series Hierarchical and grouped me series 6

Grouped me series A grouped me series is a collec
on of me series that can be grouped together in a number of non-hierarchical ways. Total A AX AY B BX BY Total X AX BX Y AY BY Examples Labour turnover by occupa on and state Tourism by region and purpose of travel Spectacle sales by brand, gender, stores, etc. Forecas ng large collec ons of related me series Hierarchical and grouped me series 7

tl;dr 1 Forecast all series at all levels of aggrega
on using an automa c forecas ng algorithm (e.g., ets, auto.arima, ...) 2 Reconcile the resul ng forecasts so they add up correctly using least squares op miza on (i.e., ﬁnd closest reconciled forecasts to the original forecasts). 3 This is all available in the hts package in R. Forecas ng large collec ons of related me series Hierarchical and grouped me series 8

Hierarchical me series Total A B C Forecas ng large
collec ons of related me series Hierarchical and grouped me series 9 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.

Hierarchical me series Total A B C yt = [yt
, yA,t , yB,t , yC,t ] =     1 1 1 1 0 0 0 1 0 0 0 1       yA,t yB,t yC,t   Forecas ng large collec ons of related me series Hierarchical and grouped me series 9 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.

, yA,t , yB,t , yC,t ] =     1 1 1 1 0 0 0 1 0 0 0 1     S   yA,t yB,t yC,t   Forecas ng large collec ons of related me series Hierarchical and grouped me series 9 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.

, yA,t , yB,t , yC,t ] =     1 1 1 1 0 0 0 1 0 0 0 1     S   yA,t yB,t yC,t   bt Forecas ng large collec ons of related me series Hierarchical and grouped me series 9 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.

, yA,t , yB,t , yC,t ] =     1 1 1 1 0 0 0 1 0 0 0 1     S   yA,t yB,t yC,t   bt yt = Sbt Forecas ng large collec ons of related me series Hierarchical and grouped me series 9 yt : observed aggregate of all series at me t. yX,t : observa on on series X at me t. bt : vector of all series at bo om level in me t.

Hierarchical me series Total A AX AY AZ B BX
BY BZ C CX CY CZ yt =             yt yA,t yB,t yC,t yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t        bt Forecas ng large collec ons of related me series Hierarchical and grouped me series 10

Hierarchical me series Total A AX AY AZ B BX
BY BZ C CX CY CZ yt =             yt yA,t yB,t yC,t yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t        bt Forecas ng large collec ons of related me series Hierarchical and grouped me series 10 yt = Sbt

Grouped data AX AY A BX BY B X Y
Total yt =             yt yA,t yB,t yX,t yY,t yAX,t yAY,t yBX,t yBY,t             =             1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1             S    yAX,t yAY,t yBX,t yBY,t    bt Forecas ng large collec ons of related me series Hierarchical and grouped me series 11

Grouped data AX AY A BX BY B X Y
Total yt =             yt yA,t yB,t yX,t yY,t yAX,t yAY,t yBX,t yBY,t             =             1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1             S    yAX,t yAY,t yBX,t yBY,t    bt Forecas ng large collec ons of related me series Hierarchical and grouped me series 11 yt = Sbt

Hierarchical and grouped me series Every collec on of me
series with aggrega on constraints can be wri en as yt = Sbt where yt is a vector of all series at me t bt is a vector of the most disaggregated series at me t S is a “summing matrix” containing the aggrega on constraints. Forecas ng large collec ons of related me series Hierarchical and grouped me series 12

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 13

Forecas ng nota on Let ˆ yn (h) be vector
of ini al h-step forecasts, made at me n, stacked in same order as yt. (In general, they will not “add up”.) Reconciled forecasts must be of the form: ˜ yn (h) = SPˆ yn (h) for some matrix P. P extracts and combines base forecasts ˆ yn (h) to get bo om-level forecasts. S adds them up Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 14

General proper es: bias ˜ yn (h) = SPˆ yn
(h) Assume: base forecasts ˆ yn (h) are unbiased: E[ˆ yn (h) | y1 , . . . , yn ] = E[yn+h | y1 , . . . , yn ] Let ˆ bn (h) be bo om level base forecasts with βn (h) = E[ˆ bn (h) | y1 , . . . , yn ]. Then E[ˆ yn (h)] = Sβn (h). We want the reconciled forecasts to be unbiased: E[˜ yn (h)] = SPSβn (h) = Sβn (h). Reconciled forecasts are unbiased iﬀ SPS = S. Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 15

General proper es: variance ˜ yn (h) = SPˆ yn
(h) Let error variance of h-step base forecasts ˆ yn (h) be Wh = Var[yn+h − ˆ yn (h) | y1 , . . . , yn ] Then the error variance of the corresponding reconciled forecasts is Var[yn+h − ˜ yn (h) | y1 , . . . , yn ] = SPWh P S Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 16

BLUF via trace minimiza on Theorem For any P sa
sfying SPS = S, then min P = trace[SPWh P S ] has solu on P = (S W−1 h S)−1S W−1 h . Var[yn+h − ˜ yn (h) | y1 , . . . , yn ] = (S W−1 h S)−1S Problem: Wh hard to es mate, especially for h > 1. Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 17

Op mal combina on forecasts Reconciled forecasts Base forecasts Solu
on 1: OLS Assume Wh ≈ kh I. ˜ yn (h) = S(S S)−1S ˆ yn (h) Reconcilia on does not depend on data Works surprisingly well. S ll need to es mate covariance matrix to produce predic on intervals. Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 18 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)

on 2: WLS Suppose we approximate W1 by its diagonal and assume that Wh = kh W1. Easy to es mate, and places weight where we have best forecasts. S ll need to es mate covariance matrix to produce predic on intervals. Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 19 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)

on 3: GLS Es mate W1 using shrinkage to the diagonal and assume that Wh = kh W1. Allows for covariances. Diﬃcult to compute for large numbers of me series. Forecas ng large collec ons of related me series BLUF: Best Linear Unbiased Forecasts 20 ˜ yn (h) = S(S W−1 h S)−1S W−1 h ˆ yn (h)

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series Applica on: Australian tourism 21

Australian tourism Forecas ng large collec ons of related me
series Applica on: Australian tourism 22

series Applica on: Australian tourism 22 Hierarchy: States (7) Zones (27) Regions (82)

series Applica on: Australian tourism 22 Hierarchy: States (7) Zones (27) Regions (82) Base forecasts ETS (exponen al smoothing) models

Base forecasts Forecas ng large collec ons of related me
series Applica on: Australian tourism 23 Domestic tourism forecasts: Total Year Visitor nights 1998 2000 2002 2004 2006 2008 60000 65000 70000 75000 80000 85000

series Applica on: Australian tourism 23 Domestic tourism forecasts: NSW Year Visitor nights 1998 2000 2002 2004 2006 2008 18000 22000 26000 30000

series Applica on: Australian tourism 23 Domestic tourism forecasts: VIC Year Visitor nights 1998 2000 2002 2004 2006 2008 10000 12000 14000 16000 18000

series Applica on: Australian tourism 23 Domestic tourism forecasts: Nth.Coast.NSW Year Visitor nights 1998 2000 2002 2004 2006 2008 5000 6000 7000 8000 9000

series Applica on: Australian tourism 23 Domestic tourism forecasts: Metro.QLD Year Visitor nights 1998 2000 2002 2004 2006 2008 8000 9000 11000 13000

series Applica on: Australian tourism 23 Domestic tourism forecasts: Sth.WA Year Visitor nights 1998 2000 2002 2004 2006 2008 400 600 800 1000 1200 1400

series Applica on: Australian tourism 23 Domestic tourism forecasts: X201.Melbourne Year Visitor nights 1998 2000 2002 2004 2006 2008 4000 4500 5000 5500 6000

series Applica on: Australian tourism 23 Domestic tourism forecasts: X402.Murraylands Year Visitor nights 1998 2000 2002 2004 2006 2008 0 100 200 300

series Applica on: Australian tourism 23 Domestic tourism forecasts: X809.Daly Year Visitor nights 1998 2000 2002 2004 2006 2008 0 20 40 60 80 100

Forecast evalua on Forecas ng large collec ons of related
me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

me series Applica on: Australian tourism 24 Training sets Test sets h = 1 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q time

Hierarchy: states, zones, regions Forecast horizon RMSE h = 1
h = 2 h = 3 h = 4 h = 5 h = 6 Ave Australia Base 1762.04 1770.29 1766.02 1818.82 1705.35 1721.17 1757.28 Bo om 1736.92 1742.69 1722.79 1752.74 1666.73 1687.43 1718.22 OLS 1747.60 1757.68 1751.77 1800.67 1686.00 1706.45 1741.69 WLS 1705.21 1715.87 1703.75 1729.56 1627.79 1661.24 1690.57 GLS 1704.64 1715.60 1705.31 1729.04 1626.36 1661.64 1690.43 States Base 399.77 404.16 401.92 407.26 395.38 401.17 401.61 Bo om 404.29 406.95 404.96 409.02 399.80 401.55 404.43 OLS 404.47 407.62 405.43 413.79 401.10 404.90 406.22 WLS 398.84 402.12 400.71 405.03 394.76 398.23 399.95 GLS 398.84 402.16 400.86 405.03 394.59 398.22 399.95 Regions Base 93.15 93.38 93.45 93.79 93.50 93.56 93.47 Bo om 93.15 93.38 93.45 93.79 93.50 93.56 93.47 OLS 93.28 93.53 93.64 94.17 93.78 93.88 93.71 WLS 93.02 93.32 93.38 93.72 93.39 93.53 93.39 GLS 92.98 93.27 93.34 93.66 93.34 93.46 93.34 Forecas ng large collec ons of related me series Applica on: Australian tourism 25

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series Fast computa on tricks 26

Fast computa on: hierarchical data Total A AX AY AZ
B BX BY BZ C CX CY CZ yt =             yt yA,t yB,t yC,t yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t        bt Forecas ng large collec ons of related me series Fast computa on tricks 27 yt = Sbt

Fast computa on: hierarchical data Total A AX AY AZ
B BX BY BZ C CX CY CZ yt =             yt yA,t yAX,t yAY,t yAZ,t yB,t yBX,t yBY,t yBZ,t yC,t yCX,t yCY,t yCZ,t             =             1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1             S        yAX,t yAY,t yAZ,t yBX,t yBY,t yBZ,t yCX,t yCY,t yCZ,t        bt Forecas ng large collec ons of related me series Fast computa on tricks 28 yt = Sbt

Fast computa on: hierarchies Think of the hierarchy as a
tree of trees: Total T1 T2 ... TK Then the summing matrix contains k smaller summing matrices: S =       1n1 1n2 · · · 1nK S1 0 · · · 0 0 S2 · · · 0 . . . . . . ... . . . 0 0 · · · SK       where 1n is an n-vector of ones and tree Ti has ni terminal nodes. Forecas ng large collec ons of related me series Fast computa on tricks 29

Fast computa on: hierarchies SΛS =    
S1 Λ1 S1 0 · · · 0 0 S2 Λ2 S2 · · · 0 . . . . . . ... . . . 0 0 · · · SK ΛK SK     + λ0 Jn λ0 is the top le element of Λ; Λk is a block of Λ, corresponding to tree Tk; Jn is a matrix of ones; n = k nk. Now apply the Sherman-Morrison formula ... Forecas ng large collec ons of related me series Fast computa on tricks 30

Fast computa on: hierarchies (S ΛS)−1 =   
 (S1 Λ1 S1 )−1 0 · · · 0 0 (S2 Λ2 S2 )−1 · · · 0 . . . . . . ... . . . 0 0 · · · (SK ΛK SK )−1     − cS0 S0 can be par oned into K2 blocks, with the (k, ) block (of dimension nk × n ) being (Sk Λk Sk )−1Jnk ,n (S Λ S )−1 Jnk ,n is a nk × n matrix of ones. c−1 = λ−1 0 + k 1nk (Sk Λk Sk )−11nk . Each Sk Λk Sk can be inverted similarly. S Λy can also be computed recursively. Forecas ng large collec ons of related me series Fast computa on tricks 31

Fast computa on: hierarchies (S ΛS)−1 =   
 (S1 Λ1 S1 )−1 0 · · · 0 0 (S2 Λ2 S2 )−1 · · · 0 . . . . . . ... . . . 0 0 · · · (SK ΛK SK )−1     − cS0 S0 can be par oned into K2 blocks, with the (k, ) block (of dimension nk × n ) being (Sk Λk Sk )−1Jnk ,n (S Λ S )−1 Jnk ,n is a nk × n matrix of ones. c−1 = λ−1 0 + k 1nk (Sk Λk Sk )−11nk . Each Sk Λk Sk can be inverted similarly. S Λy can also be computed recursively. Forecas ng large collec ons of related me series Fast computa on tricks 31 The recursive calcula ons can be done in such a way that we never store any of the large matrices involved.

Fast computa on A similar algorithm has been developed for
grouped me series with two groups. When the me series are not strictly hierarchical and have more than two grouping variables: Use sparse matrix storage and arithme c. Use itera ve approxima on for inver ng large sparse matrices. Paige & Saunders (1982) ACM Trans. Math. So ware Forecas ng large collec ons of related me series Fast computa on tricks 32

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series hts package for R 33

hts package for R Forecas ng large collec ons of
related me series hts package for R 34 hts: Hierarchical and Grouped Time Series Methods for analysing and forecas ng hierarchical and grouped me series Version: 5.0 Depends: R ( 3.0.2), forecast ( 5.0), SparseM, Matrix, matrixcalc Imports: parallel, u ls, methods, graphics, grDevices, stats LinkingTo: Rcpp ( 0.11.0), RcppEigen Suggests: tes hat Published: 2016-04-06 Author: Rob J Hyndman, Earo Wang, Alan Lee, Shanika Wickramasuriya Maintainer: Rob J Hyndman <Rob.Hyndman at monash.edu> BugReports: https://github.com/robjhyndman/hts/issues License: GPL ( 2)

Example using R library(hts) # bts is a matrix containing
the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) Forecas ng large collec ons of related me series hts package for R 35

the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) Forecas ng large collec ons of related me series hts package for R 35 Total A AX AY AZ B BX BY

the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) # Forecast 10-step-ahead using WLS combination method # ETS used for each series by default fc <- forecast(y, h=10) Forecas ng large collec ons of related me series hts package for R 36

forecast.gts func on Usage forecast(object, h, method = c("comb", "bu",
"mo","tdgsa", "tdgsf", "tdfp"), weights = c("wls", "ols", "mint", "nseries"), fmethod = c("ets", "arima", "rw"), algorithms = c("lu", "cg", "chol", "recursive", "slm"), covariance = c("shr", "sam"), positive = FALSE, parallel = FALSE, num.cores = 2, ...) Arguments object Hierarchical me series object of class gts. h Forecast horizon method Method for distribu ng forecasts within the hierarchy. weights Weights used for “op mal combina on” method. When weights = “sd”, it takes account of the standard devia on of forecasts. fmethod Forecas ng method to use algorithm Method for solving regression equa ons positive If TRUE, forecasts are forced to be strictly posi ve parallel If TRUE, allow parallel processing num.cores If parallel = TRUE, specify how many cores are going to be used Forecas ng large collec ons of related me series hts package for R 37

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series Temporal hierarchies 38

Temporal hierarchies Annual Semi-Annual1 Q1 Q2 Semi-Annual2 Q3 Q4 Basic
idea: ¯ Forecast series at each available frequency. ¯ Op mally reconcile forecasts within the same year. Forecas ng large collec ons of related me series Temporal hierarchies 39

Monthly series Annual Semi-Annual1 Q1 M1 M2 M3 Q2 M4
M5 M6 Semi-Annual2 Q3 M7 M8 M9 Q4 M10 M11 M12 k = 2, 4, 12 nodes k = 3, 6, 12 nodes Why not k = 2, 3, 4, 6, 12 nodes? Forecas ng large collec ons of related me series Temporal hierarchies 40

Monthly series Annual FourM1 BiM1 M1 M2 BiM2 M3 M4
FourM2 BiM3 M5 M6 BiM4 M7 M8 FourM3 BiM5 M9 M10 BiM6 M11 M12 k = 2, 4, 12 nodes k = 3, 6, 12 nodes Why not k = 2, 3, 4, 6, 12 nodes? Forecas ng large collec ons of related me series Temporal hierarchies 40

Monthly data        
          A SemiA1 SemiA2 FourM1 FourM2 FourM3 Q1 . . . Q4 BiM1 . . . BiM6 M1 . . . M12                   (28×1) =                   1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 0 0 0 1 1 I12                   S                M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12                Bt Forecas ng large collec ons of related me series Temporal hierarchies 41

In general For a me series y1 , . .
. , yT, observed at frequency m, we generate aggregate series y[k] j = jk t=1+(j−1)k yt , for j = 1, . . . , T/k k ∈ F(m) = {factors of m}. A single unique hierarchy is only possible when there are no coprime pairs in F(m). Mk = m/k is seasonal period of aggregated series. Forecas ng large collec ons of related me series Temporal hierarchies 42

WLS weights Hierarchy variance scaling ΛH: diagonal. Series variance scaling
ΛV: elements equal within aggrega on level. Structural scaling ΛS = diag(S1): elements equal to # nodes at each level. Depends only on seasonal period m. Independent of data and model. Allows forecasts where no errors available. Quarterly example ΛH = diag ˆ σ2 A , ˆ σ2 S1 , ˆ σ2 S2 , ˆ σ2 Q1 , ˆ σ2 Q2 , ˆ σ2 Q3 , ˆ σ2 Q4 ΛV = diag ˆ σ2 A , ˆ σ2 S , ˆ σ2 S , ˆ σ2 Q , ˆ σ2 Q , ˆ σ2 Q , ˆ σ2 Q ΛS = diag 4, 2, 2, 1, 1, 1, 1 Forecas ng large collec ons of related me series Temporal hierarchies 43

UK Accidents and Emergency Demand Forecas ng large collec ons
of related me series Temporal hierarchies 44 1 2 3 4 5 6 5100000 5300000 5500000 Annual (k=52) Forecast 2 4 6 8 10 12 2500000 2700000 2900000 Semi−annual (k=26) Forecast 5 10 15 20 25 1250000 1350000 1450000 Quarterly (k=13) Forecast 20 40 60 80 360000 400000 440000 Monthly (k=4) Forecast 50 100 150 180000 200000 220000 Bi−weekly (k=2) Forecast 50 100 150 200 250 300 90000 100000 110000 Weekly (k=1) Forecast – – – – base reconciled

UK Accidents and Emergency Demand 1 Type 1 Departments —
Major A&E 2 Type 2 Departments — Single Specialty 3 Type 3 Departments — Other A&E/Minor Injury 4 Total A endances 5 Type 1 Departments — Major A&E > 4 hrs 6 Type 2 Departments — Single Specialty > 4 hrs 7 Type 3 Departments — Other A&E/Minor Injury > 4 hrs 8 Total A endances > 4 hrs 9 Emergency Admissions via Type 1 A&E 10 Total Emergency Admissions via A&E 11 Other Emergency Admissions (i.e., not via A&E) 12 Total Emergency Admissions 13 Number of pa ents spending > 4 hrs from decision to admission Forecas ng large collec ons of related me series Temporal hierarchies 45

UK Accidents and Emergency Demand Minimum training set: all data
except the last year Base forecasts using auto.arima(). Reconciled using WLSV. Mean Absolute Scaled Errors for 1, 4 and 13 weeks ahead using a rolling origin. Aggr. Level h Base Reconciled Change Weekly 1 1.6 1.3 −17.2% Weekly 4 1.9 1.5 −18.6% Weekly 13 2.3 1.9 −16.2% Weekly 1–52 2.0 1.9 −5.0% Annual 1 3.4 1.9 −42.9% Forecas ng large collec ons of related me series Temporal hierarchies 46

Experimental setup: M3 forecas ng compe on (Makridakis and Hibon,
2000, IJF). In total 3003 series. 1,428 monthly series with a test sample of 12 observa ons each. 756 quarterly series with a test sample of 8 observa ons each. Forecast each series with ETS models. Forecas ng large collec ons of related me series Temporal hierarchies 47

Results: Monthly MAE percent diﬀerence rela ve to base max
h BU WLSH WLSV WLSS Annual 1 −19.6 −22.0 −22.0 −25.1 Semi-annual 3 0.6 −4.0 −3.6 −5.4 Four-monthly 4 2.0 −2.4 −2.2 −3.0 Quarterly 6 2.4 −1.6 −1.7 −2.8 Bi-monthly 9 0.7 −2.9 −3.3 −4.3 Monthly 18 0.0 −2.2 −3.2 −3.9 Forecas ng large collec ons of related me series Temporal hierarchies 48

Results: Quarterly MAE percent diﬀerence rela ve to base max
h BU WLSH WLSV WLSS Annual 1 −20.9 -22.7 −22.8 -22.7 Semi-annual 3 −4.5 −6.0 −6.2 -4.8 Quarterly 6 0.0 −0.2 −1.1 -0.3 Forecas ng large collec ons of related me series Temporal hierarchies 49

thief package for R thief: Temporal HIErarchical Forecas ng Install
from CRAN install.packages("thief") Install from github library(devtools) install github("/robjhyndman/thief") Usage thief(y) Forecas ng large collec ons of related me series Temporal hierarchies 50

Linear Unbiased Forecasts 3 Applica on: Australian tourism 4 Fast computa on tricks 5 hts package for R 6 Temporal hierarchies 7 References Forecas ng large collec ons of related me series References 51

References Rob J Hyndman, Roman A Ahmed, George Athanasopoulos, and
Han Lin Shang (2011). “Op mal combina on forecasts for hierarchical me series”. Computa onal Sta s cs & Data Analysis 55(9), 2579–2589. Rob J Hyndman, Alan J Lee, and Earo Wang (2016). “Fast computa on of reconciled forecasts for hierarchical and grouped me series”. Computa onal Sta s cs & Data Analysis 97, 16–32. Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman (2015). Forecas ng hierarchical and grouped me series through trace minimiza on. Working paper 15/15. Monash University George Athanasopoulos, Rob J Hyndman, Nikolaos Kourentzes, and Fo os Petropoulos (2015). Forecas ng with temporal hierarchies. Working paper. Monash University Rob J Hyndman, Alan J Lee, Earo Wang, and Shanika Wickramasuriya (2016). hts: Hierarchical and Grouped Time Series. R package v5.0 on CRAN. Rob J Hyndman and Nikolaos Kourentzes (2016). thief: Temporal Hierarchical Forecas ng. R package v0.2 on CRAN. Forecas ng large collec ons of related me series References 52

References Rob J Hyndman, Roman A Ahmed, George Athanasopoulos, and
Han Lin Shang (2011). “Op mal combina on forecasts for hierarchical me series”. Computa onal Sta s cs & Data Analysis 55(9), 2579–2589. Rob J Hyndman, Alan J Lee, and Earo Wang (2016). “Fast computa on of reconciled forecasts for hierarchical and grouped me series”. Computa onal Sta s cs & Data Analysis 97, 16–32. Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman (2015). Forecas ng hierarchical and grouped me series through trace minimiza on. Working paper 15/15. Monash University George Athanasopoulos, Rob J Hyndman, Nikolaos Kourentzes, and Fo os Petropoulos (2015). Forecas ng with temporal hierarchies. Working paper. Monash University Rob J Hyndman, Alan J Lee, Earo Wang, and Shanika Wickramasuriya (2016). hts: Hierarchical and Grouped Time Series. R package v5.0 on CRAN. Rob J Hyndman and Nikolaos Kourentzes (2016). thief: Temporal Hierarchical Forecas ng. R package v0.2 on CRAN. Forecas ng large collec ons of related me series References 52 ¯ More informa on: robjhyndman.com

Forecasting large collections of related time s...

Forecasting large collections of related time series

More Decks by Rob J Hyndman

Other Decks in Research

Featured

Transcript