Slide 1

Slide 1 text

1 Analysing sub-daily time series data Rob J Hyndman Earo Wang Mitchell O’Hara-Wild

Slide 2

Slide 2 text

NUMBATS Non-Uniform Monash Business Analytics Team 2

Slide 3

Slide 3 text

NUMBATS Non-Uniform Monash Business Analytics Team Di Cook Earo Wang Mitchell O’Hara-Wild 2

Slide 4

Slide 4 text

Pedestrian counts 3

Slide 5

Slide 5 text

Pedestrian counts 0 1000 2000 3000 4000 Jul 2016 Oct 2016 Jan 2017 Apr 2017 Date Pedestrians counted Hourly pedestrian traffic at Southern Cross Station 0 1000 2000 3000 4000 Apr 01 Apr 15 May 01 May 15 Jun 01 Date Pedestrians counted Hourly pedestrian traffic at Southern Cross Station 4

Slide 6

Slide 6 text

Pedestrian counts Weekday Weekend 00 AM 06 AM 12 PM 18 PM 00 AM 00 AM 06 AM 12 PM 18 PM 00 AM 0 1000 2000 3000 4000 Time Total pedestrians counted Seasonality in pedestrian traffic at Southern Cross Station 5

Slide 7

Slide 7 text

Call volume 0 100 200 300 400 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Weeks Call volume 5 minute call volume at North American bank 100 200 300 400 1 2 3 4 Weeks Call volume 6

Slide 8

Slide 8 text

Electricity demand 4 6 8 Jan 2014 Apr 2014 Jul 2014 Oct 2014 Jan 2015 Date Electricity demanded (GW) Half−hourly electricity demand for Victoria 3 4 5 6 Sep 01 Sep 15 Oct 01 Oct 15 Nov 01 Date Electricity demanded (GW) Half−hourly electricity demand for Victoria 7

Slide 9

Slide 9 text

Challenges Visualization Even plotting a single time series comprising one year of data, it is hard to see the interesting features. 8

Slide 10

Slide 10 text

Challenges Visualization Even plotting a single time series comprising one year of data, it is hard to see the interesting features. R classes The ts, zoo, xts and other time series classes do not work well with sub-daily data. Newer packages (timetk and tibbletime) do not play nicely with modelling functions. 8

Slide 11

Slide 11 text

Earo Wang 9

Slide 12

Slide 12 text

Challenges Visualization Even plotting a single time series comprising one year of data, it is hard to see the interesting features. R classes The ts, zoo, xts and other time series classes do not work well with sub-daily data. Newer packages (timetk and tibbletime) do not play nicely with modelling functions. 10

Slide 13

Slide 13 text

Challenges Visualization Even plotting a single time series comprising one year of data, it is hard to see the interesting features. R classes The ts, zoo, xts and other time series classes do not work well with sub-daily data. Newer packages (timetk and tibbletime) do not play nicely with modelling functions. Forecasting Most time series modelling frameworks handle sub-daily data poorly. Available models include tbats and prophet, but they have limitations. 10

Slide 14

Slide 14 text

TBATS model TBATS Trigonometric terms for seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Handles non-integer seasonality, multiple seasonal periods. Entirely automated Prediction intervals often too wide Very slow on long series No exogenous predictors 11

Slide 15

Slide 15 text

TBATS model library(forecast) calls %>% tbats %>% forecast %>% autoplot(include=2500) 0 200 400 600 800 31 32 33 34 35 36 Time . level 80 95 Forecasts from TBATS(0.555, {0,0}, −, {<169,6>, <845,4>}) 12

Slide 16

Slide 16 text

prophet Additive regression model developed at Facebook yt = gt + st + ht + εt yt = time series. gt = piecewise linear growth function st = Fourier seasonal terms: daily, weekly and/or yearly ht = holiday effect. εt = error (can be ARMA errors). Estimated as a Bayesian regression using Stan 13

Slide 17

Slide 17 text

Daily blog traffic Daily pageviews for the Hyndsight blog (2014−2015) Month Pageviews 500 1000 1500 2000 2500 May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr 14

Slide 18

Slide 18 text

prophet example library(prophet) m <- prophet(hyndsight) future <- make_future_dataframe(m, periods = 365) forecast <- predict(m, future) plot(m, forecast) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 1000 2000 3000 2014−07 2015−01 2015−07 2016−01 ds y 15

Slide 19

Slide 19 text

prophet pros and cons Pros Completely automatic including changepoints Handles multiple seasonality and holiday effects Cons Seems to overfit annual seasonality Number of Fourier terms is hard-coded 16

Slide 20

Slide 20 text

Mitchell O’Hara-Wild 17

Slide 21

Slide 21 text

Watch this space 18 https://github.com/earowang/tsibble http://pkg.earo.me/sugrrants https://github.com/mitchelloharawild/fasster http://pkg.robjhyndman.com/forecast http://pkg.earo.me/hts Slides available at robjhyndman.com