1
Analysing
sub-daily time
series data
Rob J Hyndman
Earo Wang
Mitchell O’Hara-Wild
Slide 2
Slide 2 text
NUMBATS
Non-Uniform Monash Business Analytics Team
2
Slide 3
Slide 3 text
NUMBATS
Non-Uniform Monash Business Analytics Team
Di Cook Earo Wang Mitchell O’Hara-Wild 2
Slide 4
Slide 4 text
Pedestrian counts
3
Slide 5
Slide 5 text
Pedestrian counts
0
1000
2000
3000
4000
Jul 2016 Oct 2016 Jan 2017 Apr 2017
Date
Pedestrians counted
Hourly pedestrian traffic at Southern Cross Station
0
1000
2000
3000
4000
Apr 01 Apr 15 May 01 May 15 Jun 01
Date
Pedestrians counted
Hourly pedestrian traffic at Southern Cross Station
4
Slide 6
Slide 6 text
Pedestrian counts
Weekday Weekend
00 AM
06 AM
12 PM
18 PM
00 AM
00 AM
06 AM
12 PM
18 PM
00 AM
0
1000
2000
3000
4000
Time
Total pedestrians counted
Seasonality in pedestrian traffic at Southern Cross Station
5
Electricity demand
4
6
8
Jan 2014 Apr 2014 Jul 2014 Oct 2014 Jan 2015
Date
Electricity demanded (GW)
Half−hourly electricity demand for Victoria
3
4
5
6
Sep 01 Sep 15 Oct 01 Oct 15 Nov 01
Date
Electricity demanded (GW)
Half−hourly electricity demand for Victoria
7
Slide 9
Slide 9 text
Challenges
Visualization
Even plotting a single time series comprising one year
of data, it is hard to see the interesting features.
8
Slide 10
Slide 10 text
Challenges
Visualization
Even plotting a single time series comprising one year
of data, it is hard to see the interesting features.
R classes
The ts, zoo, xts and other time series classes do not
work well with sub-daily data.
Newer packages (timetk and tibbletime) do not
play nicely with modelling functions.
8
Slide 11
Slide 11 text
Earo Wang
9
Slide 12
Slide 12 text
Challenges
Visualization
Even plotting a single time series comprising one year
of data, it is hard to see the interesting features.
R classes
The ts, zoo, xts and other time series classes do not
work well with sub-daily data.
Newer packages (timetk and tibbletime) do not
play nicely with modelling functions.
10
Slide 13
Slide 13 text
Challenges
Visualization
Even plotting a single time series comprising one year
of data, it is hard to see the interesting features.
R classes
The ts, zoo, xts and other time series classes do not
work well with sub-daily data.
Newer packages (timetk and tibbletime) do not
play nicely with modelling functions.
Forecasting
Most time series modelling frameworks handle
sub-daily data poorly.
Available models include tbats and prophet, but
they have limitations. 10
Slide 14
Slide 14 text
TBATS model
TBATS
Trigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend (possibly damped)
Seasonal (including multiple and non-integer periods)
Handles non-integer seasonality, multiple seasonal
periods.
Entirely automated
Prediction intervals often too wide
Very slow on long series
No exogenous predictors 11
prophet
Additive regression model developed at Facebook
yt = gt + st + ht + εt
yt = time series.
gt = piecewise linear growth function
st = Fourier seasonal terms: daily, weekly and/or
yearly
ht = holiday effect.
εt = error (can be ARMA errors).
Estimated as a Bayesian regression using Stan
13
Slide 17
Slide 17 text
Daily blog traffic
Daily pageviews for the Hyndsight blog (2014−2015)
Month
Pageviews
500 1000 1500 2000 2500
May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr
14
prophet pros and cons
Pros
Completely automatic including changepoints
Handles multiple seasonality and holiday effects
Cons
Seems to overfit annual seasonality
Number of Fourier terms is hard-coded
16
Slide 20
Slide 20 text
Mitchell O’Hara-Wild
17
Slide 21
Slide 21 text
Watch this space
18
https://github.com/earowang/tsibble
http://pkg.earo.me/sugrrants
https://github.com/mitchelloharawild/fasster
http://pkg.robjhyndman.com/forecast
http://pkg.earo.me/hts
Slides available at robjhyndman.com