Slide 1

Slide 1 text

Moving Averages and Calendars Davis Vaughan @dvaughan32 Software Engineer, RStudio October 2019 

Slide 2

Slide 2 text

Window Functions Schedules

Slide 3

Slide 3 text

Window What?

Slide 4

Slide 4 text

speakerdeck.com/davisvaughan/slide-almanac

Slide 5

Slide 5 text

speakerdeck.com/davisvaughan/slide-almanac function applied = mean()

Slide 6

Slide 6 text

speakerdeck.com/davisvaughan/slide-almanac function applied = mean() function applied = sd()

Slide 7

Slide 7 text

Function applied is arbitrary

Slide 8

Slide 8 text

Function applied is arbitrary rolling regression = lm()

Slide 9

Slide 9 text

Types of windows: 1) Sliding 2) Expanding

Slide 10

Slide 10 text

speakerdeck.com/davisvaughan/slide-almanac Idea from: https://eng.uber.com/forecasting-introduction/

Slide 11

Slide 11 text

speakerdeck.com/davisvaughan/slide-almanac Moving averages, rolling regressions… Idea from: https://eng.uber.com/forecasting-introduction/

Slide 12

Slide 12 text

speakerdeck.com/davisvaughan/slide-almanac Idea from: https://eng.uber.com/forecasting-introduction/

Slide 13

Slide 13 text

speakerdeck.com/davisvaughan/slide-almanac Cumulative sums, expanding window regression… Idea from: https://eng.uber.com/forecasting-introduction/

Slide 14

Slide 14 text

In R?

Slide 15

Slide 15 text

speakerdeck.com/davisvaughan/slide-almanac So many attempts: - zoo::rollapply() - tibbletime::rollify() - tsibble::slide() / stretch() - data.table::frollapply() (2019-10-03)

Slide 16

Slide 16 text

{slide}

Slide 17

Slide 17 text

speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = 2) 3 2 1 [[3]] 2 1 [[2]] 4 3 2 [[4]] 1 [[1]]

Slide 18

Slide 18 text

speakerdeck.com/davisvaughan/slide-almanac 3 2 1 [[3]] NULL [[2]] 4 3 2 [[4]] NULL [[1]] slide(1:4, ~.x, .before = 2, .complete = TRUE) Ignore partial results

Slide 19

Slide 19 text

speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = 1, .after = 1) 4 3 2 [[3]] 4 3 [[4]] 3 2 1 [[2]] 2 1 [[1]] Center alignment

Slide 20

Slide 20 text

speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = Inf) 3 2 1 [[3]] 2 1 [[2]] 4 3 2 1 [[4]] 1 [[1]] Cumulative sliding

Slide 21

Slide 21 text

speakerdeck.com/davisvaughan/slide-almanac slide() slide_dbl() slide_int() ... Type Stability

Slide 22

Slide 22 text

speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before = 2) [1] 2 3 4 4

Slide 23

Slide 23 text

speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before = 2) [1] 2 3 4 4

Slide 24

Slide 24 text

speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before = 2) [1] 2 3 4 4

Slide 25

Slide 25 text

speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before = 2) [1] 2 3 4 4

Slide 26

Slide 26 text

speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before = 2) [1] 2 3 4 4

Slide 27

Slide 27 text

speakerdeck.com/davisvaughan/slide-almanac index_vec <- as.Date("2019-08-29") + c(0, 1, 5, 6) wday_vec <- wday(index_vec, label = TRUE) company <- tibble( sales = sales_vec, index = index_vec, wday = wday_vec ) 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 28

Slide 28 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 29

Slide 29 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 30

Slide 30 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 31

Slide 31 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 32

Slide 32 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 33

Slide 33 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 34

Slide 34 text

speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_row = slide_dbl(sales, mean, .before = 2) ) roll_row 3 4 4 2 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 35

Slide 35 text

speakerdeck.com/davisvaughan/slide-almanac # Construct a regular index full_index <- expand( company, index = full_seq(index, 1) ) # Join with original data company_full_raw <- left_join( full_index, company ) # Slide over this, then filter back down company_three_day <- company_full_raw %>% mutate( roll_day = slide_dbl( sales, mean, na.rm = TRUE, .before = 2 ) ) %>% filter( index %in% company$index ) Solution? 7 2019-09-04 2 6 6 2019-09-03 2019-09-02 5 NA 4 NA 2019-09-01 1 3 2 2019-08-31 NA 2019-08-30 4 2 2019-08-29 sales index 3 6 4 roll_day 2 4 2 2019-09-04 1 3 2 2019-09-03 6 2019-08-30 4 2 2019-08-29 sales index company_full_raw company_three_day

Slide 36

Slide 36 text

speakerdeck.com/davisvaughan/slide-almanac # Construct a regular index full_index <- expand( company, index = full_seq(index, 1) ) # Join with original data company_full_raw <- left_join( full_index, company ) # Slide over this, then filter back down company_three_day <- company_full_raw %>% mutate( roll_day = slide_dbl( sales, mean, na.rm = TRUE, .before = 2 ) ) %>% filter( index %in% company$index ) Solution? 7 2019-09-04 2 6 6 2019-09-03 2019-09-02 5 NA 4 NA 2019-09-01 1 3 2 2019-08-31 NA 2019-08-30 4 2 2019-08-29 sales index 3 6 4 roll_day 2 4 2 2019-09-04 1 3 2 2019-09-03 6 2019-08-30 4 2 2019-08-29 sales index company_full_raw company_three_day I JUST WANT A 3 DAY AVERAGE

Slide 37

Slide 37 text

slide(.x, .f, …) slide_index(.x, .i, .f, …)

Slide 38

Slide 38 text

speakerdeck.com/davisvaughan/slide-almanac slide_index( .x = wday_vec, .i = index_vec, .f = ~.x, .before = days(2) ) Tue [[3]] Fri Thu [[2]] Wed Tue [[4]] Thu [[1]] slide( .x = wday_vec, .f = ~.x, .before = 2 ) Tue Fri Thu [[3]] Fri Thu [[2]] Wed Tue Fri [[4]] Thu [[1]]

Slide 39

Slide 39 text

speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 40

Slide 40 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?”

Slide 41

Slide 41 text

speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?”

Slide 42

Slide 42 text

speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

Slide 43

Slide 43 text

speakerdeck.com/davisvaughan/slide-almanac roll_day 3 4 6 2 3 5 4 roll_bday 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index 3 bday = [Fri, Mon, Tue] 3 day = [Sun, Mon, Tue]

Slide 44

Slide 44 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally

Slide 45

Slide 45 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally “knows” about custom holidays and weekends

Slide 46

Slide 46 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally “knows” about custom holidays and weekends “adjusts” dates relative to the calendar

Slide 47

Slide 47 text

{almanac}

Slide 48

Slide 48 text

Recurrence rule: A set of conditions that define a recurring event, such as a weekend or holiday.

Slide 49

Slide 49 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1)

Slide 50

Slide 50 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1) Base frequency of the event

Slide 51

Slide 51 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1) Base frequency of the event Recurrence conditions

Slide 52

Slide 52 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1) Base frequency of the event Recurrence conditions sch_in(c("2019-09-02", "2019-09-03"), on_labor_day) #> [1] TRUE FALSE

Slide 53

Slide 53 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1) Base frequency of the event Recurrence conditions sch_in(c("2019-09-02", "2019-09-03"), on_labor_day) #> [1] TRUE FALSE sch_seq("2017-01-01", "2019-12-31", on_labor_day) #> [1] "2017-09-04" "2018-09-03" "2019-09-02"

Slide 54

Slide 54 text

Schedule: A collection of recurrence rules, required dates, and exclusion dates.

Slide 55

Slide 55 text

speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth = 1) on_christmas <- yearly() %>% recur_on_ymonth(“December”) %>% recur_on_mday(25) on_weekends <- weekly() %>% recur_on_weekends() on_weekends_or_holidays <- schedule() %>% sch_add_rrule(on_labor_day) %>% sch_add_rrule(on_christmas) %>% sch_add_rrule(on_weekends)

Slide 56

Slide 56 text

speakerdeck.com/davisvaughan/slide-almanac sch_seq("2019-09-01", "2019-12-31", on_weekends_or_holidays) #> [1] "2019-09-01" "2019-09-02" "2019-09-07" "2019-09-08" "2019-09-14" #> [6] "2019-09-15" "2019-09-21" "2019-09-22" "2019-09-28" "2019-09-29" #> ... #> [31] "2019-12-14" "2019-12-15" "2019-12-21" "2019-12-22" "2019-12-25" #> [36] "2019-12-28" "2019-12-29"

Slide 57

Slide 57 text

speakerdeck.com/davisvaughan/slide-almanac * These will probably move to their own hldy_christmas() hldy_easter() hldy_thanksgiving() ... calendar_us_federal() calendar_us_nyse() Prebuilt holidays and calendars

Slide 58

Slide 58 text

speakerdeck.com/davisvaughan/slide-almanac * These will probably move to their own hldy_christmas() hldy_easter() hldy_thanksgiving() ... calendar_us_federal() calendar_us_nyse() Prebuilt holidays and calendars Particularly challenging!

Slide 59

Slide 59 text

speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"

Slide 60

Slide 60 text

speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"

Slide 61

Slide 61 text

speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"

Slide 62

Slide 62 text

speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30" This can also be a function

Slide 63

Slide 63 text

Modified following: Choose the first business day after x, unless it falls in a different month, in which case the first business day before x is chosen instead.

Slide 64

Slide 64 text

speakerdeck.com/davisvaughan/slide-almanac on_15th_and_last <- monthly() %>% recur_on_mday(c(15, -1)) payments <- tibble( dates = sch_seq("2019-09-01", "2019-12-31", on_15th_and_last), wday = wday(dates, label = TRUE) ) 2019-12-31 2019-12-31 Tue 8 Tue 2019-12-15 7 Sun Mon 2019-12-16 6 2019-11-30 Sat Fri 2019-11-29 2019-11-15 Fri 2019-11-15 Fri 5 adj_wday Mon Thu Tue Mon 2019-09-30 2019-10-15 2019-10-31 adj_dates 2019-09-16 4 Thu 2019-10-31 1 3 2 2019-10-15 Tue 2019-09-30 Mon Sun 2019-09-15 wday dates

Slide 65

Slide 65 text

speakerdeck.com/davisvaughan/slide-almanac 2019-12-31 2019-12-31 Tue 8 Tue 2019-12-15 7 Sun Mon 2019-12-16 6 2019-11-30 Sat Fri 2019-11-29 2019-11-15 Fri 2019-11-15 Fri 5 adj_wday Mon Thu Tue Mon 2019-09-30 2019-10-15 2019-10-31 adj_dates 2019-09-16 4 Thu 2019-10-31 1 3 2 2019-10-15 Tue 2019-09-30 Mon Sun 2019-09-15 wday dates on_weekends <- weekly() %>% recur_on_weekends() payments %>% mutate( adj_dates = sch_adjust(dates, on_weekends, adj_modified_following), adj_wday = wday(adj_dates, label = TRUE) )

Slide 66

Slide 66 text

speakerdeck.com/davisvaughan/slide-almanac friday_before_labor_day <- “2019-08-30" # Move forward two business days? # - Steps forward 1 day to Saturday 2019-08-31 # - Call sch_adjust(), adjusts to Tuesday 2019-09-03 # - Steps forward 1 day to Wednesday 2019-09-04 # - Call sch_adjust(), no adjustment needed sch_step( friday_before_labor_day, n = 2, schedule = on_weekends_or_holidays ) #> [1] “2019-09-04"

Slide 67

Slide 67 text

speakerdeck.com/davisvaughan/slide-almanac friday_before_labor_day <- “2019-08-30" # Move forward two business days? # - Steps forward 1 day to Saturday 2019-08-31 # - Call sch_adjust(), adjusts to Tuesday 2019-09-03 # - Steps forward 1 day to Wednesday 2019-09-04 # - Call sch_adjust(), no adjustment needed sch_step( friday_before_labor_day, n = 2, schedule = on_weekends_or_holidays ) #> [1] “2019-09-04"

Slide 68

Slide 68 text

{slide} + {almanac}

Slide 69

Slide 69 text

speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

Slide 70

Slide 70 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally

Slide 71

Slide 71 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally We can make this with {almanac}

Slide 72

Slide 72 text

speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally We can make this with {almanac} This doesn’t exist yet, but would use sch_step()

Slide 73

Slide 73 text

slide_index(.x, .i, .f, …) slide_between(.x, .i, .starts, .stops, .f, …)

Slide 74

Slide 74 text

speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( starts = index - days(2), stops = index, roll_day = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 75

Slide 75 text

speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( starts = index - days(2), stops = index, roll_day = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index This is where we solve our problem

Slide 76

Slide 76 text

speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( starts = sch_step(index, n = -2, schedule = on_weekends), stops = index, roll_bday = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) roll_day 3 4 6 2 3 5 4 roll_bday 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index

Slide 77

Slide 77 text

speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

Slide 78

Slide 78 text

speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

Slide 79

Slide 79 text

In conclusion…

Slide 80

Slide 80 text

{slide} for window functions slide_index() to roll relative to an index

Slide 81

Slide 81 text

{slide} for window functions slide_index() to roll relative to an index {almanac} to build schedules and adjust dates

Slide 82

Slide 82 text

{slide} for window functions slide_index() to roll relative to an index {slide} + {almanac} = Flexible rolling computations! {almanac} to build schedules and adjust dates

Slide 83

Slide 83 text

Special Thanks JavaScript: rrule https://github.com/jakubroztocil/rrule James Laird-Smith: gs https://github.com/jameslairdsmith/gs Jeroen Ooms: V8 https://github.com/jeroen/V8

Slide 84

Slide 84 text

Questions? {almanac} GitHub https://github.com/DavisVaughan/almanac Website https://davisvaughan.github.io/almanac {slide} GitHub https://github.com/DavisVaughan/slide Website https://davisvaughan.github.io/slide