Upgrade to Pro — share decks privately, control downloads, hide ads and more …

slide-almanac.pdf

F3a9889311273df8c6f72ed94a91a3fd?s=47 Davis Vaughan
October 08, 2019
280

 slide-almanac.pdf

A tour of two new R packages, {slide} and {almanac}.

{slide} empowers you to perform arbitrary rolling computations, like cumulative functions, rolling averages, and rolling regressions.

{almanac} provides the tools to construct recurrence rules and schedules, and then adjust dates relative to them. This allows you to shift dates by "3 business days".

Together, you can perform computations that are highly relevant in a business setting, such as "a rolling average over the past 20 business days".

F3a9889311273df8c6f72ed94a91a3fd?s=128

Davis Vaughan

October 08, 2019
Tweet

Transcript

  1. Moving Averages and Calendars Davis Vaughan @dvaughan32 Software Engineer, RStudio

    October 2019 
  2. Window Functions Schedules

  3. Window What?

  4. speakerdeck.com/davisvaughan/slide-almanac

  5. speakerdeck.com/davisvaughan/slide-almanac function applied = mean()

  6. speakerdeck.com/davisvaughan/slide-almanac function applied = mean() function applied = sd()

  7. Function applied is arbitrary

  8. Function applied is arbitrary rolling regression = lm()

  9. Types of windows: 1) Sliding 2) Expanding

  10. speakerdeck.com/davisvaughan/slide-almanac Idea from: https://eng.uber.com/forecasting-introduction/

  11. speakerdeck.com/davisvaughan/slide-almanac Moving averages, rolling regressions… Idea from: https://eng.uber.com/forecasting-introduction/

  12. speakerdeck.com/davisvaughan/slide-almanac Idea from: https://eng.uber.com/forecasting-introduction/

  13. speakerdeck.com/davisvaughan/slide-almanac Cumulative sums, expanding window regression… Idea from: https://eng.uber.com/forecasting-introduction/

  14. In R?

  15. speakerdeck.com/davisvaughan/slide-almanac So many attempts: - zoo::rollapply() - tibbletime::rollify() - tsibble::slide()

    / stretch() - data.table::frollapply() (2019-10-03)
  16. {slide}

  17. speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = 2) 3 2 1 [[3]]

    2 1 [[2]] 4 3 2 [[4]] 1 [[1]]
  18. speakerdeck.com/davisvaughan/slide-almanac 3 2 1 [[3]] NULL [[2]] 4 3 2

    [[4]] NULL [[1]] slide(1:4, ~.x, .before = 2, .complete = TRUE) Ignore partial results
  19. speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = 1, .after = 1) 4

    3 2 [[3]] 4 3 [[4]] 3 2 1 [[2]] 2 1 [[1]] Center alignment
  20. speakerdeck.com/davisvaughan/slide-almanac slide(1:4, ~.x, .before = Inf) 3 2 1 [[3]]

    2 1 [[2]] 4 3 2 1 [[4]] 1 [[1]] Cumulative sliding
  21. speakerdeck.com/davisvaughan/slide-almanac slide() slide_dbl() slide_int() ... Type Stability

  22. speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before

    = 2) [1] 2 3 4 4
  23. speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before

    = 2) [1] 2 3 4 4
  24. speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before

    = 2) [1] 2 3 4 4
  25. speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before

    = 2) [1] 2 3 4 4
  26. speakerdeck.com/davisvaughan/slide-almanac sales_vec [1] 2 4 6 2 slide_dbl(sales_vec, mean, .before

    = 2) [1] 2 3 4 4
  27. speakerdeck.com/davisvaughan/slide-almanac index_vec <- as.Date("2019-08-29") + c(0, 1, 5, 6) wday_vec

    <- wday(index_vec, label = TRUE) company <- tibble( sales = sales_vec, index = index_vec, wday = wday_vec ) 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  28. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” 4 2 Wed 2019-09-04 1

    3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  29. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” <dbl> 3 6 4 roll_day

    2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  30. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” <dbl> 3 6 4 roll_day

    2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  31. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” <dbl> 3 6 4 roll_day

    2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  32. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” <dbl> 3 6 4 roll_day

    2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  33. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?” <dbl> 3 6 4 roll_day

    2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  34. speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_row = slide_dbl(sales, mean,

    .before = 2) ) roll_row <dbl> 3 4 4 2 <dbl> 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  35. speakerdeck.com/davisvaughan/slide-almanac # Construct a regular index full_index <- expand( company,

    index = full_seq(index, 1) ) # Join with original data company_full_raw <- left_join( full_index, company ) # Slide over this, then filter back down company_three_day <- company_full_raw %>% mutate( roll_day = slide_dbl( sales, mean, na.rm = TRUE, .before = 2 ) ) %>% filter( index %in% company$index ) Solution? 7 2019-09-04 2 6 6 2019-09-03 2019-09-02 5 NA 4 NA 2019-09-01 1 3 2 2019-08-31 NA 2019-08-30 4 2 2019-08-29 sales index <dbl> <date> <dbl> 3 6 4 roll_day 2 4 2 2019-09-04 1 3 2 2019-09-03 6 2019-08-30 4 2 2019-08-29 sales index <dbl> <date> company_full_raw company_three_day
  36. speakerdeck.com/davisvaughan/slide-almanac # Construct a regular index full_index <- expand( company,

    index = full_seq(index, 1) ) # Join with original data company_full_raw <- left_join( full_index, company ) # Slide over this, then filter back down company_three_day <- company_full_raw %>% mutate( roll_day = slide_dbl( sales, mean, na.rm = TRUE, .before = 2 ) ) %>% filter( index %in% company$index ) Solution? 7 2019-09-04 2 6 6 2019-09-03 2019-09-02 5 NA 4 NA 2019-09-01 1 3 2 2019-08-31 NA 2019-08-30 4 2 2019-08-29 sales index <dbl> <date> <dbl> 3 6 4 roll_day 2 4 2 2019-09-04 1 3 2 2019-09-03 6 2019-08-30 4 2 2019-08-29 sales index <dbl> <date> company_full_raw company_three_day I JUST WANT A 3 DAY AVERAGE
  37. slide(.x, .f, …) slide_index(.x, .i, .f, …)

  38. speakerdeck.com/davisvaughan/slide-almanac slide_index( .x = wday_vec, .i = index_vec, .f =

    ~.x, .before = days(2) ) Tue [[3]] Fri Thu [[2]] Wed Tue [[4]] Thu [[1]] slide( .x = wday_vec, .f = ~.x, .before = 2 ) Tue Fri Thu [[3]] Fri Thu [[2]] Wed Tue Fri [[4]] Thu [[1]]
  39. speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index,

    mean, .before = days(2)) ) <dbl> 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  40. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?”

  41. speakerdeck.com/davisvaughan/slide-almanac “3 day rolling average?”

  42. speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

  43. speakerdeck.com/davisvaughan/slide-almanac roll_day <dbl> 3 4 6 2 <dbl> 3 5

    4 roll_bday 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date> 3 bday = [Fri, Mon, Tue] 3 day = [Sun, Mon, Tue]
  44. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally
  45. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally “knows” about custom holidays and weekends
  46. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally “knows” about custom holidays and weekends “adjusts” dates relative to the calendar
  47. {almanac}

  48. Recurrence rule: A set of conditions that define a recurring

    event, such as a weekend or holiday.
  49. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1)
  50. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1) Base frequency of the event
  51. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1) Base frequency of the event Recurrence conditions
  52. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1) Base frequency of the event Recurrence conditions sch_in(c("2019-09-02", "2019-09-03"), on_labor_day) #> [1] TRUE FALSE
  53. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1) Base frequency of the event Recurrence conditions sch_in(c("2019-09-02", "2019-09-03"), on_labor_day) #> [1] TRUE FALSE sch_seq("2017-01-01", "2019-12-31", on_labor_day) #> [1] "2017-09-04" "2018-09-03" "2019-09-02"
  54. Schedule: A collection of recurrence rules, required dates, and exclusion

    dates.
  55. speakerdeck.com/davisvaughan/slide-almanac on_labor_day <- yearly() %>% recur_on_ymonth(“September”) %>% recur_on_wday(“Monday”, nth =

    1) on_christmas <- yearly() %>% recur_on_ymonth(“December”) %>% recur_on_mday(25) on_weekends <- weekly() %>% recur_on_weekends() on_weekends_or_holidays <- schedule() %>% sch_add_rrule(on_labor_day) %>% sch_add_rrule(on_christmas) %>% sch_add_rrule(on_weekends)
  56. speakerdeck.com/davisvaughan/slide-almanac sch_seq("2019-09-01", "2019-12-31", on_weekends_or_holidays) #> [1] "2019-09-01" "2019-09-02" "2019-09-07" "2019-09-08"

    "2019-09-14" #> [6] "2019-09-15" "2019-09-21" "2019-09-22" "2019-09-28" "2019-09-29" #> ... #> [31] "2019-12-14" "2019-12-15" "2019-12-21" "2019-12-22" "2019-12-25" #> [36] "2019-12-28" "2019-12-29"
  57. speakerdeck.com/davisvaughan/slide-almanac * These will probably move to their own hldy_christmas()

    hldy_easter() hldy_thanksgiving() ... calendar_us_federal() calendar_us_nyse() Prebuilt holidays and calendars
  58. speakerdeck.com/davisvaughan/slide-almanac * These will probably move to their own hldy_christmas()

    hldy_easter() hldy_thanksgiving() ... calendar_us_federal() calendar_us_nyse() Prebuilt holidays and calendars Particularly challenging!
  59. speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the

    next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"
  60. speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the

    next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"
  61. speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the

    next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30"
  62. speakerdeck.com/davisvaughan/slide-almanac # A Monday labor_day <- “2019-09-02" # Find the

    next business day? # - Sees labor day, adjust by 1 day # - Lands on 2019-09-03, done! sch_adjust(labor_day, on_weekends_or_holidays) #> [1] "2019-09-03" # - Sees labor day, adjust by -1 day # - Lands on 2019-09-01, a Sunday, adjust by -1 day # - Lands on 2019-08-31, a Saturday, adjust by -1 day # - Lands on 2019-08-30, done! sch_adjust(labor_day, on_weekends_or_holidays, -days(1)) #> [1] “2019-08-30" This can also be a function
  63. Modified following: Choose the first business day after x, unless

    it falls in a different month, in which case the first business day before x is chosen instead.
  64. speakerdeck.com/davisvaughan/slide-almanac on_15th_and_last <- monthly() %>% recur_on_mday(c(15, -1)) payments <- tibble(

    dates = sch_seq("2019-09-01", "2019-12-31", on_15th_and_last), wday = wday(dates, label = TRUE) ) 2019-12-31 2019-12-31 Tue 8 Tue 2019-12-15 7 Sun Mon 2019-12-16 6 2019-11-30 Sat Fri 2019-11-29 2019-11-15 Fri 2019-11-15 Fri 5 adj_wday <ord> Mon Thu Tue Mon <date> 2019-09-30 2019-10-15 2019-10-31 adj_dates 2019-09-16 4 Thu 2019-10-31 1 3 2 2019-10-15 Tue 2019-09-30 Mon Sun 2019-09-15 wday dates <ord> <date>
  65. speakerdeck.com/davisvaughan/slide-almanac 2019-12-31 2019-12-31 Tue 8 Tue 2019-12-15 7 Sun Mon

    2019-12-16 6 2019-11-30 Sat Fri 2019-11-29 2019-11-15 Fri 2019-11-15 Fri 5 adj_wday <ord> Mon Thu Tue Mon <date> 2019-09-30 2019-10-15 2019-10-31 adj_dates 2019-09-16 4 Thu 2019-10-31 1 3 2 2019-10-15 Tue 2019-09-30 Mon Sun 2019-09-15 wday dates <ord> <date> on_weekends <- weekly() %>% recur_on_weekends() payments %>% mutate( adj_dates = sch_adjust(dates, on_weekends, adj_modified_following), adj_wday = wday(adj_dates, label = TRUE) )
  66. speakerdeck.com/davisvaughan/slide-almanac friday_before_labor_day <- “2019-08-30" # Move forward two business days?

    # - Steps forward 1 day to Saturday 2019-08-31 # - Call sch_adjust(), adjusts to Tuesday 2019-09-03 # - Steps forward 1 day to Wednesday 2019-09-04 # - Call sch_adjust(), no adjustment needed sch_step( friday_before_labor_day, n = 2, schedule = on_weekends_or_holidays ) #> [1] “2019-09-04"
  67. speakerdeck.com/davisvaughan/slide-almanac friday_before_labor_day <- “2019-08-30" # Move forward two business days?

    # - Steps forward 1 day to Saturday 2019-08-31 # - Call sch_adjust(), adjusts to Tuesday 2019-09-03 # - Steps forward 1 day to Wednesday 2019-09-04 # - Call sch_adjust(), no adjustment needed sch_step( friday_before_labor_day, n = 2, schedule = on_weekends_or_holidays ) #> [1] “2019-09-04"
  68. {slide} + {almanac}

  69. speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

  70. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally
  71. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally We can make this with {almanac}
  72. speakerdeck.com/davisvaughan/slide-almanac calendar <- (weekends + holidays) company <- company %>%

    mutate( roll_day = slide_index_dbl(sales, index, mean, .before = days(2)) ) company <- company %>% mutate( roll_bday = slide_index_dbl(sales, index, mean, .before = bdays(2, calendar)) ) Ideally We can make this with {almanac} This doesn’t exist yet, but would use sch_step()
  73. slide_index(.x, .i, .f, …) slide_between(.x, .i, .starts, .stops, .f, …)

  74. speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index,

    mean, .before = days(2)) ) company <- company %>% mutate( starts = index - days(2), stops = index, roll_day = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) <dbl> 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  75. speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( roll_day = slide_index_dbl(sales, index,

    mean, .before = days(2)) ) company <- company %>% mutate( starts = index - days(2), stops = index, roll_day = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) <dbl> 3 6 4 roll_day 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date> This is where we solve our problem
  76. speakerdeck.com/davisvaughan/slide-almanac company <- company %>% mutate( starts = sch_step(index, n

    = -2, schedule = on_weekends), stops = index, roll_bday = slide_between_dbl(sales, index, mean, .starts = starts, .stops = stops) ) roll_day <dbl> 3 4 6 2 <dbl> 3 5 4 roll_bday 2 4 2 Wed 2019-09-04 1 3 2 2019-09-03 Tue 6 2019-08-30 Fri 4 2 Thu 2019-08-29 sales wday index <ord> <dbl> <date>
  77. speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

  78. speakerdeck.com/davisvaughan/slide-almanac “3 business day rolling average?”

  79. In conclusion…

  80. {slide} for window functions slide_index() to roll relative to an

    index
  81. {slide} for window functions slide_index() to roll relative to an

    index {almanac} to build schedules and adjust dates
  82. {slide} for window functions slide_index() to roll relative to an

    index {slide} + {almanac} = Flexible rolling computations! {almanac} to build schedules and adjust dates
  83. Special Thanks JavaScript: rrule https://github.com/jakubroztocil/rrule James Laird-Smith: gs https://github.com/jameslairdsmith/gs Jeroen

    Ooms: V8 https://github.com/jeroen/V8
  84. Questions? {almanac} GitHub https://github.com/DavisVaughan/almanac Website https://davisvaughan.github.io/almanac {slide} GitHub https://github.com/DavisVaughan/slide Website

    https://davisvaughan.github.io/slide