Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Functions: using, writing, and non-standard eva...

Daniel Chen
December 08, 2018

Functions: using, writing, and non-standard evaluation!!

Daniel Chen

December 08, 2018
Tweet

More Decks by Daniel Chen

Other Decks in Programming

Transcript

  1. Research Scientist @ UVA PhD Student @ VT Instructor @

    DataCamp @ The Carpentries Author: Pandas for Everyone Hi! 2 / 47
  2. Functions They look something like this. my_function <- function(param1, param2,

    ...) { # function body } my_add <- function(x, y = 1) { return(x + y) } my_add(10) ## [1] 11 my_add(10, 20) ## [1] 30 3 / 47
  3. Functions are just like other R objects my_add(10, 20) ##

    [1] 30 another_add <- my_add another_add(10, 20) ## [1] 30 4 / 47
  4. Anonymous functions Functions that are not assigned to a name

    # note the round brackets around the function definition (function(x, y = 1) {x + y})(10, 20) ## [1] 30 5 / 47
  5. Scoping Things that happen in a function stay in the

    function add_1 <- function(x) { y <- 1 x + y } x <- 1 add_1(x) ## [1] 2 add_1(10) ## [1] 11 y <- 10 add_1(100) # returns 101 not 110 ## [1] 101 6 / 47
  6. Scoping search order Things inside the function first my_func1 <-

    function() { x <- 10 y <- 20 return(x + y) } my_func1() ## [1] 30 7 / 47
  7. Scoping search order Then enclosing environment (e.g., global) x <-

    100 my_func_2 <- function() { y <- 200 return(x + y) } my_func_2() ## [1] 300 rm(x) my_func_2() ## Error in my_func_2(): object 'x' not found 8 / 47
  8. When you nd yourself repeating yourself... ... more than 3

    times, Write a function! library(urltools) library(magrittr) repos <- c( 'https://sourceforge.net/projects/fortranxunit/', 'https://ti.arc.nasa.gov/opensource/projects/pour/', 'https://github.com/nasa/GSAP') suffix_extract(domain(repos[1]))$domain ## [1] "sourceforge" suffix_extract(domain(repos[2]))$domain ## [1] "nasa" suffix_extract(domain(repos[3]))$domain ## [1] "github" 9 / 47
  9. Write a function get_domain <- function(url) { suffix_extract(domain(url))$domain } get_domain('https://sourceforge.net/projects/fortranxunit/')

    ## [1] "sourceforge" get_domain('https://ti.arc.nasa.gov/opensource/projects/pour/') ## [1] "nasa" get_domain('https://github.com/nasa/GSAP') ## [1] "github" 10 / 47
  10. But we still have repetative code! for (r in seq_along(repos))

    { print(get_domain(repos[[r]])) } ## [1] "sourceforge" ## [1] "nasa" ## [1] "github" v <- c() for (i in seq_along(repos)) { v[[i]] <- get_domain(repos[[i]]) } v ## [1] "sourceforge" "nasa" "github" 11 / 47
  11. seq_along is a bit more robust v <- c() for

    (i in seq_along(repos)) { v[[i]] <- get_domain(repos[[i]]) } v ## [1] "sourceforge" "nasa" "github" v <- c() repos <- c() for (i in 1:length(repos)) { v[[i]] <- get_domain(repos[[r]]) } ## Error in get_component_(x, 1): Not compatible with STRSXP: [type=NULL]. 12 / 47
  12. Loops without the for similar to the 'apply' family of

    functions repos <- c( 'https://sourceforge.net/projects/fortranxunit/', 'https://ti.arc.nasa.gov/opensource/projects/pour/', 'https://github.com/nasa/GSAP') sapply(repos, get_domain) ## https://sourceforge.net/projects/fortranxunit/ ## "sourceforge" ## https://ti.arc.nasa.gov/opensource/projects/pour/ ## "nasa" ## https://github.com/nasa/GSAP ## "github" 13 / 47
  13. purrr library(purrr) map(.x = repos, .f = get_domain) ## [[1]]

    ## [1] "sourceforge" ## ## [[2]] ## [1] "nasa" ## ## [[3]] ## [1] "github" 14 / 47
  14. purrr has functions that let you pick your output type

    Using map_chr we get a character vector, instead of a list. map_chr(.x = repos, .f = get_domain) ## [1] "sourceforge" "nasa" "github" 15 / 47
  15. Anonymous purrr library(stringr) map_chr(.x = repos, .f = function(x) get_domain(str_to_lower(x)))

    ## [1] "sourceforge" "nasa" "github" is exactly the same as map_chr(.x = repos, .f = ~ get_domain(str_to_lower(.))) ## [1] "sourceforge" "nasa" "github" 16 / 47
  16. map2 v1 <- c(1, 2, 3) v2 <- c(10, 20,

    30) my_add_2 <- function(x, y) { x + y } mapply(my_add_2, x = v1, y = v2) ## [1] 11 22 33 map2_dbl(v1, v2, my_add_2) ## [1] 11 22 33 17 / 47
  17. pmap Instead of have map3, map4, etc... we have pmap!

    v1 <- c(1, 2, 3) v2 <- c(10, 20, 30) v3 <- c(100, 200, 300) v4 <- c(1000, 2000, 3000) my_add_4 <- function(x, y, z, a) { x + y + z + a } pmap_dbl(list(x = v1, y = v2, z = v3, a = v4), my_add_4) ## [1] 1111 2222 3333 18 / 47
  18. Another thing about map If you want to map values

    that is not the first argument of the function, use anonymous functions. v1 <- c(1, 2, 3) my_add <- function(x = 1, y = 2) { x + y } map_dbl(v1, ~ my_add(x = 1, y = .)) ## [1] 2 3 4 19 / 47
  19. Iterate through everything purrr::safely purrr::possibly purrr::quietly v <- list(1, 'oops',

    5) map(v, log) ## Error in log(x = x, base = base): non-numeric argument to mathematical fun results <- map(v, safely(log)) 20 / 47
  20. Look at results results ## [[1]] ## [[1]]$result ## [1]

    0 ## ## [[1]]$error ## NULL ## ## ## [[2]] ## [[2]]$result ## NULL ## ## [[2]]$error ## <simpleError in log(x = x, base = base): non-numeric argument to mathemati ## ## ## [[3]] ## [[3]]$result ## [1] 1.609438 ## 21 / 47
  21. Transpose purrr::transpose(results) ## $result ## $result[[1]] ## [1] 0 ##

    ## $result[[2]] ## NULL ## ## $result[[3]] ## [1] 1.609438 ## ## ## $error ## $error[[1]] ## NULL ## ## $error[[2]] ## <simpleError in log(x = x, base = base): non-numeric argument to mathemati ## ## $error[[3]] ## NULL 23 / 47
  22. Transpose and get all results purrr::transpose(results)$result ## [[1]] ## [1]

    0 ## ## [[2]] ## NULL ## ## [[3]] ## [1] 1.609438 purrr::transpose(results)$error ## [[1]] ## NULL ## ## [[2]] ## <simpleError in log(x = x, base = base): non-numeric argument to mathemati ## ## [[3]] ## NULL 24 / 47
  23. Non-Standard Evaluation (NSE) R gives you access to the code

    used in a function, not just the value itself. REPL/interactive mode Saves typing 25 / 47
  24. How does it know what the labels are? library(ggplot2) ggplot(data

    = cars, aes(x = speed, y = dist)) + geom_point() 26 / 47
  25. It knows the expression we put in library(ggplot2) ggplot(data =

    cars, aes(x = log(speed), y = log(dist))) + geom_point() 27 / 47
  26. Capture expression: substitute instead of the value, it returns the

    code f <- function(x){ base::substitute(x) } f(1:10) ## 1:10 x <- 1 f(x) ## x x <- 1 y <- 3 f(x + log(y)) ## x + log(y) 28 / 47
  27. Character vector of expression: deparse base::deparse({ f <- function(x) {

    substitute(x) } }) ## [1] "function (x) " "{" " substitute(x)" ## [4] "}" 29 / 47
  28. Deparse returns a string we get y because the code

    is y f <- function(x) substitute(x) g <- function(y) deparse(f(y)) g(1:10) ## [1] "y" g(x) ## [1] "y" g(x + y ^ 2 / z + exp(a * sin(b))) ## [1] "y" 30 / 47
  29. Subset sample_df <- data.frame(a = 1:5, b = 5:1, c

    = c(5, 3, 1, 4, 1)) sample_df ## a b c ## 1 1 5 5 ## 2 2 4 3 ## 3 3 3 1 ## 4 4 2 4 ## 5 5 1 1 subset(sample_df, a >= 4) ## a b c ## 4 4 2 4 ## 5 5 1 1 31 / 47
  30. Write our own subset my_subset <- function(x, condition) { condition_call

    <- substitute(condition) print(condition_call) r <- eval(expr = condition_call, envir = x) print(r) x[r, ] } my_subset(sample_df, a >= 4) ## a >= 4 ## [1] FALSE FALSE FALSE TRUE TRUE ## a b c ## 4 4 2 4 ## 5 5 1 1 32 / 47
  31. Why is this a good thing? Let's you separate the

    code form the back end (dplyr, dbplyr, dtplyr) Becareful, NSE makes functions not referentially transparent, meaning if you replace the value to a function with an equlivilant object, you get a different result 33 / 47
  32. quote and expr Give you back what you gave it

    # quote always returns input as is f <- function(x) base::quote(x) f(a + b) ## x library(rlang) f <- function(x) rlang::expr(x) f(a + b) ## x 35 / 47
  33. substitue and enexpr What the user passed f <- function(x)

    base::substitute(x) f(a + b) ## a + b f <- function(x) rlang::enexpr(x) f(a + b) ## a + b 36 / 47
  34. Unquoting x <- rlang::expr(a + b) x ## a +

    b bang bang!! rlang::expr(!!x + z) ## a + b + z 37 / 47
  35. Make it a function tl;dr: use !! my_scatterplot <- function(df,

    x, y) { xvar <- rlang::enexpr(x) yvar <- rlang::enexpr(y) ggplot(data = df, aes(x = !!xvar, y = !!yvar)) + geom_point() } 41 / 47
  36. What about the environment? Remember how functions work from the

    beginning of the talk? # I just took Hadley's example my_mutate <- function(df, var) { n <- 10 var <- rlang::enexpr(var) dplyr::mutate(df, !!var) } df <- tibble::tibble(x = 1) n <- 100 my_mutate(df, x + n) ## # A tibble: 1 x 2 ## x `x + n` ## <dbl> <dbl> ## 1 1 11 43 / 47
  37. enquo instead of enexpr # I just took Hadley's example

    my_mutate <- function(df, var) { n <- 10 var <- rlang::enquo(var) dplyr::mutate(df, !!var) } df <- tibble::tibble(x = 1) n <- 100 my_mutate(df, x + n) ## # A tibble: 1 x 2 ## x `x + n` ## <dbl> <dbl> ## 1 1 101 45 / 47
  38. tl;dr: use !! my_select <- function(df, x) { x <-

    rlang::enquo(x) df %>% dplyr::select(!!x) %>% ggplot(aes(x = !!x)) + geom_histogram(bins = 20) } mtcars %>% my_select(mpg) 46 / 47