Pro Yearly is on sale from $80 to $50! »

そろそろ手を出すpurrr / nekosky

D12a80cab206033a820ccff8319f957b?s=47 Uryu Shinya
December 16, 2017

そろそろ手を出すpurrr / nekosky

効率的なRプログラミングを可能にするpurrrパッケージの基本的な使い方とdplyr, ggplot2等のtidyverseに含まれるパッケージとの合わせ技を覚えましょう 😼 Tokyo.R#66での発表資料です https://atnd.org/events/92993

D12a80cab206033a820ccff8319f957b?s=128

Uryu Shinya

December 16, 2017
Tweet

Transcript

  1. ͦΖͦΖ खΛग़͢purrr 20171216 Tokyo.R#66@Ropponngi Shinya Uryu (HOXO-M INC.) @u_ribo @uribo

    Purrr that A functional programming toolkit for R
  2. YOUR TURN Q1.ߏ଄͕ಉ͡ෳ਺ͷDTWϑΝΠϧ͕ ಉҰͷϑΥϧμʹอଘ͞Ε͍ͯ·͢ɻ ͜ΕΒΛҰͭͷσʔλϑϨʔϜͱͯ͠ ಡΈࠐΈ͍ͨ࣌ɺ ͲͷΑ͏ʹॲཧ͠·͔͢

  3. YOUR TURN Q2. ࣮਺ྻ͔ΒͳΔྻͷσʔλϑ ϨʔϜ͔Βɺ֤ྻͷฏۉ஋ΛٻΊ͍ͨɻ ͲͷΑ͏ͳखஈΛར༻͠·͔͢

  4. ޷͖ʹ͍ͤ GPS ʹΑΔϧʔϓ EQMZSTVNNBSJTF BQQMZ଒ͷؔ਺ ͨͩ͠ίϐϖ ͯΊʔ͸μϝͩ˞

  5. ίϯϐϡʔλ͕ಘҙͳ͜ͱ ൓෮ॲཧ ŵŜŤũ ΫΝXͤ ;͜͡ ESGUHZ ♂

  6. 3ͰΑ͋͘Δ൓෮ॲཧ ୀ۶ͳ͜ͱ͸3ʹ೚ͤΑ͏Ŏ wBQQMZ଒ GPSؔ਺ wάϧʔϓॲཧ

  7. QVSSS൓෮ॲཧʹର͢ΔࡦͷҰͭ TJODF 'JSTU3FMFBTF ΀ΔΔΔ ΀ΓΌʔ ͝རӹ ؔ਺΍Ҿ਺ͷద༻ʹҰ؏໊ͨ͠শ ίʔυͷهड़Λ؆ུԽ͢Δ  কདྷతͳ࣮૷

    ฒྻԽͱਐߦ۩߹ͷՄࢹԽ
  8. ҃Δਓͷܦݧ ޮೳʹ͸ ݸਓ͕ࠩ ͋Γ·͢

  9. ҆৺͍ͯͩ͘͠͞ QVSSSΉ͍͔ͣΒҰ౓Ͱཧղ͠ͳ͓ͯ͘L εϐʔυతʹGPS΋ѱ͘ͳ͍Αʢࠓ͸ʣ QVSSSΛ࢖͏ͷ͸ίʔυͷอकੑΛߴΊΔͨΊ ͬͯ)BEMFZ͕ݴͬͯͨϤ 3GPS%BUB4DJFODF

  10. map()

  11. NBQϕΫτϧͷ֤ཁૉ΁ͷؔ਺ͷద༻ # จࣈ਺Λ਺͑Δॲཧ—————————————————————— x <- c("kazutan", "hoxom", "uribo") nchar(x[1]) #

    7 nchar(x[2]) # 5 nchar(x[3]) # 5 sapply(x, nchar) # ؔ਺ΛҾ਺ʹͱΔߴ֊ؔ਺ # kazutan hoxom uribo # 7 5 5
  12. library(purrr) map(x, nchar) # [[1]] # [1] 7 # [[2]]

    # [1] 5 # [[3]] # [1] 5 map(.x, .f, ) ؔ਺ ... Ϧετ ϕΫτϧ σʔλϑϨʔϜ දݱࣜ ର৅ ॲཧ σʔλϑϨʔϜͰ͸ྻʢϕΫτϧʣ͕ର৅ NBQϕΫτϧͷ֤ཁૉ΁ͷؔ਺ͷద༻
  13. base::Map͡Όμϝͳͷ? Map(nchar, x) # $kazutan # [1] 7 # $hoxom

    # [1] 5 # $uribo # [1] 5 %>% ʜ·͋ྑ͍͚Ͳ ೴తʹ͸ ୈҰҾ਺ʹ͸ૢ࡞ͷର৅ /( ॊೈͳॲཧʹෆ޲͖
  14. map_*()

  15. NBQ৭ʑ  ฦΓ஋Λ೚ҙͷܕͷϕΫτϧʹ FUNCTION RETURN map_lgl() ࿦ཧܕ map_int() ੔਺ܕ map_dbl()

    ࣮਺ܕ map_chr() จࣈྻܕ ฦΓ஋ͷσʔλܕʹԠͯ͡ద༻͢Δؔ਺Λมߋ
  16. x %>% map_int(nchar) # [1] 7 5 5 x %>%

    map_int(nchar) %>% sum() # [1] 17 x %>% map_chr(nchar) [1] "7" "5" "5" NBQ৭ʑ  ฦΓ஋Λ೚ҙͷܕͷϕΫτϧʹ
  17. CAUTIONDŽ x %>% map_lgl(nchar) # Error: Can't coerce element 1

    from a integer to a logical σʔλܕͷม׵نଇʹ஫ҙ ࿦ཧܕɺ੔਺ܕɺ ഒਫ਼౓খ਺఺ܕɺจࣈྻͷॱʹॊೈੑ͕ߴ͍
  18. NBQ৭ʑ  Ґஔ΍৚݅ʹΑΔద༻ x %>% map_at(.at = 2, nchar) #

    [[1]] #[1] "kazutan" # [[2]] [1] 5 # [[3]] [1] "uribo" x %>% map_if( .p = . == "kazutan", nchar) # [[1]] # [1] 7 # [[2]] # [1] "hoxom" # [[3]] # [1] "uribo" .p͸predicateΛҙຯ͢Δ
  19. ͳΜ͔ग़͖ͯͨ x %>% map_if(.p = == "kazutan", nchar) NBHSJUUSͷͱҰॹμωʢ໌ࣔతʹΦϒδΣΫτΛ༩͑Δʣ x[1]:

    TRUE "kazutan" == "kazutan" x[2]: FALSE "hoxom" == "kazutan" x[3]: FALSE "uribo" == "kazutan" ͸ཁૉͷ୅໊ࢺͱͯ͠ػೳ .
  20. QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ # 4QFDJFT͝ͱʹMN ͕࣮ߦ͞ΕΔ iris %>% split(.$Species) %>% map(function(df) {

    lm(Petal.Width ~ Sepal.Length, data = df) })
  21. # ؆ུԽͨ͠هड़ iris %>% split(.$Species) %>% map(~ lm(Petal.Width ~ Sepal.Length,

    data = .)) }) ୈҰҾ਺ͷ ΦϒδΣΫτ͕ ౉͞ΕΔ ແ໊ؔ਺Λఆٛ JSJTͷ4QFDJFT͝ͱʹॲཧ QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ
  22. Ͳ͏͍͏͜ͱͩͬͯ͹Α res = x %>% map(~nchar) res[[1]] %>% class() #

    [1] "function" res = x %>% map(~nchar(.)) res[[1]] %>% class() # [1] "integer" ͨͩͷؔ਺ ʢର৅͕༩͑ΒΕͳ͍ʣ ʹΑΓ ର৅͕༩͑ΒΕɺ ؔ਺͕࣮ߦ͞ΕΔ
  23. # ม਺ͷࢀর iris %>% split(.$Species) %>% map_dfc(~ mean(.$Sepal.Width)) # A

    tibble: 1 x 3 setosa versicolor virginica <dbl> <dbl> <dbl> 1 3.428 2.77 2.974 QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ
  24. NBQ৭ʑ  Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ map2_int(.x = 1:3, .y = 4:6, .f

    = `+`) # [1] 5 7 9 .x .y 1 2 3 4 5 6 5 7 9 ཁૉͷ Ґஔ͝ͱʹ ฒྻॲཧ +
  25. ద༻͢Δؔ਺ͷҾ਺΁ͷ஋ͷ౉͠ํ # rnorm(n = 3, mean = 0, sd =

    1) # NFBOͱTEͷ஋Λมߋ͠ɺ # O͸ݻఆͨ͠ਖ਼ن෼෍ʹै͏ཚ਺Λੜ੒ map2(.x = c(0, -1, 1), # meanʹద༻ .y = c(1, 1.5, 2), # sdʹద༻ .f = rnorm, n = 3) .f͸functionΛҙຯ͢Δ 3ͭͷҾ਺͕ఆٛ͞ΕΔ
  26. ద༻͢Δؔ਺ͷҾ਺΁ͷ஋ͷ౉͠ํ # ແ໊ؔ਺Խͯ͠ɺ໌ࣔతʹ໊લΛهड़ͯ͠΋ྑ͍ map2(.x = c(0, -1, 1), # mean

    .y = c(1, 1.5, 2), # sd .f = ~ rnorm(mean = .x, sd = .y, n = 3)) # Ґஔ΋ར༻Մೳ(લϖʔδͷྫಉ༷) map2(.x = c(0, -1, 1), # mean .y = c(1, 1.5, 2), # sd .f = ~ rnorm(n = 3, .x, .y,))
  27. NBQ৭ʑ  Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ library(jpmesh) # Ң౓ܦ౓͔ΒϝογϡίʔυΛฦ٫ # ϝογϡίʔυ͸ϝογϡαΠζʹԠܻͯ͡਺͕ҟͳΔ coords_to_mesh(longitude =

    141.3468, latitude = 43.06462, mesh_size = "80km") # [1] "6441" coords_to_mesh(141.3468, 43.06462, "1km") # [1] "64414277" 3ͭͷҾ਺͕ఆٛ͞ΕΔ
  28. d <- tibble::data_frame( longitude = c(141.3468, 139.6917, 139.7147), latitude =

    c(43.06462, 35.68949, 35.70078), mesh_size = c("80km", "1km", "500m")) d %>% pmap_chr(coords_to_mesh) # [1] "6441" "53394525" "533945471" NBQ৭ʑ  Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ
  29. d %>% pmap_chr(coords_to_mesh) longitude latitude mesh_size ໊લ ·ͨ͸ ҐஔͰࢦఆ ؔ਺ͷҾ਺໊ͱ

    σʔλϑϨʔϜͷྻ໊͕Ұக NBQ৭ʑ  Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ
  30. # ໊લͰҰக͢ΔͷͰ͜ͷσʔλϑϨʔϜͰ΋0, d = tibble::data_frame( mesh_size = c("80km", "1km", "500m"),

    longitude = c(141.3468, 139.6917, 139.7147), latitude = c(43.06462, 35.68949, 35.70078)) # ཁૉͷҐஔͰҰக͢ΔͷͰ͜ͷσʔλϑϨʔϜͰ΋0, d = tibble::data_frame( lon = c(141.3468, 139.6917, 139.7147), lat = c(43.06462, 35.68949, 35.70078), size = c("80km", "1km", "500m")) NBQ৭ʑ  Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ
  31. purrr + tidyverse 20171216 Tokyo.R#66@Ropponngi ※purrr΋librarary(tidyverse)Ͱϩʔυ͞Ε·͢

  32. ๯಄ͷ໰୊ΛUJEZWFSTFͰ # JSJTσʔληοτΛ4QFDJFT͝ͱ ߦͣͭ ʹอଘͨ͠DTW (target_files = list.files("data/", pattern =

    ".csv$", full.names = TRUE)) target_files %>% map_df(readr::read_csv) # A tibble: 150 x 5 # … ෳ਺ͷDTW͔ΒҰͭͷσʔλϑϨʔϜ
  33. ๯಄ͷ໰୊ΛUJEZWFSTFͰ # શม਺͕࣮਺ܕͰ͋Ε͹ df %>% map_df(mean) # ࣮਺ྻ͚ͩΛର৅ʹ df %>%

    map_if(is.double, mean) ෳ਺ྻͷฏۉ஋ΛٻΊΔ
  34. άϧʔϓผͷਤΛαΫοͱ࡞੒ walk2(paste0("img_", unique(iris$Species), ".png"), iris %>% split(.$Species) %>% map(~ggplot(., aes(Sepal.Length,Petal.Width))

    + geom_point()), ggsave, # HHTBWF ʹ౉͢Ҿ਺ʢݻఆ஋ʣ width = 4, height = 3)
  35. Μ Կ΋දࣔ͞Εͳ͍ map2_int(1:3, 4:6, `+`) walk2(1:3, 4:6, `+`) XBML ͸ग़ྗΛ൐Θͳ͍

    for side-effect
  36. EQMZSͷؔ਺಺ͰNBQ df_mesh = read_csv("data/mesh_1km.csv", col_types = "c") df_mesh %>% sample_n(3L)

    LNϝογϡͷ஋ΛؚΜͩσʔλ # A tibble: 3 x 1 mesh_1km <chr> 1 36234703 2 36235603 …
  37. jpmesh::mesh_to_coords( meshcode = df_mesh$mesh_1km[1]) # A tibble: 1 x 4

    # … # meshcodeҾ਺ʹୈҰྻ͕༩͑ΒΕΔ df_mesh_map = df_mesh %>% mutate(out = pmap(., ~ jpmesh::mesh_to_coords(meshcode = .x))) EQMZSͷؔ਺಺ͰNBQ .xʹ͸mesh_1km͕ೖΔ
  38. ͓ ೖ͚ͬͨͲ΋ df_mesh_map # A tibble: 60 x 2 mesh_1km

    out <chr> <list> 1 36225745 <tibble [1 x 4]> 2 36225746 <tibble [1 x 4]> 3 36225755 <tibble [1 x 4]> ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ
  39. ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ df_mesh_map$out[[1]] # A tibble: 1 x 4 lng_center lat_center

    lng_error lat_error <dbl> <dbl> <dbl> <dbl> 1 122.94375 24.4541666667 0.00624999999999 0.0041666667 த਎͸σʔλϑϨʔϜ
  40. ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ df_mesh_map %>% tidyr::unnest() # A tibble: 60 x 5

    mesh_1km lng_center lat_center lng_error lat_error <chr> <dbl> <dbl> <dbl> <dbl> 1 36225745 122.94375 24.4541666667 0.00624999999999 0.00416666670000 2 36225746 122.95625 24.4541666667 0.00624999999999 0.00416666670000 3 36225755 122.94375 24.4625000000 0.00624999999999 0.00416666666666 4 36225756 122.95625 24.4625000000 0.00624999999999 0.00416666666666 5 36225757 122.96875 24.4625000000 0.00624999999999 0.00416666666666 6 36225759 122.99375 24.4625000000 0.00625000000001 0.00416666666666 7 36225766 122.95625 24.4708333333 0.00624999999999 0.00416666663333 … ల։͢Δʹ͸tidyr::unnest() ݩͷσʔλϑϨʔϜʹྻ͕௥Ճ͞ΕΔ
  41. άϧʔϓʹରͯ͠OFTU⁶VOOFTU iris_nest = iris %>% group_by(Species) %>% nest() # A

    tibble: 3 x 2 Species data <fctr> <list> 1 setosa <tibble [50 x 4]> 2 versicolor <tibble [50 x 4]> 3 virginica <tibble [50 x 4]> all_equal(iris_nest %>% unnest(), iris) # [1] TRUE
  42. άϧʔϓʹରͯ͠OFTU⁶VOOFTU iris_model = iris_nest %>% transmute(out = map(data, function(df) {

    broom::tidy(lm(Sepal.Length~Petal.Width, data=df))})) # A tibble: 3 x 1 out <list> 1 <data.frame [2 x 5]> 2 <data.frame [2 x 5]> 3 <data.frame [2 x 5]>
  43. άϧʔϓʹରͯ͠OFTU⁶VOOFTU iris_model %>% unnest() # A tibble: 6 x 5

    term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) 4.777177508269 0.123912416859 38.55285555207 8.98396856599e-38 … MN ͷ݁ՌΛσʔλϑϨʔϜʹ ͋ͱ͸͓޷͖ʹʂ
  44. Appendix 20171216 Tokyo.R#66@Ropponngi

  45. keep(), discard()

  46. ཁૉͷอ࣋ͱআ֎ res = x %>% map_at(1, nchar) res %>% keep(~

    is.integer(.) == TRUE) %>% as_vector() [1] 7 # ཁૉͷ݁Ռ͕࢒Δ res %>% discard(~ is.integer(.) == TRUE) %>% as_vector() [1] "hoxom" "uribo" # ཁૉͷ݁ՌΛআ֎
  47. invoke()

  48. ཁૉʹద༻͢Δؔ਺Λมߋ library(stringr) c("str_to_upper", "str_to_title", "str_to_lower") %>% invoke_map_chr(x) # [1] "KAZUTAN"

    "Hoxom" "uribo" ؔ਺͸จࣈྻͱͯ͠༩͑Δ ֤ؔ਺΁ͷҾ਺͸ϦετͰ༩͑Δ
  49. ϦετΛ৞ΈࠐΈ 1:3 %>% reduce(`+`) # [1] 6 # 1 +2

    + 3 x %>% map(nchar) %>% reduce(c) # 7 5 5 # ͜ͷ৔߹ɺmap_int()Ͱྑ͍ ݸਓతʹɺmap_df()ରԠ͍ͯ͠ͳ͍ ৔߹ʹಛʹΦεεϝ (ݱࡏͷsfύοέʔδ (v.0.5-5))ͱ͔
  50. partial()

  51. ؔ਺ͷҰ෦ͷҾ਺஋Λݻఆ ෦෼ద༻ set.seed(71) f = partial(runif, n = rpois(1, 5),

    .lazy = FALSE) f # QBSUJBM Ҿ਺Ͱએݴͨ͠Ҿ਺Λ΋ͭؔ਺͕࡞੒͞ΕΔ function (...) runif(n = 4L, ...) 4͸set.seed(71)Λ༩͑ͨ࣌ͷ rpois(1,5)ͷ݁Ռ ෦෼తʹಉ͡ॲཧΛࢪ͢ࡍʹศར
  52. ؔ਺ͷҰ෦ͷҾ਺஋Λݻఆ ෦෼ద༻ f() # [1] 0.555103868479 0.327369962120 0.211666960036 0.316121358424 #

    SVOJG ͷOҎ֎ͷҾ਺͸มߋՄೳ f(min = 0.2) # [1] 0.957813141309 0.729371172562 0.911536924727 0.470407903753
  53. flatten()

  54. Ϧετͷ֊૚ΛҰஈ্͛Δ # ֊૚ͷϦετ x = list( list(hijiyama = c("kazutan")), list(tokyo

    = c("hoxom", "uribo"))) x %>% flatten() # $hijiyama # [1] "kazutan" # $tokyo # [1] "hoxom" "uribo" 2͔Β1֊૚ͷϦετʹ
  55. Ϧετͷ֊૚ΛҰஈ্͛Δ # VOMJTU ͱ͸ҟͳΓɺҰஈͣͭωετΛղফ͢ΔʢΑΓ҆શʣ x %>% unlist() # hijiyama tokyo1

    tokyo2 # "kazutan" "hoxom" "uribo" x %>% flatten() %>% flatten_chr() # [1] "kazutan" "hoxom" "uribo" ֊૚͕ͳ͘ͳΓɺϕΫτϧʢจࣈྻʣͱͯ͠ฦ٫
  56. ENJOY 20171216 Tokyo.R#66@Ropponngi