Upgrade to Pro — share decks privately, control downloads, hide ads and more …

そろそろ手を出すpurrr / nekosky

Uryu Shinya
December 16, 2017

そろそろ手を出すpurrr / nekosky

効率的なRプログラミングを可能にするpurrrパッケージの基本的な使い方とdplyr, ggplot2等のtidyverseに含まれるパッケージとの合わせ技を覚えましょう 😼 Tokyo.R#66での発表資料です https://atnd.org/events/92993

Uryu Shinya

December 16, 2017
Tweet

More Decks by Uryu Shinya

Other Decks in Programming

Transcript

  1. ͦΖͦΖ
    खΛग़͢purrr
    20171216 Tokyo.R#66@Ropponngi
    Shinya Uryu (HOXO-M INC.)
    @u_ribo
    @uribo

    Purrr that A functional programming toolkit for R

    View Slide

  2. YOUR TURN
    Q1.ߏ଄͕ಉ͡ෳ਺ͷDTWϑΝΠϧ͕
    ಉҰͷϑΥϧμʹอଘ͞Ε͍ͯ·͢ɻ
    ͜ΕΒΛҰͭͷσʔλϑϨʔϜͱͯ͠
    ಡΈࠐΈ͍ͨ࣌ɺ
    ͲͷΑ͏ʹॲཧ͠·͔͢

    View Slide

  3. YOUR TURN
    Q2. ࣮਺ྻ͔ΒͳΔྻͷσʔλϑ
    ϨʔϜ͔Βɺ֤ྻͷฏۉ஋ΛٻΊ͍ͨɻ
    ͲͷΑ͏ͳखஈΛར༻͠·͔͢

    View Slide

  4. ޷͖ʹ͍ͤ
    GPS
    ʹΑΔϧʔϓ
    EQMZSTVNNBSJTF
    BQQMZ଒ͷؔ਺

    ͨͩ͠ίϐϖ
    ͯΊʔ͸μϝͩ˞

    View Slide

  5. ίϯϐϡʔλ͕ಘҙͳ͜ͱ
    ൓෮ॲཧ
    ŵŜŤũ ΫΝXͤ
    ;͜͡
    ESGUHZ


    View Slide

  6. 3ͰΑ͋͘Δ൓෮ॲཧ
    ୀ۶ͳ͜ͱ͸3ʹ೚ͤΑ͏Ŏ
    wBQQMZ଒ GPSؔ਺
    wάϧʔϓॲཧ

    View Slide

  7. QVSSS൓෮ॲཧʹର͢ΔࡦͷҰͭ
    TJODF
    'JSTU3FMFBTF

    ΀ΔΔΔ
    ΀ΓΌʔ
    ͝རӹ
    ؔ਺΍Ҿ਺ͷద༻ʹҰ؏໊ͨ͠শ
    ίʔυͷهड़Λ؆ུԽ͢Δ
    কདྷతͳ࣮૷
    ฒྻԽͱਐߦ۩߹ͷՄࢹԽ

    View Slide

  8. ҃Δਓͷܦݧ
    ޮೳʹ͸
    ݸਓ͕ࠩ
    ͋Γ·͢

    View Slide

  9. ҆৺͍ͯͩ͘͠͞
    QVSSSΉ͍͔ͣΒҰ౓Ͱཧղ͠ͳ͓ͯ͘L
    εϐʔυతʹGPS΋ѱ͘ͳ͍Αʢࠓ͸ʣ
    QVSSSΛ࢖͏ͷ͸ίʔυͷอकੑΛߴΊΔͨΊ
    ͬͯ)BEMFZ͕ݴͬͯͨϤ
    3GPS%BUB4DJFODF

    View Slide

  10. map()

    View Slide

  11. NBQϕΫτϧͷ֤ཁૉ΁ͷؔ਺ͷద༻
    # จࣈ਺Λ਺͑Δॲཧ——————————————————————
    x <- c("kazutan", "hoxom", "uribo")
    nchar(x[1]) # 7
    nchar(x[2]) # 5
    nchar(x[3]) # 5
    sapply(x, nchar) # ؔ਺ΛҾ਺ʹͱΔߴ֊ؔ਺
    # kazutan hoxom uribo
    # 7 5 5

    View Slide

  12. library(purrr)
    map(x, nchar)
    # [[1]]
    # [1] 7
    # [[2]]
    # [1] 5
    # [[3]]
    # [1] 5
    map(.x, .f, )
    ؔ਺
    ...
    Ϧετ
    ϕΫτϧ
    σʔλϑϨʔϜ
    දݱࣜ
    ର৅ ॲཧ
    σʔλϑϨʔϜͰ͸ྻʢϕΫτϧʣ͕ର৅
    NBQϕΫτϧͷ֤ཁૉ΁ͷؔ਺ͷద༻

    View Slide

  13. base::Map͡Όμϝͳͷ?
    Map(nchar, x)
    # $kazutan
    # [1] 7
    # $hoxom
    # [1] 5
    # $uribo
    # [1] 5
    %>%
    ʜ·͋ྑ͍͚Ͳ
    ೴తʹ͸
    ୈҰҾ਺ʹ͸ૢ࡞ͷର৅
    /(
    ॊೈͳॲཧʹෆ޲͖

    View Slide

  14. map_*()

    View Slide

  15. NBQ৭ʑ
    ฦΓ஋Λ೚ҙͷܕͷϕΫτϧʹ
    FUNCTION RETURN
    map_lgl() ࿦ཧܕ
    map_int() ੔਺ܕ
    map_dbl() ࣮਺ܕ
    map_chr() จࣈྻܕ
    ฦΓ஋ͷσʔλܕʹԠͯ͡ద༻͢Δؔ਺Λมߋ

    View Slide

  16. x %>% map_int(nchar)
    # [1] 7 5 5
    x %>% map_int(nchar) %>% sum()
    # [1] 17
    x %>% map_chr(nchar)
    [1] "7" "5" "5"
    NBQ৭ʑ
    ฦΓ஋Λ೚ҙͷܕͷϕΫτϧʹ

    View Slide

  17. CAUTIONDŽ
    x %>% map_lgl(nchar)
    # Error: Can't coerce element 1 from a integer
    to a logical
    σʔλܕͷม׵نଇʹ஫ҙ
    ࿦ཧܕɺ੔਺ܕɺ
    ഒਫ਼౓খ਺఺ܕɺจࣈྻͷॱʹॊೈੑ͕ߴ͍

    View Slide

  18. NBQ৭ʑ
    Ґஔ΍৚݅ʹΑΔద༻
    x %>%
    map_at(.at = 2, nchar)
    # [[1]]
    #[1] "kazutan"
    # [[2]]
    [1] 5
    # [[3]]
    [1] "uribo"
    x %>%
    map_if(
    .p = . == "kazutan",
    nchar)
    # [[1]]
    # [1] 7
    # [[2]]
    # [1] "hoxom"
    # [[3]]
    # [1] "uribo"
    .p͸predicateΛҙຯ͢Δ

    View Slide

  19. ͳΜ͔ग़͖ͯͨ
    x %>% map_if(.p = == "kazutan", nchar)
    NBHSJUUSͷͱҰॹμωʢ໌ࣔతʹΦϒδΣΫτΛ༩͑Δʣ
    x[1]: TRUE "kazutan" == "kazutan"
    x[2]: FALSE "hoxom" == "kazutan"
    x[3]: FALSE "uribo" == "kazutan"
    ͸ཁૉͷ୅໊ࢺͱͯ͠ػೳ
    .

    View Slide

  20. QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ
    # 4QFDJFT͝ͱʹMN
    ͕࣮ߦ͞ΕΔ
    iris %>%
    split(.$Species) %>%
    map(function(df) {
    lm(Petal.Width ~ Sepal.Length, data = df)
    })

    View Slide

  21. # ؆ུԽͨ͠هड़
    iris %>%
    split(.$Species) %>%
    map(~
    lm(Petal.Width ~ Sepal.Length, data =
    .))
    })
    ୈҰҾ਺ͷ
    ΦϒδΣΫτ͕
    ౉͞ΕΔ
    ແ໊ؔ਺Λఆٛ
    JSJTͷ4QFDJFT͝ͱʹॲཧ
    QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ

    View Slide

  22. Ͳ͏͍͏͜ͱͩͬͯ͹Α
    res = x %>% map(~nchar)
    res[[1]] %>% class()
    # [1] "function"
    res = x %>% map(~nchar(.))
    res[[1]] %>% class()
    # [1] "integer"
    ͨͩͷؔ਺
    ʢର৅͕༩͑ΒΕͳ͍ʣ
    ʹΑΓ
    ର৅͕༩͑ΒΕɺ
    ؔ਺͕࣮ߦ͞ΕΔ

    View Slide

  23. # ม਺ͷࢀর
    iris %>%
    split(.$Species) %>%
    map_dfc(~ mean(.$Sepal.Width))
    # A tibble: 1 x 3
    setosa versicolor virginica

    1 3.428 2.77 2.974
    QVSSSʹ͓͚ΔγϣʔτΧοτԋࢉࢠ

    View Slide

  24. NBQ৭ʑ
    Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ
    map2_int(.x = 1:3, .y = 4:6, .f = `+`)
    # [1] 5 7 9
    .x .y
    1
    2
    3
    4
    5
    6
    5
    7
    9
    ཁૉͷ
    Ґஔ͝ͱʹ
    ฒྻॲཧ
    +

    View Slide

  25. ద༻͢Δؔ਺ͷҾ਺΁ͷ஋ͷ౉͠ํ
    # rnorm(n = 3, mean = 0, sd = 1)
    # NFBOͱTEͷ஋Λมߋ͠ɺ
    # O͸ݻఆͨ͠ਖ਼ن෼෍ʹै͏ཚ਺Λੜ੒
    map2(.x = c(0, -1, 1), # meanʹద༻
    .y = c(1, 1.5, 2), # sdʹద༻
    .f = rnorm,
    n = 3)
    .f͸functionΛҙຯ͢Δ
    3ͭͷҾ਺͕ఆٛ͞ΕΔ

    View Slide

  26. ద༻͢Δؔ਺ͷҾ਺΁ͷ஋ͷ౉͠ํ
    # ແ໊ؔ਺Խͯ͠ɺ໌ࣔతʹ໊લΛهड़ͯ͠΋ྑ͍
    map2(.x = c(0, -1, 1), # mean
    .y = c(1, 1.5, 2), # sd
    .f = ~ rnorm(mean = .x, sd = .y, n = 3))
    # Ґஔ΋ར༻Մೳ(લϖʔδͷྫಉ༷)
    map2(.x = c(0, -1, 1), # mean
    .y = c(1, 1.5, 2), # sd
    .f = ~ rnorm(n = 3, .x, .y,))

    View Slide

  27. NBQ৭ʑ
    Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ
    library(jpmesh)
    # Ң౓ܦ౓͔ΒϝογϡίʔυΛฦ٫
    # ϝογϡίʔυ͸ϝογϡαΠζʹԠܻͯ͡਺͕ҟͳΔ
    coords_to_mesh(longitude = 141.3468,
    latitude = 43.06462,
    mesh_size = "80km")
    # [1] "6441"
    coords_to_mesh(141.3468, 43.06462, "1km")
    # [1] "64414277"
    3ͭͷҾ਺͕ఆٛ͞ΕΔ

    View Slide

  28. d <- tibble::data_frame(
    longitude = c(141.3468, 139.6917, 139.7147),
    latitude = c(43.06462, 35.68949, 35.70078),
    mesh_size = c("80km", "1km", "500m"))
    d %>%
    pmap_chr(coords_to_mesh)
    # [1] "6441" "53394525" "533945471"
    NBQ৭ʑ
    Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ

    View Slide

  29. d %>%
    pmap_chr(coords_to_mesh)
    longitude latitude mesh_size
    ໊લ
    ·ͨ͸
    ҐஔͰࢦఆ
    ؔ਺ͷҾ਺໊ͱ
    σʔλϑϨʔϜͷྻ໊͕Ұக
    NBQ৭ʑ
    Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ

    View Slide

  30. # ໊લͰҰக͢ΔͷͰ͜ͷσʔλϑϨʔϜͰ΋0,
    d = tibble::data_frame(
    mesh_size = c("80km", "1km", "500m"),
    longitude = c(141.3468, 139.6917, 139.7147),
    latitude = c(43.06462, 35.68949, 35.70078))
    # ཁૉͷҐஔͰҰக͢ΔͷͰ͜ͷσʔλϑϨʔϜͰ΋0,
    d = tibble::data_frame(
    lon = c(141.3468, 139.6917, 139.7147),
    lat = c(43.06462, 35.68949, 35.70078),
    size = c("80km", "1km", "500m"))
    NBQ৭ʑ
    Ҿ਺͝ͱʹҟͳΔର৅Λࢦఆ

    View Slide

  31. purrr +
    tidyverse
    20171216 Tokyo.R#66@Ropponngi
    ※purrr΋librarary(tidyverse)Ͱϩʔυ͞Ε·͢

    View Slide

  32. ๯಄ͷ໰୊ΛUJEZWFSTFͰ
    # JSJTσʔληοτΛ4QFDJFT͝ͱ ߦͣͭ
    ʹอଘͨ͠DTW
    (target_files = list.files("data/",
    pattern = ".csv$",
    full.names = TRUE))
    target_files %>% map_df(readr::read_csv)
    # A tibble: 150 x 5
    # …
    ෳ਺ͷDTW͔ΒҰͭͷσʔλϑϨʔϜ

    View Slide

  33. ๯಄ͷ໰୊ΛUJEZWFSTFͰ
    # શม਺͕࣮਺ܕͰ͋Ε͹
    df %>% map_df(mean)
    # ࣮਺ྻ͚ͩΛର৅ʹ
    df %>% map_if(is.double, mean)
    ෳ਺ྻͷฏۉ஋ΛٻΊΔ

    View Slide

  34. άϧʔϓผͷਤΛαΫοͱ࡞੒
    walk2(paste0("img_", unique(iris$Species), ".png"),
    iris %>%
    split(.$Species) %>%
    map(~ggplot(., aes(Sepal.Length,Petal.Width))
    + geom_point()),
    ggsave,
    # HHTBWF
    ʹ౉͢Ҿ਺ʢݻఆ஋ʣ
    width = 4, height = 3)

    View Slide

  35. Μ Կ΋දࣔ͞Εͳ͍
    map2_int(1:3, 4:6, `+`)
    walk2(1:3, 4:6, `+`)
    XBML
    ͸ग़ྗΛ൐Θͳ͍
    for side-effect

    View Slide

  36. EQMZSͷؔ਺಺ͰNBQ

    df_mesh = read_csv("data/mesh_1km.csv",
    col_types = "c")
    df_mesh %>% sample_n(3L)
    LNϝογϡͷ஋ΛؚΜͩσʔλ
    # A tibble: 3 x 1
    mesh_1km

    1 36234703
    2 36235603

    View Slide

  37. jpmesh::mesh_to_coords(
    meshcode = df_mesh$mesh_1km[1])
    # A tibble: 1 x 4
    # …
    # meshcodeҾ਺ʹୈҰྻ͕༩͑ΒΕΔ
    df_mesh_map = df_mesh %>%
    mutate(out = pmap(., ~
    jpmesh::mesh_to_coords(meshcode = .x)))
    EQMZSͷؔ਺಺ͰNBQ

    .xʹ͸mesh_1km͕ೖΔ

    View Slide

  38. ͓ ೖ͚ͬͨͲ΋
    df_mesh_map
    # A tibble: 60 x 2
    mesh_1km out

    1 36225745
    2 36225746
    3 36225755
    ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ

    View Slide

  39. ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ
    df_mesh_map$out[[1]]
    # A tibble: 1 x 4
    lng_center lat_center lng_error lat_error

    1 122.94375 24.4541666667 0.00624999999999
    0.0041666667
    த਎͸σʔλϑϨʔϜ

    View Slide

  40. ֊૚ߏ଄ͷ͋ΔσʔλϑϨʔϜ
    df_mesh_map %>% tidyr::unnest()
    # A tibble: 60 x 5
    mesh_1km lng_center lat_center lng_error lat_error

    1 36225745 122.94375 24.4541666667 0.00624999999999 0.00416666670000
    2 36225746 122.95625 24.4541666667 0.00624999999999 0.00416666670000
    3 36225755 122.94375 24.4625000000 0.00624999999999 0.00416666666666
    4 36225756 122.95625 24.4625000000 0.00624999999999 0.00416666666666
    5 36225757 122.96875 24.4625000000 0.00624999999999 0.00416666666666
    6 36225759 122.99375 24.4625000000 0.00625000000001 0.00416666666666
    7 36225766 122.95625 24.4708333333 0.00624999999999 0.00416666663333

    ల։͢Δʹ͸tidyr::unnest()
    ݩͷσʔλϑϨʔϜʹྻ͕௥Ճ͞ΕΔ

    View Slide

  41. άϧʔϓʹରͯ͠OFTU⁶VOOFTU
    iris_nest = iris %>%
    group_by(Species) %>% nest()
    # A tibble: 3 x 2
    Species data

    1 setosa
    2 versicolor
    3 virginica
    all_equal(iris_nest %>% unnest(),
    iris)
    # [1] TRUE

    View Slide

  42. άϧʔϓʹରͯ͠OFTU⁶VOOFTU
    iris_model = iris_nest %>%
    transmute(out = map(data, function(df) {
    broom::tidy(lm(Sepal.Length~Petal.Width, data=df))}))
    # A tibble: 3 x 1
    out

    1
    2
    3

    View Slide

  43. άϧʔϓʹରͯ͠OFTU⁶VOOFTU
    iris_model %>% unnest()
    # A tibble: 6 x 5
    term estimate std.error statistic p.value

    1 (Intercept) 4.777177508269 0.123912416859 38.55285555207
    8.98396856599e-38

    MN
    ͷ݁ՌΛσʔλϑϨʔϜʹ
    ͋ͱ͸͓޷͖ʹʂ

    View Slide

  44. Appendix
    20171216 Tokyo.R#66@Ropponngi

    View Slide

  45. keep(),
    discard()

    View Slide

  46. ཁૉͷอ࣋ͱআ֎
    res = x %>% map_at(1, nchar)
    res %>%
    keep(~ is.integer(.) == TRUE) %>%
    as_vector()
    [1] 7 # ཁૉͷ݁Ռ͕࢒Δ
    res %>%
    discard(~ is.integer(.) == TRUE) %>%
    as_vector()
    [1] "hoxom" "uribo" # ཁૉͷ݁ՌΛআ֎

    View Slide

  47. invoke()

    View Slide

  48. ཁૉʹద༻͢Δؔ਺Λมߋ
    library(stringr)
    c("str_to_upper", "str_to_title", "str_to_lower")
    %>% invoke_map_chr(x)
    # [1] "KAZUTAN" "Hoxom" "uribo"
    ؔ਺͸จࣈྻͱͯ͠༩͑Δ
    ֤ؔ਺΁ͷҾ਺͸ϦετͰ༩͑Δ

    View Slide

  49. ϦετΛ৞ΈࠐΈ
    1:3 %>% reduce(`+`)
    # [1] 6
    # 1 +2 + 3
    x %>% map(nchar) %>%
    reduce(c)
    # 7 5 5
    # ͜ͷ৔߹ɺmap_int()Ͱྑ͍
    ݸਓతʹɺmap_df()ରԠ͍ͯ͠ͳ͍
    ৔߹ʹಛʹΦεεϝ
    (ݱࡏͷsfύοέʔδ (v.0.5-5))ͱ͔

    View Slide

  50. partial()

    View Slide

  51. ؔ਺ͷҰ෦ͷҾ਺஋Λݻఆ ෦෼ద༻

    set.seed(71)
    f = partial(runif, n = rpois(1, 5), .lazy = FALSE)
    f
    # QBSUJBM
    Ҿ਺Ͱએݴͨ͠Ҿ਺Λ΋ͭؔ਺͕࡞੒͞ΕΔ
    function (...)
    runif(n = 4L, ...) 4͸set.seed(71)Λ༩͑ͨ࣌ͷ rpois(1,5)ͷ݁Ռ
    ෦෼తʹಉ͡ॲཧΛࢪ͢ࡍʹศར

    View Slide

  52. ؔ਺ͷҰ෦ͷҾ਺஋Λݻఆ ෦෼ద༻

    f()
    # [1] 0.555103868479 0.327369962120
    0.211666960036 0.316121358424
    # SVOJG
    ͷOҎ֎ͷҾ਺͸มߋՄೳ
    f(min = 0.2)
    # [1] 0.957813141309 0.729371172562
    0.911536924727 0.470407903753

    View Slide

  53. flatten()

    View Slide

  54. Ϧετͷ֊૚ΛҰஈ্͛Δ
    # ֊૚ͷϦετ
    x = list(
    list(hijiyama = c("kazutan")),
    list(tokyo = c("hoxom", "uribo")))
    x %>% flatten()
    # $hijiyama
    # [1] "kazutan"
    # $tokyo
    # [1] "hoxom" "uribo"
    2͔Β1֊૚ͷϦετʹ

    View Slide

  55. Ϧετͷ֊૚ΛҰஈ্͛Δ
    # VOMJTU
    ͱ͸ҟͳΓɺҰஈͣͭωετΛղফ͢ΔʢΑΓ҆શʣ
    x %>% unlist()
    # hijiyama tokyo1 tokyo2
    # "kazutan" "hoxom" "uribo"
    x %>% flatten() %>% flatten_chr()
    # [1] "kazutan" "hoxom" "uribo"
    ֊૚͕ͳ͘ͳΓɺϕΫτϧʢจࣈྻʣͱͯ͠ฦ٫

    View Slide

  56. ENJOY
    20171216 Tokyo.R#66@Ropponngi

    View Slide