Save 37% off PRO during our Black Friday Sale! »

Japan.R 2015: {purrr} による非テーブルデータの処理

22f56e55955b9aa693081ed5dc6400ae?s=47 Sinhrks
December 05, 2015

Japan.R 2015: {purrr} による非テーブルデータの処理

22f56e55955b9aa693081ed5dc6400ae?s=128

Sinhrks

December 05, 2015
Tweet

Transcript

  1. {purrr} ʹΑΔ ඇςʔϒϧσʔλͷॲཧ

  2. ࣗݾ঺հ • @sinhrks • ۀ຿: σʔλ෼ੳ (ϝʔΧʔ) • ར༻ݴޠ: Python,

    Rͱ͔ • झຯ: OSS ׆ಈ • pandas ※ ίϛολ (Python ύοέʔδ) ※ R ͷ data.frame + dplyr + tidyr + readr + haven + lubridate + stringr + ggplot2… (ҎԼུ ͷΑ͏ͳ΋ͷ
  3. ຊ೔͸ Θ͚Ͱ͸͋Γ·ͤΜ Λ dis Γʹདྷͨ

  4. ͳͥ Japan.R ʹ? • Rͷύοέʔδ࡞ͬͨ (JU)VC"XBSET ݱࡏ • ͜ͷR։ൃऀ͕͍͢͝ (2015)

    by @u_ribo બग़ • http://uribo.hatenablog.com/entry/2015/12/02/180004
  5. ͳͥ Λʁ

  6. ͳͥRΛ? • ϓϩάϥϛϯά͕ઐ໳Ͱ͸ͳ͍͕ɺઐ໳෼໺Ͱ ౷ܭ/ػցֶश͕ඞཁͩͬͨͷͰ • ΤϯδχΞ͕ͩɺ౷ܭ/ػցֶश͕ඞཁʹͳͬͨ ͷͰ

  7. ૝ఆ͢ΔϨϕϧͱΰʔϧ • લఏ: dplyr, ggplot2 ͸গ͠࢖ͬͨ͜ͱ͕͋Δ • ΰʔϧ: {purrr} ͷجຊతͳ࢖͍ํΛ஌Δ

    • ͜ΜͳίʔυΛॻ͔ͳͯ͘΋Α͘ͳΔ glm.fit1 <- glm(y ~ x, …) glm.fit2 <- glm(y ~ x + z, …) … predict(glm.fit1, newdata = newdata) predict(glm.fit2, newdata = newdata) …
  8. ຊ೔ͷ಺༰ {purrr} Ͱσʔλॲཧ

  9. σʔλॲཧͷྲྀΕ σʔλͷ४උ ϞσϦϯά ධՁ

  10. RʹΑΔσʔλॲཧ EBUBGSBNF ϦετϕΫτϧ Ϟσϧ HHQMPU UJEZS EQMZS SFBES SWFTU SMJTU

    QVSSS ֤छ౷ܭػցֶश ύοέʔδ CSPPN DBSFU HHGPSUJGZ
  11. data.frameͷॲཧ

  12. data.frameͷॲཧ • σʔλͷॲཧʹ͸ύλʔϯ͕͋Δ • ྫ: data.frame Λ ͋Δྻͷ஋ͰάϧʔϓΘ͚͠ɺ ྻ͝ͱʹฏۉ஋Λܭࢉ͠ɺ݁ՌΛ data.frame

    ʹ ͍ͨ͠ɻ ෼ׂ ద༻ ݁߹
  13. plyr • ૊ΈࠐΈ: library(plyr) plyr::ddply(iris, .(Species), plyr::colwise(mean)) ## Species Sepal.Length

    Sepal.Width Petal.Length Petal.Width ## 1 setosa 5.006 3.428 1.462 0.246 ## 2 versicolor 5.936 2.770 4.260 1.326 ## 3 virginica 6.588 2.974 5.552 2.026 iris_group <- split(iris, iris$Species) res <- sapply(iris_group, function(x) { sapply(x[, 1:4], mean) }) as.data.frame(t(res)) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## setosa 5.006 3.428 1.462 0.246 ## versicolor 5.936 2.770 4.260 1.326 ## virginica 6.588 2.974 5.552 2.026 • plyr: ෼ׂ - ద༻ - ݁߹ΛҰͭͷؔ਺Ͱ
  14. dplyr • plyr: ෼ׂ - ద༻ - ݁߹ΛҰͭͷؔ਺Ͱ library(plyr) plyr::ddply(iris,

    .(Species), plyr::colwise(mean)) ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 setosa 5.006 3.428 1.462 0.246 ## 2 versicolor 5.936 2.770 4.260 1.326 ## 3 virginica 6.588 2.974 5.552 2.026 library(dplyr) dplyr::summarise_each(dplyr::group_by(iris, Species), dplyr::funs(mean)) ## Source: local data frame [3 x 5] ## ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## (fctr) (dbl) (dbl) (dbl) (dbl) ## 1 setosa 5.006 3.428 1.462 0.246 ## 2 versicolor 5.936 2.770 4.260 1.326 ## 3 virginica 6.588 2.974 5.552 2.026 • dplyr: ॲཧ͝ͱʹؔ਺Λద༻
  15. ύΠϓԋࢉࢠ %>% • magrittr iris %>% dplyr::group_by(Species) %>% dplyr::summarise_each(dplyr::funs(mean)) ##

    Source: local data frame [3 x 5] ## ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## (fctr) (dbl) (dbl) (dbl) (dbl) ## 1 setosa 5.006 3.428 1.462 0.246 ## 2 versicolor 5.936 2.770 4.260 1.326 ## 3 virginica 6.588 2.974 5.552 2.026 x %>% f(…) == f(x, …)
  16. data.frameͷॲཧ • ߟ͑ํ • ॲཧΛγϯϓϧͳؔ਺ͷ૊Έ߹ΘͤͰهड़ • ؔ਺͸ύΠϓԋࢉࢠ %>% Ͱ઀ଓ •

    ಉ͜͡ͱΛ data.frame Ҏ֎Ͱ΋΍Γ͍ͨ
  17. data.frame Ҏ֎ͷॲཧ

  18. ͜Μͳ͜ͱ͋Γ·ͤΜ͔ʁ • ͋ΔॲཧΛɺ • ܁Γฦ͍ͨ͠ • ෳ਺ͷର৅ʹߦ͍͍ͨ • ͦͷ··Ͱ͸ data.frame

    ʹͰ͖ͳ͍σʔλ (JSON / XMLͳͲ) Λѻ͍͍ͨ
  19. ྫ͑͹ σʔλΛ8FC"1*͔Β+40/Ͱऔಘͯ͠੔ܗ ͋Δ৚݅ʹ߹க͠ͳ͍σʔλ͸আ֎ EBUBGSBNFΛ࡞੒ ෳ਺ͷϞσϧΛ࡞੒ൺֱϓϩοτ

  20. ૊ΈࠐΈؔ਺Ͱ΍Δͱ… • Apply ؔ਺܈ • ߴ֊ؔ਺ ※ ܈ BQQMZ lapply(list(1,

    2, 3), function (x) { x + 1 }) Map(function (x) { x + 1 }, list(1, 2, 3)) MBQQMZ TBQQMZ UBQQMZ NBQQMZ 3FEVDF 'JMUFS 'JOE .BQ /FHBUF 1PTJUJPO ແ໊ؔ਺ͷఆ͕ٛΊΜͲ͏ ؔ਺ͷద༻Ҏ֎ͷॲཧ͸Ͱ͖ͳ͍ ؔ਺͕ୈҰҾ਺ͷͨΊύΠϓԋࢉࢠ͕࢖͍ʹ͍͘ ֮͑ΒΕͳ͍ ※ ؔ਺ΛҾ਺΍ฦΓ஋ʹ͢Δؔ਺
  21. $SFBUJWF$PNNPOT$$GSPNUIF1JYBCBZ ͳΜͱ͔ͳΒͳ͍͔ʜ

  22. purrr !!! 1IPUPCZԶ

  23. {purrr} ͱ͸

  24. {purrr} ͱ͸ • RStudio BlogΑΓ • ؔ਺ܕϓϩάϥϛϯάͷͨΊͷπʔϧ • ॲཧΛγϯϓϧͳؔ਺ͷ૊Έ߹ΘͤͰهड़ •

    ؔ਺͸ύΠϓԋࢉࢠͰ઀ଓ Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations by combining simple pieces in a standard way.
  25. ؔ਺ܕϓϩάϥϛϯάͱ͸? • (Ұҙͷఆ͕ٛ͋ΔΘ͚Ͱ͸ͳ͍͕) ෳ਺ͷؔ਺ͷ ૊Έ߹ΘͤʹΑͬͯ࠷ऴతʹ΍Γ͍ͨॲཧΛه ड़͍ͯ͘͠ελΠϧɻ • ؔ਺ܕϓϩάϥϛϯάΛ૝ఆͨ͠ݴޠ͸ؔ਺ ܕݴޠͱݺ͹ΕΔ

  26. R ͸ؔ਺ܕݴޠ͔? • Yes (by Hadley Wickham in “Advanced R”)

    . • ৄ͘͠͸ “Rݴޠపఈղઆ” (๜༁͕12/23ൃച)
  27. {purrr} ͷಛ௃ • ϥϜμࣜ • ύΠϓԋࢉࢠͰ઀ଓͰ͖Δؔ਺܈ • ஫ҙ఺ • όʔδϣϯ͸0.1ɻࠓޙ

    ഁյతͳมߋ͕͋ΔՄೳੑ͕͋Δɻ • ຊ೔͸ओཁͳؔ਺ͷΈ͝঺հ (͜ΕͰ΄ͱΜͲͷ͜ͱ͸Ͱ͖Δ)ɻ
  28. ϥϜμࣜ • ໊લΛ΋ͨͳ͍ؔ਺ (ແ໊ؔ਺) Λγϯϓϧʹهड़ • R ඪ४ • {purrr}

    ͷϥϜμࣜ • {purrr} ͷؔ਺ͷதͰ࢖͏ (ߴ֊ؔ਺) function(x) { x + 1 } ~ . + 1 map(x, ~ . + 1)  υοτ ͕Ҿ਺ʹରԠ
  29. {purrr} ͷؔ਺܈ (Ұ෦) ඪ४ QVSSS BQQMZܥ NBQ 'JMUFS LFFQ 3FEVDF

    SFEVDF TQMJU TQMJU@CZ TPSU TPSU@CZ
  30. map • ؔ਺Λ֤ཁૉʹద༻ map(c(1, 2, 3), ~ . + 1)

    ## x = list 3 (216 bytes) ## . [[1]] = double 1= 2 ## . [[2]] = double 1= 3 ## . [[3]] = double 1= 4 map(list(a = 1, b = 2, c = 3), ~ . + 1) ## x = list 3 (544 bytes) ## . a = double 1= 2 ## . b = double 1= 3 ## . c = double 1= 4 map(c(1, 2, 3), ~ . + 1) == list(1 + 1, 2 + 1, 3 + 1)
  31. map2 • 2ͭͷҾ਺ʹରͯؔ͠਺ద༻ map2(c(1, 2, 3), c(4, 5, 6), ~

    .x * .y) ## x = list 3 (216 bytes) ## . [[1]] = double 1= 4 ## . [[2]] = double 1= 10 ## . [[3]] = double 1= 18 map2(c(1, 2, 3), c(4, 5, 6), ~ .x * .y) == list(1 * 4, 2 * 5, 3 * 6) ͻͱͭΊ͔ΒͷཁૉΛY ;ͨͭΊ͔ΒͷཁૉΛZͰࢀর
  32. map_xxx • ݁ՌΛϕΫτϧͰऔಘɺܕͷࢦఆ͕ඞཁ map_int(list(a = 1L, b = 2L, c

    = 3L), ~ . + 1L) ## a b c ## 2 3 4 NBQ@YYY ϕΫτϧͷܕ NBQ@MHM MPHJDBM NBQ@DIS DIBSBDUFS NBQ@JOU JOUFHFS NBQ@ECM OVNSJD
  33. keep • ؔ਺ͷ৚݅ʹ͋ͯ͸·ΔཁૉΛநग़ keep(c(1, 2, 3), ~ . >= 2)

    ## [1] 2 3 keep(list(a = 1, b = 2, c = 3), ~ . >= 2) ## x = list 2 (416 bytes) ## . b = double 1= 2 ## . c = double 1= 3
  34. reduce • ͨͨΈࠐΈ reduce(c(1, 2, 3), `+`) ## [1] 6

    reduce(list(a = 1, b = 2, c = 3), `+`) ## [1] 6 reduce(c(1, 2, 3), `+`) == ((1 + 2) + 3)
  35. split_by • base::split + ϥϜμࣜ split_by(c(1, 2, 3), ~ .

    %% 2) ## x = list 2 (424 bytes) ## . 0 = double 1= 2 ## . 1 = double 2= 1 3 split_by(list(a = 1, b = 2, c = 3), ~ . %% 2) ## x = list 2 (1040 bytes) ## . 0 = list 1 ## . . b = double 1= 2 ## . 1 = list 2 ## . . a = double 1= 1 ## . . c = double 1= 3
  36. sort_by • ιʔτ sort_by(c(2, -3, 1), ~ abs(.)) ## [1]

    1 2 -3 sort_by(list(a = 2, b = - 3, c = 1), ~ abs(.)) ## x = list 3 (544 bytes) ## . c = double 1= 1 ## . a = double 1= 2 ## . b = double 1= -3
  37. 2. αϯϓϧ

  38. {purrr} • 1. μϛʔσʔλͷ࡞੒ • 2. Ϧετ͔Βͷσʔλબ୒ • 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ

    • 4. ϓϩοτ
  39. 1. μϛʔσʔλͷ࡞੒ • ద౰ͳμϛʔσʔλΛ࡞Γ͍ͨ • ҎԼͷଐੑΛ΋ͭϦετͷϦετ • name: ໊લ •

    age: ೥ྸ • likes: ޷͖ͳਓ
  40. 1. μϛʔσʔλͷ࡞੒ ndata <- c('hoxo-m', 'hoxo-eros', 'hoxo-um', 'hoxo-uri') dummies <-

    map(1:5, ~ list(name = sample(ndata, size = 1), age = sample(25:35, size = 1), likes = sample(ndata, size = 2))) dummies ## x = list 5 (3632 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-eros ## . . age = integer 1= 29 ## . . likes = character 2= hoxo-um hoxo-uri ## . [[2]] = list 3 ## . . name = character 1= hoxo-m ## . . age = integer 1= 34 ## . . likes = character 2= hoxo-uri hoxo-eros ## . [[3]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 25 ## . . likes = character 2= hoxo-m hoxo-uri ## . [[4]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 29 ## . . likes = character 2= hoxo-uri hoxo-eros ## . [[5]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 35 ## . . likes = character 2= hoxo-eros hoxo-um ϦετΛฦ͢ϥϜμࣜ
  41. 1. μϛʔσʔλͷ࡞੒ dummies <- map(ndata, ~ list(name = ., age

    = sample(30:35, size = 1), likes = sample(ndata[ndata != .], size = sample(1:(length(ndata) - 1), size = 1)))) dummies ## x = list 4 (2896 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-m ## . . age = integer 1= 31 ## . . likes = character 2= hoxo-um hoxo-uri ## . [[2]] = list 3 ## . . name = character 1= hoxo-eros ## . . age = integer 1= 31 ## . . likes = character 3= hoxo-uri hoxo-um ... ## . [[3]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 30 ## . . likes = character 1= hoxo-m ## . [[4]] = list 3 ## . . name = character 1= hoxo-uri ## . . age = integer 1= 34 ## . . likes = character 2= hoxo-um hoxo-m
  42. 1. μϛʔσʔλͷ࡞੒ gen <- function(name) { age = sample(30:35, size

    = 1) size = sample(1:(length(ndata) - 1), size = 1) likes = sample(ndata[ndata != name], size = size) return (list(name = name, age = age, likes = likes)) } dummies <- map(ndata, ~ gen(.)) dummies ## x = list 4 (2896 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-m ## . . age = integer 1= 31 ## . . likes = character 2= hoxo-um hoxo-uri ## . [[2]] = list 3 ## . . name = character 1= hoxo-eros ## . . age = integer 1= 31 ## . . likes = character 3= hoxo-uri hoxo-um ... ## . [[3]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 30 ## . . likes = character 1= hoxo-m ## . [[4]] = list 3 ## . . name = character 1= hoxo-uri ## . . age = integer 1= 34 ## . . likes = character 2= hoxo-um hoxo-m ී௨ͷؔ਺Λ౉ͯ͠΋Α͍
  43. 1. μϛʔσʔλͷ࡞੒ ndata %>% map(~ list(name = ., age =

    sample(30:35, size = 1))) %>% dplyr::bind_rows() EBUBGSBNFʹ͚ͨ͠Ε͹ EQMZSCJOE@SPXT ## Source: local data frame [4 x 2] ## ## name age ## (chr) (int) ## 1 hoxo-m 34 ## 2 hoxo-eros 35 ## 3 hoxo-um 32 ## 4 hoxo-uri 34
  44. 2. Ϧετ͔Βͷσʔλબ୒ • ؆୯ͳૢ࡞͸ {purrr} Ͱ΋Ͱ͖Δ • Ϩίʔυͷબ୒ • ଐੑͷબ୒

    • ࢀߟ: {rlist} ͱͷൺֱ • {purrr} ͰϦετσʔλΛૢ࡞͢Δ <1> • {purrr} ͰϦετσʔλΛૢ࡞͢Δ <2>
  45. 2. Ϧετ͔Βͷσʔλબ୒ keep(dummies, ~ .$name == 'hoxo-m') ## x =

    list 1 (752 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-m ## . . age = integer 1= 31 ## . . likes = character 2= hoxo-um hoxo-uri OBNFଐੑ͕bIPYPN`ͷ ϨίʔυΛબ୒
  46. 2. Ϧετ͔Βͷσʔλબ୒ keep(dummies, ~ .$age > 30) ## x =

    list 3 (2256 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-m ## . . age = integer 1= 31 ## . . likes = character 2= hoxo-um hoxo-uri ## . [[2]] = list 3 ## . . name = character 1= hoxo-eros ## . . age = integer 1= 31 ## . . likes = character 3= hoxo-uri hoxo-um ... ## . [[3]] = list 3 ## . . name = character 1= hoxo-uri ## . . age = integer 1= 34 ## . . likes = character 2= hoxo-um hoxo-m BHFଐੑ͕ΑΓେ͖͍ ͓͡͞ΜΛબ୒ ˞࣮ࡍͷ೥ྸͱ͸ҟͳΔՄೳੑ͕͋Γ·͢
  47. 2. Ϧετ͔Βͷσʔλબ୒ keep(dummies, ~ 'hoxo-m' %in% .$likes) ## x =

    list 3 (2192 bytes) ## . [[1]] = list 3 ## . . name = character 1= hoxo-eros ## . . age = integer 1= 31 ## . . likes = character 3= hoxo-uri hoxo-um ... ## . [[2]] = list 3 ## . . name = character 1= hoxo-um ## . . age = integer 1= 30 ## . . likes = character 1= hoxo-m ## . [[3]] = list 3 ## . . name = character 1= hoxo-uri ## . . age = integer 1= 34 ## . . likes = character 2= hoxo-um hoxo-m bIPYPN`͞Μͷ͜ͱ͕ ޷͖ͳਓΛநग़ ˞ࣄ࣮ͱ͸ҟͳΔՄೳੑ͕͋Γ·͢
  48. 2. Ϧετ͔Βͷσʔλબ୒ keep(dummies, ~ 'hoxo-m' %in% .$likes) %>% map(~ .$name)

    ## x = list 3 (376 bytes) ## . [[1]] = character 1= hoxo-eros ## . [[2]] = character 1= hoxo-um ## . [[3]] = character 1= hoxo-uri ϑΟϧλ݁Ռ͔Β OBNFଐੑͷΈΛநग़ keep(dummies, ~ 'hoxo-m' %in% .$likes) %>% map('name') ## x = list 3 (376 bytes) ## . [[1]] = character 1= hoxo-eros ## . [[2]] = character 1= hoxo-um ## . [[3]] = character 1= hoxo-uri NBQʹจࣈྻΛ ౉ͯ͠΋Α͍
  49. • σʔλղੳͷͨΊͷ౷ܭϞσϦϯάೖ໳ • ୈ3ষΑΓ • y: छࢠ਺ • x: ২෺ͷαΠζ

    • f: ࢪං͋Γ(T) / ͳ͠ 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ df <- read.csv('data3a.csv') head(df) ## y x f ## 1 6 8.31 C ## 2 6 9.44 C ## 3 6 9.50 C ## 4 12 9.07 C ## 5 10 10.16 C ## 6 4 8.32 C
  50. 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ formulas <- list(mod1 = y ~ x, mod2

    = y ~ f, mod3 = y ~ x + f) results <- purrr::map(formulas, ~ glm(formula = ., family = poisson, data = df)) results[[1]] Call: glm(formula = .x, family = .y, data = df) Coefficients: (Intercept) x 1.29172 0.07566 Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 GPSNVMBͷϦετΛ࡞੒ ֤GPSNVMBͰ ϞσϧΛ࡞੒
  51. 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ map(results, logLik) ## x = list 3 (2128

    bytes) ## . mod1 = double 1( logLik )= -235.39 ## . A nobs = integer 1= 100 ## . A df = integer 1= 2 ## . mod2 = double 1( logLik )= -237.63 ## . A nobs = integer 1= 100 ## . A df = integer 1= 2 ## . mod3 = double 1( logLik )= -235.29 ## . A nobs = integer 1= 100 ## . A df = integer 1= 3 ݁Ռ͢΂ͯʹ MPH-JLؔ਺Λద༻
  52. 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ map(results, AIC) ## x = list 3 (544

    bytes) ## . mod1 = double 1= 474.77 ## . mod2 = double 1= 479.25 ## . mod3 = double 1= 476.59 sort_by(results, AIC) %>% names() ## [1] "mod1" "mod3" "mod2" ݁Ռ͢΂ͯʹ "*$ؔ਺Λద༻ "*$ͷॱʹฒ΂ସ͑ɺ Ϟσϧ໊Λදࣔ
  53. 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱ formulas <- c(y ~ x, y ~ x,

    y ~ f) families <- c(poisson, gaussian, poisson) results2 <- purrr::map2(formulas, families, ~ glm(formula = .x, family = .y, data = df)) results2[[1]] Call: glm(formula = .x, family = .y, data = df) Coefficients: (Intercept) x 1.29172 0.07566 Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 89.51 Residual Deviance: 84.99 AIC: 474.8 ෳ਺ύϥϝʔλͷ৔߹͸ NBQ
  54. 4. ϓϩοτ plot(results[[1]], which = 2) map(results, ~ plot(., which

    = 2)) ݁Ռͷ͏ͪͻͱͭΛ 22ϓϩοτ ͢΂ͯΛ22ϓϩοτ ॱ൪ʹදࣔ͞ΕΔ αϒϓϩοτ͚ͨ͠Ε͹QBS
  55. ͓·͚

  56. {purrr} ͱ૊Έ߹Θͤͯ ಛʹ͏Ε͍͠ύοέʔδ

  57. {caret} • ػցֶशશ෦ೖΓ library(mlbench) library(caret) data(PimaIndiansDiabetes) control <- trainControl(method =

    "repeatedcv", number = 10, repeats = 3) methods <- c('gbm', 'rpart', 'svmRadial') trained <- methods %>% purrr::map(~ train(diabetes ~ ., data = PimaIndiansDiabetes, method = ., trControl = control)) resampled <- resamples(trained) summary(resampled) ## Call: ## summary.resamples(object = resampled) ## ## Models: Model1, Model2, Model3 ## Number of resamples: 30 ## ## Accuracy ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## Model1 0.6883 0.7435 0.7662 0.7682 0.7915 0.8442 0 ## Model2 0.6711 0.7273 0.7451 0.7499 0.7785 0.8312 0 ## Model3 0.6883 0.7403 0.7662 0.7678 0.7922 0.8312 0 ## ## Kappa ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## Model1 0.3037 0.4297 0.4763 0.4711 0.5268 0.6578 0 ## Model2 0.2339 0.3657 0.4078 0.4245 0.4733 0.6128 0 ## Model3 0.2518 0.4043 0.4610 0.4607 0.5138 0.6165 0 ൺֱ͍ͨ͠ϞσϧΛϦετʹ͠ɺ ֤ϞσϧΛUSBJO
  58. {broom} • ϞσϧΛ data.frame ʹม׵͢Δ library(broom) glance(results[[1]]) ## null.deviance df.null

    logLik AIC BIC deviance df.residual ## 1 89.50694 99 -235.3863 474.7725 479.9828 84.993 98 ؔ਺ ֓ཁ UJEZ Ϟσϧͷཁ໿ BVHNFOU Ϟσϧͷ৘ใΛݩͷEBUBGSBNFʹ෇༩͢Δ HMBODF Ұߦͷཁ໿ • 3. ෳ਺Ϟσϧͷ࡞੒ͱൺֱͷ݁ՌΛྲྀ༻
  59. {broom} • ϞσϧΛ data.frame ʹม׵͢Δ map(results, broom::glance) %>% dplyr::bind_rows() ##

    Source: local data frame [3 x 7] ## ## null.deviance df.null logLik AIC BIC deviance df.residual ## (dbl) (int) (dbl) (dbl) (dbl) (dbl) (int) ## 1 89.50694 99 -235.3863 474.7725 479.9828 84.99300 98 ## 2 89.50694 99 -237.6273 479.2545 484.4649 89.47501 98 ## 3 89.50694 99 -235.2937 476.5874 484.4029 84.80793 97 ෳ਺ͷϞσϧͷཁ໿Λ EBUBGSBNFʹ
  60. {ggfortify} • ϞσϧΛ ggplot2::autoplot Λར༻ͯ͠ඳը͢Δ • {purrr} ͰϞσϧͷϦετΛ࡞Δ -> autoplot

    library(ggfortify) autoplot(results, which = 2)
  61. ·ͱΊ • {purrr} ͸ؔ਺ܕϓϩάϥϛϯάͷͨΊͷπʔϧ • ॲཧΛγϯϓϧͳؔ਺ͷ૊Έ߹ΘͤͰهड़ • ؔ਺͸ύΠϓԋࢉࢠͰ઀ଓ

  62. Enjoy!