tidyverse tutorial 1

9c42c4bc1d91c409d754da88c91cb2ef?s=47 kur0cky
September 20, 2019

tidyverse tutorial 1

tidyverse 超入門
講義用

9c42c4bc1d91c409d754da88c91cb2ef?s=128

kur0cky

September 20, 2019
Tweet

Transcript

  1. σʔλղੳͱલॲཧ .ࠇ໦༟ୋ !FEUVTBDKQ

  2. ໨࣍  σʔλղੳͱ͸  3ͱ34UVEJP  3ͷجຊ  ϞμϯͳσʔλϑϨʔϜૢ࡞ !2

  3. ຊ೔࢖༻͢Δσʔλ TUBSXBST w ελʔ΢Υʔζͷొ৔ਓ෺ʹؔ͢Δσʔλ IUUQTXBQJDP  qJHIUT w ೥ʹ-(" +',

    &83Λग़ൃͨ͢͠΂ͯͷϑϥΠτͷఆࠁσʔλ XFBUIFS w -(" +', &83ͷఱީ΍෩ͷ৘ใ ࣌ؒ͝ͱ  BJSMJOFT w ߤۭձࣾͷςʔϒϧ !3
  4. σʔλղੳͱ͸

  5. σʔλղੳͱ͸ !5 6OEFSTUBOE *NQPSU 0VUQVU4IBSF

  6. σʔλղੳͱ͸ !6 *NQPSU 0VUQVU4IBSF 7JTVBMJ[F .PEFMMJOH *OUFSQSFU 5SBOTGPSN

  7. 5PPMT ͨ͘͞Μ͋Δ w ి୎ w .JDSPTPGU&YDFM w *#.4144 w 4"4

    w 1ZUIPO w 3 !7
  8. 5PPMT ޷͖ʹબ΂͹ྑ͍ w ޷Έ ྲྀߦΓ څྉ ਎ۙʹಘҙͳਓ͕͍Δ FUD !8 #JH%BUB

    ػցֶश ౷ܭղੳ ϋϯυϦϯά 42- ˕ ✕ ✕ ˓ 3 ˚ ˓ ˕ ˕ 1ZUIPO ˚ ˕ ˓ ˓ &YDFM ✕ ✕ ✕ ✕ ి୎ ✕ ✕ ✕ ✕
  9. 3ͱ34UVEJP

  10. 3 ಘҙͳͷͰ3Λ঺հ͠·͢ w ΠϯλϓϦλݴޠ 㲗ίϯύΠϥݴޠ  ௕ॴ w 044 w

    ༷ʑͳ౷ܭղੳɾػցֶशϥΠϒϥϦ w 34UVEJPࣾͷଘࡏ ୹ॴ w ஗͍ ಺෦͕$ ͳͲͰॻ͔Ε͍ͯΔͱ଎͍ !10
  11. 34UVEJPΛ࢖͓͏ w 3ʹಛԽͨ͠౷߹։ൃ؀ڥ *%&  w ΤσΟλͱͯ͠༏लͳ͚ͩͰͳ͘ ༷ʑͳ֦ுػೳ w ๛෋ͳγϣʔτΧοτ

    w (JUͱͷ࿈ܞ w ແྉ !11
  12. 34UVEJPͷը໘  ΤσΟλ  ίϯιʔϧ  ΦϒδΣΫτ ཤྺͳͲ  ϓϩοτ

    ϔϧϓͳͲ !12 ᶃ ᶄ ᶅ ᶆ
  13. ىಈ͠Α͏ w ىಈ w ഑෍ͨ͠&YFSDJTF31SPKΛ্ཱͪ͛Δ w 3Λಈ͔ͯ͠ΈΑ͏ w ίϯιʔϧͰplot(iris) w

    ύοέʔδΛΠϯετʔϧͯ͠ΈΑ͏ ͕͔͔࣌ؒΔͷͰ஫ҙ  • install.packages(“tidyverse”) w εΫϦϓτΛ࡞ͬͯΈΑ͏ w 'JMF/FX'JMF34DSJQU !13
  14. 31SPKFDUͱ͸ w ϓϩδΣΫτ୯ҐͰ෼ੳΛ؅ཧ͢Δ͘͠Έ w Ұ࿈ͷղੳͰඞཁʹͳΔϑΝΠϧ܈Λ·ͱΊͯѻ͏ w ࡞ۀσΟϨΫτϦ͕ͦ͜ʹͳΔ w ଞਓͱͷڞ༗͕͠΍͍͢ w

    'JMF/FX1SPKFDUͰ࡞੒ !14
  15. ؀ڥΛઃఆ͠Α͏ w 5PPMT(MPCBM0QUJPOT͔Β؀ڥઃఆ w (FOFSBMɿνΣοΫΛશͯ֎ͦ͏ w 3FTUPSFQSFWJPVTMZPQFOTPVSDFEPDVNFOUBUTUBSUVQ
 ͸࢒͓͍ͯͯ͠ྑ͍ w $PEF4BWJOHɿΤϯίʔσΟϯάΛ65'ʹ

    w "QQFBSBODFɿ޷͖ͳݟͨ໨ʹઃఆ͠Α͏ !15
  16. 3ͷجຊ

  17. جຊ w GPSจ JGจ ؔ਺ ͦͷଞʜ w ଞͷݴޠͱ͋·ΓมΘΒͳ͍ͷͰ ͻͭΑ͏ʹͳͬͨΒௐ΂ ͍ͯͩ͘͞

    w ഑ྻͷࢀর͸͸͡·Γʂʂ !17
  18. 3Ͱͷσʔλܕ Ұͭͷ஋ w -PHJDBM #PPMFBO  w *OUFHFS w %PVCMF

    w $PNQMFY w $IBSBDUFS w 'BDUPS w FUDʜ !18 ෳ਺ͷ஋ w "UPNJD7FDUPS w .BUSJY w %BUB'SBNF w -JTU w FUDʜ
  19. "UPNJD7FDUPS w ̍࣍ݩ഑ྻ w ཁૉ͸શͯಉ͡ܕ શͯJOUFHFS શͯDIBSBDUFS ͳͲ  w

    ཁૉͷશͯΛ·ͱΊͯॲཧͰ͖Δʢ࢛ଇԋࢉͱ͔ʣ w c()Ͱ࡞Δ !19
  20. "UPNJD7FDUPS !20 drink <- c(“beer”, “sake”, “whisky”) # ୅ೖ drink

    # ΦϒδΣΫτͷݺͼग़͠ price <- c(480, 700, 850) # ਺஋ܕϕΫτϧ favorite <- c(TRUE, TRUE, TRUE) # ࿦ཧܕϕΫτϧ
  21. .BUSJY w ̎࣍ݩ഑ྻ w ཁૉ͸શͯಉ͡ܕ w ߦྻԋࢉ͕Ͱ͖Δʢ಺ੵͱ͔ʣ w ཁૉͷશͯΛ·ͱΊͯॲཧͰ͖Δʢ࢛ଇԋࢉͱ͔ʣ w

    matrix()Ͱ࡞Δ !21
  22. -JTU w ̍࣍ݩ഑ྻ w ཁૉ͸ͳΜͰ΋͍͍ 7FDUPSͷ֦ு  w list()Ͱ࡞Δ w

    ࣗ༝౓͕ΊͬͪΌߴ͍ !22
  23. %BUB'SBNF ͓ͳ͡Έͷ࢛͍֯ςʔϒϧ • ֤ྻ͸ಉ͡௕͞ͷ Atomic vector w data.frame()Ͱ࡞Δ w data.frame(drink

    = drink, 
 price = price, 
 favorite = favorite) w ༷ʑͳύοέʔδ͕%BUB'SBNFΛத৺ʹ࡞ΒΕ͍ͯΔ !23
  24. %BUB'SBNFʹ৮Ζ͏ starwars <- read.csv("data/starwars.csv", stringsAsFactors = FALSE, fileEncoding = “UTF-8”)

    head(starwars) # ઌ಄֬ೝ tail(starwars) # ຤ඌ֬ೝ summary(starwars) # هड़౷ܭྔ str(starwars) # ֤ྻͷܕ֬ೝ !24
  25. ϞμϯͳσʔλϑϨʔϜૢ࡞

  26. ʮ5PPMʹਫ਼௨͢Δʯͱ͍͏͜ͱ ྉཧʹྫ͑Δͱ w แஸ΍ίϯϩΛ࢖͑ΔΑ͏ʹͳΖ͏ w Ϩγϐ͸దٓݟΕ͹Α͍ ʮര଎ͰσʔλΛѻ͑Δʯͱ͍͏͜ͱ w ๲େͳࢼߦࡨޡΛ܁ΓฦͤΔ w

    ࣌ؒ͸༗ݶ ຊ࣭తͰͳ͍࡞ۀ͸ͬ͞͞ͱऴΘΒͤͯ
 ҿΈʹग़͔͚Α͏ݚڀ͠Α͏ !26
  27. 5JEZWFSTF ֓೦ w 3Ͱͷ༷ʑͳૢ࡞ *NQPSU &YQPSU 5SBOTGPSN 7JTVBMJ[BUJPO FUD ͕


    ౷ҰతͳΠϯλʔϑΣʔεͰग़དྷͨΒ ૉఢͩΑͶ ύοέʔδ w ্هΛ࣮ݱ͢ΔͨΊͷύοέʔδ܈ w install.packages(“tidyverse”) ͰΠϯετʔϧ w )BEMFZ8JDLIBN 34UVEJPࣾ ͕த৺ͱͳΓ։ൃ w ௒ศར !27 ˞)BEMFZ8JDLIBN3քͷਆ
  28. 5JEZWFSTF !28

  29. 5JEZWFSTF !29 ಛʹ͜ΕΒ

  30. library(tidyverse)

  31. %BUB'SBNFͷجຊૢ࡞ EQMZS w ม਺ ྻ ͷநग़ w ؍ଌ ߦ ͷநग़

    w ؍ଌ ߦ ͷฒͼସ͑ w ৽ͨͳม਺ ྻ ͷ࡞੒ w ूܭ w άϧʔϓԽ !31 • select() • filter() • arrange() • mutate() • summarise() • group_by()
  32. ࢖͍ํ w ୈҾ਺ʹ͸σʔλϑϨʔϜΛ༩͑Δ w ୈҾ਺Ҏ߱Ͱ͸ྻ໊ΛΫΦʔςʔγϣϯແ͠Ͱ༩͑Δ w ໭Γ஋͸৽ͨͳσʔλϑϨʔϜ !32

  33. ΍ͬͯΈΑ͏ select(starwars, name, gender, species) filter(starwars, species == "Human", height

    <= 170) mutate(starwars, BMI = mass / (height/100)^2) arrange(starwars, gender, height) summarise(starwars, mean_mass = mean(mass, na.rm = TRUE), mean_height = mean(height, na.rm = TRUE)) grouped <- group_by(starwars, species) summarise(grouped, mean_mass = mean(mass, na.rm = TRUE), mean_height = mean(height, na.rm = TRUE), count = n()) !33
  34. %>%

  35. ύΠϓԋࢉࢠ%>% X %>% f X %>% f(y) X %>% f

    %>% g X %>% f(y, .) !35 f(X) f(X, y) g(f(X)) f(y, X) લͷؔ਺ͷग़ྗΛ࣍ͷؔ਺ͷୈҾ਺ʹΘͨ͢΋ͷ $NE 4IJGU N $USM 4IJGU N Ͱೖྗ
  36. ෳ਺ͷॲཧΛ͢Δ৔߹ df1 <- filter(starwars, species == "Human") df2 <- mutate(d1,

    BMI = mass / (height/100)^2) df3 <- group_by(df2, gender) df4 <- summarise(df3, mean_BMI = mean(BMI, na.rm=TRUE), min_BMI = min(BMI, na.rm=TRUE), max_BMI = max(BMI, na.rm=TRUE) !36 # A tibble: 2 x 4 gender mean_BMI min_BMI max_BMI <chr> <dbl> <dbl> <dbl> 1 female 22.0 16.5 27.5 2 male 26.0 21.5 37.9
  37. ෳ਺ͷॲཧΛ͢Δ৔߹ starwars %>% filter(species == "Human") %>% mutate(BMI = mass

    / (height/100)^2) %>% group_by(gender) %>% summarise(mean_BMI = mean(BMI, na.rm=TRUE), min_BMI = min(BMI, na.rm=TRUE), max_BMI = max(BMI, na.rm=TRUE)) %>%Λ࢖͏͜ͱͰ ୭ʹͰ΋ಡΈ΍͍͢ίʔυʹʂʂ !37
  38. ࿅श໰୊  ਎௕ͷ࠷΋௿͍உੑΛ֬ೝͤΑ  #.*ͷ΋ͬͱ΋ߴ͍ొ৔ਓ෺͸୭͔  ฏۉ਎௕͕࠷΋ߴ͍छ଒͸Կ͔
  ݕࡧͯͦ͠ͷ࢟Λ͔֬ΊΑ͏ !38

  39. ࣍ճ·Ͱͷ՝୊

  40. ՝୊ 1. ൃۭߓ (origin) ͝ͱͷඈߦػศͷ਺, ඈߦڑ཭ͷฏۉ, ग़ൃ࣌ࠁ஗Ԇͷ ฏۉΛٻΊΑ 2. ೔෇͝ͱͷग़ൃ࣌ࠁ஗ԆͷฏۉΛٻΊΑ

    3. ೔෇͝ͱʹܽߤʹͳͬͨศͷ਺Λௐ΂Α
 ʢܽߤͩͱdep_delayͱarr_delay͕NAʹͳΔʣ 4. ࣮ࡍʹඈΜͩศͷඈߦڑ཭ͷඪ४ภࠩΛग़ൃۭߓ͝ͱʹٻΊ, ঢॱʹ ฒ΂Α 5. ೔෇͝ͱʹ, ۭߓLGA͔Β࠷ॳʹඈΜͩศͱ࠷ޙʹඈΜͩศΛநग़ͤΑ 6. ࣮ࡍʹඈΜͩศͷ͏ͪ, ఆࠁ௨Γग़ൃͨ͠ศͷׂ߹Λௐ΂Α ώϯτɿdplyrͷؔ਺ͷதͰ n() Λ࢖͏ͱߦ਺Λࢉग़Ͱ͖Δ !40