tidyverse tutorial 2

9c42c4bc1d91c409d754da88c91cb2ef?s=47 kur0cky
September 27, 2019

tidyverse tutorial 2

tidyverse 超入門 2
講義用資料

9c42c4bc1d91c409d754da88c91cb2ef?s=128

kur0cky

September 27, 2019
Tweet

Transcript

  1. σʔλղੳͱલॲཧᶘ .ࠇ໦༟ୋ !FEUVTBDKQ

  2. ໨࣍  3FWJFX&YFSDJTF  +PJO  5JEZ%BUB !2

  3. ຊ೔࢖༻͢Δσʔλ TUBSXBST w ελʔ΢Υʔζͷొ৔ਓ෺ʹؔ͢Δσʔλ IUUQTXBQJDP  qJHIUT w ೥ʹ-(" +',

    &83Λग़ൃͨ͢͠΂ͯͷϑϥΠτͷఆࠁσʔλ XFBUIFS w -(" +', &83ͷఱީ΍෩ͷ৘ใ ࣌ؒ͝ͱ  BJSMJOFT w ߤۭձࣾͷςʔϒϧ !3
  4. 3FWJFX&YFSDJTF

  5. %BUB'SBNFͷجຊૢ࡞ EQMZS w ม਺ ྻ ͷநग़ w ؍ଌ ߦ ͷநग़

    w ؍ଌ ߦ ͷฒͼସ͑ w ৽ͨͳม਺ ྻ ͷ࡞੒ w ूܭ w άϧʔϓԽ !5 • select() • filter() • arrange() • mutate() • summarise() • group_by()
  6. ࢖͍ํ w ୈҾ਺ʹ͸σʔλϑϨʔϜΛ༩͑Δ w ୈҾ਺Ҏ߱Ͱ͸ྻ໊ΛΫΦʔςʔγϣϯແ͠Ͱ༩͑Δ w ໭Γ஋͸৽ͨͳσʔλϑϨʔϜ %>%ͱ߹Θͤͯര଎σʔλϋϯυϦϯάʂʂ !6

  7. ԋश qJHIUTσʔλʹؔͯ͠ɺҎԼͷ໰୊ʹ౴͑Α  ඈߦڑ཭͕࠷௕Ͱ͋Δศͷग़ൃ஍ͱ໨త஍͸Ͳ͔͜  ౸ண࣌ࠁͷ஗Ε͕ݦஶͳߤۭձࣾ͸Ͳ͔͜  ग़ൃ࣌ࠁͱ౸ண࣌ࠁͷ஗Ε͕ݦஶͳߤۭձࣾ͸Ͳ͔͜  Կ࣌ൃͷඈߦػ͕࠷΋ଟ͍͔

     ߤۭձࣾͷൟ๩ظ͸͍͔ͭ  શͯͷߦͰdep_time - sched_dep_time = dep_delayͱͳ͍ͬͯΔ͜ͱΛ֬ೝ ͤΑ !7 # ύοέʔδ͔ΒಡΈࠐΉ library(nycflights13) data(flights)
  8. +PJO

  9. +PJO ͭͷςʔϒϧΛ LFZΛ΋ͱʹ݁߹͢Δૢ࡞ w ʮֶੜͷݸਓ৘ใςʔϒϧʯ w ʮतۀͷ৘ใςʔϒϧʯ w ʮཤमɾ੒੷ςʔϒϧʯ LFZ

    w ʮֶੜʯ ʮ੒੷ʯɿLFZ͸ֶ੶൪߸ w ʮतۀʯ ʮཤमʯɿLFZ͸तۀ*% !9 ʮਓɾतۀɾ੒੷ͷςʔϒϧʯ
  10. +PJOͷछྨ w YͱZΛ+PJO͍ͨ͠ w ΋ͬͱ΋୯७ͳͷ͸ *OOFSKPJO w ॏෳ͢ΔLFZ͚ͩ࢒͢ !10 ग़యɿIUUQTSETIBEDPO[

  11. w -FGUKPJO w YͷLFZΛશͯ࢒͢ w 3JHIUKPJO w ZͷLFZΛશͯ࢒͢ w 'VMMKPJO

    w ྆ํͷLFZΛશͯ࢒͢ !11 ग़యɿIUUQTSETIBEDPO[
  12. **_join()ͷ࢖͍ํ inner_join(band_members, band_instruments,
 by = “name”) left_join(band_members, band_instruments2,
 by =

    c(“name” = “artist”)) !12 > band_members name band 1 Mick Stones 2 John Beatles 3 Paul Beatles > band_instruments name plays 1 John guitar 2 Paul bass 3 Keith guitar > band_instruments2 artist plays 1 John guitar 2 Paul bass 3 Keith guitar
  13. ࿅श໰୊  inner_join(), left_join(), right_join(), full_join()
 ͦΕͧΕͷग़ྗ݁ՌΛ༧૝͠ ࣮ࡍʹಈ͔ͯ֬͠ೝͤΑ  qJHIUTσʔλͱBJSMJOFTσʔλΛDBSSJFSྻͰ݁߹ͤΑ

     qJHIUTσʔλͱXFBUIFSσʔλΛPSJHJO ZFBS NPOUI EBZ IPVS ྻͰ݁߹ͤΑ !13
  14. 5JEZ%BUB

  15. UJEZEBUB ͖ͪΜͱͨ͠σʔλ ఆٛʢग़యɿIUUQTSETIBEDPO[ʣ w Ұͭͷྻʹ͸Ұͭͷม਺ BUPNJDWFDUPS  w Ұͭͷߦʹ͸Ұͭͷ؍ଌ w

    Ұͭͷηϧʹ͸Ұͭͷ஋ w ݸʑͷ؍ଌ͸શͯಉ͡ܗΛ͍ͯ͠Δ σʔλϑϨʔϜ͸্هΛຬͨ͢Α͏ʹ࡞Ζ͏ ˞ߦ໊ʢSPXOBNFTʣ͸࢖ΘͣʹJOEFY΍JEͷྻΛ࡞Ζ͏ !15
  16. NFTTZEBUB w Α͘ݟΔܗ w ਓؒʹ͸Θ͔Γ΍͍͢ ʮԣ࣋ͪܗʯ w Ұͭͷྻʹ͸Ұͭͷม਺˚ w Ұͭͷߦʹ͸Ұͭͷ؍ଌ✖

    w Ұͭͷηϧʹ͸Ұͭͷ஋̋ !16 ஍఺ 12࣌ 15࣌ 17࣌ ౦ژ ‗ ‘ ‘ ໊ݹ԰ ‗ ‗ ‘ େࡕ ‘ ‘ ‘ ྻ໊ ߦ໊
  17. NFTTZEBUB w Α͘ݟΔܗ w ਓؒʹ͸Θ͔Γ΍͍͢ ʮԣ࣋ͪܗʯ w Ұͭͷྻʹ͸Ұͭͷม਺˚ w Ұͭͷߦʹ͸Ұͭͷ؍ଌ✖

    w Ұͭͷηϧʹ͸Ұͭͷ஋̋ !17 ஍఺ 12࣌ 15࣌ 17࣌ ౦ژ ‗ ‘ ‘ ໊ݹ԰ ‗ ‗ ‘ େࡕ ‘ ‘ ‘ ஍఺ ࣌ࠁ ఱؾ
  18. UJEZEBUB w ղੳͰѻ͍΍͍͢ w ׳Εͳ͍͏ͪ͸ݟʹ͍͘ʁ ʮॎ࣋ͪܗʯ w Ұͭͷྻʹ͸Ұͭͷม਺̋ w Ұͭͷߦʹ͸Ұͭͷ؍ଌ̋

    w Ұͭͷηϧʹ͸Ұͭͷ஋̋ !18 ஍఺ ࣌ࠁ ఱؾ ౦ژ ࣌ ‗ ໊ݹ԰ ࣌ ‗ େࡕ ࣌ ‘ ౦ژ ࣌ ‘ ໊ݹ԰ ࣌ ‗ େࡕ ࣌ ‘
  19. NFTTZŠUJEZ !19  ྻ໊ʹͳͬͯ͠·͍ͬͯͨม਺໊   Λ
 ৽͍͠ZFBSͱ͍͏ม਺ʹ͢Δ

  20. UJEZŠNFTTZ !20 

  21. 3Ͱͷॎԣม׵ !21 ॎ࣋ͪ ԣ࣋ͪ spread() gather() gather(df, key = “ྻ໊ʹདྷ͍ͯͨม਺Λ֨ೲ͢Δ৽ͨͳม਺໊”,

    value = “ෳ਺ͷྻʹ·͕͍ͨͬͯͨม਺Λ·ͱΊΔ৽ͨͳม਺໊”, - ม׵ʹߟྀ͠ͳ͍ྻ໊) spread(df, key, value, fill = ޿͛ͨͱ͖ܽଌʹͳΔͱ͜ΖΛຒΊ͍ͨ஋)
  22. ࿅श໰୊  ҎԼͷίʔυͰTUPDLT ٖࣅతͳऩӹ཰σʔλ Λ࡞Γ  ॎ௕ʹͤΑ stocks <- data.frame(

    time = as.Date('2009-01-01') + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) )  ΋ͱʹ໭ͤ !22
  23. ࣍ճ·Ͱͷ՝୊

  24. ՝୊ 1. ࠷΋ؾԹ͕ߴ͍தग़ൃͨ͠ศΛ೺ѲͤΑ 2. ଌఆ͞Εͨσʔλͷ͏ͪɺϘʔΠϯάࣾͷඈߦػ͸ԿճඈΜͰ͍Δ͔ 3. ඈߦػʹ࠾༻͞Ε͍ͯΔΤϯδϯͷछྨ͝ͱʹɺ1ճ͋ͨΓͷฏۉඈ ߦڑ཭Λࢉग़ͤΑ 4. ୹ڑ཭

    or ௕ڑ཭ʹಛԽ͍ͯ͠Δߤۭձࣾ͸͋Δ͔ɻ͋ΔͳΒ͹ɺ൑அ ཧ༝΋ड़΂Αɻ 5. ౦ʹ޲͔ͬͯඈͿศͱ੢ʹ޲͔ͬͯඈͿศͷͲͪΒ͕ଟ͍͔ (ඈߦػ ͸໨త஍ʹ޲͔ͬͯ௚ਐ͢Δ΋ͷͱ͢Δ) 6. ग़ൃ࣌ͷ࣪౓ͱɺग़ൃͷ஗Ԇʹ૬ؔ͸͋Δ͔ !24
  25. Α͋͘Δ࣭໰ w σʔλαΠΤϯεͷԿָ͕͍͠ʁ w σʔλ͔Β஌ݟΛಘΔ ͱ͍͏खଓ͖͕ԿΑΓ΋ָ͍͠ ࢲݟ  w Ծઆɾݕূ͕ΩϨΠʹܾ·ͬͨͱ͖͕ؾ͍͍࣋ͪ

    w ೥ੜͷ͏ͪ͸ԿΛͨ͠Βྑ͍ʁ w جૅ ౷ܭֶ ࠷దԽ ઢܗ୅਺ FUD ΛΩϟονΞοϓ͢Δ࣌ؒ͸ࠓޙͳ͘ͳͬͯ ͍͘ w ڵຯͷ͋Δσʔλ ڝഅ εϙʔπ FUD Λର৅ʹ ෼ੳΛֶΜͰ͍͘ͷ΋ྑ͍͔ ΋ָ͠Ήͷ͕Ұ൪ w 3͕೉͍͠ w ؆୯΍ͦ͞͠͏ͳࢀߟॻΛݟͯΈΔͷ΋˕ !25