Upgrade to Pro — share decks privately, control downloads, hide ads and more …

データ分析入門 / tokupon-ds2022

データ分析入門 / tokupon-ds2022

2022年7月9日に行われたとくぽんAI塾「データ分析入門」のスライドです。

テキスト: http://uribo.github.io/tokupon_ds/
リポジトリ: https://github.com/uribo/tokupon_ds

Uryu Shinya

July 09, 2022
Tweet

More Decks by Uryu Shinya

Other Decks in Education

Transcript

  1. ӝੜਅ໵ ͱ͘ΆΜ"*क़ σʔλ෼ੳೖ໳ ಙౡେֶσβΠϯܕ"*ڭҭݚڀηϯλʔ

  2. ຊ୊ʹೖΔલʹʜ IUUQTVSJCPHJUIVCJPUPLVQPO@ET 8FC্ʹڭࡐɺԋश؀ڥΛ༻ҙ͍ͯ͠·͢ 🚨 ະ׬੒Ͱ͢ 🙇 2 8FCϒϥ΢βͰɹɹΛࢼͤ·͢ ىಈʹ͕͔͔࣌ؒΔ͜ͱ͕͋Γ·͢ IUUQTCJUMZO.,Y

  3. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  4. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  5. σʔλɺάϥϑʹ᱐͞Εͳ͍Ͱ ͜ͷάϥϑΛݟͯԿΛࢥ͔ͬͨͳʁ ಙౡݝ͸શࠃͰͲͷ͘Β͍ʹͳΔ͔ͳʁ 5

  6. σʔλɺάϥϑʹ᱐͞Εͳ͍Ͱ ಉ͡σʔλΛ࢖͍ͬͯͯ΋ ਎௕͕DN͔Β࢝·͍ͬͯͳ͍ Կഒ΋͕ࠩ͋ΔΑ͏ʹݟ͑Δ άϥϑͷ࡞ΓํɺݟͤํͰ ༩͑Δҹ৅͕มΘΔ ԣ࣠ͷ஋Λ͔Β࢝ΊΔ ౎ಓ෎ݝؒͰͷ 6 ฏۉ਎௕ʹେ͖ͳࠩ͸ͳ͍

  7. ਎ͷճΓͷσʔλɺάϥϑʹ໨Λ޲͚Α͏ ໌೔ ೔ ͷબڍಛ൪Λબڍ݁Ռͱ߹ΘͤͯݟͯΈΑ͏ ͲΜͳάϥϑ͕࢖ΘΕ͍ͯΔ͔ σʔλͷݟͤํʹಛผͳҙਤؚ͕·Ε͍ͯͳ͍͔ ϝσΟΞ͕ൃ৴͢Δ৘ใ͕ਅ࣮ͱ͸ݶΒͳ͍ ΢αΪͱΞώϧͷࡨ֮ ࡞ऀෆ໌ύϒϦοΫυϝΠϯ8JLJNFEJB$PNNPOTΑΓ IUUQTDPNNPOTXJLJNFEJBPSHXJLJ'JMF,BOJODIFO@VOE@&OUFTWH

    7
  8. σʔλΛਓ͕ؒར༻Ͱ͖Δܗʹม׵ɾॲཧΛߦ͏͜ͱͰɺର৅ʹ͍ͭͯͷཧղ΍༧ଌΛ໨ࢦ͢खଓ͖ σʔλ෼ੳͱ͸Կ͔ σʔλ data  ൑அ΍ཱ࿦ͷ΋ͱʹͳΔࢿྉɾ৘ใɾࣄ࣮Šʰεʔύʔେࣙྛʱ ࣮ࡏ͔Β৘ใΛநग़͠ɺූ߸Խ͢Δ ؍ଌ ࣮ݧ ௐࠪ σʔλԽ

    ཧղɾ༧ଌ 8
  9. σʔλΛཁ໿͢Δ͜ͱ σʔλ෼ੳͷ໨త σʔλͷҙຯɺσʔλؒͷؔ܎Λઆ໌͢Δ͜ͱ ৽ͨʹಘΒΕΔσʔλʹର͢Δ༧ଌΛߦ͏͜ͱ Α͏΍͘ 9

  10. σʔλ෼ੳͷ໨తσʔλͷཁ໿ σʔλ෼ੳͰѻ͏σʔλ͸๲େʢ਺ඦʙ਺ेສ݅ʣ ͜ΕΒͷσʔλͷ಺༰Λ੔ཧ͠ɺ؆ܿʹ఻͑Δ͜ͱ͕ٻΊΒΕΔ ୅ද஋ʹΑΔσʔλͷू໿ σʔλՄࢹԽ ͹Β͖ͭͷࢦඪͷܭࢉʹΑΔ෼෍ͷਪఆ ώετάϥϜ ശώήਤ ฏۉ஋ ࠷খ஋ɾ࠷େ஋

    ඪ४ภࠩ ෼ࢄ ਓ͕ؒॲཧͰ͖Δ਺஋ͷ਺ʹ͸ݶΓ͕͋Δ ࣸਅͷڕͷମ௕͸ʁ σʔλ෼ੳͷख๏ 10
  11. σʔλ෼ੳͷ໨తσʔλͷઆ໌ σʔλ͕΋ͭҙຯɺͦͷഎܠΛ୳Δ ෳ਺ͷσʔλΛൺֱ͠ɺͦͷؔ܎ੑΛ໌Β͔ʹ͢Δ ؔ܎ͷ਺஋Խ άϥϑɺදʹΑΔදݱ Ϋϩεूܭද ࢄ෍ਤ ૬ؔ܎਺ ڞ෼ࢄ σʔλ෼ੳͷख๏

    11
  12. σʔλ෼ੳͷ໨తະ஌ͷσʔλ΁ͷ༧ଌ طଘͷσʔλͱσʔλͷؔ܎ੑΛઆ໌͢ΔϞσϧʹΑΓɺະ஌ͷσʔλ͕ಘΒΕͨ৔߹ͷ༧ଌΛߦ͏ ճؼϞσϧ ෼ྨϞσϧ σʔλ෼ੳͷख๏ 12 ମͷ෦Ґ͔Βछ໊Λਪఆ ମͷҰ෦෦Ґ͔Βଞͷ෦ҐͷαΠζΛਪఆ

  13. σʔλ෼ੳͷͨΊͷϓϩάϥϛϯάݴޠ3 13 ୭΋͕ࣗ༝ʹѻ͑ΔΦʔϓϯιϑτ΢ΣΞ ౷ܭղੳάϥϑΟοΫεΞϓϦέʔγϣϯ։ൃจܳతϓϩάϥϛϯάػցֶशɹͳͲ༻్͸͞·͟· ౷߹։ൃ؀ڥͰ͋Δ34UVEJPͷػೳ͕๛෋ 1ZUIPOͱฒͼɺੈքతʹ΋޿͘࢖ΘΕΔϓϩάϥϛϯάݴޠ IUUQTCJUMZO.,Y ԋश؀ڥ͸ͪ͜Β͔Β ιʔείʔυ͕ϑΝΠϧʹ࢒Γɺ෼ੳ݁Ռͷ࠶ݱɺ࢖͍ճ͕͠؆୯

  14. ಡΈࠐΈ ੔ܗ Ճ޻ ՄࢹԽ Ϟσϧ ఻ୡ (BSSFUUBOE)BEMFZ  Λݩʹ࡞੒ σʔλ෼ੳͷखॱ

    14
  15. σʔλ෼ੳͷྺ࢙ίϨϥͷྲྀߦʹର͢Δδϣϯɾεϊ΢ͷ׆༂ ੈلϩϯυϯͰະ஌ͷӸපͱͯ͠ίϨϥ͕ྲྀߦ ೥ʹδϣϯɾεϊ΢͕࡞੒ͨ͠ΰʔϧσϯɾεΫΤΞͷϒϩʔυɾ ετϦʔτपลʹ͓͚Δࢮ๢ऀͷঢ়ଶΛࣔ͢஍ਤύϒϦοΫυϝΠϯ IUUQTDPNNPOTXJLJNFEJBPSHXJLJ'JMF4OPXDIPMFSBNBQKQH 15 δϣϯɾεϊ΢͸ɺ஍ݩॅຽΒ΁ͷฉ͖ࠐΈௐࠪ౳Λߦ͍ɺ ࠷ऴతʹίϨϥͷൃੜݯ͕ɺਫಓϙϯϓͰ͋Δͱಛఆ ױऀ͕࢖༻͍ͯͨ͠ҪށਫͷҐஔͱɹɹɹɹ ҪށਫΛڙڅ͢Δਫಓձࣾʹ͍ͭͯ෼ੳ

    ໰୊ͱͳΔҪށΛಛఆͨ͠ΓɺਫಓձࣾͷൺֱΛ࣮ࢪ ਫͷར༻Λఀࢭͤ͞Δ͜ͱͰҰ෦ͷ஍ҬͰ ίϨϥΛ཈͑Δ͜ͱʹ੒ޭ ίϨϥͷྲྀߦ͸੔උ͞ΕͨԼਫಓʹΑΓ 
 ޮ཰తʹ޿͕ͬͯൃੜ͍ͯͨ͠ʢͷͪʹ൑໌ʣ
  16. σʔλ෼ੳͷྺ࢙φΠνϯήʔϧͷ౷ܭʹجͮ͘ҩྍӴੜվֵ φΠνϯήʔϧ͸೥͔Β೥ͷؒʹൃੜ͍ͯͨ͠ΫϦϛΞઓ૪ʹ͓͍ͯɺ 16 ϑϩʔϨϯεɾφΠνϯήʔϧʹΑΔ͘͞ͼܗάϥϑɻ IUUQTDPNNPOTXJLJNFEJBPSHXJLJ'JMF/JHIUJOHBMFNPSUBMJUZKQH ύϒϦοΫυϝΠϯ ͘͞ͼͷҰͭҰ͕݄ͭΛද͢ɻ͘͞ͼͷதʹࢮҼʹ͍ͭͯͷͭͷঢ়ଶΛදݱ ࢮҼͰଟ͍ͷ͸ෛইͰ͸ͳ࣬͘පʹΑΔ΋ͷɻ ઓ৔Ͱෛইͨ͠ฌ࢜ͷ؃ޢͱӴੜ໘ͷվળʹऔΓ૊Ή ઓ૪ऴྃޙɺઓ૪ࢮऀͷݪҼΛ෼ੳதʹɺɹɹ

    ઓಆͰෛͬͨই͕ݪҼͰ๢͘ͳΔฌ࢜ΑΓ΋ɺ ෛইޙʹԿΒ͔ͷەʹײછͨ͠ӨڹͰපؾͱɹ ͳΓࢮ๢͢Δฌ࢜ͷ΄͏͕ѹ౗తʹଟ͍͜ͱΛ ໌Β͔ʹͨ͠
  17. σʔλ෼ੳͷྺ࢙ΤΠϒϥϋϜɾ΢ΥʔϧυͷੜଘऀόΠΞε 17 ੺ؙ͍͕ଛইՕॴ .BSUJO(SBOEKFBO WFDUPS .D(FEEPO QJDUVSF $BNFSPO.PMM DPODFQU $$#:4"

    8JLJNFEJB$PNNPOTΑΓ IUUQTDSFBUJWFDPNNPOTPSHMJDFOTFTCZTB ୈೋ࣍ੈքେઓதɺ೚຿͔Β໭ͬͨػମ͕ड͚ͨ ଛইՕॴΛ෼ੳ Ͳ͜Λิڧ͢Δͷ͕ద੾ͩΖ͏͔
  18. ౴͑߹Θͤ ΤΠϒϥϋϜɾ΢Υʔϧυ͸ܸ௢͞Εͨരܸ ػ͕෼ੳʹؚ·Ε͍ͯͳ͍͜ͱΛࢦఠ 18 ؼؐͨ͠ػମ͕ଛইΛड͚͍ͯͳ͍ՕॴΛ ิڧ͢ΔΑ͏ʹࢦࣔ ੺ؙ͍Ͱࣔ͢Օॴ͸ଛইΛड͚ͯ΋҆શʹ ؼؐͰ͖Δ৔ॴͱͯ͠ߟ͑ͨ΋ͷ

  19. ߟ͑ͯΈΑ͏ ਓͷΫϥεͰߦΘΕͨςετʢ఺ຬ఺ʣͷฏۉ఺͕఺Ͱͨ͠ɻ ͜ͷͱ͖ɺ఺਺͕఺ͩͬͨਓ͸Ϋϥεͷ্Ґਓͷதʹؚ·ΕΔͰ͠ΐ͏͔ɻ ޙ΄Ͳ౴͑߹ΘͤΛ͠·͢ 19

  20. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  21. σʔλͷछྨ ม਺ʜڞ௨ͷख๏ʹΑͬͯಘΒΕͨ஋ɻର৅ʹΑͬͯ਺஋͕มԽ͢Δ஋Λҙຯ͢Δ ྫ͑͹ɺ ΁Μ͢͏ ಈ෺ͷମॏɺಈ෺ͷ෼ྨ܈ɺಈ෺ԂͷདྷԂऀ਺    ৯೑ྨ ௗྨ

    ৯೑ྨ    ྔతม਺ ࣭తม਺ ྔతม਺ ࿈ଓม਺ ཭ࢄม਺ σʔλΛه࿥͢Δਫ਼౓ʹΑͬͯখ਺఺ҎԼͷ஋͕มΘΔ ͱΓಘΔ஋͕ҰఆͷִؒʹΑΓόϥόϥ ྔతม਺͸଍ͨ͠ΓׂͬͨΓͱ͍͏ԋࢉ͕ Ͱ͖Δ͚Ͳ࣭తม਺Ͱ͸ͦΕ͕Ͱ͖ͳ͍Α 21
  22. σʔλϑϨʔϜσʔλΛදܗࣜͰ·ͱΊͯදݱͨ͠΋ͷ ಈ෺ʹ͍ͭͯͷ෼ྨ܈ͱ໊শʢछ໊ʣɺମ௕ͱମॏͷͭͷม਺Λه࿥ ৯೑ྨ ྶ௕ྨ ྶ௕ྨ     

     Ϩοαʔύϯμ νϯύϯδʔ Ϛϯτώώ ৯೑ྨ ௗྨ ϥΠΦϯ ϑϯϘϧτϖϯΪϯ     σʔλ෼ੳͰ͸σʔλϑϨʔϜͷܗࣜͰσʔλΛѻ͏ͷ͕Ұൠత 22
  23. σʔλϑϨʔϜͷಡΈํ ෼ྨ܈ ৯೑ྨ ྶ௕ྨ ྶ௕ྨ     

     Ϩοαʔύϯμ νϯύϯδʔ Ϛϯτώώ ৯೑ྨ ௗྨ ϥΠΦϯ ϑϯϘϧτϖϯΪϯ     ମॏ LN ମ௕ DN छ໊ ྻͷ໊લͱͯ͠ม਺໊͕ه࿥͞ΕΔ ߦ ྻ ৯೑ྨ   Ϩοαʔύϯμ ෼ྨ܈ ৯೑ྨ ྶ௕ྨ ྶ௕ྨ ৯೑ྨ ௗྨ ؍ଌର৅ʹ͍ͭͯͷ͢΂ͯͷม਺ͷ஋ΛؚΉ ม਺ͷதʹશσʔλͷ஋ΛؚΉ 23
  24. ίʔεͰొ৔͢Δσʔλ ϖϯΪϯσʔλʜQFOHVJOT ಈ෺σʔλʜEG@[PP 24 ೆۃେ཮ʹੜҭ͢ΔϖϯΪϯͷେ͖͞ʹ͍ͭͯͷ؍ଌσʔλ ͱ͘͠·ಈ෺ԂͰࣂҭ͞ΕΔಈ෺ͷମͷେ͖͞ͱମॏ ೥݄೔࣌఺ͷ৘ใΛ΋ͱʹ࡞੒ छͷಈ෺ʹ͍ͭͯͷ໊শͱ෼ྨ܈ɺ ମͷେ͖͞ʢମ௕DNʣͱମॏʢLHʣΛ8JLJQFEJBͷϖʔδ ͔Βඥ෇͚ͯ࡞੒

  25. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  26. 㲔 Ͳ͏΍ͬͯσʔλΛཁ໿͢Δ͔ source("data-raw/zoo.R") df_zoo$body_length_cm #> [1] 63.5 100.0 64.0 110.0

    85.0 66.0 80.0 168.0 134.0 250.0 130.0 175.0 #> [13] 31.0 NA 1.2 250.0 35.0 69.0 NA NA 40.0 NA ܽଛ஋ ԿΒ͔ͷཧ༝ʹΑΓσʔλ͔Βܽམͨ͠஋ هड़౷ܭྔ σʔλՄࢹԽ ਤදΛ༻͍ͨཁ໿ ਺஋ʹΑΔཁ໿ σʔλʹؚ·ΕΔ਺஋͕Ґஔ͢Δͱ͜Ζʹ͍ͭͯେ·͔ʹ܏޲Λ೺Ѳ͢Δ ୅ද஋ ͹Β͖ͭ σʔλʹؚ·ΕΔ਺஋શମ͕Ͳͷఔ౓όϥͭ͘ͷ͔Λ೺Ѳ͢Δ ώετάϥϜ ശώήਤ ౓਺෼෍ද 26
  27. ୅ද஋ฏۉ஋ σʔλʹؚ·ΕΔ஋Λ͢΂ͯ଍͠߹Θͤͯɺσʔλͷ਺Ͱׂͬͨ஋ ⚠ฏۉ஋Λѻ͏ͱ͖ͷ஫ҙ🚨 ฏۉ஋͸ඞͣ͠΋σʔλͷਅΜதΛࣔ͢஋Ͱ͸ͳ͍ ฏۉ஋͸֎Ε஋ͷӨڹΛड͚΍͍͢ ฏۉ஋ 27 1 3 5

    7 10 x <- c(1, 10, 5, 3, 7) (1 + 10 + 5 + 3 + 7) / length(x) #> [1] 5.2 # mean()ؔ਺Λ༻͍ͯฏۉ஋Λܭࢉ͠·͢ɻ mean(x) #> [1] 5.2
  28. ୅ද஋தԝ஋ σʔλʹؚ·ΕΔ਺ͷਅΜதͱͳΔ஋ # xͷ਺஋͸େ͖͞ͷॱ൪ʹͳ͍ͬͯͳ͍ͷͰฒͼସ͑Δ sort(x) #> [1] 1 3 5

    7 10 sort(x)[3] #> [1] 5 median(x) #> [1] 5 # σʔλͷݸ਺͕ۮ਺ͷ৔߹ͷதԝ஋ͷٻΊํ x <- c(1, 2, 4, 6) # ਅΜதͷ྆ྡͷ஋ͷฏۉ஋Λதԝ஋ͱ͢Δ median(x) #> [1] 3 தԝ஋ ۮ਺ͷ৔߹ தԝ஋ 28 1 3 5 7 10 1 2 4 6
  29. quantile(penguins$flipper_length_mm, na.rm = TRUE) #> 0% 25% 50% 75% 100%

    #> 172 190 197 213 231 தԝ஋Λ֦ுͨ͠ߟ͑ํʜ࢛෼Ґ఺ σʔλΛ஋ͷখ͍͞ॱʹฒͼସ͑ͨͱ͖ɺσʔλશମΛۉ౳ͳ਺͔ΒͳΔͭͷάϧʔϓʹ෼͚Δ ͜ͷͱ͖ͷάϧʔϓΛ෼͚Δͭͷ఺ʢ஋ʣΛ࢛෼Ґ఺ͱ͍͏ ୈ࢛෼Ґ఺ ୈ࢛෼Ґ఺ ୈ࢛෼Ґ఺ தԝ஋ σʔλͷؚ͕·ΕΔ σʔλͷؚ͕·ΕΔ σʔλͷؚ͕·ΕΔ 29
  30. x <- c(5, 1, 3, 5, 10, 5, 3, 7)

    # ࠷ස஋ΛٻΊ·͢ names(which(table(x) == max(table(x)))) #> [1] "5" ୅ද஋࠷ස஋ σʔλʹؚ·ΕΔ஋ͷதͰ࠷΋ଟ͍஋ ࠷ස஋ 30 1 3 3 5 5 5 7 10
  31. σʔλͷ͹Β͖ͭ ୅ද஋͚ͩͰ͸୅ද஋Ҏ֎ͷ஋ʹ͍ͭͯઆ໌Ͱ͖ͳ͍ ࠷ස஋ σʔλ͕ͲͷΑ͏ʹ෼෍͢Δ͔Λ͹Β͖ͭʹΑͬͯௐ΂Δ ಉ͡୅ද஋Ͱ͋ͬͯ΋σʔλͷ෼෍͸ҟͳΔ 31

  32. σʔλͷ͹Β͖ͭൣғ ࠷ස஋ ࠷খ஋ɾ࠷େ஋ͷൣғ x <- c(5, 1, 3, 5, 10,

    5, 3, 7) range(x) #> [1] 1 10 min(x) #> [1] 1 max(x) #> [1] 10 32
  33. c(0, 0, 0, 0, 0) c(1, 2, 3, 2, 1)

    c(1, 100, 5, 8, 1) c(1, 6, 40, 56, 1) σʔλͷ͹Β͖ͭ෼ࢄWBSJBODF ֤஋͕ฏۉ஋Λத৺ͱͯ͠ͲͷΑ͏ʹࢄΒ͹͍ͬͯΔ͔Λࣔ͢ ฏۉ஋ ྫ ϖϯΪϯͷ֤ݸମͷମ௕ʹ͍ͭͯ શൠతʹۉҰͳ஋ʁ ಛఆͷݸମ͕ฏۉ஋ΑΓ΋ಛஈߴ͍ɾ௿͍ʁ ମ௕͕ߴ͍ݸମͱ௿͍͕όϥόϥʁ σʔλͷ෼෍ʹ͍ͭͯ۩ମతͳઆ໌͕Ͱ͖ΔΑ͏ʹ ॎ๮͸ฏۉ஋Λࣔ͢ 33
  34. ෼ࢄͷٻΊํ ภࠩΛ৐͢Δ ม਺ͷ֤஋ͱฏۉ஋ͷࠩΛٻΊΔʢภࠩʣ ม਺ͷฏۉ஋Λग़͢ ͢΂ͯͷ஋ʹର͔ͯ͠ΒΛ܁Γฦ͠ɺ߹ܭ͢Δ ߹ܭͨ͠஋Λσʔλͷ਺ͰׂΔ 34 ΁Μ͞

  35. ෼ࢄΛࢉग़ͯ͠ΈΑ͏ ϖϯΪϯσʔλͷ͏ͪɺΞσϦʔϖϯΪϯͷ಄ͷମॏ CPEZ@NBTT@H ʹ͍ͭͯߟ͑Δ library(palmerpenguins) library(dplyr) df <- penguins |>

    filter(species == "Adelie") |> select(body_mass_g) |> filter(!is.na(body_mass_g)) |> slice_head(n = 5) df #> # A tibble: 5 × 1 #> body_mass_g #> <int> #> 1 3750 #> 2 3800 #> 3 3250 #> 4 3450 #> 5 3650 35
  36. ෼ࢄΛࢉग़ͯ͠ΈΑ͏ 36 ภࠩΛ৐͢Δ ภࠩΛٻΊΔ ม਺ͷฏۉ஋Λग़͢ ͢΂ͯͷ஋ʹର͔ͯ͠ΒΛ܁Γฦ͠ɺ߹ܭ͢Δ ߹ܭͨ͠஋Λσʔλͷ਺ͰׂΔ df <- df

    |> # ֤஋ʹ͍ͭͯภࠩ deviationʢฏۉΑΓ΋͍͘Βେ͖͍͔খ͍͔͞ʣΛٻΊΔ mutate(deviation = body_mass_g - mean(df$body_mass_g, na.rm = TRUE)) df #> # A tibble: 5 × 2 #> body_mass_g deviation #> <int> <dbl> #> 1 3750 170 #> 2 3800 220 #> 3 3250 -330 #> 4 3450 -130 #> 5 3650 70 ਖ਼ͷ஋ͱෛͷ஋ͷ྆ํ͕ࠞ͟Δ ߹ܭ͢ΔͱʹͳΔ ภࠩͷಛ௃ ෛͷ஋Ͱ΋৐͢Δͱਖ਼ͷ஋ʹͳΔ
  37. 37 ภࠩΛ৐͢Δ ภࠩΛٻΊΔ ม਺ͷฏۉ஋Λग़͢ ͢΂ͯͷ஋ʹର͔ͯ͠ΒΛ܁Γฦ͠ɺ߹ܭ͢Δ ߹ܭͨ͠஋Λσʔλͷ਺ͰׂΔ df <- df |>

    mutate(deviation2 = deviation^2) df #> # A tibble: 5 × 3 #> body_mass_g deviation deviation2 #> <int> <dbl> <dbl> #> 1 3750 170 28900 #> 2 3800 220 48400 #> 3 3250 -330 108900 #> 4 3450 -130 16900 #> 5 3650 70 4900 sum(df$deviation2) / nrow(df) #> [1] 41600 ෼ࢄΛࢉग़ͯ͠ΈΑ͏ var(df$body_mass_g) #> [1] 52000 3ͷඪ४ؔ਺Ͱ෼ࢄΛٻΊΔ ˞σʔλͷ਺ͰׂΔෆภ෼ࢄ
  38. σʔλͷ͹Β͖ͭඪ४ภࠩTUBOEBSEEFWJBUJPO ඪ४ภࠩͷٻΊํʜ෼ࢄʹ͍ͭͯฏํࠜΛٻΊΔ ෼ࢄΛٻΊͨͱ͖ʹ৐ͨ͠΋ͷΛݩʹ໭ͨ͢Ί ฏํࠜΛར༻͢Δཧ༝ ৐͢Δͱ୯Ґ͕มΘΔ΋ͷͷӨڹΛऔΓআ͘ DNŠDN? TRSU DN? ŠDN 38

  39. ෼෍Λࢹ֮Խ͢Δ౓਺෼෍ද ͋Δ஋͕σʔλʹؚ·ΕΔ਺ʜ౓਺·ͨ͸ස౓ Ͳ͢͏ ͻΜͲ ౓਺ͷ෼෍Λදܗࣜʹ·ͱΊͨ΋ͷʜ౓਺෼෍ද ಈ෺σʔλͷ෼ྨ܈Λ౓਺Ͱදݱͯ͠ΈΑ͏ df_zoo$taxon #> [1] "৯೑ྨ"

    "ௗྨ" "৯೑ྨ" "ௗྨ" "ྶ௕ྨ" "ྶ௕ྨ" #> [7] "ྶ௕ྨ" "৯೑ྨ" "ᴩࣃྨ" "৯೑ྨ" "ௗྨ" "ۮఙྨ" #> [13] "৯೑ྨ" "৯೑ྨ" "ௗྨ" "৯೑ྨ" "ྶ௕ྨ" "ௗྨ" #> [19] "ܵۮఙྨ" "حఙྨ" "ᴩࣃྨ" "ܵۮఙྨ" ͜ͷਤͰ͸ྶ௕ྨ͸ 39
  40. ෼෍Λࢹ֮Խ͢Δ౓਺෼෍ද df_zoo$taxon #> [1] "৯೑ྨ" "ௗྨ" "৯೑ྨ" "ௗྨ" "ྶ௕ྨ" "ྶ௕ྨ"

    #> [7] "ྶ௕ྨ" "৯೑ྨ" "ᴩࣃྨ" "৯೑ྨ" "ௗྨ" "ۮఙྨ" #> [13] "৯೑ྨ" "৯೑ྨ" "ௗྨ" "৯೑ྨ" "ྶ௕ྨ" "ௗྨ" #> [19] "ܵۮఙྨ" "حఙྨ" "ᴩࣃྨ" "ܵۮఙྨ" 40
  41. ෼෍Λࢹ֮Խ͢Δ౓਺෼෍ද ྔతม਺ʹରͯ͠౓਺෼෍දΛ࡞੒͢Δͱ͖͸ ม਺͕ͱΓಘΔ஋Λ͍͔ͭ͘ͷ۠ؒʹ෼ׂͨ͠֊ڃ DMBTT Λߟ͑Δ 41 ஋͕ݶఆతͳ཭ࢄม਺ αΠίϩͷग़໨ͳͲ ஋Λ֊ڃͱͯ͠௚઀༻͍Δ ಈ෺ͷମॏͳͲ

    ֤౓਺ʹؚ·ΕΔ۠ؒͷ෯Λ֊ڃ෯ͱ͍͏ ֊ڃ෯΍֊ڃ਺͸σʔλͷൣғΛݟܾͯΊΔ ࿈ଓม਺ ద౰ͳൣғΛ֊ڃʹ༻͍Δ weight_freq <- table(cut(penguins$body_mass_g, breaks = seq(2000, 7000, by = 1000), dig.lab = 4)) tibble::tibble( class = names(weight_freq), frequency = weight_freq)
  42. penguins |> ggplot(aes(body_mass_g)) + # ώετάϥϜͰ͸பͷ֊ڃΛϏϯ bin ͱݺͼ·͢ geom_histogram(bins =

    5) + ylab("Frequency") + xlab("Body mass (g)") + labs(title = "ϖϯΪϯͷମॏͷώετάϥϜ") ෼෍Λࢹ֮Խ͢ΔώετάϥϜ ౓਺෼෍දΛ΋ͱʹάϥϑΛ࡞੒ ֊ڃ͝ͱʹபΛઃ͚ɺபͷߴ͞Ͱ౓਺Λදݱ 42 பͱபͷؒʹܺؒΛ࡞Βͳ͍ ʢ๮άϥϑͱ͸ҟͳΔ఺ʣ
  43. ෼෍ͷܗ͍Ζ͍Ζ ώετάϥϜͷ֊ڃ਺͕ҟͳΔͱ෼෍ͷܗ΋มԽ͢Δ͜ͱ͕͋Δ 43

  44. ෼෍ͷܗ͍Ζ͍Ζ σʔλͷ͹Β͖ͭʹԠͯ͡σʔλͷ෼෍΋ҟͳΔ 44 ӈʹ੄ʢ৲ඌʣ͕௕͍෼෍ʜϩϯάςʔϧܕ ୅ද஋͕খ͍͞ํ͔Β࠷ස஋ɺதԝ஋ɺฏۉ஋ͷॱʹฒͿ

  45. ෼෍Λࢹ֮Խ͢Δശώήਤ ʮശʯͱʮώήʯΛ࢖ͬͯσʔλͷ෼෍Λදݱ͢Δάϥϑ ࢛෼Ґ఺ɺ֎Ε஋ͷ৘ใ΋ՄࢹԽ͢Δ͜ͱ͕Ͱ͖Δ 45

  46. 㲔 ෼෍Λࢹ֮Խ͢Δശώήਤ ෳ਺σʔλͷ͹Β͖ͭΛൺֱ͢Δࡍʹ΋༗ޮ ശώήਤͰ͸σʔλͷࢄΒ͹Γ͕খ͍͞৔߹ʹ͸খ͘͞ͳΓɺٯʹࢄΒ͹Γ͕େ͖͍࣌ʹ͸େ͖͘ͳΔ 46 df_zoo |> filter(!is.na(body_length_cm)) |> group_by(taxon)

    |> mutate(body_length_median = median(body_length_cm)) |> ungroup() |> mutate(taxon = forcats::fct_reorder(taxon, body_length_median)) |> ggplot(aes(taxon, body_length_cm, color = taxon)) + geom_boxplot() + coord_flip() + scale_colour_tokupon() + guides(color = "none") + labs(title = "ಈ෺σʔλͷ෼ྨ܈͝ͱͷମ௕ͷശώήਤ")
  47. 47 ౴͑߹Θͤ ਓͷΫϥεͰߦΘΕͨςετʢ఺ຬ఺ʣͷฏۉ఺͕఺Ͱͨ͠ɻ ͜ͷͱ͖ɺ఺਺͕఺ͩͬͨਓ͸Ϋϥεͷ্Ґਓͷதʹؚ·ΕΔͰ͠ΐ͏͔ɻ # Ϋϥεதͷ40ਓͷςετͷ఺਺ʢ఺਺ॱʣ x #> [1] 16

    24 27 31 32 32 33 33 36 36 37 38 39 40 40 42 43 43 43 44 44 45 46 46 48 #> [26] 50 50 52 52 53 54 65 62 66 70 75 73 82 88 89 mean(x) # Ϋϥεͷฏۉ఺ #> [1] 47.975 median(x) # Ϋϥεͷ఺਺ͷதԝ஋ #> [1] 44 x[1:20] #> [1] 16 24 27 31 32 32 33 33 36 36 37 38 39 40 40 42 43 43 43 44 x[21:40] #> [1] 44 45 46 46 48 50 50 52 52 53 54 65 62 66 70 75 73 82 88 89
  48. ม਺ͷؔ܎Λௐ΂Δ ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ άϥϑͷ࡞੒ ·ͱΊ

  49. σʔλ෼ੳʹ͓͚Δͭͷؔ܎ ෳ਺ͷม਺͕ͱ΋ʹมԽ͢Δঢ়ଶ σʔλ෼ੳͰ͸ɹɹɹɹɹɹͱɹɹɹɹɹɹͷͭͷؔ܎Λѻ͏ʢࣅͯඇͳΔ΋ͷʣ 49 ͦ͏͔Μ ૬ؔؔ܎ ҼՌؔ܎ ҼՌؔ܎ ͋Δग़དྷࣄ΍෺ࣄ͕ݪҼͱͳͬͯɺผͷग़དྷࣄ΍෺ࣄʢ݁Ռʣ͕ى͜Δ΋ͷ ٖࣅ૬ؔ

    ؍ଌ͞Ε͍ͯͳ͍ୈࡾͷཁҼʹΑͬͯ૬ؔؔ܎͕ҼՌؔ܎ͷΑ͏ʹݟ͑Δ΋ͷ ૬ؔؔ܎ ͋Δग़དྷࣄ΍෺ࣄͱผͷग़དྷࣄ΍෺ࣄͷؒʹؔ܎͕͋Δ΋ͷ ͋Δਫಓձࣾͷར༻ΛࢭΊΔ ਫಓΛར༻͍ͯͨ͠஍ҬͷίϨϥױऀ͕ݮΔ Ұਓ౰ͨΓͷνϣίϨʔτͷফඅྔ͕૿͑Δ ϊʔϕϧ৆ड৆ऀ͕૿͑Δ Ұਓ౰ͨΓͷ(%1͕૿͑Δ ϖϯΪϯݸମͷཌྷͷ௕͞ ϖϯΪϯݸମͷͪ͘͹͠ͷ௕͞
  50. ૬ؔ ͭͷม਺ؒͰى͜Δ܏޲ ͱ ͷؔ܎ 50 ͱͷؔ܎ ؔ܎͕ݟΒΕͳ͍ ࢄ෍ਤͱͯ͠άϥϑ্ʹՄࢹԽ͢Δ͜ͱͰ܏޲Λ೺Ѳ͠΍͘͢ͳΔ

  51. ؔ܎ͷ਺஋Խ ؔ܎ͷڧ͞Λ਺஋Խ͢Δ͜ͱͰผͷม਺ͱͷൺֱ΋ՄೳʹͳΔ 51 ਖ਼ͷ૬ؔؔ܎ʹ͋Δ͜ͱ͸Θ͔Δ͚Ͳɺ ͦͷؔ܎ͷڧ͞͸Θ͔Βͳ͍Α

  52. ؔ܎ͷ਺஋Խڞ෼ࢄDPWBSJBODF 52 ͭͷม਺ YͱZ ʹ͍ͭͯͷڞ෼ࢄ͸࣍ͷΑ͏ʹٻΊΒΕΔ ෼ղͯ͠ߟ͑ͯΈΑ͏ ม਺Y Z ͷ஋͔Βม਺Y Z

    ͷฏۉ஋ΛҾ͘ ภࠩ ภࠩͷੵ
  53. ؔ܎ͷ਺஋Խڞ෼ࢄDPWBSJBODF σʔλͷJ൪໨͔Β Oʢ͢΂ͯͷσʔλʣ·ͰӈͷॲཧΛߦ͍ɺͦΕΛ଍͠߹ΘͤΔ ม਺Yͱม਺Zͷ֤஋ʹରͯ͠ภࠩΛٻΊɺͦΕΛֻ͚߹Θͤͨ΋ͷΛ଍͢ O σʔλ਺ ͰׂΔ

  54. ؆୯ͳσʔλͰڞ෼ࢄΛܭࢉ 54 df <- df |> mutate(across(everything(),.fns = mean, .names

    = "{.col}_mean")) |> rowwise() |> mutate(flipper_length_deviation = flipper_length_mm - flipper_length_mm_mean, bill_length_deviation = bill_length_mm - bill_length_mm_mean) |> mutate(deviation_cross = flipper_length_deviation * bill_length_deviation) |> ungroup()         
  55. ؆୯ͳσʔλͰڞ෼ࢄΛܭࢉ 55 # ϖϯΪϯσʔλ͔Β2݅෼ΛऔΓग़ͯ͠ڞ෼ࢄΛٻΊ·͢ df <- penguins |> slice_head(n =

    2) |> select(flipper_length_mm, bill_length_mm) df #> # A tibble: 2 × 2 #> flipper_length_mm bill_length_mm #> <int> <dbl> #> 1 181 39.1 #> 2 186 39.5
  56. ڞ෼ࢄͷಛ௃ 56 ஋͕େ͖͍΄Ͳม਺ͷؔ܎͕ڧ͍͜ͱΛࣔ͢ ୹ॴʜม਺ͷ୯Ґʹґଘͯ͠஋͕มΘΔ df_mm <- penguins |> select(flipper_length_mm, bill_length_mm)

    |> purrr::set_names(c("flipper_length", "bill_length")) cov(df_mm$flipper_length, df_mm$bill_length, use = "complete.obs") #> [1] 50.37577 df_cm <- df_mm |> transmute(across(everything(), .fns = ~ .x / 10)) cov(df_cm$flipper_length, df_cm$bill_length, use = "complete.obs") #> [1] 0.5037577 ϛϦϝʔτϧͷͱ͖ ηϯνϝʔτϧͷͱ͖ 3ͷඪ४ؔ਺Ͱ෼ࢄΛٻΊΔ ˞σʔλͷ਺ͰׂΔෆภڞ෼ࢄ
  57. ؔ܎ͷ਺஋Խ૬ؔ܎਺ ڞ෼ࢄΛ֤ม਺ͷඪ४ภࠩͷੵͰׂΔ͜ͱͰࢉग़͞ΕΔ ڞ෼ࢄͷ୯Ґґଘͷ໰୊Λղফ͢Δࢦඪ 57 ͔Β·Ͱͷ஋ΛͱΔɻม਺ͷؔ܎͕ڧ͍΄Ͳઈର஋͕ʹۙͮ͘ cor(penguins$flipper_length_mm, penguins$bill_length_mm, use = "complete.obs")

    #> [1] 0.6561813 cor(df_cm$flipper_length, df_cm$bill_length, use = "complete.obs") #> [1] 0.6561813
  58. ΞϯείϜͷྫ σʔλՄࢹԽͷॏཁੑΛઆ໌͢Δྫ هड़౷ܭྔ΍૬ؔ܎਺͕΄΅ಉ͡஋Ͱ͋ͬͯ΋ɺத਎ͷσʔλ͕ҟͳΔ͜ͱΛࣔ͢ 58 ΞϯείϜͷྫͱͯࣔ͠͞ΕΔσʔλ YͱZ YͱZ YͱZ YͱZ ͷϖΞͰ౷ܭྔɺ૬ؔ܎਺Λग़͢ͱ͍ͣΕͷϖΞͰ΋΄΅ಉ͡஋ʹͳΔ

    ࢄ෍ਤΛඳ͍ͯΈΔͱʜ
  59. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  60. ๮άϥϑ 60 σʔλͷେখΛ๮ͷߴ͞Ͱදݱ͢Δάϥϑ ෳ਺ͷ߲໨ؒͰͷ஋ͷҧ͍Λൺֱ͢Δͷʹద͢Δ ஫ҙ ߲໨ͷฒͼ ๮ͷߴ͞͸ݪ఺͔Β։࢝

  61. ๮άϥϑΛվળͯ͠ΈΑ͏ 61 ͜ͷάϥϑͷΑ͘ͳ͍఺͸Ͳ͔͜ͳ मਖ਼͢Δͱͨ͠ΒͲ͜Λม͑Α͏͔ df_zoo |> count(taxon) |> mutate(prop =

    n / sum(n) * 100) |> ggplot(aes(x = "", y = prop, fill = taxon)) + geom_bar(stat = "identity", width = 1) + scale_fill_tokupon() + coord_polar("y")
  62. ๮άϥϑΛվળͯ͠ΈΑ͏ 62 มߋ఺ ߲໨ͷฒͼ ԣ͔ΒॎʹೖΕସ͑ ஋͕େ͖͍΋ͷ͔Βฒ΂Δ df_zoo |> filter(!is.na(body_length_cm)) |>

    ggplot(aes(forcats::fct_reorder(name, body_length_cm), body_length_cm, fill = taxon)) + geom_bar(stat = "identity") + scale_fill_tokupon() + coord_flip() + xlab(NULL) + ylab("ମ௕ (cm)") + labs(title = "ͱ͘͠·ಈ෺ԂͰࣂҭ͞ΕΔಈ෺ͷඪ४తͳମ௕")
  63. ԁάϥϑ 63 άϥϑʹඳ͍ͨԁͷதʹσʔλͷׂ߹Λද͢άϥϑ ԁશମͰͷߏ੒ɻσʔλશମΛ઎ΊΔ಺༁Λදݱ͢Δͷʹద͢Δ ஫ҙ ߲໨ͷى఺͸࣌ܭͷ࣌ͷҐஔ શମͰͱͳΔׂ߹Λѻ͏͜ͱ σʔλؒͷൺֱʹ͸ద͞ͳ͍ɻσʔλ಺Ͱͷ૬ରతͳൺֱ͸0, มߋ఺ ׂ߹ͷେ͖͞ͷॱʹදࣔ

    ׂ߹ͷগͳ͍߲໨Λ·ͱΊΔ ʮͦͷଞʯͱͯ͠දࣔ
  64. ࠓ೔ͷ಺༰ σʔλ෼ੳͱ͸Կ͔ σʔλͷछྨͱදݱํ๏ σʔλͷಛ௃Λଊ͑Δ ม਺ͷؔ܎Λௐ΂Δ άϥϑͷ࡞੒ ·ͱΊ

  65. ࢀߟจݙɾ63- 65 w ೔ຊ౷ܭֶձ2020σʔλͷ෼ੳ೔ຊ౷ܭֶձެࣜೝఆ౷ܭݕఆڃରԠվగ൛౦ژਤॻ w ౢాਖ਼࿨ɺѨ෦ਅਓ20173ͰֶͿ౷ܭֶೖ໳౦ژԽֶಉਓ w ಺ా੣ҰΒ2021ڭཆͱͯ͠ͷσʔλαΠΤϯεߨஊࣾ w ߐ࡚و༟2020෼ੳऀͷͨΊͷσʔλղऍֶೖ໳σʔλͷຊ࣭ΛͱΒ͑Δٕज़ιγϜ

    w ࣎լେֶσʔλαΠΤϯεֶ෦௕࡚େֶ৘ใσʔλՊֶ෦ڞฤ2022σʔλαΠΤϯεͷา͖ํֶज़ਤॻग़൛ࣾ w ஛಺܆2014౷ܭͷׂ͸΢ιੈքʹ͸ͼ͜Δʮ਺ࣈτϦοΫʯΛݟഁΔٕज़ಙؒॻళ w ౦ژେֶڭཆֶ෦౷ܭֶڭࣨฤ1991جૅ౷ܭֶ ౷ܭֶೖ໳ ౦ژେֶग़൛ձ w ੢಺ܒ2013౷ܭֶ͕࠷ڧͷֶ໰Ͱ͋ΔσʔλࣾձΛੜ͖ൈͨ͘Ίͷ෢ثͱڭཆμΠϠϞϯυࣾ w ΩʔϥϯɾώʔϦʔ ӝੜਅ໵ ߐޱ఩࢙ ࡾଜڤੜ༁ 2021σʔλ෼ੳͷͨΊͷσʔλՄࢹԽೖ໳ߨஊࣾ w Ѩ෦ਅਓ2021౷ܭֶೖ໳σʔλ෼ੳʹඞਢͷ஌ࣝɾߟ͑ํԾઆݕఆ͔Β౷ܭϞσϦϯά·ͰॏཁτϐοΫΛ׬શ໢ཏιγϜ w দຊ݈ଠ࿠2017άϥϑΛͭ͘ΔલʹಡΉຊҰॠͰ఻ΘΔදݱ͸ͲͷΑ͏ʹੜ·Εͨͷ͔ٕज़ධ࿦ࣾ w ΞϧϕϧτɾΧΠϩ ༅Ҫਅ੅༁ 2020άϥϑͷ΢ιΛݟഁΔٕज़ϚΠΞϛେֶϏδϡΞϧɾδϟʔφϦζϜߨ࠲μΠϠϞϯυࣾ w ϚΠέϧɾϑϨϯυϦʔϋϫʔυɾ΢ΣΠφʔ ൧ౢوࢠ༁ 2021σʔλࢹ֮Խͷਓྨ࢙άϥϑͷൃ໌͔Β࣌ؒͱۭؒͷՄࢹԽ·Ͱ੨౔ࣾ w 4UFWFO44LJFOB ௕ඌߴ߂༁ 2020σʔλαΠΤϯεઃܭϚχϡΞϧΦϥΠϦʔɾδϟύϯ