Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

Kshim
December 15, 2019

What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

This presentation is a summary of paper about appropriate usage of bootstrap method written by senior statistician at Google.

Kshim

December 15, 2019
Tweet

Other Decks in Programming

Transcript

  1. ໨తͱ֓ཁ 3 Introduction Microeconomics Statistics Econometrics Engel Laws ² δϟʔφϧ

    ੈքͰҰ൪ݖྗͷ͋Δ౷ܭֶձɿ"NFSJDBO4UBUJTUJDBM"TTPDJBUJPO ʹग़൛͞Εͨಛू l4QFDJBM*TTVFPO4UBUJTUJDTBOEUIF6OEFSHSBEVBUF$VSSJDVMVNzͷ࿦จ ૣݟઌੜͷΦεεϝͷ࿦จ ग़൛͞Εͨ͹͔ΓͳͷͰɼඃҾ༻਺ɻ ² ஶऀ 5JN)FTUFSCFSHάʔάϧͷγχΞ౷ܭϚϯ ² ໨తͱ֓ཁ ϒʔτετϥοϓ๏ʹؔ͢Δਖ਼͍͠࢖༻ํ๏ͷܒ໤ɻ ͲΜͳ࣌ʹ্ख͘ػೳͯ͠ɼͲΜͳ࣌ʹػೳ͠ͳ͍ͷ͔ͳͲɻ
  2.  7FSJ[POͷྫ 5 Introduction Idea of bootstrap Graphs Confidence interval

    ² ถࠃͷి࿩ઢͷγεςϜ *-&$ *ODVNCFOU-PDBM&YDIBOHF$BSSJFS ੲ͔Β͋Δి࿩ઢΤϦΞ಺ͷ؅ཧձࣾͷ૯শ $-&$ $PNQFUJUJWF-PDBM&YDIBOHF$BSSJFS ৽نࢀೖͷి࿩ΩϟϦΞձࣾͷ૯শ *-&$ͱڝ૪ Ø ݹࢀͰ͋Δ*-&$ͷి࿩ճઢΛ$-&$΋࢖༻͍ͯ͠Δɻ Ø $-&$ͷճઢ͕ΠΧΕͨ࣌ɼ*-&$͸ճઢΛ௚ٛ͢຿͕͋Δ ² 7FSJ[PO χϡʔϤʔΫͷ*-&$ ² 16$ 1VCMJD6UJMJUZ$PNNJTTJPO *-&$ 7FSJ[PO ͷमཧΛ؂ࢹ͢Δ૊৫ *-&$͸ࣗ෼ͷސ٬ͷճઢ͕ΠΧΕͨͱ͖ɼ͙͢ʹमཧ͢ΔΠϯηϯςΟϒ͕͋Δ͕ɼࣗ෼ͷ ճઢΛआΓͯΔ͚ͩͷ$-&$ͷސ٬ͷճઢ͕ΠΧΕͨ࣌ɼ͙͢ʹ௚͢ΠϯηϯςΟϒ͕ͳ͍ 16$͸͜ΕΒͭͷमཧͷεϐʔυ͕༗ҙʹ͕ࠩ͋Δ͔Λ؂ࢹ͍ͯ͠Δɻ ΋͠༗ҙਫ४ͰҎԼͰ͕ࠩ༗ҙͷ৔߹ɼ*-&$ 7FSJ[PO ͸ڊֹͷേۚΛ෷͏ඞཁ͕͋Δɻ 16$͸ɼ*-&$ͷސ٬ͷमཧͱ$-&$ͷސ٬ͷमཧͷ྆ํͷमཧ࣌ؒͷαϯϓϧΛऔͬͨɻ
  3.  7FSJ[POͷྫ 6 Introduction Idea of bootstrap Graphs Confidence interval

    ² औΕͨαϯϓϧ ໌Β͔ʹ*-&$ͷํ͕଎͍ ͭ·Γސ٬Λࠩผ͍ͯ͠Δ 1PTJUJWFTLFX ӈਤΑΓ ² ࠩΛݕఆ͢Δͭͷςετํ๏ ᶃ 1FSNVUBUJPO5FTU ϒʔτετϥοϓΈ͍ͨͳͷ  ᶄ Uݕఆ ैདྷ͸UݕఆΛߦ͍͕ͬͯͨɼ7FSJ[PO͕QFSNVUBUJPOUFTUΛߦ͏Α͏ఏҊͨ͠Β͍͠ɻ ² ݁Ռ ᶃͰ͸QʹͳΓɼ༗ҙࠩͳ͠േۚͳ͠ɻᶄͰ͸QͱͳΓɼ༗ҙࠩ͋Γേۚɻ ࣮ࡍͷਖ਼ղ͸ᶃɻࠩͷUݕఆ͸ɼͭͷ܈ͷඪຊ਺ͷ਺ʹ͕ࠩ͋Δ৔߹ɼਖ਼֬Ͱͳ͍Β͍͠ɻ ࢀߟIUUQTXXXEZOBDPNDPKQQSPEVDU@TFSWJDFQBDLBHFTTOQBMZ[FTB@U@QFSNVUBUJPOIUNM
  4. 0OF4BNQMF#PPUTUSBQ 7 Introduction Idea of bootstrap Graphs Confidence interval ²

    ϒʔτετϥοϓͷํ๏ Λී௨ͷαϯϓϧ͔Βಘͨ฼਺ͷਪܭ஋ͱ͠ɼ Λϒʔτετϥοϓ͔ΒಘΒΕͨ౷ܭྔɻ  Oݸͷαϯϓϧ͔Βɼ৽ͨʹOݸͷαϯϓϧΛॏෳΛڐ͠෮ݩநग़͢Δ  ͦͷ৽ͨͳOݸͷαϯϓϧͰ໨తͷ౷ܭྔ Λܭࢉ͢Δɻ  ͱΛ3ճ܁Γฦ͠ɼ Λ3ݸಘΔɻ͢Δͱϒʔτετϥοϓ෼෍͕ඳ͚Δɻ ² 3ݸͷ ͔ΒಘΒΕΔͭͷ΋ͷ  ඪ४ޡࠩ 4& ɿ  ૈѱͳ۠ؒɿϒʔτετϥοϓ෼෍ͷ۠ؒ  όΠΞεɿ ² 7FSJ[POͷCPPUTUSBQͷ݁Ռ v *-&$ O ਖ਼ن෼෍ͬΆ͍ͨΊɼUݕఆͰ͖ͦ͏ɻ v $-&$ O ࿪ΜͰΔͷͰUݕఆແཧɻ ࣮ࡍͷͱ͜Ζ͸Ͳ͏ͳͷ͔ʁ࣍ͷϖʔδ ࢀߟIUUQTXXXEZOBDPNDPKQQSPEVDU@TFSWJDFQBDLBHFTTOQBMZ[FTB@U@QFSNVUBUJPOIUNM ˆ ˆ ˆ ˆ i ˆ ˆ ˆ sb = 1 r 1 r i=1 (ˆ i ˆ i )
  5. ؒҧ͍΍͍֓͢೦ 8 Introduction Idea of bootstrap Graphs Confidence interval ໊শ

    ࣜɾه߸ උߟ ฼ฏۉ µ ඪຊฏۉ X = 1 n n i=1 Xi ฼෼ࢄ 2 ඪຊ෼ࢄ s2 = 1 n n i=1 (Xi X)2 ҰகਪఆྔͰͳ͍ ෆภ෼ࢄ s2 = 1 n 1 n i=1 (Xi X)2 ҰகਪఆྔͰ͋Δ ฼ඪ४ภࠩ ඪຊඪ४ภࠩ s = 1 n n i=1 (Xi X)2 ҰகਪఆྔͰͳ͍ ෆภඪ४ภࠩ s = 1 n 1 n i=1 (Xi X)2 ҰகਪఆྔͰ͋Δ ฼ඪ४ޡࠩ n (ඪຊ) ඪ४ޡࠩ s n Ұൠʹݴ͏ඪ४ޡࠩ (SE) ඪຊޡࠩ µ X (ྫ) ਅ஋ͱਪఆྔͷࠩ ภࠩ Xi X ϒʔτετϥοϓͰ͸ ͜ͷͭͷ۠ผ͕ඇৗ ʹॏཁ ฏۉ µɼ෼ࢄ 2 ͷ෼෍ʹै͏֬཰ม਺ X ͕͋Δɻ ͜ͷ෼෍͔Βநग़࣮ͨ͠ݱ஋Λ Xi ͱ͢Δɻ͜ͷ࣌ɼҰൠʹ X (µ, 2 n ) ͱͳΔɻ
  6. ϒʔτετϥοϓU෼෍ 9 Introduction Idea of bootstrap Graphs Confidence interval ²

    U஋ͷϒʔτετϥοϓ෼෍Λߟ͍͑ͨ ͜͜·Ͱ͸มྔ ฏۉ ͷਪఆྔͷ෼෍Λߟ͖͕͑ͯͨɼଟมྔͷ෼෍Λߟ͑Δඞཁ͕͋Δɻ ී௨ͷU஋͸͜ͷΑ͏ʹٻΊΒΕΔɻ Uͷ෼෍Λ஌Γ͍ͨΘ͚͕ͩɼВͱ4&͸ͦΕͧΕ෼෍Λ࣋ͭͨΊɼU͸ଟมྔͷ෼෍ͱͳΔɻ ϒʔτετϥοϓʹΑΓҎԼͷΑ͏ʹU ΛٻΊΔ͜ͱͰU ͷ෼෍͕Θ͔Γͦ͏ɻ Ұൠʹ ͸ ͱ ͕ಠཱͷ࣌ U෼෍ʹै͏͕ɼ ࠓճͷΑ͏ʹ฼ूஂͷ෼෍͕QPTJUJWFTLFXͷ৔߹ɼ͜ΕΒ ͱ ͕ਖ਼ͷ૬ؔΛ࣋ͭ Լਤ ɻ ͕ͨͬͯ͠U஋͸U෼෍ʹ͸ैΘͣɼUݕఆͰ͖ͳ͍ɻ ͭ·Γɼେඪຊͷ*-&$Ͱ͢ΒɼUݕఆ͸ߦ͑ͳ͍ɻ t = X µ s/ n X s X s
  7. ϒʔτετϥοϓͷΞΠσΞ 11 Introduction Idea of bootstrap Graphs Confidence interval ²

    ཧ૝తͳɼ౷ܭྔͷ෼෍ͷೖखํ๏ ʮ฼ूஂ͔ΒαϯϓϧΛநग़͠ฏۉΛͱΔʯΛԿ౓΋܁Γฦ͠ɼฏۉͷ෼෍Λ࡞Δ ² ݱ࣮తͳɼ౷ܭྔͷ෼෍ͷೖखํ๏ ʢϒʔτετϥοϓʣ ʮ฼ूஂ͔Βநग़ͨ͠ඪຊ͔ΒॏෳΛڐ͠·ͨநग़͠ฏۉΛͱΔʯΛ܁Γฦ͠ɼ෼෍Λ࡞Δ
  8. ϓϥάΠϯݪଇ جຊతͳϒʔτετϥοϓͷݪଇ 12 ² ϓϥάΠϯݪଇͱ͸ ະ஌ͷύϥϝʔλΛਪఆͯ͠ɼͿͪࠐΉ͜ͱɻ ྫ ฼਺ͷඪ४ภࠩМ͕Θ͔Βͳ͍ͱ͖ɼTΛਪఆ͠ɼМͷ୅ΘΓʹ༻͍Δɻ ² ϒʔτετϥοϓʹ͓͚ΔϓϥάΠϯ

    ฼ूஂ෼෍ શମͷ୅ΘΓʹɼඪຊ෼෍Λܦݧత෼෍ Λ༻͍Δɻ ² ϓϥάΠϯͨ͠ΒͲ͏ͳΔͷʁ ฼ूஂͷ୅ΘΓͱͯ͠ѻ͑ΔͨΊɼ৭ʑΘ͔Δɻ Λ฼ूஂͱݟͳ͢͜ͱͰɼલͷϖʔδͷ্ͷਤΈ͍ͨͳঢ়گͰ͋ΔͱΈͳͤΔɻ ˆ F F ˆ F Introduction Idea of bootstrap Graphs Confidence interval
  9.  ඪຊฏۉɿେඪຊ 14 Introduction Idea of bootstrap Graphs Confidence interval

    ฼ूஂ͔ΒͭͷαϯϓϧΛந ग़ ඪຊ਺͸ ࠨͷྻɿ฼ूஂ͔Βͷαϯϓϧ தͷྻɿฏۉͷCPPUTUSBQ෼෍ ӈͷྻɿ੨ͷαϯϓϧ͔Β3ม Խ  αϯϓϧ ৭ ʹΑͬͯɼ ϒʔτετϥοϓ෼෍ͷ MPDBUJPO͕มΘΔ  Ͳͷϒʔτετϥοϓ෼෍ Ͱ΋ܗ͸ࣅͯΔ  3Λ૿΍͢ͱϒʔτετϥο ϓ෼෍ͷܗ͕҆ఆ͢Δ
  10. ඪຊฏۉɿখඪຊ 15 ฼ूஂ͔ΒͭͷαϯϓϧΛந ग़ ඪຊ਺͸ ࠨͷྻɿ฼ूஂ͔Βͷαϯϓϧ தͷྻɿฏۉͷCPPUTUSBQ෼෍ ӈͷྻɿ੨ͷαϯϓϧ͔Β3ม Խ 

    αϯϓϧͷ෼෍ ࠨ ͕େ͖͘ ҧ͏ͨΊɼϒʔτετϥοϓ ෼෍ த ͷܗ΋େ͖͘ҧ͏ Introduction Idea of bootstrap Graphs Confidence interval
  11. தԝ஋ 16 ฼ूஂ͔ΒͭͷαϯϓϧΛந ग़ ඪຊ਺͸ ح਺ ࠨͷྻɿ฼ूஂ͔Βͷαϯϓϧ தͷྻɿฏۉͷCPPUTUSBQ෼෍ ӈͷྻɿTNPPUIFECPPUTUSBQ 

    ඪຊ෼෍ ࠨ ͸࿈ଓ͕ͩɼ ϒʔτετϥοϓ෼෍͸཭ࢄɻ ϒʔτετϥοϓඪຊதԝ஋ ͕αϯϓϧͷ஋ͷͲΕ͔Ұͭ ͔ͩΒ  ͜ͷΑ͏ͳগඪຊͷ࣌ɼதԝ ஋΍࢛෼Ґ͸্ख͘ϒʔτε τϥοϓ෼෍͕Ͱͳ͍ɻ Introduction Idea of bootstrap Graphs Confidence interval
  12. ฏۉ஋ͱ෼ࢄͷؔ܎ੑ ࢦ਺෼෍ͷྫ 17 ࢦ਺෼෍฼ूஂ͔ΒͭͷαϯϓϧΛநग़ ඪຊ਺͸ ࠨͷྻɿ฼ूஂ͔Βͷαϯϓϧ தͷྻɿฏۉͷϒʔτετϥοϓ෼෍ ӈͷྻɿU஋ͷϒʔτετϥοϓ෼෍  ࢦ਺෼෍͸ฏۉͷೋ৐͕෼ࢄͷ஋ͳͷͰɼ

    αϯϓϧ ࠨ ͷฏۉ͕େ͖͍΋ͷ͸ɼ ϒʔτετϥοϓ෼෍ͷ෼ࢄ͕େ͖͍  ϒʔτετϥοϓU෼෍͸฼ूஂͷU෼෍ ʹ͍ۙʂʂ f(x; ) = e x E[x] = 1 = V ar(x) = 1 2 = 2 Introduction Idea of bootstrap Graphs Confidence interval
  13. άϥϑ͔ΒಡΈऔΕͨ͜ͱ 18 ² ඪຊ෼෍͕฼ूஂ෼෍ΑΓࡉ͍෼෍ͩͱɼϒʔτετϥοϓ෼෍͸ඪຊ෼෍ΑΓࡉ͍ ² େඪຊͩͱϒʔτετϥοϓ෼෍্͕ख͘ݱΕΔ͕ɼখඪຊͩͱࠅ͍ Ø খඪຊͰϒʔτετϥοϓΛͯ͠΋খඪຊͰ͋Δͱ͍͏ࣄࣗମͷऑ఺͸ࠀ෰Ͱ͖ͳ͍ ² *OGFSFODF

    ਪ࿦ ʹॏཁͳཁૉ  ϒʔτετϥοϓ෼෍͕ͲΕ͚ͩඪຊ෼෍ʹ͍͔ۙ  खॱ͕αϯϓϧͷछྨΛͲΕ΄Ͳ্ख͍͜ͱߟྀʹೖΕΒΕΔ͔ ʢ3͕ଟ͍ํ͕͍͍ʣ ² ཧ૝ͷ3 ߟҊऀ&GSPOͷओுʮ3͸ཉ͍͠ΑͶʯ ͜ͷ࿦จͷஶऀͷओுʮ৴པ۠ؒΛٻΊΔͷͰ͋Ε͹3͸ཉ͍͠ʯY Introduction Idea of bootstrap Graphs Confidence interval
  14. ύʔηϯλΠϧ๏ 20 Introduction Idea of bootstrap Graphs Confidence interval ²

    ํ๏ ϒʔτετϥοϓ෼෍ͷਅΜதΛ۠ؒͱ͢Δɻ؆୯ ² ಛ௃ ਖ਼֬Ͱ͸ͳ͍ɻখඪຊʹରͯ͠͸ࠅ͍݁Ռ͕ग़Δɿ͕۠ؒ௒খ͘͞ग़Δɻ ² ͳͥখ͘͞ͳΔͷ͔ʁ ී௨ͷඪ४ޡࠩͱϒʔτετϥοϓඪ४ޡࠩΛൺ΂Δͱ෼͔Δ v ී௨ͷඪ४ޡࠩ v ϒʔτετϥοϓඪ४ޡࠩ v ൺֱ͢Δͱɼϒʔτετϥοϓඪ४ޡࠩ͸ɼී௨ͷޡࠩΑΓ ෼͚ͩ খ͘͞ͳΔ ฼ूஂ෼෍ͷฏۉΛ µ ෼ࢄΛ 2 ͱ͢Δͱɼநग़͞Εͨαϯϓϧͷෆภ෼ࢄ͸ s2 = 1 n 1 (xi x) Ͱ͋ΔɻΑͬͯඪ४ޡࠩ͸ s n ϒʔτετϥοϓͰ͸ϓϥάΠϯͷݪཧʹΑΓܦݧ෼෍ (நग़͞Εͨαϯϓϧ) Λ฼ूஂͱݟͳ͢ͷͰɼ฼ूஂ෼෍ͷ෼ࢄ͕ ˆ2 = 1 n (xi x) ͱͳΔɻϒʔτ ετϥοϓඪຊͷඪຊޡࠩ͸ ˆ n n 1 n ˆ n = n 1 n s n
  15.  ϕʔγοΫ๏ͱϒʔτετϥοϓU๏ 21 ϕʔγοΫ๏ ² ํ๏ ຖճͷϦαϯϓϦϯάͰҎԼͷЎΛܭࢉͯ͠ɼͦͷ஋Λ༻͍Δɻ ² ಛ௃ શ͘ਖ਼֬Ͱ͸ͳ͍ɻ

    Introduction Idea of bootstrap Graphs Confidence interval ϒʔτετϥοϓU๏ ² ํ๏ ຖճͷϦαϯϓϦϯάͰU஋Λܭࢉ͠ɼͦͷ෼෍Λ༻͍Δɻ ² ಛ௃ Ұ൪·͠ɻ฼ूஂ෼෍͕࿪ΜͰ͍Δͱ$-&$ͷྫͷΑ͏ʹαϯϓϧ͕ଟͯ͘΋ͩΊɻ
  16. ൺֱ ਖ਼ن෼෍ 22 ² γϛϡϨʔγϣϯʹΑΓͭͷ۠ؒΛൺֱ͢Δ ਖ਼ن෼෍ ਅͷ෼෍Ͱ͋Ε͹ɼ྆ଆ͕ͽͬͨΓͣͭ੾ΓऔΒΕΔ͸ͣͰ͋Δɻ ԣ࣠ɿαϯϓϧ਺ɻ ॎ࣠ɿΧόʔͯ͠ͳ͍֬཰ ²

    ݁Ռ શମతʹΧόʔͯ͠ͳ͍ɻͭ·Γ͕۠ؒڱ͍ɻ ύʔηϯλΠϧ๏ Q ͱϕʔγοΫ๏ S ͕ࠅ͍ɻ ී௨ͷU஋ͱϒʔτετϥοϓU๏͕ྑ͍݁Ռɻ Introduction Idea of bootstrap Graphs Confidence interval
  17. ൺֱ ࢦ਺෼෍ 23 ² γϛϡϨʔγϣϯʹΑΓͭͷ۠ؒΛൺֱ͢Δ ࢦ਺෼෍ ਖ਼ن෼෍ͷ৔߹ͱҧ͍ɼࠨӈରশͰͳ͍ͷͰඇରশͰ੾ΓऔΒΕΔɻ ² ݁Ռ جຊతʹ΄ͱΜͲશ෦ࠅ͍ɻ

    ϒʔτετϥοϓU๏͚ͩྑ͍ύϑΥʔϚϯεɻ4FDUJPOͷ࠷ޙͷਤ͕ࢀߟʹͳΔɻ Introduction Idea of bootstrap Graphs Confidence interval