$30 off During Our Annual Pro Sale. View details »

データドリブンな仮説検証のためのSelective Inference

saltcooky
October 26, 2019

データドリブンな仮説検証のためのSelective Inference

TokyoR#82 で発表したLT

saltcooky

October 26, 2019
Tweet

More Decks by saltcooky

Other Decks in Science

Transcript

  1. σʔλυϦϒϯͳԾઆݕূͷͨΊͷ 4FMFDUJWF*OGFSFODF !TBMUDPPLZ 5PLZP3 1

  2. ୭ • !TBMUDPPLZ • 3ྺɿ೥͙Β͍͔ͳ • ۈઌɿݪ॓ʹ͋Δ*5ܥͷձࣾ • ࢓ࣄ಺༰ɿ3%తͳ෦ॺͰ3Λ࢖ͬͨ
 ɾσʔλ෼ੳ԰͞Μ


    ɾσʔλूܭ԰͞Μ • झຯɿ෰ϑΝογϣϯඒज़ؗ८Γ 2
  3. ಺༰ΛҰݴͰ 
 ࠷ۙษڧͨ͠4FMFDUJWF*OGFSFODFΛ঺հ
 3

  4. 4 ౷ܭ෼ੳͷ࿩ Ծઆݕఆ
 ؼແԾઆ͕غ٫Ͱ͖Δ͔ ݁ՌධՁ
 ༗ҙͳ͕ࠩ͋Δͱ൑໌ ஌ࣝܦݧ͔ΒͷԾઆܗ੒
 ʮҿञ͸ ʹྑ͍ʯ ஌ࣝۦಈܕͳ౷ܭ෼ੳ

    σʔλ
  5. 5 σʔλʹجͮ͘Ծઆܗ੒
 ʮҿञ͸ ʹྑ͍ʯ σʔλۦಈܕͳ౷ܭ෼ੳ σʔλ Ծઆݕఆ
 ؼແԾઆ͕غ٫Ͱ͖Δ͔ ݁ՌධՁ
 ༗ҙͳ͕ࠩ͋Δͱ൑໌

    ౷ܭ෼ੳͷ࿩
  6. 6 σʔλʹجͮ͘Ծઆܗ੒
 ʮҿञ͸ ʹྑ͍ʯ σʔλۦಈܕͳ౷ܭ෼ੳ ಘΒΕͯσʔλʹ
 ۮવͦͷΑ͏ͳ܏޲͕
 ͋ͬͨͷ͔΋͠Εͳ͍ σʔλ Ծઆݕఆ


    ؼແԾઆ͕غ٫Ͱ͖Δ͔ ݁ՌධՁ
 ༗ҙͳ͕ࠩ͋Δͱ൑໌ ͦͷΑ͏ͳ܏޲͕͋ΔσʔλͰ
 ݕఆΛ͢Δͱ༗ҙʹͳΓ΍͍͢ʁ ౷ܭ෼ੳͷ࿩
  7. 7 Ծઆબ୒όΠΞε σʔλʹجͮ͘Ծઆܗ੒
 ʮҿञ͸ ʹྑ͍ʯ σʔλۦಈܕͳ౷ܭ෼ੳ σʔλ Ծઆݕఆ
 ؼແԾઆ͕غ٫Ͱ͖Δ͔ ݁ՌධՁ


    ༗ҙͳ͕ࠩ͋Δͱ൑໌ Ծઆબ୒όΠΞε σʔλΛجʹԾઆܗ੒Λ͓͜ͳͬͨͱ͖
 Ծઆݕఆʹੜͯ͡͠·͏όΠΞε
  8. Ծઆબ୒όΠΞεͷ֬ೝ ࣍ͷΑ͏ͳਅͷճؼϞσϧΛߟ͑Δ
 
 
 
 ͦͯ͠ɺ໨తม਺ʹӨڹ͕ͳ͍ม਺ͷܭม਺͕ಘΒΕ͍ͯΔͱԾఆ 
 ҎԼΛճ܁Γ܁Γฦ͢ w TUFQXJTFʹΑΔม਺બ୒Λߦ͍ճؼϞσϧΛಘΔ

    w ಘΒΕͨϞσϧʹ͓͚Δ܎਺ͷ༗ҙ͔Ͳ͏͔ݕఆ͢Δ Ћ  w غ٫ׂ߹غ٫͞Εͨճ਺ݕఆΛߦͳͬͨճ਺ 8
  9. 9 Ծઆબ୒όΠΞεͷ֬ೝ Ќ Ќ Ќ Ќ Ќ Ќ Ќ Ќ

    Ќ Ќ غ٫͞Εͨ਺           غ٫͞Ε͔ͨ ͬͨ਺           غ٫ൺ཰            ʻม਺બ୒ΛߦΘͳ͍৔߹ʼ
 όΠΞε͕ͳ͍৔߹ɺ༗ҙਫ४ͱغ٫ׂ߹͸౳͘͠ͳΔ

  10. Ծઆબ୒όΠΞεͷ֬ೝ ؼແԾઆͷ΋ͱͰ͸Q஋͸Ұ༷෼෍ʹै͓ͬͯΓɺόΠΞε͕ͳ͍ 10

  11. 11 Ծઆબ୒όΠΞεͷ֬ೝ Ќ Ќ Ќ Ќ Ќ Ќ Ќ Ќ

    Ќ Ќ غ٫͞Εͨ਺           غ٫͞Ε͔ͨ ͬͨ਺           غ٫ׂ߹            ม਺બ୒Λߦ͏৔߹
 غ٫ׂ߹͸༗ҙਫ४ΑΓඇৗʹେ͖͘ͳ͍ͬͯΔ

  12. 12 Ծઆબ୒όΠΞεͷ֬ೝ Q஋͸Ұ༷෼෍ʹै͓ͬͯΒͣɺόΠΞε͕৐͍ͬͯΔ
 ˠ༗ҙͰͳ͍ͷʹ༗ҙͳม਺Ͱ͋Δͱ͢ΔՄೳੑ͕ߴ͘ͳΔ

  13. 4FMFDUJWF*OGFSFODF Ծઆܗ੒ͷΠϕϯτͷٯ૾%Λ৚݅෇͚Δ͜ͱͰɺ
 Ծઆબ୒όΠΞεͷͳ͍ਪఆΛߦ͏͜ͱ͕Ͱ͖Δ 13 ɿԾઆ ಛ௃ྔ ू߹ ɿԾઆબ୒Πϕϯτ
  ಛ௃બ୒ΞϧΰϦζϜ

    ɿબ୒͞ΕͨԾઆ ಛ௃ྔ ը૾ग़యIUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG
  14. 4FMFDUJWF*OGFSFODF "ճؼ෼ੳʹ͓͚Δಛ௃બ୒ΠϕϯτΛͲ͏දݱ͢Ε͹ྑ͍͔  2ઢܗͳܗͰදݱ͢Ε͹ྑ͍ʢ-FFFUBM ʣ 14 ը૾ग़యIUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG

  15. 4FMFDUJWF*OGFSFODF ճؼ෼ੳʹ͓͚Δಛ௃બ୒Πϕϯτ w .BSHJOBM4DSFBOJOH
 ɹϧʔϧఆٛʹج͍ͮͨม਺બ୒ w MBTTP
 ɹ࠷దੑ৚݅ʹج͍ͮͨม਺બ୒ w 4UFQXJTF


    ɹΞϧΰϦζϜʹج͍ͮͨม਺બ୒ ͜ΕΒશͯઢܗͳಛ௃બ୒Πϕϯτ 15 ը૾ग़యIUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG
  16. 4FMFDUJWF*OGFSFODF ઢܗͳಛ௃બ୒ΠϕϯτΛ৚݅෇͚͜ͱͰɺؼແԾઆͷ෼෍͸
 ੾அਖ਼ن෼෍'ʹै͏ 16 w 4FMFDUJWFQWBMVF w 4FMFDUJWFDPOpEFODFJOUFSWBMT ը૾ग़యIUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG

  17. 3Ͱ4FMFDUJWF*OGFSFODF 3Ͱͷճؼ෼ੳͷ4FMFDUJWF*OGFSFODF͸
 TFMFDUJWF*OGFSFODFύοέʔδͰ࣮૷Մೳ 17 
 > # stepwiseʹ͓͚Δselective inferenceͷ࣮ߦྫ
 >

    library(selectiveInference) > gfit = fs(x,y) # x=આ໌ม਺ y=໨తม਺ > out = fsInf(gfit,type = "aic",alpha = 0.05)
 > out # ݁Ռͷ֬ೝ

  18. 3Ͱ4FMFDUJWF*OGFSFODF 3Ͱͷճؼ෼ੳͷ4FMFDUJWF*OGFSFODF͸
 TFMFDUJWF*OGFSFODFύοέʔδͰ࣮૷Մೳ 18 
 > # lassoʹ͓͚Δselective inferenceͷ࣮ߦྫ
 >

    gfit = glmnet(x,y) > lambda = .3 > beta_ls = coef(gfit, s=lambda/n)[-1] > out = fixedLassoInf(x,y,beta_ls,lambda,sigma=sigma)
  19. 19 3Ͱ4FMFDUJWF*OGFSFODF Ќ Ќ Ќ Ќ Ќ Ќ Ќ Ќ

    Ќ Ќ غ٫͞Εͨ਺           غ٫͞Ε͔ͨ ͬͨ਺           غ٫ׂ߹            TUFQXJTFʹ͓͚Δ4*݁Ռ
 غ٫ׂ߹͸֓Ͷఔ౓ʹͳ͍ͬͯΔ
  20. 20 3Ͱ4FMFDUJWF*OGFSFODF Q஋ͷ෼෍΋֓ͶҰ༷෼෍ʹै͓ͬͯΓɺόΠΞεͷͳ͍ਪఆ ͕Ͱ͖͍ͯΔͱߟ͑ΒΕΔ

  21. ΫϥελϦϯάʹ͓͚Δ4* ΫϥελϦϯάޙͷΫϥεؒʹ͕ࠩ͋Δ͔ͷݕఆ͢Δ࣌ʹ΋ 4FMFDUJWFJOGFSFODFΛߦΘͳ͍ͱ͍͚ͳ͍
 Ϋϥεͷܗ੒ʹ͸ར༻͍ͯ͠Δσʔλʹґଘ͍ͯ͠ΔͨΊ  21 ը૾ग़యIUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG

  22. ΫϥελϦϯάʹ͓͚Δ4* ֊૚ΫϥελϦϯάʹ͓͚ΔTFMFDUJWFJOGFSFODF͸ɺ QWDMVTU 㱢W ͱTDBMFCPPU 㱢W ύοέʔδͰ࣮૷Մೳ
  5FSBEB4IJNPEBJSB 

     22
  23. ·ͱΊ w ม਺બ୒ΛߦͳͬͨճؼϞσϧʹ͓͚Δ܎਺ͷݕఆʹ͸ɺ
 Ծઆબ୒όΠΞε͕৐͍ͬͯΔ w TFMFDUJWFJOGFSFODFΛߦ͏͜ͱʹΑΓόΠΞεͷͳ͍ਪఆ Λߦ͏͜ͱ͕Ͱ͖Δ w TFMFDUJWF*OGFSFODFύοέʔδΛ࢖͍·͠ΐ͏ w

    ΫϥελϦϯάʹ͓͚ΔTFMFDUJWFJOGFSFODF΋͋Γ·͢ 23
  24. ࢀߟ w &YBDUQPTUTFMFDUJPOJOGFSFODF XJUIBQQMJDBUJPOUPUIFMBTTP+BTPO%-FFFUBM  
 IUUQTBSYJWPSHBCT w 4FMFDUJWFJOGFSFODFBGUFSWBSJBCMFTFMFDUJPOWJBNVMUJTDBMFCPPUTUSBQ5FSBEB 4IJNPEBJSB

    
 IUUQTBSYJWPSHBCT w "DPOEJUJPOBMBQQSPBDIUPJOGFSFODFBGUFSNPEFMTFMFDUJPO
 IUUQKPTIVBMPGUVTDPNQPTUDPOEJUJPOBMBQQSPBDIUPJOGFSFODFBGUFSNPEFM TFMFDUJPO w $PNQVUJOHTFMFDUJWFJOGFSFODFQWBMVFTPGDMVTUFSTVTJOHQWDMVTUBOETDBMFCPPU
 IUUQTUBUTZTJLZPUPVBDKQQSPHTDBMFCPPUQWDMVTUQEG w σʔλۦಈܕՊֶͷͨΊͷબ୒తਪ࿦ʢ4FMFDUJWF*OGFSFODFʣ
 IUUQTXXXJFJDFPSHdTJUBGPSVNBSUJDMFQEG w Ծઆݕఆʹ͓͚Δม਺બ୒ͷӨڹΛߟ͑Δ4FMFDUJWF*OGFSFODFೖ໳XJUI3
 IUUQTRJJUBDPNTBMUDPPLZJUFNTCFCGECDG 24