Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

An Introduction to Statistical Learning 4章

Rigel
February 15, 2018

An Introduction to Statistical Learning 4章

An Introduction to Statistical Learningの輪読を行ったときに作成したスライドです。

Rigel

February 15, 2018
Tweet

More Decks by Rigel

Other Decks in Science

Transcript

  1. â¾ 4.1 An Overview of Classification 4.2 Why Not Linear

    Regression? 4.3 Logistic Regression 4.3.1 The Logistic Model 4.3.2 Estimating the Regression Coefficients 4.3.3 Making Predictions 4.3.4 Multiple Logistic Regression 4.3.5 Logistic Regression for >2 Response Classes 4.4 Linear Discriminant Analysis 4.4.1 Using Bayes’ Theorem for Classification 4.4.2 Linear Discriminant Analysis for p=1 4.4.3 Linear Discriminant Analysis for p>1 4.4.4 Quadratic Discriminant Analysis 4.5 A Comparison of Classification Methods 4.6 Lab: Logistic Regression, LDA, QDA, and KNN 4.7 Exercises 2
  2. 4.2 Why Not Linear Regression? ŢŸfĆŹŔstrokeųepileptic seizureŸ ļŶdrug overdoseƓśŘűƏƌ ũƐŹŔstrokeųdrug

    overdoseŸqųdrug overdoseų epileptic seizureŸqŝòŤŘŢųƓŽKŤűŘƏƌŷ ŠŴgŃŹũŸƌřŵ gŹŵŘƌ 9
  3. 4.2 Why Not Linear Regression? ġ§\¢Ÿ&ŝŔĩxŔ ìxŔĹxųŋwŠƓ ŤűŔĩxŜƍ ìxŸqŝ ìxŜƍĹxŸqų

    òŤŘųťūYHŹŔ1,2,3ųŘřfĆŹŘŘŜƇŤ ƐŵŘƌ ůŸÕƓųƏĥáŵâá\¢ŸĂRs Ÿ¦ÈŹČáŶŹŵŘƌ ůŸÕŸYHŹŘئÈŝŗƏƌ 10
  4. 4.3.1 The Logistic Model ŴřƉŮű!ųp(!)ŸĽ$Ɠƻƪƿ:ŤūƍŘŘŸLJ ©ıŸĂRsŶƌƏ ÍŲŹƧƺŬƌ ŴƔŵbalanceŸ&ŲƇdefaultŸèÖŹ[0,1]Ÿ ŶB ƄŮű¿ŤŘƌ

    ŢŸOōŹƁůŶdefaultƪDŽƦŬŠŸOōťƈŵŘƌ âá\¢ŝ2&ŸYHŶRs0²ƓđřųŔ?ØáŶ Źp ! > 1ƇŤşŹp ! < 0ųŵƏYHŝ…ŧŗƏŜ ƍŷ 17
  5. 4.3.1 The Logistic Model ŢŸOōƓęÆŦƏŶŹ,űŸ!ŶśŘűp ! Ÿ&ŝ [0,1]ŶBƄƏƌřŵĽ¢Ɠ Ůűp !

    Ɠƻƪƿ:ŤŵŠ ƐźŘŠŵŘŷ ũŸƌřŵĽ¢ŹƇŭƑƔŘŮŻŘŗƏƌ ǁƢƣƩƕƨƝRsŲŹǁƢƣƩƕƨƝĽ¢Ɠ řƌ 18
  6. 4.3.1 The Logistic Model 3ðŸĂRsƻƪƿŲŹ-" Ź!Ÿ<[8ŶjŦƏ $Ÿ[8ĺƓĒŤűŘūŷ ǁƢƣƩƕƨƝRsƻƪƿŸYHŹũřŲŹŵŘƌ !Ÿ<[8ŶjŦƏp !

    Ÿ[8ĺŹ!Ÿ&Ŷ"cŦ Əƌ ŬŠŴ!ŝ[8ŦƐźp ! Ƈ[8Ŕ !ŝÌlŦƐźp ! ƇÌlųŘřĽ $Ź’Ǝïůƌ 23
  7. 4.3.2 Estimating the Regression Coefficient 3.: -" = 0ųŘřsÐġŹ ŲŗƏŢųƓéŦƌ

    ŢƐŹdefaultŸèÖŹbalanceŸ&ŶƌƍŵŘŢųƓĒ ŤűŘƏƌ ŬŠŴp&ŝųűƇkţŘŜƍ3. Ź¶>ŲŞƏŷ 28
  8. 4.3.5 Logistic Regression for >2 Response Class 6ŸƥƝơƼǂŲġ§Ťū2ƝƽƣŸǁƢƣƩƕƨƝRs ƻƪƿŹ3ƝƽƣŶƇš|ŲŞƏƌ ŲƇgŃŶŹŗƄƎ

    ƒŵŘƌ ŵƔŲŜųŘřųŔ¾ŸƥƝơƼǂŲġ§ŦƏ340² ŝē¢ƝƽƣŸ0ŐŲČáŬŜƍŬƌ ŬŜƍŢŸ°ŲŹġ§ŤŵŘŠŴũřŘřŸŝŗƏŮű ŢųŬŠėŚűśŘűŷ ŭŵƅŶRŲŲŞƏƌ 42
  9. 4.4.1 Using Bayes’ Theorem for Classification ĘÍƪDŽƦƓ 9 (9 >2)

    %ŸƝƽƣŶ0ŠūŘųŦƏƌ ůƄƎŔĥáŵâá\¢$Ź9%Ÿ&ƓųƏƌ :; ƓĘÍƪDŽƦŝƝƽƣ<ŲŗƏèÖųŦƏƌ =;(!) ≡ Pr(! = A|$ = <)ųŦƏƌ ŢƐŹƝƽƣ<ŶśŠƏ!ŸixĽ¢Ŭƌ ůƄƎŔƝƽƣ<ŶśŘű! ≈ AŶŵƏèÖŝŒŘ¨ =;(!)Ź_Şŵ&ƓųƏƌ 47
  10. 4.4.1 Using Bayes’ Theorem for Classification ŢŢŲŔƵƖƤŸfØƓ řƌ ŢƐŜƍŹD; !

    = Pr $ = < ! ųĒŦƌ yƓĕƐźƒŜƏ»Ŷ:; ų=;(A)ƓfŤūŘƌ ŦƏųD; ! ŝƒŜƏŜƍ :; ƓfŦƏŢųŹö<Ŭƌ <øŶƝƽƣŸ7HŬƌ =;(A)ƓfŦƏŢųŹƍŜŸixĽ¢ƓfŤŵŘ ųņŤŘƌ ũŸ¦ÈŶůŘűŸƥƝơƼǂŲġ§ŤűŘşƌ 48
  11. 4.4.2 Linear Discriminant Analysis for D = 1 ƄŧŹD =

    1ųfŦƏƌ ůƄƎġ§\¢ŹíŬƌ =;(A)ƓÀĖ0rųfŦƏƌ ţƍŶŔE" # = E# # =, …, = EG # ≡ E#ųfŦƏƌ ,űŸƝƽƣŲ0¡ŝòŤŘųŘřŢųŬƌ 50
  12. 4.4.2 Linear Discriminant Analysis for D = 1 j¢ƓųŮű<ŶĽŦƏķ0ŬŠćŚƏų ŢřŵƏƌ

    AŝÆƄŮū¨ŶƝƽƣŝ<ŬŮūYHH; Ÿ&Ź_Şş ŵƏƌ ŭŵƅŶH; ŝAŸĂŵĽ¢ŬŜƍĂ340²Ůű Ěřƌ 52
  13. !Śź9 = 2Ų:" = :# ųŦƏƌ ŢŸųŞŔ AŝƝƽƣ1ŲŗƏųŞH" > H#

    ųŵƏƌ ůƄƎ ųŵƏƌ Ɲƽƣ2ŸųŞƇI»Ŭƌ ZÛĂŹH" = H# ƓAŶůŘűęŘű ųÅƄƏƌ 4.4.2 Linear Discriminant Analysis for D = 1 53
  14. !ƓŸTŶéŦƌ J" = −1.25 , J# = 1.25 , E"

    # = E" # = 1 ųŤūƌ Ƅū:" = :# = 0.5ųŦƏųT ŸœŸçĂ(ÆfZÛ) ŝzŠƏƌ 4.4.2 Linear Discriminant Analysis for D = 1 54
  15. gŃŶŹ!ŝÀĖ0rŶƒřų0ŜŮűŘūųŤűƇ J", … , JG, :", … , :G, E#ƓfŦƏ…ĔŝŗƏƌ

    Ă340²(LDA)ŹƱƽƺDŽƦŸf&ƓţŮŞŸy ŶĴÚŤűƵƖƤ0ŐQƓƏƌ Ÿf&ŝƌşÚŘƍƐƏƌ 4.4.2 Linear Discriminant Analysis for D = 1 56
  16. 4.4.3 Linear Discriminant Analysis for D > 1 LDAƓē¢Ÿġ§\¢ŸYHŶš|ŦƏƌ ũŸūƆŶŔĘÍ&ŹŔGƝƽƣŝU¯ŸuVƵƝƫƿ

    ųŔ,Ɲƽƣ-ıŸ0¡-0¡đ2Ɠ›ů^\ĺÀĖ 0rŶƒřųfŦƏƌ ^\ĺÀĖ0rŹũƐŪƐŸ\¢ŝ1¾*ŸÀĖ0 rŶƒřųfŤűŔţƍŶũƐŪƐŸ\¢ŹäĽĽ$ Ɠ›ůƌ 60
  17. 4.4.3 Linear Discriminant Analysis for D > 1 D =

    2Ÿ^\ĺÀĖ0rŸixĽ¢ŸƞƽƳƓéŦƌ ŴŮŭŸTƇx" ĨƄūŹx# ĨŶũŮűTƓ1¤ŦƏų¤ ʼnŹ1¾*ŸÀĖ0rŶŵƏƌ pŹVar !" = Var !# ŲCor !", !# = 0Ŭƌ FŹ!" ų!# Ŷ0.7ŸäĽŝŗƏƌ 61
  18. 4.4.3 Linear Discriminant Analysis for D > 1 ^\ĺÀĖ0rŸixĽ¢Ź ŢƔŵŬƌ

    !~V(J, Σ)Ůű¬şƌ E ! = JŹD¾*ŸuVƵƝƫƿŬƌ Cov ! = ΣŹD×DŸ0¡-0¡đ2Ŭƌ 62
  19. ROCªĂƓéŦƌ ĄĨŹdefaultŦƏŶdefaultŦƏų34ŤūÖŬƌ ůƄƎŔxŸŢųŬƌ 4.4.3 Linear Discriminant Analysis for D >

    1 ½ĨŹdefaultŤŵŘŶdefault ŦƏų34ŤūÖŬƌ ůƄƎŔ1-ÔÞxŬƌ ŗƍƊƏľ&ŲŔůŸÖƓě óŤűƴǁƨƫŤūŸŝROCªĂ Ŭƌ 80
  20. 4.4.3 Linear Discriminant Analysis for D > 1 ŇĂŸŸʼnîŝ_ŞŠƐźŔčŘ0ŐŝŲŞƏƌ ũŸʼnîŹAUCųLźƐƏƌ

    RŸYHAUCŹ0.95ųňtŶŒŘƌ £ƆŸçĂŹŔŗƍƊƏľ&Ŷ śŘűůŸÖŝòŤŘĂŬƌ 6ƶDŽƢŸTŲŘřųŔ0rŝ ĹŵŮűŘƏųŞųŜŶŵƏƌ ŢŸYH34Ÿû³ŹŸŒ XƇŚŵŘƌ 82
  21. 4.4.3 Linear Discriminant Analysis for D > 1 ČáŶ0ŐŤūû³ŹĒŸƌřŶŵƏƌ “dž”Ɠ·/ŤūŘƌŔjïġŬƌ

    “DŽ”ũŸAjŬƌŔsÐġŬƌ defaultƪDŽƦŸYHŹ”dž”ŝdefault=YesŬƌ 83
  22. 4.4.3 Linear Discriminant Analysis for D > 1 ĹĔŵýěĺƓŸĒŶéŦƌ False

    Pos. rateųTrue Pos. rateŸ0ÂŹGƝƽƣŸgŃ Ÿ%¢Ŭƌ Pos. pred. valueųNeg. Pred. valueŸ0ÂŹGƝƽƣŸ ÍŤū%¢Ŭƌ 84
  23. 4.4.4 Quadratic Discriminant Analysis ŢŸfŸųŞŘůƇŸH; A Ź ųŵƏƌ QDAŹŔĘÍƪDŽƦ! =

    AŹŔH; A ŝ­Ƈ_ŞşŵƏ Ɲƽƣ<Ŷ0ŐŦƏƌ J;, Σ;, :; Źf&ƓÚŘƏƌ LDAųŹIJŮűQDAŹAŸ¾yŶŵŮűƏƌ 87
  24. 4.5 A Comparison of Classification Method ƄŧŹŔǁƢƣƩƕƨƝRsųLDAŸĽ$ŶůŘűġ §ŦƏƌ ġ§\¢ŝ1ů(D =

    1)Ų2ůŸƝƽƣŝŗƏųŞŔ ŢŸyŜƍLDAŸj¢ƚƨƤŹ ų¬ŠƏƌ ^. ų^" ŹŔJ" ųJ# ųE#ŸĽ¢Ŭƌ 93
  25. 4.5 A Comparison of Classification Method ûnůŸ•ÈŹŔIťû³ƓƇūƍŦŮű‡řŜƇ ŤƐŵŘ ŠŴ_ŹIJřû³ŶŵƏƌ LDAŹŔĘÍ&Ź,Ɲƽƣ-ıŸ0¡-0¡đ2Ɠ

    ›ůÀĖ0rŶƒřųfŤűŘūŷ ŬŜƍŔũŸfŝ’ƎïůųŞŹLDAŹǁƢƣƩƕƨ ƝRsƌƎ)ëŬƌ İŶŢŸfŝ’ƎïūŵŘųŞŹǁƢƣƩƕƨƝR sŸ¦ŝ)ëŬƌ 95
  26. 4.5 A Comparison of Classification Method KNNŹŢŸðŲġ§Ťū¦ÈųŹ,şIJŮūƌŷ ĘÍ&! = AŸƝƽƣŹŔ

    !ŜƍĭŘŋŶ9%ŸĘÍ &Ɠ@ÒŤűƝƽƣƓ34ŦƏƌ ůƄƎŔKNNŹƯǂƱƽƺƫƾƨƝŵ•ÈŬƌ ŬŜƍŔZÛŝ,ÑĂťƈŵŘųŞŹŔLDAƉǁƢ ƣƩƕƨƝƌƎ)ëŬƌ ŬŠŴŔKNNŹŴŸġ§\¢ŝĹĔŜųŜŹ0Ŝƍ ŵŘƌ 96
  27. 4.5 A Comparison of Classification Method 4ůŸ•ÈƓŔ6íŸơƭƾƚŸƪDŽƦƓŮűĴÚŤ űƅūƌ 3ůŹĂŵZÛŲŔÁƎ3ůŹňĂŵZÛŬƌ ũƐŪƐŶjŤűŔ100%ŸƪDŽƦƥƨƫƓŮūƌ

    ũŤűŔƘƽDŽÖƓěóŤūƌ KNNŹ9 = 1ŸųŞųCV(cross-validation)ųŘř•Èų 2íŐŤūƌ CVŶůŘűŹ5ðŲġ§ŦƏƌ ũƐŪƐŸơƭƾƚŹD = 2Ŭƌ ůƄƎġ§\¢ŹůŬƌ 98
  28. 4.5 A Comparison of Classification Method ơƭƾƚ 1 LDAŝÝřƄşŘŮūƌ KNNŹưƽůŘūƌ

    QDAŹ´ħ‰ŝ…ĔŶŒŘŜ ƍLDAƌƎ‹:Ťūƌ ǁƢƣƩƕƨƝRsŹÀĖ0rŸ fƓśŜŵŘ0ŔLDAƌƎƂƔŸ ŭƋŮų9Ůūƌ 101
  29. 4.5 A Comparison of Classification Method ơƭƾƚ 3 !" ų!#

    Źt0rŶƒřƌ GƝƽƣ50%ŸƪDŽƦŝŗƏƌ t0rŸÕŹƂųƔŴÀĖ0rų āŬŠŴŔuV&ŜƍijŘ&ŝ ^şŵƏ(JŶŗƏƌ ZÛĂŹãĂŲŔǁƢƣƩƕƨƝR sŝĴŤűŘūƌ LDAŹfŝÀĖ0rŬŜƍŷ 103
  30. 4.5 A Comparison of Classification Method ơƭƾƚ 4 Ɲƽƣ1Ÿġ§\¢ļŸäĽŹ0.5 ŲŔƝƽƣ2Ÿġ§\¢ļŸäĽŹ

    -0.5Ŭƌ ŢƐŹQDAŸfŶj†ŤűŘƏ ƌ ZÛŹ¾ªĂŶŵƏƌ ūŤŜŶQDAŝŸ•ÈƌƎ)Ɛ űŘƏŢųŝĕűCƐƏŷ 105
  31. 4.5 A Comparison of Classification Method ơƭƾƚ 5 GƝƽƣŸĘÍ&ŹŔġ§\¢ļ ŶäĽŸŵŘÀĖ

    ¢Ŭƌ âá\¢ŹŔ!" #ų!# #ų!"×!# Ɠġ §\¢ųŤűŔǁƢƣƩƕƨƝĽ¢ ŜƍƠǂƴƿŤūƌ ũŸû³ŔZÛŹ¾ŶŵƏƌ QDAŝÝŘŘŷ ¾ŶKNN-CVŬƌ Ăŵ•ÈŹ‰ĉŝ‹Řƌ 106
  32. 4.5 A Comparison of Classification Method ơƭƾƚ 6 ĞúŹơƭƾƚ5ųāŬŠŴŔâ á\¢ŹƇŮųēŅŵňĂĽ¢

    ŜƍƠǂƴƾǂƞŤūƌ ũŸû³ŔQDAŲƇĴ1ŵƻƪƿ :ŝŲŞŵŜŮūƌ Ăŵ•ÈƌƎQDAŸ¦ŝƷơŬ ŠŴŔƌƎ´ħŵKNN-CVŝÝč ŜŮūƌ 107
  33. 4.5 A Comparison of Classification Method ơƭƾƚ 6 KNN-1ŝ݋ŜŮūƌ ƪDŽƦŝēŅŵňĂĽ¢ŲŗŮ

    űƇŔKŸ&ŝÀŤşĵ™ţƐŵŘų "ÑųŤűĤ{ŲŗƏŢųƓéŤű ŘƏƌ 108