数式と実装から復習するツリーアルゴリズム

 数式と実装から復習するツリーアルゴリズム

87bfa5d458ffdd284199bef5ba1ed481?s=128

Waku Michishita

March 16, 2019
Tweet

Transcript

  1. 0 2 . .. 2 2 1  )152"1&. -

    604'.,3%+ - 20193 16 $/-3  ($!4#2*4  
  2. 0 2 . .. 2 2 1 0.  

    1.  J ໦ͷੜ੒ํ๏ JJ ෆ७౓ JJJ ໦ͷႩఆํ๏ 2. Random Forest J #PPUTUSBQ4BNQMJOH JJ "EB#PPTUͱ3BOEPN'PSFTU 3. Gradient Boostingʢ ʣ 
  3. 1 00 . 21 0 1 1 00 . VXWU[

    G0 • <G GE/* • VXWU[)or"). .*7MI $G • !F@6A1OHI43YTWYZR?>+HB9P • !L+F@6AESCE:H#-=A6P;QDK F<QB66G8E • <G BLP<C^LNE6<C • \LP] %=A6P!G '2 G,&= • \5JOLNE6]KaggleBG%( • \LNE6]!G(
  4. 1 00 . 21 0 1 1 00 . 

    *6>3247 %#+)'-"Titanic6>3247, & 5;>.<0;1:#% = $2 )&% # !& & $ #! />8% ($GitHub#.49!&
  5. 0 2 . .. 2 2 1 • ʢ 

    ʣ 1.   ܾఆ໦
  6. 1 00 . 21 0 1 1 00 . )-

    • )- *2. 1%'* 20 ,$) ,/"+(#*!/"*453, +*,&  6  6
  7. 1 00 . 21 0 1 1 00 . 

       "    !  x1 > 5 x1 <= 5 x2 <= 0.6 x2 > 0.6 x3 = 0 x3 = 1
  8. 1 00 . 21 0 1 1 00 . 

      • !"node#   • !"root node#  ! •  !"terminal node $  !: leaf node#   ! •  "branch, sub tree#  
  9. 1 00 . 21 0 1 1 00 . 

        
  10. 1 00 . 21 0 1 1 00 . F

    F EG]Y^QW\$D CYW\VR`$D 52N1 • ]Y^QW\$D 2N@F (XaUP:7,B6N"F0 P:"D,*PN1 "E;N&PJNHD5L*P) :A37 • YW\VR`$D [aZBAF (XaUPB6N=8+MFD3K4ES_T8B6N" -P;1<F%/P ;N*PHN19OP'M.;9CBP1 →# GYW\VR`$D BF5!b4FI9?>c
  11. 1 00 . 21 0 1 1 00 . 

          
  12. 1 00 . 21 0 1 1 00 . #

      #! "$#'!&#( • CART(C&RT, Classification And Regression Treeʣ • ID3(Iterative Dichotomiser 3) • C4.5 • CHAID(Chi-squared Automatic Interaction Detection) ")( $CART (%#&CART*  "%
  13. 1 00 . 21 0 1 1 00 . 

      CART   
  14. 1 00 . 21 0 1 1 00 . 

      2 C4.5CHAID
  15. 1 00 . 21 0 1 1 00 . ʢ)?

    D ?K =>D#* • J  L Joint ProbabilityK "@$0?%1D@4=+ $A=B0,:8=1*$A=B0?%1DF! ", $ =!7+ • & 89;93@$0%1D@4=+ $A=B0,:8=1*$B?'>2$A0%1DF! " = ∑'() * !(", $' )=!7 • 1 ,D$0%4:8=-.@B=< @$0%1D@4=+ $B0%4:8<$A0%1DF! "|$ =!6*! "|$ = . /,0 . 0 =5ED+ •  = HIGFD ?/CED@4=F =--*HIG0/CE8?CED@ 4=F=-.+ * (@2Ato-kei.netF ?
  16. 1 00 . 21 0 1 1 00 . ʢ

    &+&6 ' • 150 7! • 3527 " • !+.4/7 #$ % = {1,2, … , ,} • 352"&!+150 7 !(") • .4/%&!+150 7 !$ • 352"&!+150'"%'.4/&!+150 7!$ (") → ,*-#352"&#.4/%$%+ 0 #$ " = 12(3) 1(3) -)+ 352"'.4/( '  $%+.4/argmax $ 0(#$ |") $%+
  17. 1 00 . 21 0 1 1 00 . !

    "# $ = &'()) &()) ͷಋग़ #+ ! "# = ⁄ -# - #+ $"$!$ ! $ "# = ⁄ -# ($) -#  #+"$!$  ! "# , $ = ! "# ! $ "# = -# - -# ($) -# = -# ($) - "$!$  ! $ = / #01 2 ! "# , $ = / #01 2 -# ($) - = -($) -  
  18. 1 00 . 21 0 1 1 00 . !

    "# $ = &'()) &()) ͷಋग़ʢ͖ͭͮʣ # + ! "# , $ = ⁄ .# ($) ., "$!$ ! $ = ⁄ .($) . "$!$# +   ! "# $ = ! "# , $ ! $ = ⁄ .# ($) . ⁄ .($) . = .# ($) .($) → "$!$#  
  19. 1 00 . 21 0 1 1 00 . 

     #& "! ( sklearn.DecisionTreeClassifier!,14.) (%& +3.204 5entropy6"-/ 5gini6%2  *! " ""))%'$)( •  +3.2047 ! " = − ∑ &'( ) * +& " ln *(+& |") • -/ 7 ! " = ∑ &'( ) * +& " (1 − * +& " )
  20. 1 00 . 21 0 1 1 00 . 

    0 1!.7 • %6 9 ↓ • $"/+"=@;9 ↓ • $)=@;'7(7- 9 • $) 0 *3 94%$"9 ↓ • ,0 ?$"0 * 3 94%28&030-=@;9 →#79 0=@;:<>/35% $"0 5$  0 5$
  21. 1 00 . 21 0 1 1 00 . *

    • .2( +(0 2 • */)(0 "%!,' )1 →*2&-1'& 7* 8  !($# **35648
  22. 1 00 . 21 0 1 1 00 . 0768/

    +%  5&4*20768/)4=?>@95-$4 sklearn.DecisionTreeClassifier/" ,103!.=?>@9# • max_depth(default=None)...0'A B • min_samples_split (default=2).../.<@;0:@9 • min_samples_leaf (default=1)...(*0AB<@;/.0:@9 • max_leaf_nodes (default=None)...AB<@;0 
  23. 1 00 . 21 0 1 1 00 . 

    ( •  " -!,&*%, • +, )Graphviz- # Graphviz' %* %, 3.134-7(45/8 dtreeplt-$  "(.2608
  24. 1 00 . 21 0 1 1 00 . 

    • T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. •  , $% #.-54, , 2012 • A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) – Analytic Vidhya •  )3+2,0 – Python!  •  "&'  – Code Craft House • [Python]Graphviz# 1*/12dtreeplt( - Qiita
  25. 0 2 . .. 2 2 1 • ʢ ʣ

    2. Random Forest 3BOEPN'PSFTU
  26. 1 00 . 21 0 1 1 00 . LRandom

    Forest10M DKFIKHJB  • DKFIKHJB  ,10&,3"179#7  2 &/ %.=':  • 0!$:DKFIKHJB  0! ,+1 -27 #8;/ #:)5 1 =4<(,&, : E?J@6GKAC>J@. *)#:
  27. 1 00 . 21 0 1 1 00 . -8/836#Bootstrap

    Sampling • -8/836"% $,&+!", )-4791,# !) →Bagging, Random Forest"%290$Bootstrap Sample, ! $ , !) • Bootstrap Sampling"%/846*'5/8458.!  , ):resampling with replacementʗ(;
  28. 1 00 . 21 0 1 1 00 . Bootstrap

    Sampling  ... ... ... ... ... ... ...         ... resampling resampling resampling ... 1 2 n
  29. 1 00 . 21 0 1 1 00 . Bagging

    • Bootstrap and AGGregatING • Bootstrap Sampling)")(&" "  #'   →$  " ) !%
  30. 1 00 . 21 0 1 1 00 . Bagging

     ...    ... →    ... ... resampling resampling resampling   
  31. 1 00 . 21 0 1 1 00 . Random

    Forest • &!A9C.I/5K=6D-?$ /E@,$ ←RMUNB Bootstrap SampleBCH<06.5K@, 6=,J(/ 7>+J9E B")/*1@IF8, • Random Forest %A,J(L+H.7EEHK9:2TUPSA'8J3?>") B, @L>0JG-A69 ←Bootstrap SamplingAG;=5K9#QVO4?A,J(/ @J
  32. 1 00 . 21 0 1 1 00 . Random

    Forest  ...     ... ... ... resampling resampling resampling    X1 , X2 , X3 , X4 X5 X1 , X3 , X5 X1 , X2 , X3 , X4
  33. 1 00 . 21 0 1 1 00 . Out-Of-Bag"

    • Random Forest '&(!#" %# Out-Of-Bag"   $*+),Out-Of- Bag- "%#  .http://alfredplpl.hatenablog.com/entry/2013/12/24/225420
  34. 1 00 . 21 0 1 1 00 . sklearnRandomForestClassifier

    DecisionTreeClassifier &*)-# • n_estimators ...   .$' +%010/ • bootstrap ... Bootstrap Sampling '*!.$' +%0True/ • oob_score ... Out-Of-Bag",(+ '*! .$' +%0False/
  35. 1 00 . 21 0 1 1 00 . 

     • ,  ,  , 2012 • David S. Moore, et al, Bootstrap Method and Permutation Tests, “The Practice of Business Statistics: Using Data for Decisions”, ch.18, W. H. Freeman • L. Breiman, and A. Cutler, “Random Forests” • Bagging and Random Forest Ensemble Algorithms for Machine Learning – Machine Learning Mastery