Upgrade to Pro — share decks privately, control downloads, hide ads and more …

数式と実装から復習するツリーアルゴリズム

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 数式と実装から復習するツリーアルゴリズム

Avatar for Waku Michishita

Waku Michishita

March 16, 2019
Tweet

More Decks by Waku Michishita

Other Decks in Programming

Transcript

  1. 0 2 . .. 2 2 1  )152"1&. -

    604'.,3%+ - 20193 16 $/-3  ($!4#2*4  
  2. 0 2 . .. 2 2 1 0.  

    1.  J ໦ͷੜ੒ํ๏ JJ ෆ७౓ JJJ ໦ͷႩఆํ๏ 2. Random Forest J #PPUTUSBQ4BNQMJOH JJ "EB#PPTUͱ3BOEPN'PSFTU 3. Gradient Boostingʢ ʣ 
  3. 1 00 . 21 0 1 1 00 . VXWU[

    G0 • <G GE/* • VXWU[)or"). .*7MI $G • !F@6A1OHI43YTWYZR?>+HB9P • !L+F@6AESCE:H#-=A6P;QDK F<QB66G8E • <G BLP<C^LNE6<C • \LP] %=A6P!G '2 G,&= • \5JOLNE6]KaggleBG%( • \LNE6]!G(
  4. 1 00 . 21 0 1 1 00 . 

    *6>3247 %#+)'-"Titanic6>3247, & 5;>.<0;1:#% = $2 )&% # !& & $ #! />8% ($GitHub#.49!&
  5. 0 2 . .. 2 2 1 • ʢ 

    ʣ 1.   ܾఆ໦
  6. 1 00 . 21 0 1 1 00 . )-

    • )- *2. 1%'* 20 ,$) ,/"+(#*!/"*453, +*,&  6  6
  7. 1 00 . 21 0 1 1 00 . 

       "    !  x1 > 5 x1 <= 5 x2 <= 0.6 x2 > 0.6 x3 = 0 x3 = 1
  8. 1 00 . 21 0 1 1 00 . 

      • !"node#   • !"root node#  ! •  !"terminal node $  !: leaf node#   ! •  "branch, sub tree#  
  9. 1 00 . 21 0 1 1 00 . 

        
  10. 1 00 . 21 0 1 1 00 . F

    F EG]Y^QW\$D CYW\VR`$D 52N1 • ]Y^QW\$D 2N@F (XaUP:7,B6N"F0 P:"D,*PN1 "E;N&PJNHD5L*P) :A37 • YW\VR`$D [aZBAF (XaUPB6N=8+MFD3K4ES_T8B6N" -P;1<F%/P ;N*PHN19OP'M.;9CBP1 →# GYW\VR`$D BF5!b4FI9?>c
  11. 1 00 . 21 0 1 1 00 . 

          
  12. 1 00 . 21 0 1 1 00 . #

      #! "$#'!&#( • CART(C&RT, Classification And Regression Treeʣ • ID3(Iterative Dichotomiser 3) • C4.5 • CHAID(Chi-squared Automatic Interaction Detection) ")( $CART (%#&CART*  "%
  13. 1 00 . 21 0 1 1 00 . 

      CART   
  14. 1 00 . 21 0 1 1 00 . 

      2 C4.5CHAID
  15. 1 00 . 21 0 1 1 00 . ʢ)?

    D ?K =>D#* • J  L Joint ProbabilityK "@$0?%1D@4=+ $A=B0,:8=1*$A=B0?%1DF! ", $ =!7+ • & 89;93@$0%1D@4=+ $A=B0,:8=1*$B?'>2$A0%1DF! " = ∑'() * !(", $' )=!7 • 1 ,D$0%4:8=-.@B=< @$0%1D@4=+ $B0%4:8<$A0%1DF! "|$ =!6*! "|$ = . /,0 . 0 =5ED+ •  = HIGFD ?/CED@4=F =--*HIG0/CE8?CED@ 4=F=-.+ * (@2Ato-kei.netF ?
  16. 1 00 . 21 0 1 1 00 . ʢ

    &+&6 ' • 150 7! • 3527 " • !+.4/7 #$ % = {1,2, … , ,} • 352"&!+150 7 !(") • .4/%&!+150 7 !$ • 352"&!+150'"%'.4/&!+150 7!$ (") → ,*-#352"&#.4/%$%+ 0 #$ " = 12(3) 1(3) -)+ 352"'.4/( '  $%+.4/argmax $ 0(#$ |") $%+
  17. 1 00 . 21 0 1 1 00 . !

    "# $ = &'()) &()) ͷಋग़ #+ ! "# = ⁄ -# - #+ $"$!$ ! $ "# = ⁄ -# ($) -#  #+"$!$  ! "# , $ = ! "# ! $ "# = -# - -# ($) -# = -# ($) - "$!$  ! $ = / #01 2 ! "# , $ = / #01 2 -# ($) - = -($) -  
  18. 1 00 . 21 0 1 1 00 . !

    "# $ = &'()) &()) ͷಋग़ʢ͖ͭͮʣ # + ! "# , $ = ⁄ .# ($) ., "$!$ ! $ = ⁄ .($) . "$!$# +   ! "# $ = ! "# , $ ! $ = ⁄ .# ($) . ⁄ .($) . = .# ($) .($) → "$!$#  
  19. 1 00 . 21 0 1 1 00 . 

     #& "! ( sklearn.DecisionTreeClassifier!,14.) (%& +3.204 5entropy6"-/ 5gini6%2  *! " ""))%'$)( •  +3.2047 ! " = − ∑ &'( ) * +& " ln *(+& |") • -/ 7 ! " = ∑ &'( ) * +& " (1 − * +& " )
  20. 1 00 . 21 0 1 1 00 . 

    0 1!.7 • %6 9 ↓ • $"/+"=@;9 ↓ • $)=@;'7(7- 9 • $) 0 *3 94%$"9 ↓ • ,0 ?$"0 * 3 94%28&030-=@;9 →#79 0=@;:<>/35% $"0 5$  0 5$
  21. 1 00 . 21 0 1 1 00 . *

    • .2( +(0 2 • */)(0 "%!,' )1 →*2&-1'& 7* 8  !($# **35648
  22. 1 00 . 21 0 1 1 00 . 0768/

    +%  5&4*20768/)4=?>@95-$4 sklearn.DecisionTreeClassifier/" ,103!.=?>@9# • max_depth(default=None)...0'A B • min_samples_split (default=2).../.<@;0:@9 • min_samples_leaf (default=1)...(*0AB<@;/.0:@9 • max_leaf_nodes (default=None)...AB<@;0 
  23. 1 00 . 21 0 1 1 00 . 

    ( •  " -!,&*%, • +, )Graphviz- # Graphviz' %* %, 3.134-7(45/8 dtreeplt-$  "(.2608
  24. 1 00 . 21 0 1 1 00 . 

    • T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. •  , $% #.-54, , 2012 • A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) – Analytic Vidhya •  )3+2,0 – Python!  •  "&'  – Code Craft House • [Python]Graphviz# 1*/12dtreeplt( - Qiita
  25. 0 2 . .. 2 2 1 • ʢ ʣ

    2. Random Forest 3BOEPN'PSFTU
  26. 1 00 . 21 0 1 1 00 . LRandom

    Forest10M DKFIKHJB  • DKFIKHJB  ,10&,3"179#7  2 &/ %.=':  • 0!$:DKFIKHJB  0! ,+1 -27 #8;/ #:)5 1 =4<(,&, : E?J@6GKAC>J@. *)#:
  27. 1 00 . 21 0 1 1 00 . -8/836#Bootstrap

    Sampling • -8/836"% $,&+!", )-4791,# !) →Bagging, Random Forest"%290$Bootstrap Sample, ! $ , !) • Bootstrap Sampling"%/846*'5/8458.!  , ):resampling with replacementʗ(;
  28. 1 00 . 21 0 1 1 00 . Bootstrap

    Sampling  ... ... ... ... ... ... ...         ... resampling resampling resampling ... 1 2 n
  29. 1 00 . 21 0 1 1 00 . Bagging

    • Bootstrap and AGGregatING • Bootstrap Sampling)")(&" "  #'   →$  " ) !%
  30. 1 00 . 21 0 1 1 00 . Bagging

     ...    ... →    ... ... resampling resampling resampling   
  31. 1 00 . 21 0 1 1 00 . Random

    Forest • &!A9C.I/5K=6D-?$ /E@,$ ←RMUNB Bootstrap SampleBCH<06.5K@, 6=,J(/ 7>+J9E B")/*1@IF8, • Random Forest %A,J(L+H.7EEHK9:2TUPSA'8J3?>") B, @L>0JG-A69 ←Bootstrap SamplingAG;=5K9#QVO4?A,J(/ @J
  32. 1 00 . 21 0 1 1 00 . Random

    Forest  ...     ... ... ... resampling resampling resampling    X1 , X2 , X3 , X4 X5 X1 , X3 , X5 X1 , X2 , X3 , X4
  33. 1 00 . 21 0 1 1 00 . Out-Of-Bag"

    • Random Forest '&(!#" %# Out-Of-Bag"   $*+),Out-Of- Bag- "%#  .http://alfredplpl.hatenablog.com/entry/2013/12/24/225420
  34. 1 00 . 21 0 1 1 00 . sklearnRandomForestClassifier

    DecisionTreeClassifier &*)-# • n_estimators ...   .$' +%010/ • bootstrap ... Bootstrap Sampling '*!.$' +%0True/ • oob_score ... Out-Of-Bag",(+ '*! .$' +%0False/
  35. 1 00 . 21 0 1 1 00 . 

     • ,  ,  , 2012 • David S. Moore, et al, Bootstrap Method and Permutation Tests, “The Practice of Business Statistics: Using Data for Decisions”, ch.18, W. H. Freeman • L. Breiman, and A. Cutler, “Random Forests” • Bagging and Random Forest Ensemble Algorithms for Machine Learning – Machine Learning Mastery