Waku Michishita
March 16, 2019
270

# 数式と実装から復習するツリーアルゴリズム

March 16, 2019

## Transcript

1. ### 0 2 . .. 2 2 1  )152"1&. -

604'.,3%+ - 20193 16 \$/-3  (\$!4#2*4  
2. ### 0 2 . .. 2 2 1 0.  

1.  J ໦ͷੜ੒ํ๏ JJ ෆ७౓ JJJ ໦ͷႩఆํ๏ 2. Random Forest J #PPUTUSBQ4BNQMJOH JJ "EB#PPTUͱ3BOEPN'PSFTU 3. Gradient Boostingʢ ʣ 
3. ### 1 00 . 21 0 1 1 00 . VXWU[

G0 • <G GE/* • VXWU[)or"). .*7MI \$G • !F@6A1OHI43YTWYZR?>+HB9P • !L+F@6AESCE:H#-=A6P;QDK F<QB66G8E • <G BLP<C^LNE6<C • \LP] %=A6P!G '2 G,&= • \5JOLNE6]KaggleBG%( • \LNE6]!G(
4. ### 1 00 . 21 0 1 1 00 . 

*6>3247 %#+)'-"Titanic6>3247, & 5;>.<0;1:#% = \$2 )&% # !& & \$ #! />8% (\$GitHub#.49!&
5. ### 0 2 . .. 2 2 1 • ʢ 

ʣ 1.   ܾఆ໦
6. ### 1 00 . 21 0 1 1 00 . )-

• )- *2. 1%'* 20 ,\$) ,/"+(#*!/"*453, +*,&  6  6
7. ### 1 00 . 21 0 1 1 00 . 

   "    !  x1 > 5 x1 <= 5 x2 <= 0.6 x2 > 0.6 x3 = 0 x3 = 1
8. ### 1 00 . 21 0 1 1 00 . 

 • !"node#   • !"root node#  ! •  !"terminal node \$  !: leaf node#   ! •  "branch, sub tree#  
9. ### 1 00 . 21 0 1 1 00 . 

    
10. ### 1 00 . 21 0 1 1 00 . F

F EG]Y^QW\\$D CYW\VR`\$D 52N1 • ]Y^QW\\$D 2N@F (XaUP:7,B6N"F0 P:"D,*PN1 "E;N&PJNHD5L*P) :A37 • YW\VR`\$D [aZBAF (XaUPB6N=8+MFD3K4ES_T8B6N" -P;1<F%/P ;N*PHN19OP'M.;9CBP1 →# GYW\VR`\$D BF5!b4FI9?>c
11. ### 1 00 . 21 0 1 1 00 . 

      
12. ### 1 00 . 21 0 1 1 00 . #

  #! "\$#'!&#( • CART(C&RT, Classification And Regression Treeʣ • ID3(Iterative Dichotomiser 3) • C4.5 • CHAID(Chi-squared Automatic Interaction Detection) ")( \$CART (%#&CART*  "%
13. ### 1 00 . 21 0 1 1 00 . 

  CART   
14. ### 1 00 . 21 0 1 1 00 . 

  2 C4.5CHAID
15. ### 1 00 . 21 0 1 1 00 . ʢ)?

D ?K =>D#* • J  L Joint ProbabilityK "@\$0?%1D@4=+ \$A=B0,:8=1*\$A=B0?%1DF! ", \$ =!7+ • & 89;93@\$0%1D@4=+ \$A=B0,:8=1*\$B?'>2\$A0%1DF! " = ∑'() * !(", \$' )=!7 • 1 ,D\$0%4:8=-.@B=< @\$0%1D@4=+ \$B0%4:8<\$A0%1DF! "|\$ =!6*! "|\$ = . /,0 . 0 =5ED+ •  = HIGFD ?/CED@4=F =--*HIG0/CE8?CED@ 4=F=-.+ * (@2Ato-kei.netF ?
16. ### 1 00 . 21 0 1 1 00 . ʢ

&+&6 ' • 150 7! • 3527 " • !+.4/7 #\$ % = {1,2, … , ,} • 352"&!+150 7 !(") • .4/%&!+150 7 !\$ • 352"&!+150'"%'.4/&!+150 7!\$ (") → ,*-#352"&#.4/%\$%+ 0 #\$ " = 12(3) 1(3) -)+ 352"'.4/( '  \$%+.4/argmax \$ 0(#\$ |") \$%+
17. ### 1 00 . 21 0 1 1 00 . !

"# \$ = &'()) &()) ͷಋग़ #+ ! "# = ⁄ -# - #+ \$"\$!\$ ! \$ "# = ⁄ -# (\$) -#  #+"\$!\$  ! "# , \$ = ! "# ! \$ "# = -# - -# (\$) -# = -# (\$) - "\$!\$  ! \$ = / #01 2 ! "# , \$ = / #01 2 -# (\$) - = -(\$) -  
18. ### 1 00 . 21 0 1 1 00 . !

"# \$ = &'()) &()) ͷಋग़ʢ͖ͭͮʣ # + ! "# , \$ = ⁄ .# (\$) ., "\$!\$ ! \$ = ⁄ .(\$) . "\$!\$# +   ! "# \$ = ! "# , \$ ! \$ = ⁄ .# (\$) . ⁄ .(\$) . = .# (\$) .(\$) → "\$!\$#  
19. ### 1 00 . 21 0 1 1 00 . 

 #& "! ( sklearn.DecisionTreeClassifier!,14.) (%& +3.204 5entropy6"-/ 5gini6%2  *! " ""))%'\$)( •  +3.2047 ! " = − ∑ &'( ) * +& " ln *(+& |") • -/ 7 ! " = ∑ &'( ) * +& " (1 − * +& " )
20. ### 1 00 . 21 0 1 1 00 . 

0 1!.7 • %6 9 ↓ • \$"/+"=@;9 ↓ • \$)=@;'7(7- 9 • \$) 0 *3 94%\$"9 ↓ • ,0 ?\$"0 * 3 94%28&030-=@;9 →#79 0=@;:<>/35% \$"0 5\$  0 5\$
21. ### 1 00 . 21 0 1 1 00 . *

• .2( +(0 2 • */)(0 "%!,' )1 →*2&-1'& 7* 8  !(\$# **35648
22. ### 1 00 . 21 0 1 1 00 . 0768/

+%  5&4*20768/)4=?>@95-\$4 sklearn.DecisionTreeClassifier/" ,103!.=?>@9# • max_depth(default=None)...0'A B • min_samples_split (default=2).../.<@;0:@9 • min_samples_leaf (default=1)...(*0AB<@;/.0:@9 • max_leaf_nodes (default=None)...AB<@;0 
23. ### 1 00 . 21 0 1 1 00 . 

( •  " -!,&*%, • +, )Graphviz- # Graphviz' %* %, 3.134-7(45/8 dtreeplt-\$  "(.2608
24. ### 1 00 . 21 0 1 1 00 . 

• T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. •  , \$% #.-54, , 2012 • A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) – Analytic Vidhya •  )3+2,0 – Python!  •  "&'  – Code Craft House • [Python]Graphviz# 1*/12dtreeplt( - Qiita
25. ### 0 2 . .. 2 2 1 • ʢ ʣ

2. Random Forest 3BOEPN'PSFTU
26. ### 1 00 . 21 0 1 1 00 . LRandom

Forest10M DKFIKHJB  • DKFIKHJB  ,10&,3"179#7  2 &/ %.=':  • 0!\$:DKFIKHJB  0! ,+1 -27 #8;/ #:)5 1 =4<(,&, : E?J@6GKAC>J@. *)#:
27. ### 1 00 . 21 0 1 1 00 . -8/836#Bootstrap

Sampling • -8/836"% \$,&+!", )-4791,# !) →Bagging, Random Forest"%290\$Bootstrap Sample, ! \$ , !) • Bootstrap Sampling"%/846*'5/8458.!  , ):resampling with replacementʗ(;
28. ### 1 00 . 21 0 1 1 00 . Bootstrap

Sampling  ... ... ... ... ... ... ...         ... resampling resampling resampling ... 1 2 n
29. ### 1 00 . 21 0 1 1 00 . Bagging

• Bootstrap and AGGregatING • Bootstrap Sampling)")(&" "  #'   →\$  " ) !%
30. ### 1 00 . 21 0 1 1 00 . Bagging

 ...    ... →    ... ... resampling resampling resampling   
31. ### 1 00 . 21 0 1 1 00 . Random

Forest • &!A9C.I/5K=6D-?\$ /E@,\$ ←RMUNB Bootstrap SampleBCH<06.5K@, 6=,J(/ 7>+J9E B")/*1@IF8, • Random Forest %A,J(L+H.7EEHK9:2TUPSA'8J3?>") B, @L>0JG-A69 ←Bootstrap SamplingAG;=5K9#QVO4?A,J(/ @J
32. ### 1 00 . 21 0 1 1 00 . Random

Forest  ...     ... ... ... resampling resampling resampling    X1 , X2 , X3 , X4 X5 X1 , X3 , X5 X1 , X2 , X3 , X4
33. ### 1 00 . 21 0 1 1 00 . Out-Of-Bag"

• Random Forest '&(!#" %# Out-Of-Bag"   \$*+),Out-Of- Bag- "%#  .http://alfredplpl.hatenablog.com/entry/2013/12/24/225420
34. ### 1 00 . 21 0 1 1 00 . sklearnRandomForestClassifier

DecisionTreeClassifier &*)-# • n_estimators ...   .\$' +%010/ • bootstrap ... Bootstrap Sampling '*!.\$' +%0True/ • oob_score ... Out-Of-Bag",(+ '*! .\$' +%0False/
35. ### 1 00 . 21 0 1 1 00 . 

 • ,  ,  , 2012 • David S. Moore, et al, Bootstrap Method and Permutation Tests, “The Practice of Business Statistics: Using Data for Decisions”, ch.18, W. H. Freeman • L. Breiman, and A. Cutler, “Random Forests” • Bagging and Random Forest Ensemble Algorithms for Machine Learning – Machine Learning Mastery