Slide 1

Slide 1 text

0 2 . .. 2 2 1  )152"1&. - 604'.,3%+ - 20193 16 $/-3  ($!4#2*4  

Slide 2

Slide 2 text

0 2 . .. 2 2 1 0.   1.  J ໦ͷੜ੒ํ๏ JJ ෆ७౓ JJJ ໦ͷႩఆํ๏ 2. Random Forest J #PPUTUSBQ4BNQMJOH JJ "EB#PPTUͱ3BOEPN'PSFTU 3. Gradient Boostingʢ ʣ 

Slide 3

Slide 3 text

1 00 . 21 0 1 1 00 . VXWU[ G0 • +HB9P • !L+F@6AESCE:H#-=A6P;QDK F

Slide 4

Slide 4 text

1 00 . 21 0 1 1 00 .  *6>3247 %#+)'-"Titanic6>3247, & 5;>.<0;1:#% = $2 )&% # !& & $ #! />8% ($GitHub#.49!&

Slide 5

Slide 5 text

0 2 . .. 2 2 1 • ʢ  ʣ 1.   ܾఆ໦

Slide 6

Slide 6 text

1 00 . 21 0 1 1 00 . )- • )- *2. 1%'* 20 ,$) ,/"+(#*!/"*453, +*,&  6  6

Slide 7

Slide 7 text

1 00 . 21 0 1 1 00 .     "    !  x1 > 5 x1 <= 5 x2 <= 0.6 x2 > 0.6 x3 = 0 x3 = 1

Slide 8

Slide 8 text

1 00 . 21 0 1 1 00 .    • !"node#   • !"root node#  ! •  !"terminal node $  !: leaf node#   ! •  "branch, sub tree#  

Slide 9

Slide 9 text

1 00 . 21 0 1 1 00 .      

Slide 10

Slide 10 text

1 00 . 21 0 1 1 00 . F F EG]Y^QW\$D CYW\VR`$D 52N1 • ]Y^QW\$D 2N@F (XaUP:7,B6N"F0 P:"D,*PN1 "E;N&PJNHD5L*P) :A37 • YW\VR`$D [aZBAF (XaUPB6N=8+MFD3K4ES_T8B6N" -P;1c

Slide 11

Slide 11 text

1 00 . 21 0 1 1 00 .        

Slide 12

Slide 12 text

1 00 . 21 0 1 1 00 . #   #! "$#'!&#( • CART(C&RT, Classification And Regression Treeʣ • ID3(Iterative Dichotomiser 3) • C4.5 • CHAID(Chi-squared Automatic Interaction Detection) ")( $CART (%#&CART*  "%

Slide 13

Slide 13 text

1 00 . 21 0 1 1 00 .    CART   

Slide 14

Slide 14 text

1 00 . 21 0 1 1 00 .    2 C4.5CHAID

Slide 15

Slide 15 text

1 00 . 21 0 1 1 00 . ʢ)? D ?K =>D#* • J  L Joint ProbabilityK "@$0?%1D@4=+ $A=B0,:8=1*$A=B0?%1DF! ", $ =!7+ • & 89;93@$0%1D@4=+ $A=B0,:8=1*$B?'>2$A0%1DF! " = ∑'() * !(", $' )=!7 • 1 ,D$0%4:8=-.@B=< @$0%1D@4=+ $B0%4:8<$A0%1DF! "|$ =!6*! "|$ = . /,0 . 0 =5ED+ •  = HIGFD ?/CED@4=F =--*HIG0/CE8?CED@ 4=F=-.+ * (@2Ato-kei.netF ?

Slide 16

Slide 16 text

1 00 . 21 0 1 1 00 . ʢ &+&6 ' • 150 7! • 3527 " • !+.4/7 #$ % = {1,2, … , ,} • 352"&!+150 7 !(") • .4/%&!+150 7 !$ • 352"&!+150'"%'.4/&!+150 7!$ (") → ,*-#352"&#.4/%$%+ 0 #$ " = 12(3) 1(3) -)+ 352"'.4/( '  $%+.4/argmax $ 0(#$ |") $%+

Slide 17

Slide 17 text

1 00 . 21 0 1 1 00 . ! "# $ = &'()) &()) ͷಋग़ #+ ! "# = ⁄ -# - #+ $"$!$ ! $ "# = ⁄ -# ($) -#  #+"$!$  ! "# , $ = ! "# ! $ "# = -# - -# ($) -# = -# ($) - "$!$  ! $ = / #01 2 ! "# , $ = / #01 2 -# ($) - = -($) -  

Slide 18

Slide 18 text

1 00 . 21 0 1 1 00 . ! "# $ = &'()) &()) ͷಋग़ʢ͖ͭͮʣ # + ! "# , $ = ⁄ .# ($) ., "$!$ ! $ = ⁄ .($) . "$!$# +   ! "# $ = ! "# , $ ! $ = ⁄ .# ($) . ⁄ .($) . = .# ($) .($) → "$!$#  

Slide 19

Slide 19 text

1 00 . 21 0 1 1 00 .   #& "! ( sklearn.DecisionTreeClassifier!,14.) (%& +3.204 5entropy6"-/ 5gini6%2  *! " ""))%'$)( •  +3.2047 ! " = − ∑ &'( ) * +& " ln *(+& |") • -/ 7 ! " = ∑ &'( ) * +& " (1 − * +& " )

Slide 20

Slide 20 text

1 00 . 21 0 1 1 00 .  0 1!.7 • %6 9 ↓ • $"/+"=@;9 ↓ • $)=@;'7(7- 9 • $) 0 *3 94%$"9 ↓ • ,0 ?$"0 * 3 94%28&030-=@;9 →#79 0=@;:<>/35% $"0 5$  0 5$

Slide 21

Slide 21 text

1 00 . 21 0 1 1 00 . * • .2( +(0 2 • */)(0 "%!,' )1 →*2&-1'& 7* 8  !($# **35648

Slide 22

Slide 22 text

1 00 . 21 0 1 1 00 . 0768/ +%  5&4*20768/)4=?>@95-$4 sklearn.DecisionTreeClassifier/" ,103!.=?>@9# • max_depth(default=None)...0'A B • min_samples_split (default=2).../.<@;0:@9 • min_samples_leaf (default=1)...(*0AB<@;/.0:@9 • max_leaf_nodes (default=None)...AB<@;0 

Slide 23

Slide 23 text

1 00 . 21 0 1 1 00 .  ( •  " -!,&*%, • +, )Graphviz- # Graphviz' %* %, 3.134-7(45/8 dtreeplt-$  "(.2608

Slide 24

Slide 24 text

1 00 . 21 0 1 1 00 .  • T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. •  , $% #.-54, , 2012 • A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) – Analytic Vidhya •  )3+2,0 – Python!  •  "&'  – Code Craft House • [Python]Graphviz# 1*/12dtreeplt( - Qiita

Slide 25

Slide 25 text

0 2 . .. 2 2 1 • ʢ ʣ 2. Random Forest 3BOEPN'PSFTU

Slide 26

Slide 26 text

1 00 . 21 0 1 1 00 . LRandom Forest10M DKFIKHJB  • DKFIKHJB  ,10&,3"179#7  2 &/ %.=':  • 0!$:DKFIKHJB  0! ,+1 -27 #8;/ #:)5 1 =4<(,&, : E?J@6GKAC>J@. *)#:

Slide 27

Slide 27 text

1 00 . 21 0 1 1 00 . -8/836#Bootstrap Sampling • -8/836"% $,&+!", )-4791,# !) →Bagging, Random Forest"%290$Bootstrap Sample, ! $ , !) • Bootstrap Sampling"%/846*'5/8458.!  , ):resampling with replacementʗ(;

Slide 28

Slide 28 text

1 00 . 21 0 1 1 00 . Bootstrap Sampling  ... ... ... ... ... ... ...         ... resampling resampling resampling ... 1 2 n

Slide 29

Slide 29 text

1 00 . 21 0 1 1 00 . Bagging • Bootstrap and AGGregatING • Bootstrap Sampling)")(&" "  #'   →$  " ) !%

Slide 30

Slide 30 text

1 00 . 21 0 1 1 00 . Bagging  ...    ... →    ... ... resampling resampling resampling   

Slide 31

Slide 31 text

1 00 . 21 0 1 1 00 . Random Forest • &!A9C.I/5K=6D-?$ /E@,$ ←RMUNB Bootstrap SampleBCH<06.5K@, 6=,J(/ 7>+J9E B")/*1@IF8, • Random Forest %A,J(L+H.7EEHK9:2TUPSA'8J3?>") B, @L>0JG-A69 ←Bootstrap SamplingAG;=5K9#QVO4?A,J(/ @J

Slide 32

Slide 32 text

1 00 . 21 0 1 1 00 . Random Forest  ...     ... ... ... resampling resampling resampling    X1 , X2 , X3 , X4 X5 X1 , X3 , X5 X1 , X2 , X3 , X4

Slide 33

Slide 33 text

1 00 . 21 0 1 1 00 . Out-Of-Bag" • Random Forest '&(!#" %# Out-Of-Bag"   $*+),Out-Of- Bag- "%#  .http://alfredplpl.hatenablog.com/entry/2013/12/24/225420

Slide 34

Slide 34 text

1 00 . 21 0 1 1 00 . sklearnRandomForestClassifier DecisionTreeClassifier &*)-# • n_estimators ...   .$' +%010/ • bootstrap ... Bootstrap Sampling '*!.$' +%0True/ • oob_score ... Out-Of-Bag",(+ '*! .$' +%0False/

Slide 35

Slide 35 text

1 00 . 21 0 1 1 00 .   • ,  ,  , 2012 • David S. Moore, et al, Bootstrap Method and Permutation Tests, “The Practice of Business Statistics: Using Data for Decisions”, ch.18, W. H. Freeman • L. Breiman, and A. Cutler, “Random Forests” • Bagging and Random Forest Ensemble Algorithms for Machine Learning – Machine Learning Mastery