Slide 1

Slide 1 text

5 7 痥 4 畍 㹋鄲 睗 4 皹ךעյTree-based ٜؓإٛثّס㵅鍮מחַי⺅׽䪒ַױ ׌ն霄箖ס㵅鍮ע Jupyter نٜؒؕםל؅⩧מ牞霼׊ג偙ֿ؂־׽ ׷׌ַסך׆׆ךע⺅׽䪒؂׍յ睗 2 皹׷睗 3 皹ס⫐㵼־׼ꓨ锡ם掾 ؅ Python ך牞霼׌׾כַֹ䓺䑑؅⺅׽גַכ䘼ַױ׌ն♧┫յ4-1 硼ך睗 4 皹ס阾鼥偙ꓹמחַיױכ״גסהמյ4-2 硼ך尴㴻勎ס圸 碎מֵגזיס㕈勓⭦槏מחַיյ4-3 硼ךؓ٤ئ٤هٜ㳔肪מחַ יא׿ב׿⺅׽䪒ַױ׌ն 4.1 痥 4 畍ך鎸鯹ח֮׋׏גך倯ꆙ 4-1 硼ךע睗 4 皹ס阾鼥מֵגזיס偙ꓹמחַי祔ⷃמׇ鞃僻׊ױ׌ն כַֹס׵յ րحٞ־׼✑׾ DeepLearningցס׻ֹמ槏韢؅㵅鍮־׼牞霼 ׌׾剹禶עַׂח־⮂曫׈׿יַױ׌ֿյ㵅鍮ס鼥׎偙ע־ם׽詇脢嬐מꇙ ַֿ⮂׷׌ַכ䘼؂׿׾ג״ך׌ն ┞沁㝕׀ם脝ֻ偙סꇙַכ׊יעր؛هةؘؠع䭰⻔ס׻ֹםوٞءِٚ ٤ء䪫岺؅לסׂ׼ַ氠ַ׾־ցֿ䮕ׅ׼׿׾סךעםַ־כ瞉脢ע脝ֻי ֽ׽յ ր勓倀╚מע؛هةؘؠع䭰⻔ך剹ׂ׬׀ךעםַցכַֹסֿ瞉脢 57

Slide 2

Slide 2 text

5 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ס锶闋ך׌նאס׻ֹמ瞉脢ֿ脝ֻ׾槏氮כ׊יעյ ր勓倀╚מ؛هةؘؠ ع䭰⻔؅⯼䳀כ׌׾㵅鍮؅鼥׎׾כ牞霼ס䩘ꪨֿ⺅׼׿׾ג״ցך׌ն㵅鍮 ךֵ׿ףꫀꅙؤ٭غ؅䱱׌דׄך׌ײױ׌ֿյ剹禶ס㖪⻉ע鞅ײꂉ׊؅⯼䳀 כ׊ꇃׁ׾כ鞅׳⣨ס鬘䬎ֿ㝕׀ַכ䘼؂׿ױ׌ն 杅מ Tree-based סٜؓإٛثّמꫀ׊יעյゼ꾴㴻聋׷䩘岺מ׻׾㖪⻉ ⮔ֿׄ㝂ׂյ؛هةؘؠع䭰⻔ךױכ״׾⯼䳀ך闋鞃؅ꅼ״׾כبتطّ陭 阛מꫀ׌׾锡筶ֿ䒣ׂם׽յٜؓإٛثّס槏闋מ겏╚ך׀םׂם׾סךע כַֹ䧂䗻ֵֿ׽ױ׌ն ꫀꅙ㵅鍮؅䱱׌갾ע㵅갾סؤ٭غ聁־׼䱱׌偙ֿ僃ַכ䘼؂׿׾ג״յط ؞تعמֽׄ׾闋鞃כ׊יעꓨ锡◜꽃؅ⷃ✄ך雧׎׾׻ֹם䓺䑑כ׊יյ㵅 鍮؅牞霼׌׾갾ע⯁㛿׽ס Jupyter ؅ׇ牞霼ַגדׂכַֹ䓺䑑ך勓剹ע鞃 僻؅鉿ֹֽכ䘼ַױ׌ն㵅갾עꓨ锡⭦槏עٓةٖ٭ٜⵊ׌׾סֿ┞薭溷ך׌ ֿյوٞءِٚ٤ء篑닫ס劔摾מ׻׽䈼ֿ⮂׾סכյⷃ✄ך雧׎׾偙ֿ槏韢 ס槏闋מחםׅ׷׌ַסךյ׆׆ךע⭦槏ⷃ✄؅牞霼׌׾׆ככ׊ױ׊גն כַֹ׆כך䓜ط؞تع⫐ךע㕈勓⭦槏؅אסױױ⮗׽⮂׊ױ׊גֿյٓ ةٖ٭ٜⵊ׌׾מֵגזיעא׿׮לٜؓإٛثّٝيٜךס㜟剳◜꽃עם ַסךյ㵅갾מٚؕهٚٛ؅❈氠׌׾갾מ׵焒锶כ׊י䔢מ皑חסךעםַ ־כ䘼ַױ׌նױגյ峜䟨◜꽃כ׊יյ׆׆ךס㵅鍮ע槏韢ס槏闋؅潨溷כ ׌׾׵סךֵ׽յ㵅鍮סⲯ椙םלעא׿׮ל脝䢩׊יַױ׎؆סך׆ס掾ע ׇ峜䟨ׂד׈ַն䌐儅ם溪䞯מ㕈טַי䌐儅ם㵅鍮؅鉿ֹכַֹ䟨㎫ך㵅鍮 ؅鉿ַױ׊גסךյ׻׽ⲯ椙溷ם㵅鍮؅鉿ַגַכַֹ偙ע scikit-learn 㢼 ״յ⻄甦ٚؕهٚٛס㵅鍮؅ׇ牞霼ַגדׄג׼כ䘼ַױ׌ն 4.2 寸㹀加ך㛇劤Ⳣ椚 4-2 硼ךע尴㴻勎ס㵅鍮מֵגזיס㕈勓⭦槏מחַי牞霼؅鉿ַױ׌ն ♧┫յ4-2-1 硼ך⮔꿔כ㎇䊟סゼ꾴陭㴻כ׊י氠ַ׾ظ٭ذجشعס牞霼؅ 鉿ַյ籽ׂ 4-2-2 硼ךؙ٤عٞم٭յةؼ➳俙յ◝▗⾔靯䈼םלס雄❿㕈徙 (criteria) מחַיյ4-2-3 硼ךע杅䖇ꓪסخ٭عמחַי牞霼؅鉿ַױ׌ն 58

Slide 3

Slide 3 text

5 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ױגյ4-2-4 硼ךעؿ٭غס⮔ⰺמחַיյ4-2-5 硼ךע勎؅圸碎׊יַׂמ ֵגזיס⫐ꌃס枱䡢ס⟛䭥מחַיյ4-2-6 硼ךע㳔肪׊ג勎מ㕈טַג ◙廠מחַיא׿ב׿⺅׽䪒ַױ׌ն 4.2.1 ⴓ겲ה㔐䌓ך㉏겗鏣㹀 4-2-1 硼ךע⮔꿔כ㎇䊟סゼ꾴陭㴻ס䪻䳢מֵגזיյئ٤وٜכ׊י氠 ַ׾ظ٭ذجشعס牞霼؅鉿ַױ׌ն⮔꿔׵㎇䊟׵䩘꼿םظ٭ذجشع؅氠 ַ׼׿׿ףכַֹ׆כךלה׼׵ scikit-learn מ埉徙㵅鍮סظ٭ذجشع؅ 氠ַ׾׆ככ׊ױ׊גն scikit-learn מ♕㺲ס╚ך׵⺅׽䪒ַ׷׌ׂ劔⻏ם׵סכַֹ׆כךյ⮔ 꿔מחַיע irisյ㎇䊟מחַיע Boston Housing ؅א׿ב׿氠ַױ׌ն ♧┫յא׿ב׿סظ٭ذجشعסٞ٭غםלמחַי祔ⷃמ牞霼׊ױ׌ ױ׍ע iris dataset מחַי牞霼׊ױ׌ն 1: import numpy as np 2: from sklearn import datasets 3: 4: iris = datasets.load_iris() 5: X = iris.data 6: y = iris.target 7: 8: print(X.shape) 9: print(y.shape) 59

Slide 4

Slide 4 text

6 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.1: iris dataset Ώ ┪阾ךעظ٭ذجشعסٞ٭غכ鉿⮬סئؕثס牞霼؅鉿ַױ׊גն scikit-learn כ⻎免מ NumPy ׵鞅ײꁎ؆ךַױ׌ֿյ┞ꅙס鉿⮬亣✑מ䖩 锡םסך⩰מ鞅ײꁎ؆דכׇ槏闋ַגדׄג׼כ䘼ַױ׌ն 1: import numpy as np 2: from sklearn import datasets 3: 4: iris = datasets.load_iris() 5: X = iris.data 6: y = iris.target 7: 8: print(X.shape) 9: print(y.shape) 60

Slide 5

Slide 5 text

6 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.2: iris dataset ΐ ┪阾ךע (X, y) מ㸐׊יىشر٭ 5 鉿ס⮂ⲇכյא׿ב׿סꏕ⮬מ⻻ױ ׿׾㜟俙ס㒘מחַי牞霼؅鉿ַױ׊גն 籽ַי Boston Housing מחַי׵牞霼׊ױ׌ն 1: iris = datasets.load_boston() 2: X = iris.data 3: y = iris.target 4: 5: print(X.shape) 6: print(y.shape) 7: print(X[:2,:]) 8: print(y[:5]) 9: print(X.dtype) 10: print(y.dtype) 61

Slide 6

Slide 6 text

6 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.3: Boston Housing dataset ㎫ 4.3 ךע鉿⮬סئؕثכىشر٭鉿ס牞霼؅鉿ַױ׊גնئ٤وٜظ٭ ذמחַי⺅׽䪒ֹ갾ע׆ס׻ֹמ鉿⮬סئؕثכىشر٭ס䝠㖥؅牞霼׌ ׾כ⪢✄ס䪻䳢ֿ僃ׂם׾סךֽ׌׌״ך׌ն 4.2.2 鐰⣣㛇彊 (criteria) 4-2-2 硼ךע雄❿㕈徙 (criteria) סؙ٤عٞم٭յةؼ➳俙յ◝▗⾔靯䈼 סא׿ב׿ס㵅鍮מחַי⺅׽䪒ַױ׌ն׆׆ךؙ٤عٞم٭כةؼ➳俙ע ⮔꿔ゼ꾴؅⯼䳀כ׌׾ג״ iris ؅յ◝▗⾔靯䈼ע㎇䊟ゼ꾴؅⯼䳀כ׌׾ג״ Boston Housing ؅א׿ב׿氠ַױ׌ն 62

Slide 7

Slide 7 text

6 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ؒٝزٗؾ٦ ؙ٤عٞم٭מחַי牞霼׌׾מֵגזיעյf(x) = −xlog(x) − (1 − x)log(1 − x) סꫀ俙ס 0 < x < 1 מֽׄ׾ءٚن؅䫅ֻ׾כ虘ַ־כ䘼ַ ױ׌ն 1: import matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y1 = -x*np.log(x) - (1-x)*np.log(1-x) 6: y2 = -x*np.log2(x) - (1-x)*np.log2(1-x) 7: y3 = (-x*np.log(x) - (1-x)*np.log(1-x))*1.2 8: 9: plt.plot(x, y1, color="green") 10: plt.plot(x, y2, color="blue") 11: plt.plot(x, y3, color="red") 12: plt.show() 63

Slide 8

Slide 8 text

6 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.4: ؙ٤عٞم٭ (Entropy) ┪阾؅牞霼׊ג갾מյ┪מ⭾םꫀ俙ךֵ׾׆כֿ牞霼ך׀ױ׌ն׆׿ֿ䟨 ⽱׌׾׆ככ׊יյ2 ؠٚت⮔꿔ס갾סؙ٤عٞم٭ע 1 2 ך⻄ؠٚتמⰺ׽ 䓜י׼׿׾갾מ劄㝕כם׾׆כֿ؂־׽ױ׌3 ؠٚت♧┪ך׵㐬瞏מؠٚ ت⮔꿔׈׿׾㖪⻉סؙ٤عٞم٭ֿ┞沁㝕׀ַך׌ նױגյ׆ס갾מؙ٤ عٞم٭ס阛砯מ氠ַ׾㸐俙ꫀ俙ס䍏מחַי宜מם׾偙׵ֽ׼׿׾כ䘼ַ ױ׌ֿյ䍏ס㜟䳕⪪䑑؅氠ַ׾׆כך (㴻俙) • (㜟䳕䔿ס㸐俙ꫀ俙) כ׌׾ ׆כֿך׀׾ג״յ2 ؅氠ַ׻ֹֿ e ؅氠ַ׻ֹֿأؕ٤؅阛砯׌׾尴㴻勎 64

Slide 9

Slide 9 text

6 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ס㳔肪מַֽיע篙卸ע㜟؂׽ױ׎؆ն㎫מַֽיյ귱ֿ䍏ֿ 2 ס㖪⻉յ糽 ֿ䍏ֿ e ס㖪⻉ך׌ֿյ鰱ך銨׊ג糽ס 1.2 ⠨ֿ⡑؅鞪俠׌׿ף귱מꓨם׾ כַֹסֿ׆׆ךסَؕ٤عך׌ն ׵ֹ㸴׊⪽✄溷מ牞霼ך׀׿ףכַֹ׆כך iris מחַי牞霼׊ױ׌ն − 3 k=1 pklog(pk ) ס俙䑑מ㕈טַיյiris סؙ٤عٞم٭ס阛砯؅鉿ַױ׌ն 1: _, y_counts = np.unique(y, return_counts=True) 2: res_criterion = 0 3: for i in range(0,y_counts.shape[0]): 4: ratio = y_counts[i]/y.shape[0] 5: res_criterion -= ratio*np.log(ratio) 6: 7: print(y_counts) 8: print(res_criterion) ㎫ 4.5: iris סؙ٤عٞم٭ (Entropy) ㎫ 4.5 ע iris סؙ٤عٞم٭؅阛砯׊ג篙卸מם׽ױ׌ն׆׆ך峜䟨ֿ䖩 65

Slide 10

Slide 10 text

6 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 锡םסעؙ٤عٞم٭ך氠ַ׾㸐俙ꫀ俙ךյ䑛俙ס ratio ֿ 0 ס㖪⻉㸐俙ꫀ 俙ס㴻聋㔔־׼㜽׿ױ׌ն┞䗎 xlog(x) ס +0 ס嚋ꮹע 0 םסךյ0 כ׊י ׵虘ַסך׌ֿյوٞءِٚ٤ءמֵגזיעؙٚ٭⭦槏ֿ䖩锡מם׽ױ ׌նאסג״յ׆סꁊמחַי宜מ׊גׂםַ㖪⻉ע姌꽃ך⺅׽䪒ֹةؼ➳ 俙ס偙ֿ꽒⣌ך׌׵ה؀؆ٚؕهٚٛךؙٚ٭⭦槏ע氠䟨׈׿יַ׾ס ךյⷃמ scikit-learn םל؅❈ֹ갾מע׆סꁊעא׿׮ל宜מ׊םׂי虘ַ כ䘼ַױ׌ ն آص⤘侧 ةؼ➳俙ס槏闋מֵגזיע f(x) = 1 − x2 − (1 − x)2 = 2x(1 − x) = −2(x − 1 2 )2 + 1 2 סꫀ俙ס 0 < x < 1 מֽׄ׾ءٚن؅䫅ֻ׾סֿ虘ַ־כ 䘼ַױ׌ն 1: import matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y = 1-x**2-(1-x)**2 6: 7: plt.plot(x, y, color="green") 8: plt.show() 66

Slide 11

Slide 11 text

6 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.6: ةؼ➳俙 (Jini Index) ㎫ 4.6 ך牞霼ך׀׾׻ֹמةؼ➳俙עؙ٤عٞم٭כ⻎坎յ┪מ⭾םꫀ俙 כם׽ױ׌ն㎫ 4.4 כ锶嬟׬׾כ㕈勓溷מעֵױ׽㜟؂׼םַך׌ֿյ㎫ 4.4 ס偙ֿ x = 0.3 ׷ x = 0.7 ס׻ֹם劄㝕⡑כ劄㸯⡑ס╚ꪨמֵ׾⡑ֿ嬟 鼛溷㝕׀ׂם׾׆כֿ牞霼ך׀ױ׌ն׆סꁊסꫀ俙מחַיעյֵ׾瓦䍲槏 韢溷ם㸬⮂םל׵ꓨ锡ך׌ֿյֵױ׽鞪׬ꇃׁ׾כ㝕㜟םסךյⷃמءٚن ؅䲾ַיْؕ٭ة؅䲖׳כַֹס׵ꓨ锡ך׌ն ױגյؙ٤عٞم٭כ⻎坎מ iris מꫀ׊יةؼ➳俙؅阛砯׊יײױ׌ն 67

Slide 12

Slide 12 text

6 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: import matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y = 1-x**2-(1-x)**2 6: 7: plt.plot(x, y, color="green") 8: plt.show() ㎫ 4.7: iris סةؼ➳俙 (Jini Index) ㎫ 4.7 ע 1 ־׼ 1 3 ס◝▗ךֵ׾ 1 9 ؅ 3 ㎇䑛ַג⡑כ׊י槏闋׌׾כ虘 ַכ䘼ַױ׌ն׆׆ך阛砯篙卸ֿ 0.66...65 כ⮂יַ׾סע阛砯┪⮂׾靯䈼 ך׌ն⸮槏ס槏闋ס嫘갧ךע㕈勓溷מעא׿׮ל宜מ׊םׂי虘ַך׌ֿյ ה׶؆כ׊ג㵅鍮؅鉿ֹ갾ע׆סꁊעֵ׾瓦䍲宜מ׌׾䖩锡ֵֿ׽ױ׌ն ✳⛦ㄤ铎䊴 ◝▗⾔靯䈼ס槏闋מֵגזיעյⷃ㎇䊟⮔卥םלס劄㸯◝▗岺כ⻎坎מ脝 ֻיֽׂכ虘ַ־כ䘼ַױ׌ն♧┫յBoston Housing מꫀ׊י◝▗⾔靯䈼 68

Slide 13

Slide 13 text

6 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ؅阛砯׊יײױ׌ն 1: res_criterion = np.mean((y-np.mean(y))**2) 2: 3: print(res_criterion) ㎫ 4.8: ◝▗⾔靯䈼 ◝▗⾔靯䈼מꫀ׊יע㎫ 4.8 ס׻ֹמ NumPy ס mean ْخشغ؅氠ַ׾ ׆כךب٤وٜמ㵅鍮؅鉿ֹ׆כֿך׀ױ׌ն 4.2.3 暴䗙ꆀךا٦ز 4-2-3 硼ךע杅䖇ꓪסخ٭عמחַי⺅׽䪒ַױ׌ն׆ס需ע┞锶册ꄼמ خ٭ع׌׿ף虘ַדׄס׻ֹמ锶ֻױ׌ֿյא׿ב׿ס㜟俙מחַיخ٭ع ׊יא׿מ㸐䗎׌׾潨溷㜟俙؅⺅׽⮂׌䖩锡䙎׷յ勎ס⮔ⰺ؅鉿ֹמֵגז יסئ٤وٜס硄槏ס䖩锡䙎؅ꤥײ׾כ需עם־ם־鏿겧מם׽ױ׌ն ׆ס㵅槁מֵגזיյNumPy ס argsort ס׻ֹם塌茣؅氠ַיյ杅䖇ꓪ ס꽄沁ס Index ؅䕑יאס Index ؅氠ַיئ٤وٜ־׼ꌃ⮔겏⻉؅✑䧯׌ ׾כַֹ偙岺ֿ┞ח脝ֻ׼׿ױ׌סך׆׆ךע׆ס偙岺מחַי牞霼׊ױ ׌ն鞃僻דׄ锶י׵؂־׽ט׼ַכ䘼؂׿׾ג״յ♧┫ iris ؅氠ַי祔ⷃמ 牞霼׊יײױ׌ն 69

Slide 14

Slide 14 text

7 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: print(X[:10,0]) 2: print("-------") 3: print(np.argsort(X[:10,0])) ㎫ 4.9: argsort(NumPy) ┪阾ע؂־׽׷׌ַ׻ֹמ iris ס 1 ⮬潨מ㸐׊յ┪־׼ 10 ⠥סئ٤وٜ ؅䑛ז䒟זיא׿ב׿ס⡑ס僰꽄מ Index ؅חׄג䓺מם׽ױ׌ն׆ה׼ עր9(8+1) 沁潨ֿ 4.4 ך┞沁㸯׈ׂյ4(3+1) 沁潨ֿ 4.6 ךאס姌מ㸯׈ ׂյ....յ6(5+1) 沁潨ֿ 5.4 ך┞沁㝕׀ַցס׻ֹמ槏闋׌׿ף虘ַך׌ն ׆׿ֿך׀׾כյ1 ⮬潨ס杅䖇ꓪס㸯׈ַ꽄מ 5 חئ٤وٜ؅⺅׽⮂׌םל ס⭦槏ֿ⺪茣מם׽ױ׌ն 1: x1 = X[:10,0] 2: asc_idx = np.argsort(x1) 3: 4: print(asc_idx[:5]) 5: print(X[asc_idx[:5], :]) 6: print(y[asc_idx[:5]]) 70

Slide 15

Slide 15 text

7 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.10: argsort(NumPy) ┪阾ס׻ֹמ鉿ֹ׆כךյ1 ⮬潨ס杅䖇ꓪס⡑מ㕈טַי僰꽄מ X ׷ y ס ⡑؅䕑׾׆כֿ⺪茣מם׽ױ׌ն 4.2.4 ؜؎ٝך鎘皾הظ٦سךⴓⶴ 4-2-4 硼ךעؿ٭غס⮔ⰺמחַי牞霼׊ױ׌նخ٭ع׊ג杅䖇ꓪמחַ יأؕ٤ס阛砯؅鉿ַ訪ؿ٭غס⮔ⰺ؅鉿ַױ׌ն杅䖇ꓪמ؜طإٛ㜟俙ס ײ؅氠ַ׾ ID3 כע沌ם׽յꅙ籽㜟俙؅脝䢩׌׾ C4.5 ׷ CART מחַי ע㕈勓溷מ 4-2-3 硼סخ٭ع䔿ס篙卸؅氠ַי糹䓜ג׽ך阛砯؅鉿ֹ䖩锡ֿ ֵ׽ױ׌ն ♧┫յةؼ➳俙؅氠ַג iris ס⮔꿔ゼ꾴؅❛מ㵅鍮מחַי牞霼؅鉿זי 鉿׀ױ׌ն 71

Slide 16

Slide 16 text

7 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: def calc_criteria(y): 2: _, y_counts = np.unique(y, return_counts=True) 3: res_criterion = 0 4: for i in range(0,y_counts.shape[0]): 5: ratio = y_counts[i]/y.shape[0] 6: res_criterion += ratio*(1-ratio) 7: return res_criterion 8: 9: print(calc_criteria(y)) ㎫ 4.11: ةؼ➳俙ס阛砯 ױ׍ iris ס⩧ظ٭ذמֽׄ׾ةؼ➳俙ס阛砯؅鉿ַױ׊גն㎫ 4.11 ך篙 卸ֿ 0.6666... כם׽ױ׊גֿյ׆׿ע┫阾ס׻ֹמ׵㸬⮂ך׀ױ׌ն 1 − 50 150 2 − 50 150 2 − 50 150 2 = 1 − 1 3 = 2 3 4-1 硼ך׵闑׿ג׻ֹמ闋鞃ס갾ע⺪茣םꮹ׽؛هةؘؠع䭰⻔׷ꫀ 俙؅氠ַםַך闋鞃؅鉿ֹ׻ֹמ׊יַ׾סך׌ֿյ׆׆ךעֵֻי calc_criteria ꫀ俙ס㵅鍮؅鉿ַױ׊גն׆ס槏氮כ׊יעյأؕ٤ס阛 72

Slide 17

Slide 17 text

7 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 砯מֵגזי⮔ⰺ⯼כ⮔ⰺ䔿ך⻎׋⭦槏؅鉿ַյ䈼؅阛砯׌׾ג״ך׌ն׈ ׼מյ׆׿؅杅䖇ꓪס⡑ס俙דׄ繪׽ꂉ׌סךյםֽ׈׼ꫀ俙ⵊ׊י⫙⯈氠 ׊׷׌ַ䓺מ׊יֽׂ䖩锡ֵֿ׽ױ׌ն ׆׆ױךך criteria ס阛砯מחַיע牞霼ך׀גסךյ姌מأؕ٤ס阛砯 מחַי牞霼׊ױ׌ն 1: desc_index = np.argsort(X[:,0]) 2: gini_before = calc_criteria(y) 3: gini_after = ((1/2)*calc_criteria(y[desc_index[:X.shape[0]//2]]) \ 4: +(1/2)*calc_criteria(y[desc_index[X.shape[0]//2:]])) 5: gain = gini_before - gini_after 6: print(gain) ㎫ 4.12: أؕ٤ס阛砯 4-2-3 硼ך闑׿ג NumPy ס argsort ؅氠ַי杅䖇ꓪ X ס 1 ⮬潨ס⡑؅ خ٭ع׊յאס⡑מ㕈טַיئ٤وٜ؅◝⮔ⰺ׊ג갾ס gini ➳俙؅阛砯׊յ ׆׿מ㕈טַיأؕ٤؅阛砯׊יַױ׌ն 1: desc_index = np.argsort(X[:,0]) 2: gini_before = calc_criteria(y) 73

Slide 18

Slide 18 text

7 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 3: res_gain = np.zeros([X.shape[0]-1]) 4: for i in range(0, X.shape[0]-1): 5: gini_after = (((i+1)/150)*calc_criteria(y[desc_index[:i]]) \ 6: +((150-i)/150)*calc_criteria(y[desc_index[i:]])) 7: res_gain[i] = gini_before - gini_after 8: 9: print(np.max(res_gain)) 10: print(np.argmax(res_gain)) ㎫ 4.13: أؕ٤ס劄㝕⡑ס阛砯 ׆׿כ⻎坎ם⭦槏؅♑ס杅䖇ꓪמ׵ꈌ氠׊㵅鉿׌׾׆כךյ訪ؿ٭غס⮔ ⰺ؅鉿ַױ׌նױג׆ס갾מյ4-2-5 硼ך祔ⷃמ闑׿ױ׌ֿյ⪢יס訪ؿ٭ غמחַי⻎坎ס阛砯؅鉿ַ劄㝕⡑؅阛砯׌׾䖩锡ֵֿ׾סךյ׆סꁊס㵅 鍮עם־ם־鏿겧מם׾׆כמ峜䟨ֿ䖩锡ך׌ն 74

Slide 19

Slide 19 text

7 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 4.2.5 粸׶鵤׃怴皾חֶֽ׷朐䡾ך⥂䭯* 4-2-5 硼ךע繪׽ꂉ׊悍砯מֽׄ׾枱䡢ס⟛䭥מחַי⺅׽䪒ַױ׌ն 虝չם偙岺ֵֿ׾כ䘼ַױ׌ֿյ瞉脢עם׾׬ׂب٤وٜמ㵅鍮׊ג־זג סךյظ٭ذجشع؅✑䧯׊ג訪ؿ٭غמⰺ׽䓜ייַׂ׻ֹם㵅鍮؅鉿ַ ױ׊גն♧┫אס⯼䳀ךꀐ剹؅氠ַג枱䡢ס⟛䭥מחַי牞霼׊ױ׌ն 1: leaves = {"T1":(X, y)} 2: print(type(leaves)) 3: print(leaves["T1"][0][:3, :]) 4: print(leaves["T1"][1][:3]) ㎫ 4.14: 訪ؿ٭غכ iris dataset ┪阾ס׻ֹמ訪ס ID ؅ keyյא׿מ㸐䗎׌׾ (X, y) ؅ value כ׌׾׆כ ך硄槏׊׻ֹכַֹ偙ꓹך׌ն׆ס׻ֹמ׌׾׆כך leaves מ㸐׊յ "T1"ס ׻ֹם key ؅氠ַי X כ y ס⡑؅א׿ב׿⺅׽⮂׌׆כֿ⺪茣מם׽ױ׌ն 姌מ訪ؿ٭غסꃯⲎס갾ס⭦槏מחַי牞霼׊ױ׌ն 75

Slide 20

Slide 20 text

7 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: leaves = {"T1":(X, y)} 2: print(type(leaves)) 3: print(leaves["T1"][0][:3, :]) 4: print(leaves["T1"][1][:3]) ㎫ 4.15: 訪ؿ٭غסꃯⲎ ꃯⲎ׌׾ؿ٭غעא׿ױךסؿ٭غס⮔ⰺכ׊י銨槁׈׿׾סךյ⮔ⰺ׌ ׾ؿ٭غמ㸐䗎׌׾ꀐ剹ס锡筶؅ del ؅氠ַי巆⹛׊յ偆גמ◝חס訪ؿ٭ غסꃯⲎ؅鉿ַױ׌ն㎫ 4.15 ךע 1 沁潨ס杅䖇ꓪס⡑ךخ٭ع׊յ75 ئ٤ وٜ׍חiris ע 150 ئ٤وٜT1 כ T2 מ䮴׽⮔ׄ؅鉿ַױ׊גն׆ס갾 מ print ؅氠ַג⮂ⲇ篙卸ס┞沁䈱ס⮬מ濪潨׌׾כյT1 ע 4.3 ־׼僰꽄 ך㢼ױ׾סמ㸐׊յT2 ע 5.8 ־׼僰꽄ך㢼ױ׾׆כֿ牞霼ך׀׾סךյ⭦ 槏סْؕ٭ةֿח־ײ׷׌ַ־כ䘼ַױ׌ն 訪ؿ٭غסꃯⲎמֵגזיס⭦槏מחַיע牞霼׊յؿ٭غ⮔ⰺסْؕ٭ ةֿ䲖״ױ׊גֿյ׈׼מ׆׆ך宜מ׌׾䖩锡ֵֿ׾סֿ勎ס圸ꅎמחַי 76

Slide 21

Slide 21 text

7 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ך׌նכַֹס׵訪ؿ٭غס⮔ⰺס䍲מאס訪ؿ٭غ؅巆׊յ2 ח訪ؿ٭غ ؅ꃯⲎ׌׾׻ֹם⭦槏דכյלסذِؕ٤ءךל׿ֿꃯⲎ׈׿ג־ֿ؂־׼ םׂם׽յ篙卸勎ס圸ꅎֿ؂־׼םׂם׾־׼ך׌ն׆׿ךע◙廠 (䱿韢) ؅鉿ֹ׆כֿך׀םׂם׽ױ׌ն ׆סゼ꾴מ㸐׊יյ瞉脢עؿ٭غסꃯⲎ免מ㸐骭כםזגؿ٭غס䝠㖥؅ 媘׌׆כך㸐⭦׌׾׆ככ׊ױ׊גն ㎫ 4.16: 訪ؿ٭غסꃯⲎ 77

Slide 22

Slide 22 text

7 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.17: 訪ؿ٭غסꃯⲎ꽄כ勎ס圸ꅎ גכֻף㎫ 4.16 ס׻ֹמ⟛㲽؅鉿ֹדׄךյ㎫ 4.17 ס׻ֹם勎圸ꅎ؅⫙ 槁׌׾׆כֿך׀ױ׌ն㎫ 4.17 ס؛ٝ٤ةס訪ؿ٭غֿא׿ב׿◙廠⡑מ 㸐䗎׌׾כַֹْؕ٭ةךׇ槏闋ַגדׄג׼כ䘼ַױ׌ն׆ס⭦槏ꇃ瓦מ חַיյ┫阾ך祔ⷃמ⪽✄ⵊ׊יײױ׊גն 1. T1 ؅⮔ⰺ׊י T1 כ T2 ؅✑䧯׌׾ 2. T2 ؅⮔ⰺ׊י T2 כ T3 ؅✑䧯׌׾ 3. T2 ؅⮔ⰺ׊י T2 כ T4 ؅✑䧯׌׾ 4. T3 ؅⮔ⰺ׊י T3 כ T5 ؅✑䧯׌׾ 5. T1 ؅⮔ⰺ׊י T1 כ T6 ؅✑䧯׌׾ ⮔ⰺס꽄沁؅阾ꜗ׊יֽׂדׄךյ┪阾ס׻ֹמ勎ס圸ꅎ؅⫙槁ך׀׾כ ַֹס؅ْؕ٭ةַגדׄגסךעםַ־כ䘼ַױ׌ն 78

Slide 23

Slide 23 text

7 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.18: ⮔ⰺסꫲ⡑ס⟛㲽 ױגյ勎ס圸ꅎֿגל׿׾׆כךյ⮔ⰺס匛⚂מחַיע㎫ 4.18 ס׻ֹ מյ氠ַ׾杅䖇ꓪכאסꫲ⡑דׄ؅⟛䭥׌׿ף虘ַ׆כ׵؂־׽ױ׌ն 4.2.6 ✮庠* 4-2-6 硼ךע◙廠מחַי牞霼؅鉿ַױ׌ն4-2-5 硼ך⟛䭥׊ג枱䡢מ㕈 ט׀א׿ב׿ס訪ؿ٭غס⡑؅㳔肪免מ⟛䭥׊յא׿؅⩧מ◙廠؅鉿ַױ ׌ն׆׆ךꓨ锡םסעյ4-2-5 硼ס leaves ؅氠ַ׾׆כ؅⯼䳀כ׌׾כظ٭ ذجشعסئؕث⮔؅䊬מ⟛䭥׌׾䖩锡ֿ气׋յؓ٤ئ٤هٜ㳔肪؅鉿ֹ׆ כ׵脝ֻ׾כ劳ױ׊ׂםַכַֹ׆כך׌ն כַֹ׆כךյ4-2-5 硼ך牞霼׊ג㜟俙סֹה leaves ע⟛䭥׊םַך split_leaf כ thresholds מⲎֻיא׿ב׿ס訪ؿ٭غס⡑ס┩חדׄ؅氠ַ י◙廠׌׾׆כ؅脝ֻױ׌նsplit_leaf כ thresholds מחַיע 4-2-5 硼ך 牞霼׊גסךյ4-2-6 硼ךעױ׍訪ؿ٭غס⡑מחַי牞霼؅鉿ַױ׌ն 1: desc_index = np.argsort(X[:,2]) 2: leaves = {"T1":(X, y)} 3: del leaves["T1"] 4: leaves["T1"] = (X[desc_index[:52], :], y[desc_index[:52]]) 79

Slide 24

Slide 24 text

8 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 5: leaves["T2"] = (X[desc_index[52:], :], y[desc_index[52:]]) 6: 7: print(leaves["T1"][1]) 8: print(leaves["T2"][1]) ㎫ 4.19: א׿ב׿ס訪ؿ٭غס y ס⡑ ַ׀ם׽♣銨⡑דׄ؅牞霼׌׾כざ痒מם׾סךյ 3 沁潨ס杅䖇ꓪס 52 沁 潨ס⡑ך⮔ⰺ؅鉿זיײױ׊גն㎫ 4.19 ־׼牞霼ך׀׾׻ֹמյ iris dataset מַֽי 3 沁潨ס杅䖇ꓪ؅氠ַי 1֐50 沁潨כ 51֐150 沁潨מ⮔ׄ׾׆כ ךؠٚت 0 ֿ㴞櫼מ⮔꿔ך׀׾׻ֹך׌նכעַֻյֵױ׽㴞櫼מ⮔꿔؅鉿 ֻ׾杅媗ם塌⚶עא׿׮לםַג״յֵֻי 52 沁潨ס⡑؅ꫲ⡑מ陭ׄ׾׆ ככ׊ױ׊גն 1: print(np.mean(leaves["T1"][1])) 2: print(np.mean(leaves["T2"][1])) 80

Slide 25

Slide 25 text

8 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.20: 訪ؿ٭غס♣銨⡑ (㎇䊟) 訪ؿ٭غס♣銨⡑؅寛״׾갾מע⮔꿔׻׽׵㎇䊟ס偙ֿ䌐㐬؅寛״׾דׄ םסך阛砯ֿب٤وٜך׌ն ┞偙ך⮔꿔מֵגזיעյא׿ב׿ס訪ؿ٭غך┞沁㝂ַ⡑؅♣銨⡑כ׊ יꈷש䖩锡ֵֿ׽յ׆ס㵅鍮מֵגזיע┫阾ס׻ֹמ┞䩘ꪨ䖩锡מם׽ ױ׌ն 1: cate_T1, y_counts_T1 = np.unique(leaves["T1"][1], return_counts=True) 2: cate_T2, y_counts_T2 = np.unique(leaves["T2"][1], return_counts=True) 3: 4: print(y_counts_T1) 5: print(y_counts_T2) 6: print(np.argmax(y_counts_T1)) 7: print(np.argmax(y_counts_T2)) 8: print(cate_T1) 9: print(cate_T2) 10: print("====") 11: print(cate_T1[np.argmax(y_counts_T1)]) 12: print(cate_T2[np.argmax(y_counts_T2)]) 81

Slide 26

Slide 26 text

8 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.21: 訪ؿ٭غס♣銨⡑ (⮔꿔) ㎫ 4.21 ךע NumPy ס unique ꫀ俙ס return_counts ؛وب٘٤؅ True מ׊י㵅鉿؅鉿םזיַױ׌ն׆ס篙卸עյ1 ח潨סꂉ׽⡑כ׊י겏阛׊ ג؜طإٛյ2 ח潨סꂉ׽⡑כ׊יא׿ב׿ס⡑ס⮂槁俙ֿ⮂ⲇ׈׿ױ׌ն אסג״ NumPy מֽׄ׾祔ⷃם겏阛ꫀ俙כ⻎坎מׇ槏闋ַגדׄ׾כ虘 ַסךעםַ־כ䘼ַױ׌ն㎫ 4.21 ךע y_counts_T1 ׷ y_counts_T2 ֿא׿ב׿劄㝕⡑؅⺅׾ؕ٤ظشؠت؅ argmax ؅氠ַי⺅䕑׊յ׆׿؅ cate_T1 ׷ cate_T2 סؕ٤ظشؠتכ׊י䭰㴻׌׾׆כך T1 ס♣銨⡑ס 0 כ T2 ס♣銨⡑ס 2 ؅䕑׾׆כֿך׀ױ׌ն1 מחַיע T1 כ T2 מف ٚفٚמ䨾㺲׌׾׆ככםזגג״׆׆ךעלה׼ס♣銨⡑כ׵ם׼ם־ז גյכׇ槏闋ַגדׄג׼כ䘼ַױ׌ն 82

Slide 27

Slide 27 text

8 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.22: 訪ؿ٭غס♣銨⡑ס阾ꜗ ׆ס׻ֹמ阛砯׊ג♣銨⡑؅㎫ 4.22 ס׻ֹמ node_values כ׊י阾ꜗ׊ גכ脝ֻױ׌ն ׆׆ױךס⫐㵼ךյsplit_leafյthresholdsյnode_values ס┩חֿזג סךյ׆׿׼؅⩧מ◙廠؅鉿ֹ׆כמחַי♧┫牞霼׊יַ׀ױ׌ն◙廠מ ֽׄ׾勎ס㺤ꪛמֵגזיעא׿ב׿ס thresholds מֽׄ׾⮔㼜מ㸐׊יյ 4-2-5 硼ך牞霼׊ג leaves ؅ⰺ׽䓜י׾סכ⻎坎ם⭦槏؅◙廠氠ס X מꈌ 氠׌׿ף虘ַסך׌ֿյX ؅⮔ⰺ׊יⰺ׽䓜ייַׂכئ٤وֿٜⰺ׽䓜י ׼׿םַ訪ؿ٭غֵֿ׽յ׆ס⺅׽䪒ַֿ㸴չ㝕㜟ך׌ն׆ס㎇ꉌמֵגז יյ◙廠מַֽיעئ٤وٜס⮔ⰺךעםׂյTrue/False סؕ٤ظشؠت ؅א׿ב׿ס訪ؿ٭غמⰺ׽䓜י׾偙ֿ祔ⷃךעםַ־כ䘼؂׿ױ׊גն 1: init_idx = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: print(init_idx[:10]) 6: print(X[:10, 0]) 7: print(split_idx_left[:10]) 8: print(split_idx_right[:10]) 83

Slide 28

Slide 28 text

8 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.23: 韢槏悍砯Ώ ⮔ⰺמֵגזיע㎫ 4.23 ס׻ֹם⭦槏؅杅䖇ꓪסꫲ⡑מ㸐׊י鉿ֻף虘 ַך׌ն2 鉿潨כ 3 鉿潨ס X ס⮬䭰㴻 (㎫ 4.23 ךע 0) כ嬟鼛׌׾俙㲻 (㎫ 4.23 ךע 5.) ס 2 חמ thresholds ס⡑؅ꈌ氠׌׾׆כך◙廠מֵגזיס 訪ؿ٭غס⮔ⰺֿ⺪茣ך׌ն ׆׿׼؅ split_leaf מ㕈טַי 4-2-5 硼ס leaves כ⻎坎ם⭦槏؅鉿ֹכ 虘ַך׌ֿյ訪ؿ٭غס⮴劻⡑ע⪢י True ךֵ׾׵ססյ訪ؿ٭غ؅⮔ⰺ ׌׾갾מ False ֿ⮂י匡׾ג״׆ס⺅׽䪒ַ؅脝ֻ׾䖩锡ֵֿ׽ױ׌ն 1: init_idx = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: split_right_1 = X[:, 0] < 5.2 6: split_right_2 = X[:, 0] >= 5.2 7: print(split_right_1[:10]) 8: print(split_right_2[:10]) 9: print("======") 10: print(split_idx_right[:10]) 84

Slide 29

Slide 29 text

8 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 11: print("======") 12: print(split_idx_right[:10]*split_right_1[:10]) 13: print(split_idx_right[:10]*split_right_2[:10]) ㎫ 4.24: 韢槏悍砯ΐ 㸐䗎כ׊יע㎫ 4.24 ס׻ֹמյ訪ؿ٭غ㺤ꪛס갾מ偆׊ַ匛⚂䑑ך阛砯 ׊ג True/False כ㺤ꪛ⯼ס True/False ס畤؅阛砯׌׾׆כך and 匛⚂כ ⻎坎ם⭦槏؅㵅槁׌׾׆כֿך׀ױ׌ն 85

Slide 30

Slide 30 text

8 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: init_idx = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: split_right_1 = X[:, 0] < 5.2 6: split_right_2 = X[:, 0] >= 5.2 7: 8: idx = split_idx_right[:10]*split_right_1[:10] 9: X_10 = X[:10, :] 10: print(idx) 11: print(X_10[idx, :]) 12: print("=====") 13: print(X_10) 86

Slide 31

Slide 31 text

8 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.25: 韢槏悍砯Α 阛砯׊גؕ٤ظشؠت؅ X ׷ y מ氠ַ׾׆כךյ㎫ 4.25 ס׻ֹמ訪 ؿ٭غⷃ⛺ך X ؅⺅׽⮂׌׆כֿ⺪茣ך׌նױגյؕ٤ظشؠتⷃ⛺ך node_value ס⡑؅⺅׽⮂׌׆כך◙廠׵鉿ֹ׆כֿך׀ױ׌ն 87

Slide 32

Slide 32 text

8 8 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 4.3 ،ٝ؟ٝـٕ㷕统 4-3 硼ךעؓ٤ئ٤هٜ㳔肪ס㵅鍮מחַי牞霼؅鉿ַױ׌ն㕈勓溷מע 4-2 硼ך㵅鍮׊ג尴㴻勎؅ؠٚتⵊ׊յאסؕ٤تذ٤ت؅鏿俙✑䧯׊⫙⯈ 氠׌׾׆כך㵅鍮ֿ⺪茣םג״յ4-2 硼סٜؓإٛثّס㵅鍮מ嬟׬׾כ嬟 鼛溷槏闋ע׊׷׌ַכ䘼ַױ׌ն♧┫յ4-3-1 硼ךעف؟٤ء (Bagging)յ 4-3-2 硼ךעٚ٤رّنؚٝتع (Random Forest)յ4-3-3 硼ךעⴢꏕه٭ تطؔ٤ء (Gradient Boosting) מחַיא׿ב׿⺅׽䪒ַױ׌ն 4.3.1 غؘؚٝ 4-3-1 硼ךעف؟٤ء (Bagging) ס㵅鍮מחַי⺅׽䪒ַױ׌ն♧┫յ 4-2-1 硼ך⺅׽䪒זג iris dataset ؅ X כ y כ׊י牞霼؅鉿ַױ׌ն ױ׍յف؟٤ء؅脝ֻ׾갾מ䖩锡מם׾סֿյ ր沌ם׾ئ٤وٜמ㕈טַ י栃皑ם㳔肪㊭؅圸碎׌׾ցכַֹ׆כך׌ն׆ס׻ֹם免ע㕈勓溷מעئ ٤وٜסؕ٤ظشؠت؅א׿ב׿✑䧯׌׿ף虘ַכ脝ֻ׾כ虘ַך׌ն 1: np.random.seed(10) 2: 3: sample_idx = np.random.choice(np.arange(0, 100, 1), size=int(100*0.7)) 4: unique_idx, idx_counts = np.unique(sample_idx, return_counts=True) 5: print(sample_idx) 6: print("=====") 7: print(unique_idx) 8: print(idx_counts) 88

Slide 33

Slide 33 text

8 9 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.26: ؕ٤ظشؠتס✑䧯 (ꓨ鏿ֵ׽) ㎫ 4.26 ע 100 ⠥סئ٤وٜ־׼ 70 ⠥؅ꓨ鏿ֵ׽ךئ٤وٛ٤ء׌׾갾ס ؕ٤ظشؠت气䧯כ脝ֻיַגדׄג׼כ䘼ַױ׌նnp.random.seed(10) ע闋鞃סꌬ⻉┪յ篙卸ס⫙槁䙎؅䬎⟛׌׾ג״מ▸俙ס㎷㴻؅鉿ַױ׊גն 1: sample_idx = np.random.choice(np.arange(0, X.shape[0], 1), \ 2: size=int(X.shape[0]*0.1)) 3: 4: print(sample_idx) 5: print("=====") 6: print(X[sample_idx, :]) 89

Slide 34

Slide 34 text

9 0 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.27: iris ־׼סئ٤وٛ٤ء ㎫ 4.26 ע iris ־׼ 10 סئ٤وٜ؅ꓨ鏿ֵ׽ךئ٤وٛ٤ء׊ג篙卸ך ׌նؕ٤ظشؠت 88 ֿꓨ鏿׊יַ׾סךյ[5.6 3. 4.1 1.3] ֿ◝㎇ئ٤وٛ ٤ء׈׿יַ׾׆כֿ牞霼ך׀ױ׌ն 尴㴻勎ס㕈勓⭦槏ֿؠٚتⵊ׈׿יַ׿ףف؟٤ءעⷃמئ٤وٜ־׼ٚ ٤رّמ䬂⮂׊յ鏿俙㎇㵅鉿؅鉿ַյ4-2-6 ס♣銨⡑ס阛砯כ⻎׋锡꽝ך㝂 俙尴 (⮔꿔) ױגע䌐㐬 (㎇䊟) ؅阛砯׊յ◙廠⡑؅阛砯׊ױ׌ն 4.3.2 ٓٝتيؿٖؓأز 4-3-2 硼ךעٚ٤رّنؚٝتع (Random Forest) ס㵅鍮מחַי⺅׽ 䪒ַױ׌ն㕈勓溷מע 4-3-1 硼סف؟٤ءכ׮ׯ⻎׋ך׌ֿյ杅䖇ꓪסٚ٤ رّךסꈷ䫘׵⻉؂׎י鉿ֹכַֹ掾ֿ沌ם׽ױ׌ն 90

Slide 35

Slide 35 text

9 1 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 1: np.random.seed(0) 2: np.random.choice(np.arange(0, X.shape[1], 1), \ 3: size=int(X.shape[1]*0.7), replace=False) ㎫ 4.28: 杅䖇ꓪסꈷ䫘 (ꓨ鏿ם׊) 杅䖇ꓪסꈷ䫘מֵגזיעꓨ鏿׊םַ׻ֹמꈷצגַסךյ㎫ 4.28 ךע np.random.choice ס؛وب٘٤ס replace ؅ False כ׊ױ׊גն ױגյ㳔肪ס갾מꈷ䫘׊ג杅䖇ꓪע◙廠ס갾מ氠ַ׾䖩锡ֵֿ׾סךյ㳔 肪㊭ס㜟俙כ׊י媘׊יֽׂ䖩锡ֵֿ׽ױ׌ն 1: select_features = [] 2: 3: np.random.seed(0) 4: select_features.append(np.random.choice(np.arange(0, \ 5: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 6: select_features.append(np.random.choice(np.arange(0, \ 7: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 8: select_features.append(np.random.choice(np.arange(0, \ 9: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 10: 11: print(select_features) 91

Slide 36

Slide 36 text

9 2 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.29: 杅䖇ꓪס阾ꜗכ◙廠免ס⹧攍 㳔肪免מ❈氠׊ג杅䖇ꓪמחַיעյ㎫ 4.29 ס׻ֹמ׌׾׆כך◙廠免 מ⹧攍ֿ⺪茣מם׽ױ׌նא׿ב׿ס㳔肪㊭ס◙廠⡑؅⺅䕑׊ג䔿ס⭦槏ע ⮔꿔ךֵ׿㎇䊟ךֵ׿ف؟٤ءכ⻎׋םסךא׿׮ל곓׊ׂ脝ֻםׂי虘ַ ־כ䘼ַױ׌ն 4.3.3 ⺟ꂁـ٦أذ؍ؚٝ 4-3-3 鞃ךעⴢꏕه٭تطؔ٤ء (Gradient Boosting) ס㵅鍮מחַי⺅ ׽䪒ַױ׌ն⮔꿔מחַיע⡑סⰺ׽䓜ימꫀ׊י鏿겧מם׽׷׌ַסךյ ㎇䊟סײמחַי⺅׽䪒ַױ׌ն 92

Slide 37

Slide 37 text

9 3 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.30: ⴢꏕه٭تطؔ٤ءס㵅鍮 ⴢꏕه٭تطؔ٤ءמחַיעؠٚتכ׊יס㵅鍮؅אסױױ牞霼׌׾ 偙ֿ؂־׽׷׌ַכ䘼؂׿גסךյ㎫ 4.30 ךע⯁㛿׽ס Jupyter نٜؒؕ ס⭦槏؅ׇ箩♃׌׾׆כמ׊ױ׊גն㕈勓溷מע scikit-learn סؕ٤ذ٭ نؘ٭تמ⛣׎יַ׾סךյscikit-learn כ⻎坎םْؕ٭ةךׇ牞霼ַגד ׄג׼כ䘼ַױ׌ն ױ׍յclf ע尴㴻勎ס㵅鍮؅ؠٚتⵊ׊ג׵ס־׼ؕ٤تذ٤ت؅✑䧯׊յ אה׼؅氠ַי㳔肪؅鉿םזיַ׾׆כֿ؂־׾־כ䘼ַױ׌ն׆ס׻ֹם 㵅鍮עؓ٤ئ٤هٜ㳔肪מַֽיעא׿׮ל㜟؂זג需ךעםַכ䘼ַױ׌ ֿյف؟٤ءכסꇙַע┫־׼ 3 鉿ך׌ն㳔肪מ氠ַג X ؅氠ַי y_pred ؅阛砯׊յ׆׿؅ y_־׼䑛׀յ⫙䍲 y_מחַי㳔肪؅鉿ֹכַֹ嵣׿מם ׽ױ׌ն ׆׿מ׻׽յא׿ױךמ㳔肪׊ג㳔肪㊭ס篙卸מ㕈טַי♧䔿ס㳔肪㊭ס 㳔肪؅鉿ֹه٭تطؔ٤ءס⭦槏؅㵅槁׌׾׆כֿך׀ױ׌ն׆׆ך㵅鍮מ 93

Slide 38

Slide 38 text

9 4 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ֵגזיյy_؅ y ס deepcopy כ׊םׄ׿ף⡑ֿ⻎免מ紬겏׈׿׾׆כך 劳ױ׊ׂםַ䮕ⳛ؅獏׌׆כמ峜䟨ך׌նױגյy_pred ؅ y_־׼䑛ׂ갾 מـؕق٭قْٚ٭ذס α ؅־ׄ׾׆כ׵ֵ׾סךյقْٚ٭ذ⭦槏ֿ⺪ 茣ם׻ֹמ㵅鍮؅鉿ַױ׊גն 94