Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tree-based Algorithm by Python

Tree-based Algorithm by Python

Tree-basedアルゴリズムのPythonでの実装について取り扱った、下記の4章の内容を公開します。
https://lib-arts.booth.pm/items/2858666

基本的な理論の理解に基づいた平易な実装を行うにあたって重要と思われる事項についてまとめました。

87c236e94282fcf81192203e84a6e784?s=128

LiberalArts

April 03, 2021
Tweet

Transcript

  1. 5 7 痥 4 畍 㹋鄲 睗 4 皹ךעյTree-based ٜؓإٛثّס㵅鍮מחַי⺅׽䪒ַױ

    ׌ն霄箖ס㵅鍮ע Jupyter نٜؒؕםל؅⩧מ牞霼׊ג偙ֿ؂־׽ ׷׌ַסך׆׆ךע⺅׽䪒؂׍յ睗 2 皹׷睗 3 皹ס⫐㵼־׼ꓨ锡ם掾 ؅ Python ך牞霼׌׾כַֹ䓺䑑؅⺅׽גַכ䘼ַױ׌ն♧┫յ4-1 硼ך睗 4 皹ס阾鼥偙ꓹמחַיױכ״גסהמյ4-2 硼ך尴㴻勎ס圸 碎מֵגזיס㕈勓⭦槏מחַיյ4-3 硼ךؓ٤ئ٤هٜ㳔肪מחַ יא׿ב׿⺅׽䪒ַױ׌ն 4.1 痥 4 畍ך鎸鯹ח֮׋׏גך倯ꆙ 4-1 硼ךע睗 4 皹ס阾鼥מֵגזיס偙ꓹמחַי祔ⷃמׇ鞃僻׊ױ׌ն כַֹס׵յ րحٞ־׼✑׾ DeepLearningցס׻ֹמ槏韢؅㵅鍮־׼牞霼 ׌׾剹禶עַׂח־⮂曫׈׿יַױ׌ֿյ㵅鍮ס鼥׎偙ע־ם׽詇脢嬐מꇙ ַֿ⮂׷׌ַכ䘼؂׿׾ג״ך׌ն ┞沁㝕׀ם脝ֻ偙סꇙַכ׊יעր؛هةؘؠع䭰⻔ס׻ֹםوٞءِٚ ٤ء䪫岺؅לסׂ׼ַ氠ַ׾־ցֿ䮕ׅ׼׿׾סךעםַ־כ瞉脢ע脝ֻי ֽ׽յ ր勓倀╚מע؛هةؘؠع䭰⻔ך剹ׂ׬׀ךעםַցכַֹסֿ瞉脢 57
  2. 5 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ס锶闋ך׌նאס׻ֹמ瞉脢ֿ脝ֻ׾槏氮כ׊יעյ ր勓倀╚מ؛هةؘؠ

    ع䭰⻔؅⯼䳀כ׌׾㵅鍮؅鼥׎׾כ牞霼ס䩘ꪨֿ⺅׼׿׾ג״ցך׌ն㵅鍮 ךֵ׿ףꫀꅙؤ٭غ؅䱱׌דׄך׌ײױ׌ֿյ剹禶ס㖪⻉ע鞅ײꂉ׊؅⯼䳀 כ׊ꇃׁ׾כ鞅׳⣨ס鬘䬎ֿ㝕׀ַכ䘼؂׿ױ׌ն 杅מ Tree-based סٜؓإٛثّמꫀ׊יעյゼ꾴㴻聋׷䩘岺מ׻׾㖪⻉ ⮔ֿׄ㝂ׂյ؛هةؘؠع䭰⻔ךױכ״׾⯼䳀ך闋鞃؅ꅼ״׾כبتطّ陭 阛מꫀ׌׾锡筶ֿ䒣ׂם׽յٜؓإٛثّס槏闋מ겏╚ך׀םׂם׾סךע כַֹ䧂䗻ֵֿ׽ױ׌ն ꫀꅙ㵅鍮؅䱱׌갾ע㵅갾סؤ٭غ聁־׼䱱׌偙ֿ僃ַכ䘼؂׿׾ג״յط ؞تعמֽׄ׾闋鞃כ׊יעꓨ锡◜꽃؅ⷃ✄ך雧׎׾׻ֹם䓺䑑כ׊יյ㵅 鍮؅牞霼׌׾갾ע⯁㛿׽ס Jupyter ؅ׇ牞霼ַגדׂכַֹ䓺䑑ך勓剹ע鞃 僻؅鉿ֹֽכ䘼ַױ׌ն㵅갾עꓨ锡⭦槏עٓةٖ٭ٜⵊ׌׾סֿ┞薭溷ך׌ ֿյوٞءِٚ٤ء篑닫ס劔摾מ׻׽䈼ֿ⮂׾סכյⷃ✄ך雧׎׾偙ֿ槏韢 ס槏闋מחםׅ׷׌ַסךյ׆׆ךע⭦槏ⷃ✄؅牞霼׌׾׆ככ׊ױ׊גն כַֹ׆כך䓜ط؞تع⫐ךע㕈勓⭦槏؅אסױױ⮗׽⮂׊ױ׊גֿյٓ ةٖ٭ٜⵊ׌׾מֵגזיעא׿׮לٜؓإٛثّٝيٜךס㜟剳◜꽃עם ַסךյ㵅갾מٚؕهٚٛ؅❈氠׌׾갾מ׵焒锶כ׊י䔢מ皑חסךעםַ ־כ䘼ַױ׌նױגյ峜䟨◜꽃כ׊יյ׆׆ךס㵅鍮ע槏韢ס槏闋؅潨溷כ ׌׾׵סךֵ׽յ㵅鍮סⲯ椙םלעא׿׮ל脝䢩׊יַױ׎؆סך׆ס掾ע ׇ峜䟨ׂד׈ַն䌐儅ם溪䞯מ㕈טַי䌐儅ם㵅鍮؅鉿ֹכַֹ䟨㎫ך㵅鍮 ؅鉿ַױ׊גסךյ׻׽ⲯ椙溷ם㵅鍮؅鉿ַגַכַֹ偙ע scikit-learn 㢼 ״յ⻄甦ٚؕهٚٛס㵅鍮؅ׇ牞霼ַגדׄג׼כ䘼ַױ׌ն 4.2 寸㹀加ך㛇劤Ⳣ椚 4-2 硼ךע尴㴻勎ס㵅鍮מֵגזיס㕈勓⭦槏מחַי牞霼؅鉿ַױ׌ն ♧┫յ4-2-1 硼ך⮔꿔כ㎇䊟סゼ꾴陭㴻כ׊י氠ַ׾ظ٭ذجشعס牞霼؅ 鉿ַյ籽ׂ 4-2-2 硼ךؙ٤عٞم٭յةؼ➳俙յ◝▗⾔靯䈼םלס雄❿㕈徙 (criteria) מחַיյ4-2-3 硼ךע杅䖇ꓪסخ٭عמחַי牞霼؅鉿ַױ׌ն 58
  3. 5 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ױגյ4-2-4 硼ךעؿ٭غס⮔ⰺמחַיյ4-2-5

    硼ךע勎؅圸碎׊יַׂמ ֵגזיס⫐ꌃס枱䡢ס⟛䭥מחַיյ4-2-6 硼ךע㳔肪׊ג勎מ㕈טַג ◙廠מחַיא׿ב׿⺅׽䪒ַױ׌ն 4.2.1 ⴓ겲ה㔐䌓ך㉏겗鏣㹀 4-2-1 硼ךע⮔꿔כ㎇䊟סゼ꾴陭㴻ס䪻䳢מֵגזיյئ٤وٜכ׊י氠 ַ׾ظ٭ذجشعס牞霼؅鉿ַױ׌ն⮔꿔׵㎇䊟׵䩘꼿םظ٭ذجشع؅氠 ַ׼׿׿ףכַֹ׆כךלה׼׵ scikit-learn מ埉徙㵅鍮סظ٭ذجشع؅ 氠ַ׾׆ככ׊ױ׊גն scikit-learn מ♕㺲ס╚ך׵⺅׽䪒ַ׷׌ׂ劔⻏ם׵סכַֹ׆כךյ⮔ 꿔מחַיע irisյ㎇䊟מחַיע Boston Housing ؅א׿ב׿氠ַױ׌ն ♧┫յא׿ב׿סظ٭ذجشعסٞ٭غםלמחַי祔ⷃמ牞霼׊ױ׌ ױ׍ע iris dataset מחַי牞霼׊ױ׌ն 1: import numpy as np 2: from sklearn import datasets 3: 4: iris = datasets.load_iris() 5: X = iris.data 6: y = iris.target 7: 8: print(X.shape) 9: print(y.shape) 59
  4. 6 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.1:

    iris dataset Ώ ┪阾ךעظ٭ذجشعסٞ٭غכ鉿⮬סئؕثס牞霼؅鉿ַױ׊גն scikit-learn כ⻎免מ NumPy ׵鞅ײꁎ؆ךַױ׌ֿյ┞ꅙס鉿⮬亣✑מ䖩 锡םסך⩰מ鞅ײꁎ؆דכׇ槏闋ַגדׄג׼כ䘼ַױ׌ն 1: import numpy as np 2: from sklearn import datasets 3: 4: iris = datasets.load_iris() 5: X = iris.data 6: y = iris.target 7: 8: print(X.shape) 9: print(y.shape) 60
  5. 6 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.2:

    iris dataset ΐ ┪阾ךע (X, y) מ㸐׊יىشر٭ 5 鉿ס⮂ⲇכյא׿ב׿סꏕ⮬מ⻻ױ ׿׾㜟俙ס㒘מחַי牞霼؅鉿ַױ׊גն 籽ַי Boston Housing מחַי׵牞霼׊ױ׌ն 1: iris = datasets.load_boston() 2: X = iris.data 3: y = iris.target 4: 5: print(X.shape) 6: print(y.shape) 7: print(X[:2,:]) 8: print(y[:5]) 9: print(X.dtype) 10: print(y.dtype) 61
  6. 6 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.3:

    Boston Housing dataset ㎫ 4.3 ךע鉿⮬סئؕثכىشر٭鉿ס牞霼؅鉿ַױ׊גնئ٤وٜظ٭ ذמחַי⺅׽䪒ֹ갾ע׆ס׻ֹמ鉿⮬סئؕثכىشر٭ס䝠㖥؅牞霼׌ ׾כ⪢✄ס䪻䳢ֿ僃ׂם׾סךֽ׌׌״ך׌ն 4.2.2 鐰⣣㛇彊 (criteria) 4-2-2 硼ךע雄❿㕈徙 (criteria) סؙ٤عٞم٭յةؼ➳俙յ◝▗⾔靯䈼 סא׿ב׿ס㵅鍮מחַי⺅׽䪒ַױ׌ն׆׆ךؙ٤عٞم٭כةؼ➳俙ע ⮔꿔ゼ꾴؅⯼䳀כ׌׾ג״ iris ؅յ◝▗⾔靯䈼ע㎇䊟ゼ꾴؅⯼䳀כ׌׾ג״ Boston Housing ؅א׿ב׿氠ַױ׌ն 62
  7. 6 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ؒٝزٗؾ٦ ؙ٤عٞم٭מחַי牞霼׌׾מֵגזיעյf(x)

    = −xlog(x) − (1 − x)log(1 − x) סꫀ俙ס 0 < x < 1 מֽׄ׾ءٚن؅䫅ֻ׾כ虘ַ־כ䘼ַ ױ׌ն 1: import matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y1 = -x*np.log(x) - (1-x)*np.log(1-x) 6: y2 = -x*np.log2(x) - (1-x)*np.log2(1-x) 7: y3 = (-x*np.log(x) - (1-x)*np.log(1-x))*1.2 8: 9: plt.plot(x, y1, color="green") 10: plt.plot(x, y2, color="blue") 11: plt.plot(x, y3, color="red") 12: plt.show() 63
  8. 6 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.4:

    ؙ٤عٞم٭ (Entropy) ┪阾؅牞霼׊ג갾מյ┪מ⭾םꫀ俙ךֵ׾׆כֿ牞霼ך׀ױ׌ն׆׿ֿ䟨 ⽱׌׾׆ככ׊יյ2 ؠٚت⮔꿔ס갾סؙ٤عٞم٭ע 1 2 ך⻄ؠٚتמⰺ׽ 䓜י׼׿׾갾מ劄㝕כם׾׆כֿ؂־׽ױ׌3 ؠٚت♧┪ך׵㐬瞏מؠٚ ت⮔꿔׈׿׾㖪⻉סؙ٤عٞم٭ֿ┞沁㝕׀ַך׌ նױגյ׆ס갾מؙ٤ عٞم٭ס阛砯מ氠ַ׾㸐俙ꫀ俙ס䍏מחַי宜מם׾偙׵ֽ׼׿׾כ䘼ַ ױ׌ֿյ䍏ס㜟䳕⪪䑑؅氠ַ׾׆כך (㴻俙) • (㜟䳕䔿ס㸐俙ꫀ俙) כ׌׾ ׆כֿך׀׾ג״յ2 ؅氠ַ׻ֹֿ e ؅氠ַ׻ֹֿأؕ٤؅阛砯׌׾尴㴻勎 64
  9. 6 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ס㳔肪מַֽיע篙卸ע㜟؂׽ױ׎؆ն㎫מַֽיյ귱ֿ䍏ֿ 2

    ס㖪⻉յ糽 ֿ䍏ֿ e ס㖪⻉ך׌ֿյ鰱ך銨׊ג糽ס 1.2 ⠨ֿ⡑؅鞪俠׌׿ף귱מꓨם׾ כַֹסֿ׆׆ךסَؕ٤عך׌ն ׵ֹ㸴׊⪽✄溷מ牞霼ך׀׿ףכַֹ׆כך iris מחַי牞霼׊ױ׌ն − 3 k=1 pklog(pk ) ס俙䑑מ㕈טַיյiris סؙ٤عٞم٭ס阛砯؅鉿ַױ׌ն 1: _, y_counts = np.unique(y, return_counts=True) 2: res_criterion = 0 3: for i in range(0,y_counts.shape[0]): 4: ratio = y_counts[i]/y.shape[0] 5: res_criterion -= ratio*np.log(ratio) 6: 7: print(y_counts) 8: print(res_criterion) ㎫ 4.5: iris סؙ٤عٞم٭ (Entropy) ㎫ 4.5 ע iris סؙ٤عٞم٭؅阛砯׊ג篙卸מם׽ױ׌ն׆׆ך峜䟨ֿ䖩 65
  10. 6 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 锡םסעؙ٤عٞم٭ך氠ַ׾㸐俙ꫀ俙ךյ䑛俙ס ratio

    ֿ 0 ס㖪⻉㸐俙ꫀ 俙ס㴻聋㔔־׼㜽׿ױ׌ն┞䗎 xlog(x) ס +0 ס嚋ꮹע 0 םסךյ0 כ׊י ׵虘ַסך׌ֿյوٞءِٚ٤ءמֵגזיעؙٚ٭⭦槏ֿ䖩锡מם׽ױ ׌նאסג״յ׆סꁊמחַי宜מ׊גׂםַ㖪⻉ע姌꽃ך⺅׽䪒ֹةؼ➳ 俙ס偙ֿ꽒⣌ך׌׵ה؀؆ٚؕهٚٛךؙٚ٭⭦槏ע氠䟨׈׿יַ׾ס ךյⷃמ scikit-learn םל؅❈ֹ갾מע׆סꁊעא׿׮ל宜מ׊םׂי虘ַ כ䘼ַױ׌ ն آص⤘侧 ةؼ➳俙ס槏闋מֵגזיע f(x) = 1 − x2 − (1 − x)2 = 2x(1 − x) = −2(x − 1 2 )2 + 1 2 סꫀ俙ס 0 < x < 1 מֽׄ׾ءٚن؅䫅ֻ׾סֿ虘ַ־כ 䘼ַױ׌ն 1: import matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y = 1-x**2-(1-x)**2 6: 7: plt.plot(x, y, color="green") 8: plt.show() 66
  11. 6 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.6:

    ةؼ➳俙 (Jini Index) ㎫ 4.6 ך牞霼ך׀׾׻ֹמةؼ➳俙עؙ٤عٞم٭כ⻎坎յ┪מ⭾םꫀ俙 כם׽ױ׌ն㎫ 4.4 כ锶嬟׬׾כ㕈勓溷מעֵױ׽㜟؂׼םַך׌ֿյ㎫ 4.4 ס偙ֿ x = 0.3 ׷ x = 0.7 ס׻ֹם劄㝕⡑כ劄㸯⡑ס╚ꪨמֵ׾⡑ֿ嬟 鼛溷㝕׀ׂם׾׆כֿ牞霼ך׀ױ׌ն׆סꁊסꫀ俙מחַיעյֵ׾瓦䍲槏 韢溷ם㸬⮂םל׵ꓨ锡ך׌ֿյֵױ׽鞪׬ꇃׁ׾כ㝕㜟םסךյⷃמءٚن ؅䲾ַיْؕ٭ة؅䲖׳כַֹס׵ꓨ锡ך׌ն ױגյؙ٤عٞم٭כ⻎坎מ iris מꫀ׊יةؼ➳俙؅阛砯׊יײױ׌ն 67
  12. 6 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: import

    matplotlib.pyplot as plt 2: %matplotlib inline 3: 4: x = np.arange(0.01, 1., 0.01) 5: y = 1-x**2-(1-x)**2 6: 7: plt.plot(x, y, color="green") 8: plt.show() ㎫ 4.7: iris סةؼ➳俙 (Jini Index) ㎫ 4.7 ע 1 ־׼ 1 3 ס◝▗ךֵ׾ 1 9 ؅ 3 ㎇䑛ַג⡑כ׊י槏闋׌׾כ虘 ַכ䘼ַױ׌ն׆׆ך阛砯篙卸ֿ 0.66...65 כ⮂יַ׾סע阛砯┪⮂׾靯䈼 ך׌ն⸮槏ס槏闋ס嫘갧ךע㕈勓溷מעא׿׮ל宜מ׊םׂי虘ַך׌ֿյ ה׶؆כ׊ג㵅鍮؅鉿ֹ갾ע׆סꁊעֵ׾瓦䍲宜מ׌׾䖩锡ֵֿ׽ױ׌ն ✳⛦ㄤ铎䊴 ◝▗⾔靯䈼ס槏闋מֵגזיעյⷃ㎇䊟⮔卥םלס劄㸯◝▗岺כ⻎坎מ脝 ֻיֽׂכ虘ַ־כ䘼ַױ׌ն♧┫յBoston Housing מꫀ׊י◝▗⾔靯䈼 68
  13. 6 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ؅阛砯׊יײױ׌ն 1:

    res_criterion = np.mean((y-np.mean(y))**2) 2: 3: print(res_criterion) ㎫ 4.8: ◝▗⾔靯䈼 ◝▗⾔靯䈼מꫀ׊יע㎫ 4.8 ס׻ֹמ NumPy ס mean ْخشغ؅氠ַ׾ ׆כךب٤وٜמ㵅鍮؅鉿ֹ׆כֿך׀ױ׌ն 4.2.3 暴䗙ꆀךا٦ز 4-2-3 硼ךע杅䖇ꓪסخ٭عמחַי⺅׽䪒ַױ׌ն׆ס需ע┞锶册ꄼמ خ٭ع׌׿ף虘ַדׄס׻ֹמ锶ֻױ׌ֿյא׿ב׿ס㜟俙מחַיخ٭ع ׊יא׿מ㸐䗎׌׾潨溷㜟俙؅⺅׽⮂׌䖩锡䙎׷յ勎ס⮔ⰺ؅鉿ֹמֵגז יסئ٤وٜס硄槏ס䖩锡䙎؅ꤥײ׾כ需עם־ם־鏿겧מם׽ױ׌ն ׆ס㵅槁מֵגזיյNumPy ס argsort ס׻ֹם塌茣؅氠ַיյ杅䖇ꓪ ס꽄沁ס Index ؅䕑יאס Index ؅氠ַיئ٤وٜ־׼ꌃ⮔겏⻉؅✑䧯׌ ׾כַֹ偙岺ֿ┞ח脝ֻ׼׿ױ׌סך׆׆ךע׆ס偙岺מחַי牞霼׊ױ ׌ն鞃僻דׄ锶י׵؂־׽ט׼ַכ䘼؂׿׾ג״յ♧┫ iris ؅氠ַי祔ⷃמ 牞霼׊יײױ׌ն 69
  14. 7 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: print(X[:10,0])

    2: print("-------") 3: print(np.argsort(X[:10,0])) ㎫ 4.9: argsort(NumPy) ┪阾ע؂־׽׷׌ַ׻ֹמ iris ס 1 ⮬潨מ㸐׊յ┪־׼ 10 ⠥סئ٤وٜ ؅䑛ז䒟זיא׿ב׿ס⡑ס僰꽄מ Index ؅חׄג䓺מם׽ױ׌ն׆ה׼ עր9(8+1) 沁潨ֿ 4.4 ך┞沁㸯׈ׂյ4(3+1) 沁潨ֿ 4.6 ךאס姌מ㸯׈ ׂյ....յ6(5+1) 沁潨ֿ 5.4 ך┞沁㝕׀ַցס׻ֹמ槏闋׌׿ף虘ַך׌ն ׆׿ֿך׀׾כյ1 ⮬潨ס杅䖇ꓪס㸯׈ַ꽄מ 5 חئ٤وٜ؅⺅׽⮂׌םל ס⭦槏ֿ⺪茣מם׽ױ׌ն 1: x1 = X[:10,0] 2: asc_idx = np.argsort(x1) 3: 4: print(asc_idx[:5]) 5: print(X[asc_idx[:5], :]) 6: print(y[asc_idx[:5]]) 70
  15. 7 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.10:

    argsort(NumPy) ┪阾ס׻ֹמ鉿ֹ׆כךյ1 ⮬潨ס杅䖇ꓪס⡑מ㕈טַי僰꽄מ X ׷ y ס ⡑؅䕑׾׆כֿ⺪茣מם׽ױ׌ն 4.2.4 ؜؎ٝך鎘皾הظ٦سךⴓⶴ 4-2-4 硼ךעؿ٭غס⮔ⰺמחַי牞霼׊ױ׌նخ٭ع׊ג杅䖇ꓪמחַ יأؕ٤ס阛砯؅鉿ַ訪ؿ٭غס⮔ⰺ؅鉿ַױ׌ն杅䖇ꓪמ؜طإٛ㜟俙ס ײ؅氠ַ׾ ID3 כע沌ם׽յꅙ籽㜟俙؅脝䢩׌׾ C4.5 ׷ CART מחַי ע㕈勓溷מ 4-2-3 硼סخ٭ع䔿ס篙卸؅氠ַי糹䓜ג׽ך阛砯؅鉿ֹ䖩锡ֿ ֵ׽ױ׌ն ♧┫յةؼ➳俙؅氠ַג iris ס⮔꿔ゼ꾴؅❛מ㵅鍮מחַי牞霼؅鉿זי 鉿׀ױ׌ն 71
  16. 7 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: def

    calc_criteria(y): 2: _, y_counts = np.unique(y, return_counts=True) 3: res_criterion = 0 4: for i in range(0,y_counts.shape[0]): 5: ratio = y_counts[i]/y.shape[0] 6: res_criterion += ratio*(1-ratio) 7: return res_criterion 8: 9: print(calc_criteria(y)) ㎫ 4.11: ةؼ➳俙ס阛砯 ױ׍ iris ס⩧ظ٭ذמֽׄ׾ةؼ➳俙ס阛砯؅鉿ַױ׊גն㎫ 4.11 ך篙 卸ֿ 0.6666... כם׽ױ׊גֿյ׆׿ע┫阾ס׻ֹמ׵㸬⮂ך׀ױ׌ն 1 − 50 150 2 − 50 150 2 − 50 150 2 = 1 − 1 3 = 2 3 4-1 硼ך׵闑׿ג׻ֹמ闋鞃ס갾ע⺪茣םꮹ׽؛هةؘؠع䭰⻔׷ꫀ 俙؅氠ַםַך闋鞃؅鉿ֹ׻ֹמ׊יַ׾סך׌ֿյ׆׆ךעֵֻי calc_criteria ꫀ俙ס㵅鍮؅鉿ַױ׊גն׆ס槏氮כ׊יעյأؕ٤ס阛 72
  17. 7 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 砯מֵגזי⮔ⰺ⯼כ⮔ⰺ䔿ך⻎׋⭦槏؅鉿ַյ䈼؅阛砯׌׾ג״ך׌ն׈ ׼מյ׆׿؅杅䖇ꓪס⡑ס俙דׄ繪׽ꂉ׌סךյםֽ׈׼ꫀ俙ⵊ׊י⫙⯈氠

    ׊׷׌ַ䓺מ׊יֽׂ䖩锡ֵֿ׽ױ׌ն ׆׆ױךך criteria ס阛砯מחַיע牞霼ך׀גסךյ姌מأؕ٤ס阛砯 מחַי牞霼׊ױ׌ն 1: desc_index = np.argsort(X[:,0]) 2: gini_before = calc_criteria(y) 3: gini_after = ((1/2)*calc_criteria(y[desc_index[:X.shape[0]//2]]) \ 4: +(1/2)*calc_criteria(y[desc_index[X.shape[0]//2:]])) 5: gain = gini_before - gini_after 6: print(gain) ㎫ 4.12: أؕ٤ס阛砯 4-2-3 硼ך闑׿ג NumPy ס argsort ؅氠ַי杅䖇ꓪ X ס 1 ⮬潨ס⡑؅ خ٭ع׊յאס⡑מ㕈טַיئ٤وٜ؅◝⮔ⰺ׊ג갾ס gini ➳俙؅阛砯׊յ ׆׿מ㕈טַיأؕ٤؅阛砯׊יַױ׌ն 1: desc_index = np.argsort(X[:,0]) 2: gini_before = calc_criteria(y) 73
  18. 7 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 3: res_gain

    = np.zeros([X.shape[0]-1]) 4: for i in range(0, X.shape[0]-1): 5: gini_after = (((i+1)/150)*calc_criteria(y[desc_index[:i]]) \ 6: +((150-i)/150)*calc_criteria(y[desc_index[i:]])) 7: res_gain[i] = gini_before - gini_after 8: 9: print(np.max(res_gain)) 10: print(np.argmax(res_gain)) ㎫ 4.13: أؕ٤ס劄㝕⡑ס阛砯 ׆׿כ⻎坎ם⭦槏؅♑ס杅䖇ꓪמ׵ꈌ氠׊㵅鉿׌׾׆כךյ訪ؿ٭غס⮔ ⰺ؅鉿ַױ׌նױג׆ס갾מյ4-2-5 硼ך祔ⷃמ闑׿ױ׌ֿյ⪢יס訪ؿ٭ غמחַי⻎坎ס阛砯؅鉿ַ劄㝕⡑؅阛砯׌׾䖩锡ֵֿ׾סךյ׆סꁊס㵅 鍮עם־ם־鏿겧מם׾׆כמ峜䟨ֿ䖩锡ך׌ն 74
  19. 7 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 4.2.5 粸׶鵤׃怴皾חֶֽ׷朐䡾ך⥂䭯*

    4-2-5 硼ךע繪׽ꂉ׊悍砯מֽׄ׾枱䡢ס⟛䭥מחַי⺅׽䪒ַױ׌ն 虝չם偙岺ֵֿ׾כ䘼ַױ׌ֿյ瞉脢עם׾׬ׂب٤وٜמ㵅鍮׊ג־זג סךյظ٭ذجشع؅✑䧯׊ג訪ؿ٭غמⰺ׽䓜ייַׂ׻ֹם㵅鍮؅鉿ַ ױ׊גն♧┫אס⯼䳀ךꀐ剹؅氠ַג枱䡢ס⟛䭥מחַי牞霼׊ױ׌ն 1: leaves = {"T1":(X, y)} 2: print(type(leaves)) 3: print(leaves["T1"][0][:3, :]) 4: print(leaves["T1"][1][:3]) ㎫ 4.14: 訪ؿ٭غכ iris dataset ┪阾ס׻ֹמ訪ס ID ؅ keyյא׿מ㸐䗎׌׾ (X, y) ؅ value כ׌׾׆כ ך硄槏׊׻ֹכַֹ偙ꓹך׌ն׆ס׻ֹמ׌׾׆כך leaves מ㸐׊յ "T1"ס ׻ֹם key ؅氠ַי X כ y ס⡑؅א׿ב׿⺅׽⮂׌׆כֿ⺪茣מם׽ױ׌ն 姌מ訪ؿ٭غסꃯⲎס갾ס⭦槏מחַי牞霼׊ױ׌ն 75
  20. 7 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: leaves

    = {"T1":(X, y)} 2: print(type(leaves)) 3: print(leaves["T1"][0][:3, :]) 4: print(leaves["T1"][1][:3]) ㎫ 4.15: 訪ؿ٭غסꃯⲎ ꃯⲎ׌׾ؿ٭غעא׿ױךסؿ٭غס⮔ⰺכ׊י銨槁׈׿׾סךյ⮔ⰺ׌ ׾ؿ٭غמ㸐䗎׌׾ꀐ剹ס锡筶؅ del ؅氠ַי巆⹛׊յ偆גמ◝חס訪ؿ٭ غסꃯⲎ؅鉿ַױ׌ն㎫ 4.15 ךע 1 沁潨ס杅䖇ꓪס⡑ךخ٭ع׊յ75 ئ٤ وٜ׍חiris ע 150 ئ٤وٜT1 כ T2 מ䮴׽⮔ׄ؅鉿ַױ׊גն׆ס갾 מ print ؅氠ַג⮂ⲇ篙卸ס┞沁䈱ס⮬מ濪潨׌׾כյT1 ע 4.3 ־׼僰꽄 ך㢼ױ׾סמ㸐׊յT2 ע 5.8 ־׼僰꽄ך㢼ױ׾׆כֿ牞霼ך׀׾סךյ⭦ 槏סْؕ٭ةֿח־ײ׷׌ַ־כ䘼ַױ׌ն 訪ؿ٭غסꃯⲎמֵגזיס⭦槏מחַיע牞霼׊յؿ٭غ⮔ⰺסْؕ٭ ةֿ䲖״ױ׊גֿյ׈׼מ׆׆ך宜מ׌׾䖩锡ֵֿ׾סֿ勎ס圸ꅎמחַי 76
  21. 7 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ך׌նכַֹס׵訪ؿ٭غס⮔ⰺס䍲מאס訪ؿ٭غ؅巆׊յ2 ח訪ؿ٭غ

    ؅ꃯⲎ׌׾׻ֹם⭦槏דכյלסذِؕ٤ءךל׿ֿꃯⲎ׈׿ג־ֿ؂־׼ םׂם׽յ篙卸勎ס圸ꅎֿ؂־׼םׂם׾־׼ך׌ն׆׿ךע◙廠 (䱿韢) ؅鉿ֹ׆כֿך׀םׂם׽ױ׌ն ׆סゼ꾴מ㸐׊יյ瞉脢עؿ٭غסꃯⲎ免מ㸐骭כםזגؿ٭غס䝠㖥؅ 媘׌׆כך㸐⭦׌׾׆ככ׊ױ׊גն ㎫ 4.16: 訪ؿ٭غסꃯⲎ 77
  22. 7 8 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.17:

    訪ؿ٭غסꃯⲎ꽄כ勎ס圸ꅎ גכֻף㎫ 4.16 ס׻ֹמ⟛㲽؅鉿ֹדׄךյ㎫ 4.17 ס׻ֹם勎圸ꅎ؅⫙ 槁׌׾׆כֿך׀ױ׌ն㎫ 4.17 ס؛ٝ٤ةס訪ؿ٭غֿא׿ב׿◙廠⡑מ 㸐䗎׌׾כַֹْؕ٭ةךׇ槏闋ַגדׄג׼כ䘼ַױ׌ն׆ס⭦槏ꇃ瓦מ חַיյ┫阾ך祔ⷃמ⪽✄ⵊ׊יײױ׊גն 1. T1 ؅⮔ⰺ׊י T1 כ T2 ؅✑䧯׌׾ 2. T2 ؅⮔ⰺ׊י T2 כ T3 ؅✑䧯׌׾ 3. T2 ؅⮔ⰺ׊י T2 כ T4 ؅✑䧯׌׾ 4. T3 ؅⮔ⰺ׊י T3 כ T5 ؅✑䧯׌׾ 5. T1 ؅⮔ⰺ׊י T1 כ T6 ؅✑䧯׌׾ ⮔ⰺס꽄沁؅阾ꜗ׊יֽׂדׄךյ┪阾ס׻ֹמ勎ס圸ꅎ؅⫙槁ך׀׾כ ַֹס؅ْؕ٭ةַגדׄגסךעםַ־כ䘼ַױ׌ն 78
  23. 7 9 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.18:

    ⮔ⰺסꫲ⡑ס⟛㲽 ױגյ勎ס圸ꅎֿגל׿׾׆כךյ⮔ⰺס匛⚂מחַיע㎫ 4.18 ס׻ֹ מյ氠ַ׾杅䖇ꓪכאסꫲ⡑דׄ؅⟛䭥׌׿ף虘ַ׆כ׵؂־׽ױ׌ն 4.2.6 ✮庠* 4-2-6 硼ךע◙廠מחַי牞霼؅鉿ַױ׌ն4-2-5 硼ך⟛䭥׊ג枱䡢מ㕈 ט׀א׿ב׿ס訪ؿ٭غס⡑؅㳔肪免מ⟛䭥׊յא׿؅⩧מ◙廠؅鉿ַױ ׌ն׆׆ךꓨ锡םסעյ4-2-5 硼ס leaves ؅氠ַ׾׆כ؅⯼䳀כ׌׾כظ٭ ذجشعסئؕث⮔؅䊬מ⟛䭥׌׾䖩锡ֿ气׋յؓ٤ئ٤هٜ㳔肪؅鉿ֹ׆ כ׵脝ֻ׾כ劳ױ׊ׂםַכַֹ׆כך׌ն כַֹ׆כךյ4-2-5 硼ך牞霼׊ג㜟俙סֹה leaves ע⟛䭥׊םַך split_leaf כ thresholds מⲎֻיא׿ב׿ס訪ؿ٭غס⡑ס┩חדׄ؅氠ַ י◙廠׌׾׆כ؅脝ֻױ׌նsplit_leaf כ thresholds מחַיע 4-2-5 硼ך 牞霼׊גסךյ4-2-6 硼ךעױ׍訪ؿ٭غס⡑מחַי牞霼؅鉿ַױ׌ն 1: desc_index = np.argsort(X[:,2]) 2: leaves = {"T1":(X, y)} 3: del leaves["T1"] 4: leaves["T1"] = (X[desc_index[:52], :], y[desc_index[:52]]) 79
  24. 8 0 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 5: leaves["T2"]

    = (X[desc_index[52:], :], y[desc_index[52:]]) 6: 7: print(leaves["T1"][1]) 8: print(leaves["T2"][1]) ㎫ 4.19: א׿ב׿ס訪ؿ٭غס y ס⡑ ַ׀ם׽♣銨⡑דׄ؅牞霼׌׾כざ痒מם׾סךյ 3 沁潨ס杅䖇ꓪס 52 沁 潨ס⡑ך⮔ⰺ؅鉿זיײױ׊גն㎫ 4.19 ־׼牞霼ך׀׾׻ֹמյ iris dataset מַֽי 3 沁潨ס杅䖇ꓪ؅氠ַי 1֐50 沁潨כ 51֐150 沁潨מ⮔ׄ׾׆כ ךؠٚت 0 ֿ㴞櫼מ⮔꿔ך׀׾׻ֹך׌նכעַֻյֵױ׽㴞櫼מ⮔꿔؅鉿 ֻ׾杅媗ם塌⚶עא׿׮לםַג״յֵֻי 52 沁潨ס⡑؅ꫲ⡑מ陭ׄ׾׆ ככ׊ױ׊גն 1: print(np.mean(leaves["T1"][1])) 2: print(np.mean(leaves["T2"][1])) 80
  25. 8 1 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.20:

    訪ؿ٭غס♣銨⡑ (㎇䊟) 訪ؿ٭غס♣銨⡑؅寛״׾갾מע⮔꿔׻׽׵㎇䊟ס偙ֿ䌐㐬؅寛״׾דׄ םסך阛砯ֿب٤وٜך׌ն ┞偙ך⮔꿔מֵגזיעյא׿ב׿ס訪ؿ٭غך┞沁㝂ַ⡑؅♣銨⡑כ׊ יꈷש䖩锡ֵֿ׽յ׆ס㵅鍮מֵגזיע┫阾ס׻ֹמ┞䩘ꪨ䖩锡מם׽ ױ׌ն 1: cate_T1, y_counts_T1 = np.unique(leaves["T1"][1], return_counts=True) 2: cate_T2, y_counts_T2 = np.unique(leaves["T2"][1], return_counts=True) 3: 4: print(y_counts_T1) 5: print(y_counts_T2) 6: print(np.argmax(y_counts_T1)) 7: print(np.argmax(y_counts_T2)) 8: print(cate_T1) 9: print(cate_T2) 10: print("====") 11: print(cate_T1[np.argmax(y_counts_T1)]) 12: print(cate_T2[np.argmax(y_counts_T2)]) 81
  26. 8 2 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.21:

    訪ؿ٭غס♣銨⡑ (⮔꿔) ㎫ 4.21 ךע NumPy ס unique ꫀ俙ס return_counts ؛وب٘٤؅ True מ׊י㵅鉿؅鉿םזיַױ׌ն׆ס篙卸עյ1 ח潨סꂉ׽⡑כ׊י겏阛׊ ג؜طإٛյ2 ח潨סꂉ׽⡑כ׊יא׿ב׿ס⡑ס⮂槁俙ֿ⮂ⲇ׈׿ױ׌ն אסג״ NumPy מֽׄ׾祔ⷃם겏阛ꫀ俙כ⻎坎מׇ槏闋ַגדׄ׾כ虘 ַסךעםַ־כ䘼ַױ׌ն㎫ 4.21 ךע y_counts_T1 ׷ y_counts_T2 ֿא׿ב׿劄㝕⡑؅⺅׾ؕ٤ظشؠت؅ argmax ؅氠ַי⺅䕑׊յ׆׿؅ cate_T1 ׷ cate_T2 סؕ٤ظشؠتכ׊י䭰㴻׌׾׆כך T1 ס♣銨⡑ס 0 כ T2 ס♣銨⡑ס 2 ؅䕑׾׆כֿך׀ױ׌ն1 מחַיע T1 כ T2 מف ٚفٚמ䨾㺲׌׾׆ככםזגג״׆׆ךעלה׼ס♣銨⡑כ׵ם׼ם־ז גյכׇ槏闋ַגדׄג׼כ䘼ַױ׌ն 82
  27. 8 3 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.22:

    訪ؿ٭غס♣銨⡑ס阾ꜗ ׆ס׻ֹמ阛砯׊ג♣銨⡑؅㎫ 4.22 ס׻ֹמ node_values כ׊י阾ꜗ׊ גכ脝ֻױ׌ն ׆׆ױךס⫐㵼ךյsplit_leafյthresholdsյnode_values ס┩חֿזג סךյ׆׿׼؅⩧מ◙廠؅鉿ֹ׆כמחַי♧┫牞霼׊יַ׀ױ׌ն◙廠מ ֽׄ׾勎ס㺤ꪛמֵגזיעא׿ב׿ס thresholds מֽׄ׾⮔㼜מ㸐׊יյ 4-2-5 硼ך牞霼׊ג leaves ؅ⰺ׽䓜י׾סכ⻎坎ם⭦槏؅◙廠氠ס X מꈌ 氠׌׿ף虘ַסך׌ֿյX ؅⮔ⰺ׊יⰺ׽䓜ייַׂכئ٤وֿٜⰺ׽䓜י ׼׿םַ訪ؿ٭غֵֿ׽յ׆ס⺅׽䪒ַֿ㸴չ㝕㜟ך׌ն׆ס㎇ꉌמֵגז יյ◙廠מַֽיעئ٤وٜס⮔ⰺךעםׂյTrue/False סؕ٤ظشؠت ؅א׿ב׿ס訪ؿ٭غמⰺ׽䓜י׾偙ֿ祔ⷃךעםַ־כ䘼؂׿ױ׊גն 1: init_idx = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: print(init_idx[:10]) 6: print(X[:10, 0]) 7: print(split_idx_left[:10]) 8: print(split_idx_right[:10]) 83
  28. 8 4 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.23:

    韢槏悍砯Ώ ⮔ⰺמֵגזיע㎫ 4.23 ס׻ֹם⭦槏؅杅䖇ꓪסꫲ⡑מ㸐׊י鉿ֻף虘 ַך׌ն2 鉿潨כ 3 鉿潨ס X ס⮬䭰㴻 (㎫ 4.23 ךע 0) כ嬟鼛׌׾俙㲻 (㎫ 4.23 ךע 5.) ס 2 חמ thresholds ס⡑؅ꈌ氠׌׾׆כך◙廠מֵגזיס 訪ؿ٭غס⮔ⰺֿ⺪茣ך׌ն ׆׿׼؅ split_leaf מ㕈טַי 4-2-5 硼ס leaves כ⻎坎ם⭦槏؅鉿ֹכ 虘ַך׌ֿյ訪ؿ٭غס⮴劻⡑ע⪢י True ךֵ׾׵ססյ訪ؿ٭غ؅⮔ⰺ ׌׾갾מ False ֿ⮂י匡׾ג״׆ס⺅׽䪒ַ؅脝ֻ׾䖩锡ֵֿ׽ױ׌ն 1: init_idx = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: split_right_1 = X[:, 0] < 5.2 6: split_right_2 = X[:, 0] >= 5.2 7: print(split_right_1[:10]) 8: print(split_right_2[:10]) 9: print("======") 10: print(split_idx_right[:10]) 84
  29. 8 5 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 11: print("======")

    12: print(split_idx_right[:10]*split_right_1[:10]) 13: print(split_idx_right[:10]*split_right_2[:10]) ㎫ 4.24: 韢槏悍砯ΐ 㸐䗎כ׊יע㎫ 4.24 ס׻ֹמյ訪ؿ٭غ㺤ꪛס갾מ偆׊ַ匛⚂䑑ך阛砯 ׊ג True/False כ㺤ꪛ⯼ס True/False ס畤؅阛砯׌׾׆כך and 匛⚂כ ⻎坎ם⭦槏؅㵅槁׌׾׆כֿך׀ױ׌ն 85
  30. 8 6 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 1: init_idx

    = X[:, 0] > np.min(X[:, 0])-1 2: split_idx_left = X[:, 0] < 5. 3: split_idx_right = X[:, 0] >= 5. 4: 5: split_right_1 = X[:, 0] < 5.2 6: split_right_2 = X[:, 0] >= 5.2 7: 8: idx = split_idx_right[:10]*split_right_1[:10] 9: X_10 = X[:10, :] 10: print(idx) 11: print(X_10[idx, :]) 12: print("=====") 13: print(X_10) 86
  31. 8 7 痥 4 畍 㹋鄲 4.2 寸㹀加ך㛇劤Ⳣ椚 ㎫ 4.25:

    韢槏悍砯Α 阛砯׊גؕ٤ظشؠت؅ X ׷ y מ氠ַ׾׆כךյ㎫ 4.25 ס׻ֹמ訪 ؿ٭غⷃ⛺ך X ؅⺅׽⮂׌׆כֿ⺪茣ך׌նױגյؕ٤ظشؠتⷃ⛺ך node_value ס⡑؅⺅׽⮂׌׆כך◙廠׵鉿ֹ׆כֿך׀ױ׌ն 87
  32. 8 8 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 4.3 ،ٝ؟ٝـٕ㷕统

    4-3 硼ךעؓ٤ئ٤هٜ㳔肪ס㵅鍮מחַי牞霼؅鉿ַױ׌ն㕈勓溷מע 4-2 硼ך㵅鍮׊ג尴㴻勎؅ؠٚتⵊ׊յאסؕ٤تذ٤ت؅鏿俙✑䧯׊⫙⯈ 氠׌׾׆כך㵅鍮ֿ⺪茣םג״յ4-2 硼סٜؓإٛثّס㵅鍮מ嬟׬׾כ嬟 鼛溷槏闋ע׊׷׌ַכ䘼ַױ׌ն♧┫յ4-3-1 硼ךעف؟٤ء (Bagging)յ 4-3-2 硼ךעٚ٤رّنؚٝتع (Random Forest)յ4-3-3 硼ךעⴢꏕه٭ تطؔ٤ء (Gradient Boosting) מחַיא׿ב׿⺅׽䪒ַױ׌ն 4.3.1 غؘؚٝ 4-3-1 硼ךעف؟٤ء (Bagging) ס㵅鍮מחַי⺅׽䪒ַױ׌ն♧┫յ 4-2-1 硼ך⺅׽䪒זג iris dataset ؅ X כ y כ׊י牞霼؅鉿ַױ׌ն ױ׍յف؟٤ء؅脝ֻ׾갾מ䖩锡מם׾סֿյ ր沌ם׾ئ٤وٜמ㕈טַ י栃皑ם㳔肪㊭؅圸碎׌׾ցכַֹ׆כך׌ն׆ס׻ֹם免ע㕈勓溷מעئ ٤وٜסؕ٤ظشؠت؅א׿ב׿✑䧯׌׿ף虘ַכ脝ֻ׾כ虘ַך׌ն 1: np.random.seed(10) 2: 3: sample_idx = np.random.choice(np.arange(0, 100, 1), size=int(100*0.7)) 4: unique_idx, idx_counts = np.unique(sample_idx, return_counts=True) 5: print(sample_idx) 6: print("=====") 7: print(unique_idx) 8: print(idx_counts) 88
  33. 8 9 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.26:

    ؕ٤ظشؠتס✑䧯 (ꓨ鏿ֵ׽) ㎫ 4.26 ע 100 ⠥סئ٤وٜ־׼ 70 ⠥؅ꓨ鏿ֵ׽ךئ٤وٛ٤ء׌׾갾ס ؕ٤ظشؠت气䧯כ脝ֻיַגדׄג׼כ䘼ַױ׌նnp.random.seed(10) ע闋鞃סꌬ⻉┪յ篙卸ס⫙槁䙎؅䬎⟛׌׾ג״מ▸俙ס㎷㴻؅鉿ַױ׊גն 1: sample_idx = np.random.choice(np.arange(0, X.shape[0], 1), \ 2: size=int(X.shape[0]*0.1)) 3: 4: print(sample_idx) 5: print("=====") 6: print(X[sample_idx, :]) 89
  34. 9 0 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.27:

    iris ־׼סئ٤وٛ٤ء ㎫ 4.26 ע iris ־׼ 10 סئ٤وٜ؅ꓨ鏿ֵ׽ךئ٤وٛ٤ء׊ג篙卸ך ׌նؕ٤ظشؠت 88 ֿꓨ鏿׊יַ׾סךյ[5.6 3. 4.1 1.3] ֿ◝㎇ئ٤وٛ ٤ء׈׿יַ׾׆כֿ牞霼ך׀ױ׌ն 尴㴻勎ס㕈勓⭦槏ֿؠٚتⵊ׈׿יַ׿ףف؟٤ءעⷃמئ٤وٜ־׼ٚ ٤رّמ䬂⮂׊յ鏿俙㎇㵅鉿؅鉿ַյ4-2-6 ס♣銨⡑ס阛砯כ⻎׋锡꽝ך㝂 俙尴 (⮔꿔) ױגע䌐㐬 (㎇䊟) ؅阛砯׊յ◙廠⡑؅阛砯׊ױ׌ն 4.3.2 ٓٝتيؿٖؓأز 4-3-2 硼ךעٚ٤رّنؚٝتع (Random Forest) ס㵅鍮מחַי⺅׽ 䪒ַױ׌ն㕈勓溷מע 4-3-1 硼סف؟٤ءכ׮ׯ⻎׋ך׌ֿյ杅䖇ꓪסٚ٤ رّךסꈷ䫘׵⻉؂׎י鉿ֹכַֹ掾ֿ沌ם׽ױ׌ն 90
  35. 9 1 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 1: np.random.seed(0)

    2: np.random.choice(np.arange(0, X.shape[1], 1), \ 3: size=int(X.shape[1]*0.7), replace=False) ㎫ 4.28: 杅䖇ꓪסꈷ䫘 (ꓨ鏿ם׊) 杅䖇ꓪסꈷ䫘מֵגזיעꓨ鏿׊םַ׻ֹמꈷצגַסךյ㎫ 4.28 ךע np.random.choice ס؛وب٘٤ס replace ؅ False כ׊ױ׊גն ױגյ㳔肪ס갾מꈷ䫘׊ג杅䖇ꓪע◙廠ס갾מ氠ַ׾䖩锡ֵֿ׾סךյ㳔 肪㊭ס㜟俙כ׊י媘׊יֽׂ䖩锡ֵֿ׽ױ׌ն 1: select_features = [] 2: 3: np.random.seed(0) 4: select_features.append(np.random.choice(np.arange(0, \ 5: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 6: select_features.append(np.random.choice(np.arange(0, \ 7: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 8: select_features.append(np.random.choice(np.arange(0, \ 9: X.shape[1], 1), size=int(X.shape[1]*0.7), replace=False)) 10: 11: print(select_features) 91
  36. 9 2 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.29:

    杅䖇ꓪס阾ꜗכ◙廠免ס⹧攍 㳔肪免מ❈氠׊ג杅䖇ꓪמחַיעյ㎫ 4.29 ס׻ֹמ׌׾׆כך◙廠免 מ⹧攍ֿ⺪茣מם׽ױ׌նא׿ב׿ס㳔肪㊭ס◙廠⡑؅⺅䕑׊ג䔿ס⭦槏ע ⮔꿔ךֵ׿㎇䊟ךֵ׿ف؟٤ءכ⻎׋םסךא׿׮ל곓׊ׂ脝ֻםׂי虘ַ ־כ䘼ַױ׌ն 4.3.3 ⺟ꂁـ٦أذ؍ؚٝ 4-3-3 鞃ךעⴢꏕه٭تطؔ٤ء (Gradient Boosting) ס㵅鍮מחַי⺅ ׽䪒ַױ׌ն⮔꿔מחַיע⡑סⰺ׽䓜ימꫀ׊י鏿겧מם׽׷׌ַסךյ ㎇䊟סײמחַי⺅׽䪒ַױ׌ն 92
  37. 9 3 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ㎫ 4.30:

    ⴢꏕه٭تطؔ٤ءס㵅鍮 ⴢꏕه٭تطؔ٤ءמחַיעؠٚتכ׊יס㵅鍮؅אסױױ牞霼׌׾ 偙ֿ؂־׽׷׌ַכ䘼؂׿גסךյ㎫ 4.30 ךע⯁㛿׽ס Jupyter نٜؒؕ ס⭦槏؅ׇ箩♃׌׾׆כמ׊ױ׊גն㕈勓溷מע scikit-learn סؕ٤ذ٭ نؘ٭تמ⛣׎יַ׾סךյscikit-learn כ⻎坎םْؕ٭ةךׇ牞霼ַגד ׄג׼כ䘼ַױ׌ն ױ׍յclf ע尴㴻勎ס㵅鍮؅ؠٚتⵊ׊ג׵ס־׼ؕ٤تذ٤ت؅✑䧯׊յ אה׼؅氠ַי㳔肪؅鉿םזיַ׾׆כֿ؂־׾־כ䘼ַױ׌ն׆ס׻ֹם 㵅鍮עؓ٤ئ٤هٜ㳔肪מַֽיעא׿׮ל㜟؂זג需ךעםַכ䘼ַױ׌ ֿյف؟٤ءכסꇙַע┫־׼ 3 鉿ך׌ն㳔肪מ氠ַג X ؅氠ַי y_pred ؅阛砯׊յ׆׿؅ y_־׼䑛׀յ⫙䍲 y_מחַי㳔肪؅鉿ֹכַֹ嵣׿מם ׽ױ׌ն ׆׿מ׻׽յא׿ױךמ㳔肪׊ג㳔肪㊭ס篙卸מ㕈טַי♧䔿ס㳔肪㊭ס 㳔肪؅鉿ֹه٭تطؔ٤ءס⭦槏؅㵅槁׌׾׆כֿך׀ױ׌ն׆׆ך㵅鍮מ 93
  38. 9 4 痥 4 畍 㹋鄲 4.3 ،ٝ؟ٝـٕ㷕统 ֵגזיյy_؅ y

    ס deepcopy כ׊םׄ׿ף⡑ֿ⻎免מ紬겏׈׿׾׆כך 劳ױ׊ׂםַ䮕ⳛ؅獏׌׆כמ峜䟨ך׌նױגյy_pred ؅ y_־׼䑛ׂ갾 מـؕق٭قْٚ٭ذס α ؅־ׄ׾׆כ׵ֵ׾סךյقْٚ٭ذ⭦槏ֿ⺪ 茣ם׻ֹמ㵅鍮؅鉿ַױ׊גն 94