Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep down in classification 0.5 magic number

Sunmi Yoon
November 13, 2019

Deep down in classification 0.5 magic number

당연하게 생각하고 있는 magic number 0.5, Classification Threshold에 대해서 깊이 생각해보는 시간을 가집니다. Classification model의 결과물을 probability 형태로 받아 threshold를 조정해 가면서 최적의 output을 찾는 법에 대해서 이야기 합니다. Confusion matrix를 displot으로 이해하는 방법에 대해서 선행학습이 되어있어야 합니다.

Sunmi Yoon

November 13, 2019
Tweet

More Decks by Sunmi Yoon

Other Decks in Technology

Transcript

  1. рױೞѱח, Sensitivity৬ 1-Specificityܳ п ୷ਵ۽ ೞח 2ରਗ Ӓې೐ https://www.medcalc.org/manual/roc-curves.php Actual

    True৬ Actual False distribution੉ ৮߷ೞѱ эਸ ٸ (feature੄ class ߸߹מ۱ হ਺) ROC curveח 45ب пب ૒ࢶ
  2. рױೞѱח, Sensitivity৬ 1-Specificityܳ п ୷ਵ۽ ೞח 2ରਗ Ӓې೐ https://www.medcalc.org/manual/roc-curves.php Actual

    True৬ Actual False distribution੉ Ҁ஖ח ৔৉ হ੉ ৮߷ೞѱ ܻ࠙ ؼ ٸ ROC ழ࠳ (feature੄ class ߸߹ מ۱੉ ৮߷) ROC ழ࠳о ઝ࢚ױী оө਎ࣻ۾ feature੄ class ߸߹ מ۱੉ જ׮Ҋ ೡ ࣻ ੓׮.
  3. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py ݽ؛ٜ߹۽ ROC ழ࠳ܳ Ӓ۰ࢲ ࢿמਸ ಣоೡ ࣻ ੓׮.
 ৈӝө૑

    ೮঻ભ. ৘ܳ ٜযࢲ যڃ ૕߽ਸ ৘ஏೠ׮Ҋ ೡ ٸী, 
 ೞט࢝ ۄੋ਷ ആ঑ਸ histogram੄ 
 X୷ਵ۽ ೞח ݽ؛
  4. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py ݽ؛ٜ߹۽ ROC ழ࠳ܳ Ӓ۰ࢲ ࢿמਸ ಣоೡ ࣻ ੓׮.
 ৈӝө૑

    ೮঻ભ. ৘ܳ ٜযࢲ যڃ ૕߽ਸ ৘ஏೠ׮Ҋ ೡ ٸী, 
 ೞט࢝ ۄੋ਷ ആ঑ਸ histogram੄ 
 X୷ਵ۽ ೞח ݽ؛ ઱ട࢝ ۄੋ਷ ఃܳ histogram੄ X୷ਵ۽ ೞח ݽ؛੉ۄҊ ೡ ٸী যڃ ݽ؛੄ ಌನݢझо જաਃ? = যڃ featureо ૕߽ী ؀ೠ ࢸݺ۱੉ ڪযա׮Ҋ ೡ ࣻ ੓աਃ?
  5. https://www.medcalc.org/manual/roc-curves.php ૑Әө૑੄ X ୷਷ ആ঑, ః ١ ૕߽ਸ оܰח feature

    ٜ੉঻णפ׮. X୷ਸ ަ۽ ࢶఖೞוջী ٮۄ, ف distribution਷ Ҁ஖ӝب ೞҊ ٮ۽ ڄযઉ ੓ӝب ೮૑ਃ.
  6. Sex <= 0.5 gini = 0.473 samples = 891 value

    = [549, 342] class = Survived Fare <= 26.269 gini = 0.306 samples = 577 value = [468, 109] class = Survived True Fare <= 48.2 gini = 0.383 samples = 314 value = [81, 233] class = Dead False gini = 0.226 samples = 415 value = [361, 54] class = Survived gini = 0.448 samples = 162 value = [107, 55] class = Survived gini = 0.447 samples = 225 value = [76, 149] class = Dead gini = 0.106 samples = 89 value = [5, 84] class = Dead ਋ܻо ૑Әө૑ न҃ॳ૑ ঋও؍, ੉ class ੿੄ ߑधਸ ߄Բח ѩפ׮.
  7. Sex <= 0.5 gini = 0.473 samples = 891 value

    = [549, 342] class = Survived Fare <= 26.269 gini = 0.306 samples = 577 value = [468, 109] class = Survived True Fare <= 48.2 gini = 0.383 samples = 314 value = [81, 233] class = Dead False gini = 0.226 samples = 415 value = [361, 54] class = Survived gini = 0.448 samples = 162 value = [107, 55] class = Survived gini = 0.447 samples = 225 value = [76, 149] class = Dead gini = 0.106 samples = 89 value = [5, 84] class = Dead ഛܫ੄ ҙ੼ীࢲ ࠌਸ ٸ 1ߣ leafী ب଱ೠ ؘ੉ఠо ‘Survived’, ૊ ࢑ ࢎۈ੄ ؘ੉ఠੌ ഛܫ਷ ݻ ੑפө?
  8. Decision Tree੄ ӝࠄ ࣁ౴਷ Thresholdо 0.5ী ੓ભ. Leaf ֢٘ী җ߈ਵ۽

    ౠ੿ ۄ߰ ؘ੉ఠо ٜয੓׮ݶ, Ӓ ۄ߰ਵ۽ ௿ېझܳ ੿೧ߡ݀פ׮.
  9. Ӓۧ׮ݶ, ৈక ਋ܻо ߓ਍ ੌٜਸ ڙэ੉ ੸ਊ ೧ ࠅ ࣻ

    ੓ѷભ. Thresholdܳ ੉زदఃݶࢲ ROC ழ࠳ܳ Ӓܾ ࣻ ੓ѷભ?
  10. ࢲ۽ ׮ܲ ݽ؛ٜ੄ Probability Predictionਸ ੉ਊೠ ൤झషӒ۔ਸ ࠁҊ ROC ழ࠳੄

    ݽনө૑ ૗੘ ೧ ࠅ ࣻ ੓णפ׮. Decision Tree Features: ['Pclass', 'Sex', 'Family', 'C', 'Q', 'S'] Decision Tree Features: [‘Pclass', 'Family']
  11. য়ט੄ ޷࣌ 1. ਤ੄ ݽ؛ࠁ׮ ࢿמ੉ જ਷ classification ݽ؛ਸ ٜ݅Ҋ,

    seaborn.displot ࢤӣ࢜ ࠺Үೞӝ 2. Thresholdܳ ਑૒ৈоݶࢲ ROC ழ࠳ ࢚ ઝ಴ܳ ݻ ѐ݅ ଺ইࠁӝ
  12. ׮਺ दрীח 1. ୓҅੸ਵ۽ Threshold ੉زೞݴ ROC ழ࠳ ৮ࢿೞӝ 2.

    sklearn.metrics.roc_curve ࢎਊೞৈ ৈ۞ ݽ؛ ࢿמ ࠺Үೞӝ 3. ୭੸੄ Threshold ଺ӝ
  13. ՘