Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory: ランダムフォレストの紹介

Exploratory: ランダムフォレストの紹介

データサイエンティストの間でよく使われている機械学習のアルゴリズムのランダムフォレストの紹介と、Exploratoryの中での使い方の紹介をします。

Avatar for Kan Nishida

Kan Nishida

July 03, 2019
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. 2 εϐʔΧʔ ੢ా צҰ࿠ CEO EXPLORATORY ུྺ 2016೥ɺσʔλαΠΤϯεͷຽओԽͷͨΊɺExploratory, Inc Λཱͪ

    ্͛Δɻ Exploratory, Inc.ͰCEOΛ຿ΊΔ͔ͨΘΒɺσʔλαΠΤϯεɾϒʔ τΩϟϯϓɾτϨʔχϯάͳͲΛ௨ͯ͠γϦίϯόϨʔͰߦΘΕ͍ͯ Δ࠷ઌ୺ͷσʔλαΠΤϯεͷීٴͱڭҭʹऔΓ૊Ήɻ ถΦϥΫϧຊࣾͰɺ16೥ʹΘͨΓσʔλαΠΤϯεͷ։ൃνʔϜΛ཰ ͍ɺػցֶशɺϏοάɾσʔλɺϏδωεɾΠϯςϦδΣϯεɺσʔ λϕʔεʹؔ͢Δ਺ଟ͘ͷ੡඼ΛੈʹૹΓग़ͨ͠ɻ @KanAugust
  2. ୈ1ͷ೾ ୈ̎ͷ೾ ୈ̏ͷ೾ ϓϥΠϕʔτ(ߴ͍/ݹ͍) Φʔϓϯɾιʔε(ແྉ/࠷ઌ୺) UI & ϓϩάϥϛϯά ϓϩάϥϛϯά 2016

    2000 1976 ϚωλΠθʔγϣϯ ίϞσΟςΟԽ ຽओԽ ౷ܭֶऀ σʔλαΠΤϯςΟετ Exploratory ΞϧΰϦζϜ Ϣʔβʔɾ ମݧ πʔϧ Φʔϓϯɾιʔε(ແྉ/࠷ઌ୺) UI & ࣗಈԽ ϏδωεɾϢʔβʔ ςʔϚ σʔλαΠΤϯεͷຽओԽ
  3. 12 څྉ ೥ྸ ৬छ ۈଓ೥਺ ੑผ 10,000 60 Manager 24

    Male 3,000 40 Sales Rep 3 Female 11,000 50 Research Director 35 Female 4,000 20 HR Rep 4 Male 5,000 30 HR Rep 5 Female 10,000 45 Manager 20 Female ஌Γ͍ͨ͜ͱ ଐੑσʔλ
  4. 22 Monthly Income Age Job Role Department Gender ? 60

    Manager Sales Male ? 40 Sales Rep R&D Female ? 30 Research Director HR Female Monthly Income Age Job Role Department Gender 10,000 60 Manager HR Male 11,000 40 Research Director R&D Female 4,000 30 HR Rep HR Female ༧ଌ͢Δ ౴͑ͷͳ͍σʔλ Ϟσϧ ϥϯμϜ ϑΥϨετ
  5. 31 Mother Age Father Age Weight Plurality State Is Premature

    40 42 5.5 1 CA TRUE 33 33 6.7 1 NY FALSE 32 36 7.0 1 WA FALSE 28 28 4.5 2 NC TRUE 24 26 6.0 1 MI FALSE 28 26 6.7 1 AZ FALSE 43 40 7.6 1 TX FALSE 38 33 4.2 2 FL TRUE 34 32 5.7 1 CA FALSE 29 33 5.2 1 NY TRUE ݩσʔλ
  6. 32 Mother Age Father Age Weight Plurality State is_premature 40

    42 5.5 1 CA TRUE 33 33 6.7 1 NY FALSE 32 36 7.0 1 WA FALSE 28 28 4.5 2 NC TRUE 24 26 6.0 1 MI FALSE 28 26 6.7 1 AZ FALSE 43 40 7.6 1 TX FALSE 38 33 4.2 2 FL TRUE 34 32 5.7 1 CA FALSE 29 33 5.2 1 NY TRUE ༧ଌର৅ͷྻ: is_premature (ૣ࢈͔Ͳ͏͔)
  7. 33 Mother Age Weight is_premature 40 5.5 TRUE 33 6.7

    FALSE 32 7.0 FALSE 28 4.5 TRUE σʔλ ߦͱྻΛϥϯμϜʹαϯϓϧ͢Δɻis_premature ྻ͸༧ଌର৅ͳͷͰͲͷαϯϓϧʹ΋ඞͣೖ Δɻ Mother Age Plurality State is_premature 28 1 AZ FALSE 43 1 TX FALSE 38 2 FL TRUE Father Age State is_premature 33 FL TRUE 32 CA FALSE 33 NY TRUE
  8. 34 Mother Age Weight is_premature 40 5.5 TRUE 33 6.7

    FALSE 32 7.0 FALSE 28 4.5 TRUE Mother Age Plurality State is_premature 28 1 AZ FALSE 43 1 TX FALSE 38 2 FL TRUE Father Age State is_premature 33 FL TRUE 32 CA FALSE 33 NY TRUE σʔλ αϯϓϧ͞Εͨσʔλ͔ΒϞσϧΛ࡞੒͢Δɻ
  9. 44

  10. 46

  11. 49

  12. 56 TRUE FALSE TRUE 5 15 FALSE 15 195 ࣮ࡍͷ݁Ռ

    ਖ਼ղ཰ = (5 + 195) / 240 = 0.875 ༧ଌͷ݁Ռ
  13. 58 TRUE FALSE TRUE True Positive λΠϓ2 Τϥʔ (ୈೋछաޡ) FALSE

    λΠϓ1 Τϥʔ (ୈҰछաޡ) True Negative ࣮ࡍͷ஋ ༧ଌͷ݁Ռ
  14. 60 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    Precisionʢద߹཰ʣ Precision = 5 / (5+15) = 25% ༧ଌͷ݁Ռ
  15. 61 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    TRUEͱ༧ଌ͞Ε͕ͨɺͦΕ͕ਖ਼͍͠ͷ͸25%͚ͩͩͬͨɻ (5 / (5+15)). Precision = 5 / (5+15) = 25% ༧ଌͷ݁Ռ
  16. 62 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    TRUEͱ༧ଌ͞Ε͕ͨɺͦΕ͕ਖ਼͍͠ͷ͸25%͚ͩͩͬͨɻ (5 / (5+15)). Precision = 5 / (5+15) = 25% ༧ଌͷ݁Ռ λΠϓ̍Τϥʔ͕75%΋ى͖͍ͯΔ
  17. 65 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    Recallʢݕग़཰ʣ Recall = 5 / (5+15) = 25% ༧ଌͷ݁Ռ
  18. 66 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    ࣮ࡍʹTRUEͷͱ͖ʹɺͦΕΛ༧ଌग़དྷͨͷ͸25%͚ͩͩͬͨɻ Recall = 5 / (5+15) = 25% ༧ଌͷ݁Ռ
  19. 67 TRUE FALSE TRUE 5 15 FALSE 15 200 ࣮ࡍͷ஋

    ࣮ࡍʹTRUEͷͱ͖ʹɺͦΕΛ༧ଌग़དྷͨͷ͸25%͚ͩͩͬͨɻ Recall = 5 / (5+15) = 25% ༧ଌͷ݁Ռ λΠϓ̎Τϥʔ͕75%΋ى͖͍ͯΔ
  20. 76 • গ਺೿σʔλΛ߹੒ͯ͠૿΍͢ʢΦʔόʔɾαϯϓϦϯάʣͷ ͨΊʹɺSMOTE (Synthetic Minority Oversampling Technique) ͱ ͍͏ΞϧΰϦζϜ͕࢖͑Δɻ

    • ExploratoryͰ͸ɺΞφϦςΟΫεɾϏϡʔͷதͷϓϩύςΟ͔ Βઃఆ͢Δ͜ͱ͕Ͱ͖Δɻ • ·ͨɺσʔλϥϯάϦϯάͷεςοϓͱͯ͠ߦ͏͜ͱ΋Ͱ͖ Δɻ SMOTEʹΑΔগ਺೿σʔλͷ߹੒
  21. 87 ܾఆ໦ σʔλ αϯϓϧ αϯϓϧ αϯϓϧ ౤ථ ౤ථ ౤ථ ݁࿦

    … ϥϯμϜαϯϓϧ ϥϯμϜαϯϓϧΛར༻͍ͯ͠ΔͷͰɺ݁ՌʹϥϯμϜੑ͕͋Δɻ
  22. 90

  23. 101 • ΧςΰϦʔ vs. ਺஋ • ਺஋ vs. ਺஋ •

    ΧςΰϦʔ vs. ΧςΰϦʔ σʔλͷλΠϓͷ૊Έ߹Θͤ