Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Exploratory: 決定木の紹介と使い方
Search
Kan Nishida
June 27, 2019
Technology
0
2.7k
Exploratory: 決定木の紹介と使い方
機械学習のアルゴリズムのうちの一つで有名な決定木の紹介と、Exploratoryの中での使い方の紹介。
Kan Nishida
June 27, 2019
Tweet
Share
More Decks by Kan Nishida
See All by Kan Nishida
Seminar #52 - Introduction to Exploratory Server
kanaugust
0
320
Exploratory セミナー #61 政府のオープンデータ e-Statの活用
kanaugust
0
1.1k
Exploratory セミナー #60 時系列データの加工、可視化、分析手法の紹介
kanaugust
0
1.1k
Seminar #51 - Machine Learning - How Variable Importance Works
kanaugust
0
640
Exploratory セミナー #59 テキストデータの加工
kanaugust
0
650
Seminar #50 - Salesforce Data, Clean, Visualize, Analyze, & Dashboard
kanaugust
1
370
Exploratory セミナー #58 Exploratory x Salesforce
kanaugust
0
350
Exploratory Seminar #49 - Introduction to Dashboard Cycle with Exploratory
kanaugust
0
360
Seminar #48 - Introduction to Exploratory v6.6
kanaugust
0
330
Other Decks in Technology
See All in Technology
スタックチャン家庭用アシスタントへの道
kanekoh
0
110
Digitization部 紹介資料
sansan33
PRO
1
4.5k
大量配信システムにおけるSLOの実践:「見えない」信頼性をSLOで可視化
plaidtech
PRO
0
340
セキュアなAI活用のためのLiteLLMの可能性
tk3fftk
1
170
オーティファイ会社紹介資料 / Autify Company Deck
autifyhq
10
130k
microCMSではじめるAIライティング
himaratsu
0
130
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
18k
cdk initで生成されるあのファイル達は何なのか/cdk-init-generated-files
tomoki10
1
610
Reach American Airlines®️ Instantly: 19 Calling Methods for Fast Support in the USA
flyamerican
1
180
ポストコロナ時代の SaaS におけるコスト削減の意義
izzii
1
410
データ基盤からデータベースまで?広がるユースケースのDatabricksについて教えるよ!
akuwano
3
170
SREのためのeBPF活用ステップアップガイド
egmc
2
1k
Featured
See All Featured
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Become a Pro
speakerdeck
PRO
29
5.4k
Testing 201, or: Great Expectations
jmmastey
43
7.6k
Building an army of robots
kneath
306
45k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Bash Introduction
62gerente
613
210k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Automating Front-end Workflow
addyosmani
1370
200k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
Typedesign – Prime Four
hannesfritz
42
2.7k
Transcript
EXPLORATORY
2 εϐʔΧʔ ా צҰ CEO EXPLORATORY ུྺ 2016ɺσʔλαΠΤϯεͷຽओԽͷͨΊɺExploratory, Inc Λཱͪ
্͛Δɻ Exploratory, Inc.ͰCEOΛΊΔ͔ͨΘΒɺσʔλαΠΤϯεɾϒʔ τΩϟϯϓɾτϨʔχϯάͳͲΛ௨ͯ͠γϦίϯόϨʔͰߦΘΕ͍ͯ Δ࠷ઌͷσʔλαΠΤϯεͷීٴͱڭҭʹऔΓΉɻ ถΦϥΫϧຊࣾͰɺ16ʹΘͨΓσʔλαΠΤϯεͷ։ൃνʔϜΛ ͍ɺػցֶशɺϏοάɾσʔλɺϏδωεɾΠϯςϦδΣϯεɺσʔ λϕʔεʹؔ͢Δଟ͘ͷΛੈʹૹΓग़ͨ͠ɻ @KanAugust
Vision ΑΓΑ͍ҙࢥܾఆΛ͢ΔͨΊʹ σʔλΛ͏͜ͱ͕ͨΓલʹͳΔ
Mission σʔλαΠΤϯεͷຽओԽ
5 ୈ̏ͷ σʔλαΠΤϯεɺAIɺػցֶश౷ܭֶऀɺ։ൃऀͷͨΊ͚ͩͷͷͰ͋Γ·ͤΜɻ σʔλʹڵຯͷ͋ΔਓͳΒ୭͕ੈքͰ࠷ઌͷΞϧΰϦζϜΛͬͯ ϏδωεσʔλΛ؆୯ʹੳͰ͖Δ͖Ͱ͢ɻ Exploratory͕ͦ͏ͨ͠ੈքΛՄೳʹ͠·͢ɻ
ୈ1ͷ ୈ̎ͷ ୈ̏ͷ ϓϥΠϕʔτ(ߴ͍/ݹ͍) Φʔϓϯɾιʔε(ແྉ/࠷ઌ) UI & ϓϩάϥϛϯά ϓϩάϥϛϯά 2016
2000 1976 ϚωλΠθʔγϣϯ ίϞσΟςΟԽ ຽओԽ ౷ܭֶऀ σʔλαΠΤϯςΟετ Exploratory ΞϧΰϦζϜ Ϣʔβʔɾ ମݧ πʔϧ Φʔϓϯɾιʔε(ແྉ/࠷ઌ) UI & ࣗಈԽ ϏδωεɾϢʔβʔ ςʔϚ σʔλαΠΤϯεͷຽओԽ
質問 ExploratoryͰ؆୯ʹͰ͖ΔλεΫ 伝える データアクセス 加⼯ 可視化 機械学習・AI 統計 UI
EXPLORATORY ΦϯϥΠϯɾηϛφʔ
Analytics ܾఆ
10 σʔλੳͱ ૬ؔɺύλʔϯΛݟ͚ͭΔ͜ͱ
11 څྉ ྸ ৬छ ۈଓ ੑผ 10,000 60 Manager 24
Male 3,000 40 Sales Rep 3 Female 11,000 50 Research Director 35 Female 4,000 20 HR Rep 4 Male 5,000 30 HR Rep 5 Female 10,000 45 Manager 20 Female Γ͍ͨ͜ͱ ଐੑσʔλ
12 ՄࢹԽʂ
څྉ vs. ৬छ
څྉ vs. ۈଓ
څྉ vs. ֊ڃ
16 σʔλ ૬ؔɾ ύλʔϯ ՄࢹԽͯ͠૬ؔɾύλʔϯΛҰͭҰͭͰݟͯݕ͢Δ
17 ΊΜͲ͍͘͞ʂ
18 ΞφϦςΟΫεʂ
19 σʔλ ૬ؔɾ ύλʔϯ ػցֶशɾ౷ܭ ΞφϦςΟΫεΛͬͯ૬ؔɾύλʔϯΛޮՌతʹݟ͚ͭΔɻ ΞφϦςΟΫε
20 ܾఆ Ϟσϧ ༧ଌϞσϧΛ࡞Δ σʔλ ΞϧΰϦζϜ
21 Monthly Income Age Job Role Department Gender ? 60
Manager Sales Male ? 40 Sales Rep R&D Female ? 30 Research Director HR Female Monthly Income Age Job Role Department Gender 10,000 60 Manager HR Male 11,000 40 Research Director R&D Female 4,000 30 HR Rep HR Female ༧ଌ͢Δ ͑ͷͳ͍σʔλ ܾఆ Ϟσϧ
22 Ϟσϧσʔλͷதʹ͋ΔύλʔϯΛͱʹ࡞ΒΕΔ ܾఆ Ϟσϧ
23 Ͳͷม͕ΑΓ૬͕ؔ͋Δͷ͔ɺͲ͏͍͏ؔੑ Λ͍࣋ͬͯΔͷ͔Λ͍ͬͯΔɻ ܾఆ Ϟσϧ
24 σʔλ ΞφϦςΟΫεʹΑͬͯಘΒΕͨΠϯαΠτΛ ՄࢹԽ͢Δ͜ͱͰɺײతʹཧղ͢Δ ΞφϦςΟΫε ʢػցֶशɺ౷ܭʣ ૬ؔ / ύλʔϯ
25 ܾఆʢDecision Treeʣ ܾఆɺҰ࿈ͷ࣭ͱɺ ͦͷ͑ʹΑΔذͰ݁Ռ Λ༧ଌ͢Δख๏Ͱ͋Δɻ
26 Baby ࣇͷମॏ ࣇͷ ૣ࢈͔Ͳ͏͔ A 5.2 1 TRUE B
4.7 2 TRUE C 6.8 1 FALSE D 7.2 1 FALSE E 5.1 2 TRUE Z 5.8 1 ? ͜ͷͪΌΜૣ࢈ʹͳΔͩΖ͏͔ʁ
27 ૣ࢈Λ༧ଌ͢ΔܾఆΛ࡞ͯ͠ΈΔɻ
28 ࣇͷମॏͱͷؔΛՄࢹԽͯ͠ΈΔɻ 28 ࣇͷ ࣇͷମॏ 1 5 2 3 4
5 6 4 7
29 29 ૣ࢈͔Ͳ͏͔ɺͰ৭͚Λ͢Δɻ ͕ૣ࢈ɺ੨ૣ࢈Ͱͳ͍Λҙຯ͢Δɻ ࣇͷ ࣇͷମॏ 1 5 2 3
4 5 6 4 7
30 30 ઢΛҾ͘͜ͱͰͳΔ͘ಉ͡৭ಉ࢜Λάϧʔϓʹ͚Δɻ ઢΛҾ͘ճΛ࠷খʹ͢Δ͜ͱΛߟ͑Δɻ ࣇͷ ࣇͷମॏ 1 5 2 3
4 5 6 4 7
31 31 ࣇͷ ࣇͷମॏ 1 5 2 3 4 5
6 4 7 ·ͣɺࣇͷମॏ͕5.5Ҏ্͔Ͳ͏͔ɺͰάϧʔϓ͚Ͱ͖Δɻ ࣇͷମॏ >= 5.5
32 32 ࣇͷ ࣇͷମॏ 1 5 2 3 4 5
6 4 7 ࣍ʹɺࣇͷ͕1.5ΑΓଟ͍͔ɺͰେ͖͘άϧʔϓ͚Ͱ͖Δɻ ࣇͷମॏ >= 5.5 ࣇͷʼ1.5
33 33 ࣇͷ ࣇͷମॏ 1 5 2 3 4 5
6 4 7 ࣇͷମॏ >= 5.5 ࣇͷʼ1.5 ૣ࢈Ͱ͋Δ: Yes ૣ࢈ͷׂ߹: 100% શମͷׂ߹: 40% ૣ࢈Ͱ͋Δ: No ૣ࢈ͷׂ߹: 0% શମͷׂ߹: 40% ૣ࢈Ͱ͋Δ: No ૣ࢈ͷׂ߹: 40% શମͷׂ߹: 20%
34 ࣇͷମॏ >= 5.5 TRUE FALSE ࣇͷ > 1.5 TRUE
FALSE 0% 40% 100% ૣ࢈Ͱ͋Δ֬
35 Ͳ͏ͬͯΛ࡞͍ͬͯΔͷ͔
36 Ͳͷ࣭ʢ݅ʣΛઌʹ࣋ͬͯ͘Δ͔
37 ෆ७ʢGini Impurityʣ • 0͔Β1ͷؒͷΛऔΔɻ • ͦΕͧΕͷϊʔυͷσʔλʹͲΕ͚͕ͩࠞ ͍ͬͯ͟Δ͔Λද͢ࢦඪ
pi 38 ෆ७ʢGini Impurityʣ ෆ७ (Gini Impurity) ͦͷϊʔυʹ͋ΔҰҙͷͷΛnͱ͢Δͱɺ ҎԼͷΑ͏ʹܭࢉͰ͖Δɻ( i൪ͷΛ࣋ͭαϯϓϧͷׂ߹)
1 − p2 1 − p2 2 − p2 3 − . . . . p2 n
ෆ७ = 0 39 Not ૣ࢈ Not ૣ࢈ Not ૣ࢈
1 - (0/6)2 - (6/6)2 = 0 Not ૣ࢈ Not ૣ࢈ Not ૣ࢈
ෆ७ = 0 40 ૣ࢈ ૣ࢈ ૣ࢈ 1 - (6/6)2
- (0/6)2 = 0 ૣ࢈ ૣ࢈ ૣ࢈
ෆ७ = 0.44 41 Not ૣ࢈ Not ૣ࢈ Not ૣ࢈
ૣ࢈ ૣ࢈ 1 - (2/6)2 - (4/6)2 = 0.44 Not ૣ࢈
ෆ७ = 0.44 42 Not ૣ࢈ ૣ࢈ ૣ࢈ 1 -
(4/6)2 - (2/6)2 = 0.44 Not ૣ࢈ ૣ࢈ ૣ࢈
ෆ७ = 0.5 43 Not ૣ࢈ ૣ࢈ Not ૣ࢈ Not
ૣ࢈ ૣ࢈ ૣ࢈ 1 - (3/6)2 - (3/6)2 = 0.5
44 ૣ࢈ Not ૣ࢈ ૣ࢈ ૣ࢈ Impurity: 0.5 ૣ࢈ ૣ࢈
Not ૣ࢈ Not ૣ࢈ Not ૣ࢈ Not ૣ࢈ ελʔτ
45 ࣇͷମॏͰ࠷ॳʹάϧʔϓ͚͢Δ߹
46 ࣇͷମॏ >= 5.5 TRUE FALSE
47 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 0 ෆ७: 1-
(2/7)2 - (5/7)2 = 0.41
48 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 0 ෆ७: 1-
(2/7)2 - (5/7)2 = 0.41 ෆ७: 3/10*0 + 7/10*0.41 = 0.29
49 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 3/10*0 + 7/10*0.41
= 0.29 ෆ७: 0.5
50 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 3/10*0 + 7/10*0.41
= 0.29 ෆ७: 0.5 ෆ७ͷݮগ: 0.21
51 ࣇͷͰ࠷ॳʹάϧʔϓ͚͢Δ߹
52 ࣇͷ > 1.5 TRUE FALSE
53 ࣇͷମॏ >= 5.5 TRUE FALSE ࣇͷ > 1.5 ෆ७:
1- (2/5)2 - (3/5)2 = 0.48 ෆ७: 1- (3/5)2 - (2/5)2 = 0.48
54 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 5/10*0.48 + 5/10*0.48
= 0.48 ࣇͷ > 1.5 ෆ७: 1- (2/5)2 - (3/5)2 = 0.48 ෆ७: 1- (3/5)2 - (2/5)2 = 0.48
55 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 5/10*0.48 + 5/10*0.48
= 0.48 ࣇͷ > 1.5 ෆ७: 0.5
56 ࣇͷମॏ >= 5.5 TRUE FALSE ෆ७: 5/10*0.48 + 5/10*0.48
= 0.48 ࣇͷ > 1.5 ෆ७: 0.5 ෆ७ͷݮগ: 0.02
57 ࣇͷମॏͷํ͕ɺ ࣇͷΑΓෆ७ΛݮΒͤΔ ࣇͷ 0.02 ʻ 0.48 ࣇͷମॏ ෆ७ͷݮগΛൺֱ͢Δͱŋŋŋ
58 ઌʹࣇͷମॏͰάϧʔϓ͚ͯ͠ ࣍ʹࣇͷͰάϧʔϓ͚͢Δ Λ࡞Δ
59 TRUE FALSE TRUE FALSE 100% 50% 0% ࣇͷମॏ >=
5.5 ࣇͷ > 1.5
60 ͢Δͱɺͦ͏Ͱͳ͍߹ʹൺͯ গͳ͍࣭ʢذʣͰ͢Ή
61 ઌʹࣇͷͰάϧʔϓ͚ͯ͠ ࣍ʹࣇͷମॏͰάϧʔϓ͚͢Δ Λ࡞Δ
62 Over_35 TRUE FALSE Is_Plural TRUE FALSE 50% 100% Is_Plural
TRUE FALSE 100% 100% ࣇͷମॏ >= 5.5 ࣇͷ > 1.5 ࣇͷମॏ >= 5.5
63 ྨ vs. ճؼ
64 ੜ·Εͯ͘ΔͪΌΜະख़ࣇ͔ʁ ྨ ྨ vs. ճؼ ճؼ ͍ͭͪΌΜ͕ੜ·Εͯ͘Δ͔ʁ
65 65 ྨ ࣇͷ ࣇͷମॏ 1 5 2 3 4
5 6 4 7
66 Mother Age Father Age ճؼ
67 Is_Prural TRUE FALSE Over_35 TRUE FALSE ৷ظؒΛ༧ଌ 20 40
68 Is_Prural TRUE FALSE Over_35 TRUE FALSE 20 weeks 30
weeks 37 weeks ৷ظؒΛ༧ଌ 20 ฏۉ 40 ฏۉ͔ΒͷΒ͖͕ͭ ࠷খʹͳΔΑ͏ʹ ࢬΛ͚͍ͯ͘
69 ΞφϦςΟΫε ܾఆΛͬͯݟΔʂ
is_premature(ૣ࢈)ͷྻΛ࡞Δ ͠ɺis_prematureྻ͕ͳ͍߹ɺૣ࢈͔Ͳ͏͔ (37िະ ຬ͔Ͳ͏͔)ͷཧΛͱΔྻΛgestation_weeks(৷ि) ͷྻ͔Β৽ͨʹ࡞Δɻ 70 gestation_weeks < 37
is_prematureྻΛ࡞Δ gestation_weeks(৷ि)ͷྻϔομϝχϡʔ͔Βܭࢉͷ࡞(Mutate)ΛબͿɻ 71
72 • ྻ໊ʹ is_prematureͱೖྗɻ • ܭࢉʹ gestation_weeks<37 Λೖྗɻ ࡞͞ΕΔྻʹɺ37िະຬͳΒTRUEɺ 37िҎ্ͳΒFALSEͷ͕ೖΔɻ
is_prematureྻΛ࡞Δ
73 ܾఆΞφϦςΟΫε
74 ༧ଌରྻͷબ
75 มͷྻͷબ
76 gestation_weeks(৷ि)Ҏ֎ͷશͯͷྻΛબ
77 ܾఆ͕࡞͞Εͨɻ
։࢝
ଟͷσʔλ : FALSE (Not ૣ࢈). TRUE (Premature) ͷׂ߹ : 12%.
͜ͷͰͷσʔλͷׂ߹ : 100% ։࢝
݅: ମॏʢweight_pounds greaterʣ͕ 5.3 ύϯυҎ্͔?
ଟͷσʔλɿFALSE TRUE (Premature) ͷׂ߹ : 8%. ͜ͷͰͷσʔλͷׂ߹ : 94%
ଟͷσʔλɿTRUE TRUE (Premature) ͷׂ߹ : 72%. ͜ͷͰͷσʔλͷׂ߹ : 6%
σʔλΛՄࢹԽ͔ͯ֬͠ΊΔ
Is_Premature vs. Weight
Is_Premature vs. Weight
ैۀһσʔλΛͬͨྫ
None
ܾఆͷϞσϧΛ࡞Δ
None
None
σʔλΛՄࢹԽ͔ͯ֬͠ΊΔ
Attrition vs. Overtime
Attrition vs. Monthly Income
None
• ϓϩάϥϛϯάͳ͠ RݴޠͷUIͰ͋ΔExploratoryΛੳπʔϧͱͯ͠༻͢ΔͨΊडߨதɺϏδωεͷ Λղܾ͢ΔͨΊʹඞཁͳσʔλαΠΤϯεͷख๏ͷशಘʹ100ˋूதͰ͖Δ • ੳπʔϧͷϕϯμʔϩοΫΠϯͳ͠ ExploratoryͰͷ࡞ۀશͯಠཱͨ͠ΦʔϓϯιʔεͷRڥͰ࠶ݱ͕Մೳ • ࢥߟྗͱεΩϧͷशಘ σʔλαΠΤϯεͷεΩϧशಘ͚ͩͰͳ͘ɺσʔλੳʹඞཁͳࢥߟྗशಘͰ͖Δ
ಛ
Q & A
࿈བྷઌ ϝʔϧ
[email protected]
ΣϒαΠτ https://ja.exploratory.io ϒʔτΩϟϯϓɾτϨʔχϯά https://ja.exploratory.io/training-jp Twitter @KanAugust