Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
About Missing Values
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
bk
August 09, 2019
Science
400
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
About Missing Values
bk
August 09, 2019
More Decks by bk
See All by bk
Befriending Kurtosis with R
bk_18
1
1k
tidy_rpart
bk_18
1
1.7k
dotdotdot_in_predict_function
bk_18
1
1.1k
Introduction_of_GoogleAnalytics_with_R
bk_18
2
1k
web scraping with polite package
bk_18
2
830
start-salesforce-with-r
bk_18
0
910
Missspell Detection
bk_18
1
170
Other Decks in Science
See All in Science
生成AIの現状と展望
tagtag
PRO
0
140
(CVPR2026) Back to Basics: Let Denoising Generative Models Denoise
shumpei777
0
140
Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed Algebra
gkazunii
0
110
見上公一.pdf
genomethica
0
150
ハミルトン・ヤコビ方程式の解の性質と物理的意味
enakai00
0
660
Inside the Mind of an LLM
baggiponte
0
180
Endel Tulvingとエピソード記憶
rmaruy
0
140
チュートリアル:世界モデル
hf149
0
1.7k
機械学習 - K-means & 階層的クラスタリング
trycycle
PRO
0
1.7k
20251212_LT忘年会_データサイエンス枠_新川.pdf
shinpsan
0
290
SHINOMIYA Nariyoshi
genomethica
0
150
機械学習 - ニューラルネットワーク入門
trycycle
PRO
0
1.1k
Featured
See All Featured
Building the Perfect Custom Keyboard
takai
2
790
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
130
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
240
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
The Limits of Empathy - UXLibs8
cassininazir
1
360
GraphQLとの向き合い方2022年版
quramy
50
15k
Crafting Experiences
bethany
1
180
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
300
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.9k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Designing Powerful Visuals for Engaging Learning
tmiket
1
410
Transcript
ܽଛॲཧʹ͍ͭͯ dࣦΘΕͨΛٻΊͯd
࣍ 1. ܽଛͱ 2. ܽଛআڈ 3. ܽଛλΠϓ 4. ୯Ұೖ๏ 5.
ଟॏೖ๏ 6. ࠓճհ͠ͳ͔ͬͨͷ 7. ࢀߟจݙ
ܽଛͱ Missing Valueʢܽଛɺܽଌʣ In statistics, missing data, or missing values,
occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. ʢhttps://en.wikipedia.org/wiki/Missing_dataʣ
ܽଛ͕͋Δͱ Կ͕ࠔΔͷ͔ʁ ܽଛͱ
ܭࢉग़དྷͳ͍ Q,10, 20, ܽଛͷฏۉʁ ܽଛͱ > mean(c(10, 20, NA)) [1]
NA
Ͳ͏͢Δͷ͔ʁ ܽଛͱ
1. ফ͢ 2. ຒΊΔ ܽଛͱ
1. ফ͢ 2. ຒΊΔ 1. ফ͢ ܽଛͱ
ܽଛͷ͋ΔϨίʔυΛফ͢ ମॏ ੑผ 50 ঁ 70 உ 55 NA NA
உ ܽଛআڈ
ମॏ ੑผ 50 ঁ 70 உ 55 NA NA உ
ମॏ ੑผ 50 ঁ 70 உ ܽଛͷ͋ΔϨίʔυΛফ͢ ܽଛআڈ
ମॏ ੑผ 50 ঁ 70 உ 55 NA NA உ
ମॏ ੑผ 50 ঁ 70 உ ϦετϫΠζ๏ ܽଛͷ͋ΔϨίʔυΛফ͢ ܽଛআڈ
ফ͍͍ͯ͠ͷʁ 1. ফͯ͠ྑ͍߹ 2. ফͯ͠ବͳ߹ ܽଛআڈ
શʹϥϯμϜʹܽଛ ܽଛύλʔϯੑผ ମॏͱແؔ ܽଛআڈ
શʹϥϯμϜʹܽଛ ܽଛύλʔϯੑผ ମॏͱແؔ ফͯ͠ͳ͍ ͨͩ͠ɺޮԼ͕Δ ܽଛআڈ
؍ଌมʹґଘͯܽ͠ଛ ঁੑͷํ͕ଟܽ͘ଛ ܽଛআڈ
؍ଌมʹґଘͯܽ͠ଛ ফ͢ͱঁੑͷσʔλͷΈݮগ ภΔ ঁੑͷํ͕ଟܽ͘ଛ ܽଛআڈ
ܽଛมࣗମʹґଘͯܽ͠ଛ ମॏͷॏ͍ํ͕ଟ͘ ܽଛ ܽଛআڈ
ܽଛมࣗମʹґଘͯܽ͠ଛ ফ͢ͱମॏͷߴ͍σʔλ͕ݮগ ภΔ ମॏͷॏ͍ํ͕ଟ͘ ܽଛ ܽଛআڈ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ফͯ͠ྑ͍ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ରॲࠔ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ܽଛλΠϓ
‣ શʹϥϯμϜʹܽଛˠMCAR ʢMissing Completely At Randomʣ ‣ ؍ଌมʹґଘͯܽ͠ଛˠMAR ʢMissing At
Randomʣ ‣ ܽଛมࣗମʹґଘͯܽ͠ଛˠNMAR ʢNot Missing At Randomʣ ܽଛλΠϓ
1. ফ͢ 2. ຒΊΔ 1. ফ͢ 2. ຒΊΔ ୯Ұೖ๏
ͲΜͳͰຒΊΔͷ͔ʁ ୯Ұೖ๏
1. ฏۉʢฏۉೖ๏ʣ 2. ༧ଌʢ֬ఆతճؼೖ๏ʣ 3. ༧ଌ + ࠩʢ֬తճؼೖ๏ʣ + ୯Ұೖ๏
1. ฏۉʢฏۉೖ๏ʣ 2. ༧ଌʢ֬ఆతճؼೖ๏ʣ 3. ༧ଌ + ࠩʢ֬తճؼೖ๏ʣ + ୯Ұೖ๏
ฏۉΛܭࢉ͠ೖ ୯Ұೖ๏
ฏۉΛܭࢉ͠ೖ ฏۉೖ๏ ฏۉ ≒ உੑͷମॏͷฏۉ ୯Ұೖ๏
1. ฏۉʢฏۉೖ๏ʣ 2. ༧ଌʢ֬ఆతճؼೖ๏ʣ 3. ༧ଌ + ࠩʢ֬తճؼೖ๏ʣ + ୯Ұೖ๏
ܽଌΛ༧ଌ͢ΔϞσϧΛ࡞Δ *ิॿม ܽ ଛ ͷ ͋ Δ ม *ิॿม:ܽଛͷ༧ଌʹΘΕΔมɻඞͣ͠తมͷ༧ଌʹΘΕΔΘ͚Ͱͳ͍ɻ
୯Ұೖ๏
ܽଌΛ༧ଌ͢ΔϞσϧΛ࡞Δ ิॿม ܽ ଛ ͷ ͋ Δ ม ୯Ұೖ๏
ܽଌΛ༧ଌ͢ΔϞσϧΛ࡞Δ ܽ ଛ ͷ ͋ Δ ม ʢ֬ఆతʣ ճؼೖ๏
ิॿม ୯Ұೖ๏
ܽ ଛ ͷ ͋ Δ ม ิॿม ೖ͕શͯઢ্ʹ ୯Ұೖ๏
ܽ ଛ ͷ ͋ Δ ม ༧ଌͷޡࠩΛաখධՁ ิॿม ೖ͕શͯઢ্ʹ
୯Ұೖ๏
1. ฏۉʢฏۉೖ๏ʣ 2. ༧ଌʢ֬ఆతճؼೖ๏ʣ 3. ༧ଌ + ࠩʢ֬తճؼೖ๏ʣ 1. ฏۉʢฏۉೖ๏ʣ
2. ༧ଌʢ֬ఆతճؼೖ๏ʣ 3. ༧ଌ + ࠩʢ֬తճؼೖ๏ʣ + ୯Ұೖ๏
ೖϞσϧʹޡ߲ࠩΛՃ ܽ ଛ ͷ ͋ Δ ม ิॿม ޡ߲ࠩ
୯Ұೖ๏
ܽ ଛ ͷ ͋ Δ ม ิॿม ೖϞσϧʹޡ߲ࠩΛՃ Β͖ͭΛө
୯Ұೖ๏
ܽ ଛ ͷ ͋ Δ ม ิॿม ೖϞσϧʹޡ߲ࠩΛՃ ֬తճؼೖ๏
୯Ұೖ๏
ೖϞσϧͷޡ߲ࠩөग़དྷͨ ܽ ଛ ͷ ͋ Δ ม ิॿม ୯Ұೖ๏
ೖϞσϧͦͷͷͷෆ࣮֬ੑʁ ܽ ଛ ͷ ͋ Δ ม ิॿม ʁ
ʁ ʁ ୯Ұೖ๏
1ͭͷΛೖʢ୯Ұೖ๏ʣ ෳͷΛೖ ʢଟॏೖ๏ʣ ଟॏೖ๏
ೖ ੳ ౷߹ ෳͷೖΛਪఆ͠ෳͷೖࡁΈσʔλΛੜ ೖࡁΈσʔλΛ༻ͯ͠ਪఆ ਪఆΛ౷߹͠ɺ࠷ऴ݁Ռͱ͢Δ ଟॏೖ๏
ܽଛσʔλ ೖࡁΈσʔλ1 ೖࡁΈσʔλ2 ೖࡁΈσʔλ3 ੳ݁Ռ1 ੳ݁Ռ2 ੳ݁Ռ3 ࠷ऴ݁Ռ ೖ ੳ
౷߹ ଟॏೖ๏
ܽଛσʔλ ೖࡁΈσʔλ1 ೖࡁΈσʔλ2 ೖࡁΈσʔλ3 ੳ݁Ռ1 ੳ݁Ռ2 ੳ݁Ռ3 ࠷ऴ݁Ռ ೖ ੳ
౷߹ ଟॏೖ๏
‣ DA๏ : Data Augmentationʢσʔλ֦େ๏ʣ ‣ FCS๏ : Fully Conditional
Specificationʢ શ͖݅ࢦఆʣ ‣ EMB๏ : Expectation-Maximization with Bootstrapping ଟॏೖ๏
‣ ܽଛআڈɿϖΞϫΠζ๏ɺมআڈ ‣ ୯Ұೖ๏ɿൺೖ๏ɺϗοτσοΫ๏ɺίʔ ϧυσοΫ๏ɺLOCFɺNOCB ‣ ଟॏೖ๏ɿೖஅɺײੳ ‣ શใ࠷๏ ‣
LightGBMͳͲͷܽଛࣗಈॲཧ ࠓճհ͠ͳ͔ͬͨͷ
‣ ߴڮকٓ,ลඒஐࢠ, ܽଌσʔλॲཧ: RʹΑΔ୯Ұೖ๏ͱ ଟॏೖ๏ (౷ܭֶOne Point), ڞཱग़൛, 2017/12/9, 192ϖʔ
δ ‣ ܽଛ͕͋Δσʔλͷੳ, Sunny side up!, http:// norimune.net/1811 ‣ RͰ࣮ફʂܽଛσʔλੳೖʲ1ʳ, NHN TECHORUS Tech Blog, https://techblog.nhn-techorus.com/archives/6573 ‣ R ܽଛͷରԠʢmissing value treatmentʣ, ౷ܭֶͱӸֶͱ ࣌ʑɺॿڭੜ׆, http://jojoshin.hatenablog.com/entry/ 2017/02/03/220118 ࢀߟจݙ
ENJOY! A