Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
기계학습을 활용한 게임 어뷰징 검출
Search
JeongJu Kim
August 16, 2016
Technology
1
1k
기계학습을 활용한 게임 어뷰징 검출
PyConAPAC 2016에서 발표한 문서입니다.
JeongJu Kim
August 16, 2016
Tweet
Share
More Decks by JeongJu Kim
See All by JeongJu Kim
IPython과 Pandas를 활용한 게임데이터 분석 - PyConKR 2014
haje01
7
1.8k
Other Decks in Technology
See All in Technology
Beyond Kaniko: Navigating Unprivileged Container Image Creation
f30
0
120
20250625 Snowflake Summit 2025活用事例 レポート / Nowcast Snowflake Summit 2025 Case Study Report
kkuv
1
390
KubeCon + CloudNativeCon Japan 2025 Recap by CA
ponkio_o
PRO
0
270
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
940
低レイヤを知りたいPHPerのためのCコンパイラ作成入門 完全版 / Building a C Compiler for PHPers Who Want to Dive into Low-Level Programming - Expanded
tomzoh
4
3.4k
Lazy application authentication with Tailscale
bluehatbrit
0
140
無意味な開発生産性の議論から抜け出すための予兆検知とお金とAI
i35_267
2
9k
「良さそう」と「とても良い」の間には 「良さそうだがホンマか」がたくさんある / 2025.07.01 LLM品質Night
smiyawaki0820
1
480
2025-07-06 QGIS初級ハンズオン「はじめてのQGIS」
kou_kita
0
120
Should Our Project Join the CNCF? (Japanese Recap)
whywaita
PRO
0
310
LangChain Interrupt & LangChain Ambassadors meetingレポート
os1ma
2
260
ドメイン特化なCLIPモデルとデータセットの紹介
tattaka
2
550
Featured
See All Featured
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
Testing 201, or: Great Expectations
jmmastey
42
7.6k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.5k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.4k
Code Reviewing Like a Champion
maltzj
524
40k
How to train your dragon (web standard)
notwaldorf
94
6.1k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Building Applications with DynamoDB
mza
95
6.5k
Transcript
ӝ҅णਸ ഝਊೠ ѱ য࠭ Ѩ ӣ PyCon APAC 2016 PyCon
APAC 2016 1
ߊ ࣗѐ ӣ (
[email protected]
) : ѱ ѐߊ - NHN /
NPLUTO - 3D ূ / ѱ ۄ ѐߊ അ: ѱ ؘఠ ࣻ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
ߊח 4 ӝ҅णী ೠ ӝࠄ ध ח ٜ࠙ਸ ࢚
4 ॆਸ ഝਊೠ ؘఠ ࠙ࢳҗ ӝ҅ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅णਸ بੑೞח ҅ӝо غਵݶ פ PyCon APAC 2016 3
द زӝ 4 ѱ য࠭ ઁܳ 4 ਬ नҊ /
GM ݽפఠ݂ / ಁఢ ӝ۽ח ೠ҅ 4 ࢎۈ ѐੑ ୭ࣗചػ য࠭ ఐ दझమਸ ٜ݅ PyCon APAC 2016 4
ѱ য࠭ۆ? 4 “ӝദਵ۽ بೞ ঋ ߑधਵ۽ ѱ ࠁܳ
ദٙೞѢա ب ਸ ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚ ਸ ਊೠ ۨ 4 ೧ఊ ోਸ ࢎਊೠ ࠺࢚ ۨ 4 ହী بߓ۽ ҟҊ PyCon APAC 2016 5
ా҅৬ ఐ࢝ ؘఠ ࠙ࢳ PyCon APAC 2016 6
ࢶ, ా҅ 4 ా҅ח ೂࠗೞ ޅೠ ؘఠ৬ ஹೊ ਕ ജ҃ীࢲ
ߊ 4 ా҅ ٜ ؘఠ/҅ਸ ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅যӝী, ؘఠীࢲب оܳ ߊѼೡ ࣻ 4 ӝࠄੋ ా҅ ध ѐߊ, ӝദ, ࢲ࠺झ ١ী ب ؽ PyCon APAC 2016 7
ఐ࢝ ؘఠ ࠙ࢳ 4 ؘఠী ऀযח ࠁܳ, 4 নೠ пب۽
ਃড, दпച ೧ࠁݴ ח җ ! 4 ೞח ؘఠח җࠗఠ 4 दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
ࢎ۹1 рױೠ ా҅ ই٣য۽ झಁݠ Ѩ PyCon APAC 2016 9
࢚ട 4 नӏ য়ೠ ѱ ହ ѱ ইమ ҟҊӖ۽ оٙ
! 4 ೧ ҅ਸ ઁ೧ب ߄۽ ࢜ ҅ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁо ਃೞৈ, ӝ҅णਸ ೯ೞӝীח दр ࠗ PyCon APAC 2016 10
ਸ ਊೠ झಅ (Spam) 4 ѱ ղীࢲ ਵ۽ ݠפ/ইమ
౸ݒ ҟҊ 4 য࠭ח ۽Ӓ۔ ౸߹ਸ ݄ӝਤ೧ ݫ दܳ դةച PyCon APAC 2016 11
झಁݠ Ѩ 4 নೠ ߑߨ оמೞѷਵա, 4 োয ܻա ӝ҅णэ
Ҋә Ӕࠁ, 4 рױೠ ా҅ ই٣য۽ दب PyCon APAC 2016 12
ৡۄੋ ݫद ӡ ࠙ನ 4 ੌ߈ਵ۽ ۽Ӓ ӏ࠙ನܳ
ٮܲҊ ঌ ۰ઉ. 4 ইېח NPS Chat Corpus ݫद ӡ ࠙ನ PyCon APAC 2016 13
ѱ ղ ݫद ӡ ࠙ನ 4 ৡۄੋ җ ࠺तೞա
ખ ؊ فԁ ҃ ೱ 4 ౠ ӡ ݫदо (?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
ই٣য 4 ੌ߈ ਬ: ݫद ӡо নೞҊ, ࠼بо ݆ ঋ
4 झಁݠ: ݫद ӡо নೞ ঋҊ, ࠼بח ֫ 4 , যڃ ਬ ࠼بо ֫Ҋ ӡо নೞ ঋਵݶ झಁݠ PyCon APAC 2016 15
рױೠ Ѩ ҕध 4 ਬ ߹ പࣻ / ݫद
ӡ ઙܨ ࣻ 4 ࠺तೠ ӡ ݫदܳ ࠁյ ࣻ۾ ч ழ PyCon APAC 2016 16
࠙ܨ 4 spam_ratioо ӝળ ч ࢚ੋ Ѫਸ झಁݠ۽ р 4
ӝળ ч Ѿ ോܻझ౮ೞѱ... 4 ࢸ റ, ࠙ܨػ நܼఠ ݫद ഛੋਵ۽ ч ઑ PyCon APAC 2016 17
࠙ܨ റ ݫद ӡ ࠙ನ 4 ࠼بо ֫ ౠ ӡ
ݫद(= झಅ)о ܻ࠙غ PyCon APAC 2016 18
Ѿҗ ਊ 4 ҳഅ рױ೮݅, য়ఐ оמࢿ 4 ӝળ
чਸ ֫ѱ ই न܉بܳ ֫ 4 Ѿҗܳ оҊ ઁ PyCon APAC 2016 19
ѐࢶ ߑೱ 4 ӝળ ч Ѿਸ ખ ؊ җੋ ߑߨਵ۽
4 োয ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ਃب(TF-IDF) Ҋ۰ 4 ӝ҅ण ঌҊ્ܻ ਊ PyCon APAC 2016 20
ӝ҅ण ࣗѐ PyCon APAC 2016 21
ӝ҅णਸ ॳח ਬ 4 ֢۱ਵ۽ ҡଳ Ѿҗޛ 4 নೠ
ޙઁী ೠ ੌ߈ੋ ࣛܖ࣌ 4 ࣻ ౠࢿ(ೖ)ਸ زदী Ҋ۰ೡ ࣻ 4 ؘఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
࠙ܨ৬ ഥӈ 4 ӝ҅ण ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա
4 ࠙ܨ - ઙܨܳ ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ஏ ೞח Ѫ 4 য࠭ Ѩ ࠙ܨী ࣘೣ PyCon APAC 2016 23
ب णҗ ਯ ण 4 ب ण(Supervised Learning) 4 ӝઓ
҃ী ೧ ࠙ܨػ ࢠ ؘఠо ਸ ٸ 4 ਯ ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ ؘఠо হਸ ٸ 4 ࠗ࠙ ؘఠח ࠙ܨغয ঋ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
ӝ҅ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ
ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
ঌҊ્ܻ ࢶఖ? 4 ੌ߈ਵ۽ Ҋә ঌҊ્ܻ ؊ ࠂೠ ݽ؛ ण
оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ ޖઑѤ જ Ѫ ইש 4 ण Ѿҗܳ ࢎۈ ೧ೞӝীח ӝࠄ ঌҊ્ܻ જ PyCon APAC 2016 26
ஏী ೠ ಣо 4 ഛࢿী ೠ о ਃ ! 4
Q: ਬ 100ݺ 2ݺ ח য࠭ܳ Ѩೞ۰ ೠ. पࣻ۽ ݽف ࢚ ਬ۽ ౸ױ೮ਸ ٸ ഛبח? 4 A: 100ݺ 2ݺ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
ஏ ױਤ 4 ب(Precision) അਯ(Recall)җ ١ নೠ ױਤ 4 ب:
Ѫ ݃ա য࠭ੋо? 4 അਯ: য࠭ ݃ա ওחо? 4 ؘఠо ࠛӐഋ(Imbalance)ੌٸח ౠ ب৬ അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ ҃ח അਯ 0 PyCon APAC 2016 28
P/R Curve ৬ AUC જ ࠙ܨӝח? PyCon APAC 2016 29
ࢎ۹2 ӝ҅णਵ۽ ߁ Ѩ PyCon APAC 2016 30
࢚ട 4 ۄ࠳ ѱীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ߁ ۨо
ഝѐ ! 4 ߁: ѱ ղ ചܳ ࠺ ࢚ੋ ߑߨਵ۽ णٙ 4 ࠈ ౠࢿਸ ೞա ل۽ ౠೞӝ য۰ → ӝ҅ण ਃ PyCon APAC 2016 31
ण ߑध ࢶఖ 4 Ҷ ۡ֔/٩۞ਵ۽ ೡ ਃח হח ٠…
4 җѢ ۽Ӓо غҊ Ҋ, 4 ஏীࢲ ӝઓ য࠭ நܼఠ ܻझܳ оҊ ! → ӝ҅ण, ౠ ب ण оמ! 4 Decision Tree ߑध ب णਵ۽ Ѿ PyCon APAC 2016 32
ળ࠺ җ 1. ۽Ӓ ࣻ ࢚క ഛੋ 2. ۽Ӓ ҳઑ/
ঈ 3. णਸ ਤೠ ೖ(Feature) ୶ PyCon APAC 2016 33
ӝ҅णب ۽Ӓ ࣻࠗఠ 4 ۽Ӓܳ ҅ਵ۽ ݽਵח Ѫب औ ঋ
4 ࠙ࢳ/णী Ѧܻח दр 10~20% ب 4 ؘఠܳ ݽਵҊ оҕೞחؘ ࠗ࠙ दр Ѧܽ. 4 ۽Ӓ ഋध оә Ӓ۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ࠙ܨ೧ (ࢲߡ/۽Ӓ ઙܨ, द ߹۽) 4 ۄ٘ झషܻ(S3) ୶ୌ ☁ PyCon APAC 2016 34
ਦب ࢲߡীࢲ ۽Ӓ ࣻೞӝ 4 ѱ ࢲߡח ࠗ࠙ ਦب ӝ߈
4 য় ࣗझ જ ోٜ(fluentd, logstash ١)ਸ ॳҊ रਵա 4 ਦب ࢲߡী ࢸо औ ঋҊ, ੌࠗ ӝמ ࠗ 4 ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ ۽Ӓ ੌਸ RSync۽ زӝೞѢա 4 ѱ DBী ࣘೞৈ Dump റ ࣠ PyCon APAC 2016 35
۽Ӓо ࣻ غਵݶ ೖܳ ٜ݅ 4 ೖ(Feature, ౠࢿ): ण ࢚
ౠਸ ࢸݺ೧ח ч 4 ) чਸ ஏೞח ҃ ! → ӝ, ߑೱ, ജ҃, Үా, ಞदࢸ ١ ೖ PyCon APAC 2016 36
ೖ ѐߊ(Feature Engineering) 4 (࠺)ഋ ؘఠীࢲ ೖܳ Ҋ ࢤࢿೞח স
4 ܲ ೖٜী ղػ ೖܳ ইղӝب ೣ 4 ٸ۽ח ࠂೠ ٘о ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙ ۽Ӓীࢲ ೞنਸ ా೧ ೖ ࢤࢿ PyCon APAC 2016 37
ೞنਸ ॄঠ݅ ೞա? 4 ؘఠо Bigೞ ঋਵݶ ਃ হ 4
न… 4 ߓ Jobਸ য়ۖزউ جܻѢա 4 ӝਵ۽ ETLਸ ా೧ DBী ֍যفח җ ਃೡ ࣻ 4 ࠺ഋ/ਊ ؘఠীࢲ ࠼ߣೠ ೖ ѐߊਸ ೠݶ જ PyCon APAC 2016 38
যڌѱ ॄঠೞա? 4 ೞن ۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ਵա,
ࣇҗ ਊ য۰ 4 ۄ٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ਊ ! - AWS EMR(Elastic Map Reduce) PyCon APAC 2016 39
AWSח ࠺ऱ ঋա? 4 ୭ച ೞݶ ࠺ऱ ঋ ! 4
ਃೡ ٸ݅ ॳח ױࣘ ۞झఠ(Transient Cluster)۽ ਊ 4 Task ֢٘ח ҃ݒ ߑध Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр 0.036$ (ࢲ ܻ, 2016-08-09 ӝળ) PyCon APAC 2016 40
AWS EMR ۞झఠ द ചݶ PyCon APAC 2016 41
ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن ੌ(< 100MB)ٜ ݆
Ѫী ஂড 4 ੌٜ ߽, ࣗ, ୷ೡ ਃ 4 ݃ٶೠ ోਸ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ੌ݅ স, ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ܻ PyCon APAC 2016 42
ݠ, ࣗ & ୷ റ S3ী ػ ۽Ӓ PyCon APAC
2016 43
ೞن MapReduce ٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః
4 ೞن झܿਸ ਊ೧ ॆਵ۽ MR ٬ 4 ۽ஸীࢲ ࢠ ؘఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ ࠁ ખ וܻ݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class
MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ੌ п ۄੋ for word in WORD_RE.findall(line): # ݽٚ ױযী ೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘ Ѿҗܳ ஂ yield word, sum(counts) def reducer(self, word, counts): # ۞झఠ Ѿҗܳ ஂ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
दझమ ҳࢿب PyCon APAC 2016 46
അട ঈ 4 ӝ҅णਸ ਤ೧ 4 GM ઁೞח ӔѢ(=ೖ)৬ 4
ઁػ நܼఠ ܻझܳ ਃ PyCon APAC 2016 47
ೖ ࢤࢿ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 Үೠ
ೖࠁח নೠ ೖܳ 4 যରೖ ࠂਵ۽ ౸ױ 4 ୡӝীח ૣ दрী ೧, উചغݶ ӡѱ PyCon APAC 2016 48
ୡӝী ࡳইࠄ ೖٜ 4 ۽Ӓੋ ࣻ 4 ۨ दр 4
۽Ӓ ইਓ ࠛ࠙ݺೠ ҃о ݆ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ইమ/ݠפ णٙ ࣻ 4 ௮झ ઙܐ ࣻ 4 NPC/PC р ై ࣻ PyCon APAC 2016 49
ೖ ఋੑ? 4 ѱ पࣻ ഋ, పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽
աׇ 4 оә पࣻ ഋਵ۽ ాੌೞח Ѫ ߄ۈ 4 Bool 0, 1۽ 4 పҊܻ ఋੑ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
ٜ݅য ೖ 4 ױࣽ ఫझ (.txt) ੌ 4 நܼఠݺ
+ ೖ ߓৌ ഋध PyCon APAC 2016 51
ӝ҅ण ೯ PyCon APAC 2016 52
ӝ҅ण о߶ 4 ୭ઙ ೖ ੌ ӝо Ҋ, ӝ҅ण
ࣻ೯ب о߶ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమۢ ݽٚ ؘఠܳ ࠊঠೞח ण ޖѢ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭ ೞಌ ಁ۞ఠܳ Ѿೞח Ѫ җઁ 4 নೠ ࣇਵ۽ ৈ۞ߣ प೧ࠊঠ 4 ࠙ दझమਸ ഝਊೞח ҃ب... PyCon APAC 2016 53
যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द рױೠ Ѫਵ۽ 4
࠺तೠ ࢎ۹ ࢶ೯ োҳо ਵݶ ଵҊೞ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
Decision Tree۽ द 4 ࠂೞ ঋҊ ౸ױ җ ೧о ਊ
4 ॆ Scikit-Learn ಁః Ѫਸ ࢎਊ 4 নೠ ӝ҅ण ঌҊ્ܻਸ प ઁҕ 4 ੋఠಕझо ాੌغয য ݽ؛ Үо ਊ 4 ೖ(X)৬ য࠭ ৈࠗ(y)ܳ ֍Ҋ ण 4 DTח ೖ ӏച ਃ হয ಞܻ PyCon APAC 2016 55
DT ࢎਊ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from
sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
PyCon APAC 2016 57
Decision Tree ण җ 1. ೖ ੌীࢲ ӝઓ য࠭ ೖܳ
Ҋ 2. زࣻ ࢚ ਬ ೖ ҳೣ 4 Under Sampling 3. ؘఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞ఠ۽ ण द PyCon APAC 2016 58
ୡӝ Ѿҗ 4 ಣӐ ഛب 80% ب 4 Binary Class
࠙ܨ ҃ ࣻо ੜ աয়ח ಞ 4 աࢁ ঋѪ э݅, 4 ஏ Ѿҗо ઁ ӔѢ۽ ॳੋח ীࢲ ݆ ࠗ PyCon APAC 2016 59
ഛبܳ ৢܻ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘఠ ࣇਸ ܻ࠙
ೞҊ 4 GridSearchCVܳ ా೧ ୭ ೞಌ ಁ۞ఠܳ 4 ಣӐ ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח ೠ ߣ ࠁҊ र tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
PyCon APAC 2016 61
Ѿ ܻܳ ࠁפ... 4 णػ ݽ؛ যڃ ӝળਵ۽ ౸ױೞח ঌ
ࣻ → নೠ ҵ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ೧ח ޙઁ 4 DTח җ(Overfitting)غӝ औӝী, Depthо ցޖ Ө ঋѱ PyCon APAC 2016 62
ৈӝࢲ ؊ ࢚ ࣻо ৢۄо ঋ 4 GMשҗ ࢚ റ
࢜۽ ೖٜ ୶о 4 زदী ইమ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ ېझ݅ ࢶఖ 4 ঋҊ ইమਸ ࣻ 4 դ೧೧ ࠁח Ѫٜب ೖ۽ ٜ݅ ࣻ ח Ѫ ֢ೞ 4 ) 'ࠈ ےؒೞѱ ࢤࢿػ ܴਸ оҊ যਃ'' PyCon APAC 2016 63
) நܼఠ ܴ ےؒࢿ ౸ױ (/ݽ അ ಁఢ) ## நܼఠ
ܴ ߊ оמೠ ౸ױೞח गب ٘ # ܴਸ ݽ बࠅ۽ ߄Է(1о , 2о ݽ) # ) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # җ э ಁఢ ਵݶ ߊ оמ (प۽ח ؊ ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
ഛೠ ߑߨ ইפ݅... ࠂਵ۽ ౸ױೞӝী ب ؽ PyCon APAC 2016
65
୶о ೖ۽ झযо ೱ࢚, Ӓ۞ա… 4 ಣӐ ഛب 96%۽ ೱ࢚.
ࣻח ֫ ಞ݅, 4 प ਊ೧ࠄ Ѿҗ 4 GMש ഛੋ җীࢲ য়ఐ Ԩ ա১ ! 4 DecisionTree Ҋੋ җ ޙઁ۽ ౸ױ PyCon APAC 2016 66
Random Forest۽ Ү 4 ݆ Decision Tree ܳ ઑೠ ঔ࢚࠶
పץ 4 ࣻ DTܳ ࠙ ण(=ӏച ബҗ) दఃҊ ైೞח ߑध 4 ࣻо ծইب উੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উੋ 95% PyCon APAC 2016 67
Random Forest ण 4 ӝࠄਵ۽ Decision Tree৬ ࠺त 4 max_depth,
min_samples_leaf ݽ؛ ࠂبܳ ઑ. ѱ द೧ࢲ ઑӘঀ ఃਕࠄ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ Ѿ ! 4 ցޖ ݶ णदр ӡҊ, ցޖ ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
RF ਊ റ Ѿҗ 4 ഛبח 95% 4 ࠗೞѱ ҅
߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ஏ ഛܫب Ҋ 4 ഛܫ ֫(>70%) ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%ب അਯ(Recall) ೞۅ ୶ 4 Ӓ۞ա, ب(Precision)ח… PyCon APAC 2016 69
100% ׳ࢿ GMש ࣻসਵ۽ Ѩష೧ न Ѿҗ… ! PyCon APAC
2016 70
ওਵפ ઁܳ... 4 2ѐਘৈী Ѧ ઁ 4 ోਸ ࢎਊೠ ߁
ࠗ࠙ ࢎۄ! ! 4 ӝ/ࣘਵ۽ ઁܳ ೧ঠ ബҗо PyCon APAC 2016 71
ଵҊ: ୭ઙ ೖ ਃب PyCon APAC 2016 72
ѐࢶ ߑೱ 4 Ѩػ Ѿҗܳ ਊ೧ ण ݽ؛ ѐࢶ 4
ࠈ ҅ী ೠ PIIܳ ࣻ೧فݶ नӏ ࠈ णী ਊೡ Ѫ 4 ઁ റ ߸ઙ ࠈ ݽפఠ݂ ਃ PyCon APAC 2016 73
റӝ PyCon APAC 2016 74
ו՛ 4 ؘఠ ࣻࠗఠ оҕ, ࠙ࢳө ݽٚ җਸ ॆਵ۽
! 4 Jupyter ֢࠘ਸ ాೠ ఐ࢝ ؘఠ ࠙ࢳ " 4 ؊ নೠ ࠙ঠী ӝ҅ णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
ӝ҅ण बച 4 Ө ח ഝਊਸ ਤ೧ ӝࠄ ۿਸ ؊
ҕࠗೞ ! 4 જ Hypothesisܳ ٜ݅ ࣻ ѱ ػ 4 ୭ചܳ ೡ ࣻ ѱ ػ 4 ೞա ࢚ ঌҊ્ܻਸ ࢎਊ೧ ࠁ 4 SVM, Neural Net ١ নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
ࣁਘ ൗ۞... ࢜۽ ۽Ӓ ࣻ/࠙ࢳ ജ҃ 4 RSync ߑध ->
Fluentd/Kinesis पदр ۽Ӓ ࣻ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 4 Columnar ߄ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙ ܻ / Cache ӝמ(߈ࠂ णী ъ) 4 ױࣘ Spark ۞झఠ(20 VMs = 80য, 320GB ۔)۽ ਊ (दр 3000ਗ ب) PyCon APAC 2016 77
ઑ 4 ӝ҅ण ղо ೞ۰ח ੌী ೠ ౸ױ ! 4
য࠭ ౠࢿ ױࣽೞݶ ాੋ ߑߨਵ۽ оמ 4 ఐ࢝ ؘఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ ঈೞ 4 নೠ ݽ؛/ೖܳ పझ೧ࠁ 4 ण ݽ؛ী ٮۄ ೖ ӏച/Үചо ਃೡ ࣻ ਵפ 4 ېझр Imbalance ޙઁী PyCon APAC 2016 78
٩۞? ӝ҅ण? 4 ٩۞ 4 Үೠ ೖ ূפয݂ ਃ হ
4 ݆ ಁ۞ఠ = ݆ ؘఠо ਃ 4 ӝ҅ण 4 ೖ স ਃೞ݅ 4 ಁ۞ఠ = ؘఠ۽ب ബ җ PyCon APAC 2016 79
࢚ 4 ؘఠ ূפয݂ য۰ 4 ؘఠ ഛࠁо о ਃ
4 झನۄܳ ߉ח ࠙ঠח য়۰ ݎ যف 4 োҳо ইפۄݶ ҷڣস/࢜ ؘఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘఠ ࠙ࢳоо ਃೠ द 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑਸ పझ ೡ ࣻ ݶ? ! PyCon APAC 2016 80
ਵ۽... ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ হ݅, ח Ѫۢ
ࠁח ҃ 4 ؘఠী݅ ೞ ݈Ҋ, بݫੋਸ ೧ೞ! PyCon APAC 2016 81
хࢎפ. PyCon APAC 2016 82
ଵҊ ݂ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf
4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83