Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
기계학습을 활용한 게임 어뷰징 검출
Search
JeongJu Kim
August 16, 2016
Technology
1
1k
기계학습을 활용한 게임 어뷰징 검출
PyConAPAC 2016에서 발표한 문서입니다.
JeongJu Kim
August 16, 2016
Tweet
Share
More Decks by JeongJu Kim
See All by JeongJu Kim
IPython과 Pandas를 활용한 게임데이터 분석 - PyConKR 2014
haje01
7
1.8k
Other Decks in Technology
See All in Technology
あなたの声を届けよう! 女性エンジニア登壇の意義とアウトプット実践ガイド #wttjp / Call for Your Voice
kondoyuko
4
570
無意味な開発生産性の議論から抜け出すための予兆検知とお金とAI
i35_267
3
11k
Lufthansa ®️ USA Contact Numbers: Complete 2025 Support Guide
lufthanahelpsupport
0
100
ビギナーであり続ける/beginning
ikuodanaka
3
710
敢えて生成AIを使わないマネジメント業務
kzkmaeda
1
330
asken AI勉強会(Android)
tadashi_sato
0
180
CI/CD/IaC 久々に0から環境を作ったらこうなりました
kaz29
1
230
AIとともに進化するエンジニアリング / Engineering-Evolving-with-AI_final.pdf
lycorptech_jp
PRO
0
150
PO初心者が考えた ”POらしさ”
nb_rady
0
190
マネジメントって難しい、けどおもしろい / Management is tough, but fun! #em_findy
ar_tama
5
730
ビズリーチが挑む メトリクスを活用した技術的負債の解消 / dev-productivity-con2025
visional_engineering_and_design
3
6.2k
AI導入の理想と現実~コストと浸透〜
oprstchn
0
190
Featured
See All Featured
Speed Design
sergeychernyshev
32
1k
Building an army of robots
kneath
306
45k
How GitHub (no longer) Works
holman
314
140k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Rails Girls Zürich Keynote
gr2m
94
14k
Building Applications with DynamoDB
mza
95
6.5k
A Modern Web Designer's Workflow
chriscoyier
694
190k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
52k
Bash Introduction
62gerente
614
210k
Designing Experiences People Love
moore
142
24k
GitHub's CSS Performance
jonrohan
1031
460k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
Transcript
ӝ҅णਸ ഝਊೠ ѱ য࠭ Ѩ ӣ PyCon APAC 2016 PyCon
APAC 2016 1
ߊ ࣗѐ ӣ (
[email protected]
) : ѱ ѐߊ - NHN /
NPLUTO - 3D ূ / ѱ ۄ ѐߊ അ: ѱ ؘఠ ࣻ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
ߊח 4 ӝ҅णী ೠ ӝࠄ ध ח ٜ࠙ਸ ࢚
4 ॆਸ ഝਊೠ ؘఠ ࠙ࢳҗ ӝ҅ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅णਸ بੑೞח ҅ӝо غਵݶ פ PyCon APAC 2016 3
द زӝ 4 ѱ য࠭ ઁܳ 4 ਬ नҊ /
GM ݽפఠ݂ / ಁఢ ӝ۽ח ೠ҅ 4 ࢎۈ ѐੑ ୭ࣗചػ য࠭ ఐ दझమਸ ٜ݅ PyCon APAC 2016 4
ѱ য࠭ۆ? 4 “ӝദਵ۽ بೞ ঋ ߑधਵ۽ ѱ ࠁܳ
ദٙೞѢա ب ਸ ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚ ਸ ਊೠ ۨ 4 ೧ఊ ోਸ ࢎਊೠ ࠺࢚ ۨ 4 ହী بߓ۽ ҟҊ PyCon APAC 2016 5
ా҅৬ ఐ࢝ ؘఠ ࠙ࢳ PyCon APAC 2016 6
ࢶ, ా҅ 4 ా҅ח ೂࠗೞ ޅೠ ؘఠ৬ ஹೊ ਕ ജ҃ীࢲ
ߊ 4 ా҅ ٜ ؘఠ/҅ਸ ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅যӝী, ؘఠীࢲب оܳ ߊѼೡ ࣻ 4 ӝࠄੋ ా҅ ध ѐߊ, ӝദ, ࢲ࠺झ ١ী ب ؽ PyCon APAC 2016 7
ఐ࢝ ؘఠ ࠙ࢳ 4 ؘఠী ऀযח ࠁܳ, 4 নೠ пب۽
ਃড, दпച ೧ࠁݴ ח җ ! 4 ೞח ؘఠח җࠗఠ 4 दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
ࢎ۹1 рױೠ ా҅ ই٣য۽ झಁݠ Ѩ PyCon APAC 2016 9
࢚ട 4 नӏ য়ೠ ѱ ହ ѱ ইమ ҟҊӖ۽ оٙ
! 4 ೧ ҅ਸ ઁ೧ب ߄۽ ࢜ ҅ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁо ਃೞৈ, ӝ҅णਸ ೯ೞӝীח दр ࠗ PyCon APAC 2016 10
ਸ ਊೠ झಅ (Spam) 4 ѱ ղীࢲ ਵ۽ ݠפ/ইమ
౸ݒ ҟҊ 4 য࠭ח ۽Ӓ۔ ౸߹ਸ ݄ӝਤ೧ ݫ दܳ դةച PyCon APAC 2016 11
झಁݠ Ѩ 4 নೠ ߑߨ оמೞѷਵա, 4 োয ܻա ӝ҅णэ
Ҋә Ӕࠁ, 4 рױೠ ా҅ ই٣য۽ दب PyCon APAC 2016 12
ৡۄੋ ݫद ӡ ࠙ನ 4 ੌ߈ਵ۽ ۽Ӓ ӏ࠙ನܳ
ٮܲҊ ঌ ۰ઉ. 4 ইېח NPS Chat Corpus ݫद ӡ ࠙ನ PyCon APAC 2016 13
ѱ ղ ݫद ӡ ࠙ನ 4 ৡۄੋ җ ࠺तೞա
ખ ؊ فԁ ҃ ೱ 4 ౠ ӡ ݫदо (?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
ই٣য 4 ੌ߈ ਬ: ݫद ӡо নೞҊ, ࠼بо ݆ ঋ
4 झಁݠ: ݫद ӡо নೞ ঋҊ, ࠼بח ֫ 4 , যڃ ਬ ࠼بо ֫Ҋ ӡо নೞ ঋਵݶ झಁݠ PyCon APAC 2016 15
рױೠ Ѩ ҕध 4 ਬ ߹ പࣻ / ݫद
ӡ ઙܨ ࣻ 4 ࠺तೠ ӡ ݫदܳ ࠁյ ࣻ۾ ч ழ PyCon APAC 2016 16
࠙ܨ 4 spam_ratioо ӝળ ч ࢚ੋ Ѫਸ झಁݠ۽ р 4
ӝળ ч Ѿ ോܻझ౮ೞѱ... 4 ࢸ റ, ࠙ܨػ நܼఠ ݫद ഛੋਵ۽ ч ઑ PyCon APAC 2016 17
࠙ܨ റ ݫद ӡ ࠙ನ 4 ࠼بо ֫ ౠ ӡ
ݫद(= झಅ)о ܻ࠙غ PyCon APAC 2016 18
Ѿҗ ਊ 4 ҳഅ рױ೮݅, য়ఐ оמࢿ 4 ӝળ
чਸ ֫ѱ ই न܉بܳ ֫ 4 Ѿҗܳ оҊ ઁ PyCon APAC 2016 19
ѐࢶ ߑೱ 4 ӝળ ч Ѿਸ ખ ؊ җੋ ߑߨਵ۽
4 োয ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ਃب(TF-IDF) Ҋ۰ 4 ӝ҅ण ঌҊ્ܻ ਊ PyCon APAC 2016 20
ӝ҅ण ࣗѐ PyCon APAC 2016 21
ӝ҅णਸ ॳח ਬ 4 ֢۱ਵ۽ ҡଳ Ѿҗޛ 4 নೠ
ޙઁী ೠ ੌ߈ੋ ࣛܖ࣌ 4 ࣻ ౠࢿ(ೖ)ਸ زदী Ҋ۰ೡ ࣻ 4 ؘఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
࠙ܨ৬ ഥӈ 4 ӝ҅ण ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա
4 ࠙ܨ - ઙܨܳ ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ஏ ೞח Ѫ 4 য࠭ Ѩ ࠙ܨী ࣘೣ PyCon APAC 2016 23
ب णҗ ਯ ण 4 ب ण(Supervised Learning) 4 ӝઓ
҃ী ೧ ࠙ܨػ ࢠ ؘఠо ਸ ٸ 4 ਯ ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ ؘఠо হਸ ٸ 4 ࠗ࠙ ؘఠח ࠙ܨغয ঋ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
ӝ҅ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ
ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
ঌҊ્ܻ ࢶఖ? 4 ੌ߈ਵ۽ Ҋә ঌҊ્ܻ ؊ ࠂೠ ݽ؛ ण
оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ ޖઑѤ જ Ѫ ইש 4 ण Ѿҗܳ ࢎۈ ೧ೞӝীח ӝࠄ ঌҊ્ܻ જ PyCon APAC 2016 26
ஏী ೠ ಣо 4 ഛࢿী ೠ о ਃ ! 4
Q: ਬ 100ݺ 2ݺ ח য࠭ܳ Ѩೞ۰ ೠ. पࣻ۽ ݽف ࢚ ਬ۽ ౸ױ೮ਸ ٸ ഛبח? 4 A: 100ݺ 2ݺ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
ஏ ױਤ 4 ب(Precision) അਯ(Recall)җ ١ নೠ ױਤ 4 ب:
Ѫ ݃ա য࠭ੋо? 4 അਯ: য࠭ ݃ա ওחо? 4 ؘఠо ࠛӐഋ(Imbalance)ੌٸח ౠ ب৬ അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ ҃ח അਯ 0 PyCon APAC 2016 28
P/R Curve ৬ AUC જ ࠙ܨӝח? PyCon APAC 2016 29
ࢎ۹2 ӝ҅णਵ۽ ߁ Ѩ PyCon APAC 2016 30
࢚ട 4 ۄ࠳ ѱীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ߁ ۨо
ഝѐ ! 4 ߁: ѱ ղ ചܳ ࠺ ࢚ੋ ߑߨਵ۽ णٙ 4 ࠈ ౠࢿਸ ೞա ل۽ ౠೞӝ য۰ → ӝ҅ण ਃ PyCon APAC 2016 31
ण ߑध ࢶఖ 4 Ҷ ۡ֔/٩۞ਵ۽ ೡ ਃח হח ٠…
4 җѢ ۽Ӓо غҊ Ҋ, 4 ஏীࢲ ӝઓ য࠭ நܼఠ ܻझܳ оҊ ! → ӝ҅ण, ౠ ب ण оמ! 4 Decision Tree ߑध ب णਵ۽ Ѿ PyCon APAC 2016 32
ળ࠺ җ 1. ۽Ӓ ࣻ ࢚క ഛੋ 2. ۽Ӓ ҳઑ/
ঈ 3. णਸ ਤೠ ೖ(Feature) ୶ PyCon APAC 2016 33
ӝ҅णب ۽Ӓ ࣻࠗఠ 4 ۽Ӓܳ ҅ਵ۽ ݽਵח Ѫب औ ঋ
4 ࠙ࢳ/णী Ѧܻח दр 10~20% ب 4 ؘఠܳ ݽਵҊ оҕೞחؘ ࠗ࠙ दр Ѧܽ. 4 ۽Ӓ ഋध оә Ӓ۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ࠙ܨ೧ (ࢲߡ/۽Ӓ ઙܨ, द ߹۽) 4 ۄ٘ झషܻ(S3) ୶ୌ ☁ PyCon APAC 2016 34
ਦب ࢲߡীࢲ ۽Ӓ ࣻೞӝ 4 ѱ ࢲߡח ࠗ࠙ ਦب ӝ߈
4 য় ࣗझ જ ోٜ(fluentd, logstash ١)ਸ ॳҊ रਵա 4 ਦب ࢲߡী ࢸо औ ঋҊ, ੌࠗ ӝמ ࠗ 4 ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ ۽Ӓ ੌਸ RSync۽ زӝೞѢա 4 ѱ DBী ࣘೞৈ Dump റ ࣠ PyCon APAC 2016 35
۽Ӓо ࣻ غਵݶ ೖܳ ٜ݅ 4 ೖ(Feature, ౠࢿ): ण ࢚
ౠਸ ࢸݺ೧ח ч 4 ) чਸ ஏೞח ҃ ! → ӝ, ߑೱ, ജ҃, Үా, ಞदࢸ ١ ೖ PyCon APAC 2016 36
ೖ ѐߊ(Feature Engineering) 4 (࠺)ഋ ؘఠীࢲ ೖܳ Ҋ ࢤࢿೞח স
4 ܲ ೖٜী ղػ ೖܳ ইղӝب ೣ 4 ٸ۽ח ࠂೠ ٘о ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙ ۽Ӓীࢲ ೞنਸ ా೧ ೖ ࢤࢿ PyCon APAC 2016 37
ೞنਸ ॄঠ݅ ೞա? 4 ؘఠо Bigೞ ঋਵݶ ਃ হ 4
न… 4 ߓ Jobਸ য়ۖزউ جܻѢա 4 ӝਵ۽ ETLਸ ా೧ DBী ֍যفח җ ਃೡ ࣻ 4 ࠺ഋ/ਊ ؘఠীࢲ ࠼ߣೠ ೖ ѐߊਸ ೠݶ જ PyCon APAC 2016 38
যڌѱ ॄঠೞա? 4 ೞن ۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ਵա,
ࣇҗ ਊ য۰ 4 ۄ٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ਊ ! - AWS EMR(Elastic Map Reduce) PyCon APAC 2016 39
AWSח ࠺ऱ ঋա? 4 ୭ച ೞݶ ࠺ऱ ঋ ! 4
ਃೡ ٸ݅ ॳח ױࣘ ۞झఠ(Transient Cluster)۽ ਊ 4 Task ֢٘ח ҃ݒ ߑध Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр 0.036$ (ࢲ ܻ, 2016-08-09 ӝળ) PyCon APAC 2016 40
AWS EMR ۞झఠ द ചݶ PyCon APAC 2016 41
ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن ੌ(< 100MB)ٜ ݆
Ѫী ஂড 4 ੌٜ ߽, ࣗ, ୷ೡ ਃ 4 ݃ٶೠ ోਸ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ੌ݅ স, ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ܻ PyCon APAC 2016 42
ݠ, ࣗ & ୷ റ S3ী ػ ۽Ӓ PyCon APAC
2016 43
ೞن MapReduce ٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః
4 ೞن झܿਸ ਊ೧ ॆਵ۽ MR ٬ 4 ۽ஸীࢲ ࢠ ؘఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ ࠁ ખ וܻ݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class
MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ੌ п ۄੋ for word in WORD_RE.findall(line): # ݽٚ ױযী ೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘ Ѿҗܳ ஂ yield word, sum(counts) def reducer(self, word, counts): # ۞झఠ Ѿҗܳ ஂ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
दझమ ҳࢿب PyCon APAC 2016 46
അട ঈ 4 ӝ҅णਸ ਤ೧ 4 GM ઁೞח ӔѢ(=ೖ)৬ 4
ઁػ நܼఠ ܻझܳ ਃ PyCon APAC 2016 47
ೖ ࢤࢿ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 Үೠ
ೖࠁח নೠ ೖܳ 4 যରೖ ࠂਵ۽ ౸ױ 4 ୡӝীח ૣ दрী ೧, উചغݶ ӡѱ PyCon APAC 2016 48
ୡӝী ࡳইࠄ ೖٜ 4 ۽Ӓੋ ࣻ 4 ۨ दр 4
۽Ӓ ইਓ ࠛ࠙ݺೠ ҃о ݆ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ইమ/ݠפ णٙ ࣻ 4 ௮झ ઙܐ ࣻ 4 NPC/PC р ై ࣻ PyCon APAC 2016 49
ೖ ఋੑ? 4 ѱ पࣻ ഋ, పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽
աׇ 4 оә पࣻ ഋਵ۽ ాੌೞח Ѫ ߄ۈ 4 Bool 0, 1۽ 4 పҊܻ ఋੑ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
ٜ݅য ೖ 4 ױࣽ ఫझ (.txt) ੌ 4 நܼఠݺ
+ ೖ ߓৌ ഋध PyCon APAC 2016 51
ӝ҅ण ೯ PyCon APAC 2016 52
ӝ҅ण о߶ 4 ୭ઙ ೖ ੌ ӝо Ҋ, ӝ҅ण
ࣻ೯ب о߶ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమۢ ݽٚ ؘఠܳ ࠊঠೞח ण ޖѢ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭ ೞಌ ಁ۞ఠܳ Ѿೞח Ѫ җઁ 4 নೠ ࣇਵ۽ ৈ۞ߣ प೧ࠊঠ 4 ࠙ दझమਸ ഝਊೞח ҃ب... PyCon APAC 2016 53
যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द рױೠ Ѫਵ۽ 4
࠺तೠ ࢎ۹ ࢶ೯ োҳо ਵݶ ଵҊೞ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
Decision Tree۽ द 4 ࠂೞ ঋҊ ౸ױ җ ೧о ਊ
4 ॆ Scikit-Learn ಁః Ѫਸ ࢎਊ 4 নೠ ӝ҅ण ঌҊ્ܻਸ प ઁҕ 4 ੋఠಕझо ాੌغয য ݽ؛ Үо ਊ 4 ೖ(X)৬ য࠭ ৈࠗ(y)ܳ ֍Ҋ ण 4 DTח ೖ ӏച ਃ হয ಞܻ PyCon APAC 2016 55
DT ࢎਊ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from
sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
PyCon APAC 2016 57
Decision Tree ण җ 1. ೖ ੌীࢲ ӝઓ য࠭ ೖܳ
Ҋ 2. زࣻ ࢚ ਬ ೖ ҳೣ 4 Under Sampling 3. ؘఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞ఠ۽ ण द PyCon APAC 2016 58
ୡӝ Ѿҗ 4 ಣӐ ഛب 80% ب 4 Binary Class
࠙ܨ ҃ ࣻо ੜ աয়ח ಞ 4 աࢁ ঋѪ э݅, 4 ஏ Ѿҗо ઁ ӔѢ۽ ॳੋח ীࢲ ݆ ࠗ PyCon APAC 2016 59
ഛبܳ ৢܻ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘఠ ࣇਸ ܻ࠙
ೞҊ 4 GridSearchCVܳ ా೧ ୭ ೞಌ ಁ۞ఠܳ 4 ಣӐ ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח ೠ ߣ ࠁҊ र tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
PyCon APAC 2016 61
Ѿ ܻܳ ࠁפ... 4 णػ ݽ؛ যڃ ӝળਵ۽ ౸ױೞח ঌ
ࣻ → নೠ ҵ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ೧ח ޙઁ 4 DTח җ(Overfitting)غӝ औӝী, Depthо ցޖ Ө ঋѱ PyCon APAC 2016 62
ৈӝࢲ ؊ ࢚ ࣻо ৢۄо ঋ 4 GMשҗ ࢚ റ
࢜۽ ೖٜ ୶о 4 زदী ইమ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ ېझ݅ ࢶఖ 4 ঋҊ ইమਸ ࣻ 4 դ೧೧ ࠁח Ѫٜب ೖ۽ ٜ݅ ࣻ ח Ѫ ֢ೞ 4 ) 'ࠈ ےؒೞѱ ࢤࢿػ ܴਸ оҊ যਃ'' PyCon APAC 2016 63
) நܼఠ ܴ ےؒࢿ ౸ױ (/ݽ അ ಁఢ) ## நܼఠ
ܴ ߊ оמೠ ౸ױೞח गب ٘ # ܴਸ ݽ बࠅ۽ ߄Է(1о , 2о ݽ) # ) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # җ э ಁఢ ਵݶ ߊ оמ (प۽ח ؊ ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
ഛೠ ߑߨ ইפ݅... ࠂਵ۽ ౸ױೞӝী ب ؽ PyCon APAC 2016
65
୶о ೖ۽ झযо ೱ࢚, Ӓ۞ա… 4 ಣӐ ഛب 96%۽ ೱ࢚.
ࣻח ֫ ಞ݅, 4 प ਊ೧ࠄ Ѿҗ 4 GMש ഛੋ җীࢲ য়ఐ Ԩ ա১ ! 4 DecisionTree Ҋੋ җ ޙઁ۽ ౸ױ PyCon APAC 2016 66
Random Forest۽ Ү 4 ݆ Decision Tree ܳ ઑೠ ঔ࢚࠶
పץ 4 ࣻ DTܳ ࠙ ण(=ӏച ബҗ) दఃҊ ైೞח ߑध 4 ࣻо ծইب উੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উੋ 95% PyCon APAC 2016 67
Random Forest ण 4 ӝࠄਵ۽ Decision Tree৬ ࠺त 4 max_depth,
min_samples_leaf ݽ؛ ࠂبܳ ઑ. ѱ द೧ࢲ ઑӘঀ ఃਕࠄ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ Ѿ ! 4 ցޖ ݶ णदр ӡҊ, ցޖ ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
RF ਊ റ Ѿҗ 4 ഛبח 95% 4 ࠗೞѱ ҅
߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ஏ ഛܫب Ҋ 4 ഛܫ ֫(>70%) ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%ب അਯ(Recall) ೞۅ ୶ 4 Ӓ۞ա, ب(Precision)ח… PyCon APAC 2016 69
100% ׳ࢿ GMש ࣻসਵ۽ Ѩష೧ न Ѿҗ… ! PyCon APAC
2016 70
ওਵפ ઁܳ... 4 2ѐਘৈী Ѧ ઁ 4 ోਸ ࢎਊೠ ߁
ࠗ࠙ ࢎۄ! ! 4 ӝ/ࣘਵ۽ ઁܳ ೧ঠ ബҗо PyCon APAC 2016 71
ଵҊ: ୭ઙ ೖ ਃب PyCon APAC 2016 72
ѐࢶ ߑೱ 4 Ѩػ Ѿҗܳ ਊ೧ ण ݽ؛ ѐࢶ 4
ࠈ ҅ী ೠ PIIܳ ࣻ೧فݶ नӏ ࠈ णী ਊೡ Ѫ 4 ઁ റ ߸ઙ ࠈ ݽפఠ݂ ਃ PyCon APAC 2016 73
റӝ PyCon APAC 2016 74
ו՛ 4 ؘఠ ࣻࠗఠ оҕ, ࠙ࢳө ݽٚ җਸ ॆਵ۽
! 4 Jupyter ֢࠘ਸ ాೠ ఐ࢝ ؘఠ ࠙ࢳ " 4 ؊ নೠ ࠙ঠী ӝ҅ णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
ӝ҅ण बച 4 Ө ח ഝਊਸ ਤ೧ ӝࠄ ۿਸ ؊
ҕࠗೞ ! 4 જ Hypothesisܳ ٜ݅ ࣻ ѱ ػ 4 ୭ചܳ ೡ ࣻ ѱ ػ 4 ೞա ࢚ ঌҊ્ܻਸ ࢎਊ೧ ࠁ 4 SVM, Neural Net ١ নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
ࣁਘ ൗ۞... ࢜۽ ۽Ӓ ࣻ/࠙ࢳ ജ҃ 4 RSync ߑध ->
Fluentd/Kinesis पदр ۽Ӓ ࣻ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 4 Columnar ߄ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙ ܻ / Cache ӝמ(߈ࠂ णী ъ) 4 ױࣘ Spark ۞झఠ(20 VMs = 80য, 320GB ۔)۽ ਊ (दр 3000ਗ ب) PyCon APAC 2016 77
ઑ 4 ӝ҅ण ղо ೞ۰ח ੌী ೠ ౸ױ ! 4
য࠭ ౠࢿ ױࣽೞݶ ాੋ ߑߨਵ۽ оמ 4 ఐ࢝ ؘఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ ঈೞ 4 নೠ ݽ؛/ೖܳ పझ೧ࠁ 4 ण ݽ؛ী ٮۄ ೖ ӏച/Үചо ਃೡ ࣻ ਵפ 4 ېझр Imbalance ޙઁী PyCon APAC 2016 78
٩۞? ӝ҅ण? 4 ٩۞ 4 Үೠ ೖ ূפয݂ ਃ হ
4 ݆ ಁ۞ఠ = ݆ ؘఠо ਃ 4 ӝ҅ण 4 ೖ স ਃೞ݅ 4 ಁ۞ఠ = ؘఠ۽ب ബ җ PyCon APAC 2016 79
࢚ 4 ؘఠ ূפয݂ য۰ 4 ؘఠ ഛࠁо о ਃ
4 झನۄܳ ߉ח ࠙ঠח য়۰ ݎ যف 4 োҳо ইפۄݶ ҷڣস/࢜ ؘఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘఠ ࠙ࢳоо ਃೠ द 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑਸ పझ ೡ ࣻ ݶ? ! PyCon APAC 2016 80
ਵ۽... ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ হ݅, ח Ѫۢ
ࠁח ҃ 4 ؘఠী݅ ೞ ݈Ҋ, بݫੋਸ ೧ೞ! PyCon APAC 2016 81
хࢎפ. PyCon APAC 2016 82
ଵҊ ݂ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf
4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83