Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
기계학습을 활용한 게임 어뷰징 검출
Search
JeongJu Kim
August 16, 2016
Technology
1
1.1k
기계학습을 활용한 게임 어뷰징 검출
PyConAPAC 2016에서 발표한 문서입니다.
JeongJu Kim
August 16, 2016
Tweet
Share
More Decks by JeongJu Kim
See All by JeongJu Kim
IPython과 Pandas를 활용한 게임데이터 분석 - PyConKR 2014
haje01
7
1.8k
Other Decks in Technology
See All in Technology
マルチプロダクト×マルチテナントを支えるモジュラモノリスを中心としたアソビューのアーキテクチャ
disc99
0
290
ロールが細分化された組織でSREと協働するインフラエンジニアは何をするか? / SRE Lounge #18
kossykinto
0
170
大規模イベントに向けた ABEMA アーキテクチャの遍歴 ~ Platform Strategy 詳細解説 ~
nagapad
0
190
Kiroでインフラ要件定義~テスト を実施してみた
nagisa53
3
300
Vision Language Modelと自動運転AIの最前線_20250730
yuyamaguchi
3
1.1k
Mambaで物体検出 完全に理解した
shirarei24
2
210
モバイルゲームの開発を支える基盤の歩み ~再現性のある開発ラインを量産する秘訣~
qualiarts
0
1.1k
オブザーバビリティプラットフォーム開発におけるオブザーバビリティとの向き合い / Hatena Engineer Seminar #34 オブザーバビリティの実現と運用編
arthur1
0
340
Findy Freelance 利用シーン別AI活用例
ness
0
300
Kiroから考える AIコーディングツールの潮流
s4yuba
4
660
AI によるドキュメント処理を加速するためのOCR 結果の永続化と再利用戦略
tomoaki25
0
390
AIのグローバルトレンド 2025 / ai global trend 2025
kyonmm
PRO
1
120
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
420
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Faster Mobile Websites
deanohume
308
31k
Fireside Chat
paigeccino
38
3.6k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
A Tale of Four Properties
chriscoyier
160
23k
Transcript
ӝ҅णਸ ഝਊೠ ѱ য࠭ Ѩ ӣ PyCon APAC 2016 PyCon
APAC 2016 1
ߊ ࣗѐ ӣ (
[email protected]
) : ѱ ѐߊ - NHN /
NPLUTO - 3D ূ / ѱ ۄ ѐߊ അ: ѱ ؘఠ ࣻ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
ߊח 4 ӝ҅णী ೠ ӝࠄ ध ח ٜ࠙ਸ ࢚
4 ॆਸ ഝਊೠ ؘఠ ࠙ࢳҗ ӝ҅ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅णਸ بੑೞח ҅ӝо غਵݶ פ PyCon APAC 2016 3
द زӝ 4 ѱ য࠭ ઁܳ 4 ਬ नҊ /
GM ݽפఠ݂ / ಁఢ ӝ۽ח ೠ҅ 4 ࢎۈ ѐੑ ୭ࣗചػ য࠭ ఐ दझమਸ ٜ݅ PyCon APAC 2016 4
ѱ য࠭ۆ? 4 “ӝദਵ۽ بೞ ঋ ߑधਵ۽ ѱ ࠁܳ
ദٙೞѢա ب ਸ ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚ ਸ ਊೠ ۨ 4 ೧ఊ ోਸ ࢎਊೠ ࠺࢚ ۨ 4 ହী بߓ۽ ҟҊ PyCon APAC 2016 5
ా҅৬ ఐ࢝ ؘఠ ࠙ࢳ PyCon APAC 2016 6
ࢶ, ా҅ 4 ా҅ח ೂࠗೞ ޅೠ ؘఠ৬ ஹೊ ਕ ജ҃ীࢲ
ߊ 4 ా҅ ٜ ؘఠ/҅ਸ ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅যӝী, ؘఠীࢲب оܳ ߊѼೡ ࣻ 4 ӝࠄੋ ా҅ ध ѐߊ, ӝദ, ࢲ࠺झ ١ী ب ؽ PyCon APAC 2016 7
ఐ࢝ ؘఠ ࠙ࢳ 4 ؘఠী ऀযח ࠁܳ, 4 নೠ пب۽
ਃড, दпച ೧ࠁݴ ח җ ! 4 ೞח ؘఠח җࠗఠ 4 दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
ࢎ۹1 рױೠ ా҅ ই٣য۽ झಁݠ Ѩ PyCon APAC 2016 9
࢚ട 4 नӏ য়ೠ ѱ ହ ѱ ইమ ҟҊӖ۽ оٙ
! 4 ೧ ҅ਸ ઁ೧ب ߄۽ ࢜ ҅ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁо ਃೞৈ, ӝ҅णਸ ೯ೞӝীח दр ࠗ PyCon APAC 2016 10
ਸ ਊೠ झಅ (Spam) 4 ѱ ղীࢲ ਵ۽ ݠפ/ইమ
౸ݒ ҟҊ 4 য࠭ח ۽Ӓ۔ ౸߹ਸ ݄ӝਤ೧ ݫ दܳ դةച PyCon APAC 2016 11
झಁݠ Ѩ 4 নೠ ߑߨ оמೞѷਵա, 4 োয ܻա ӝ҅णэ
Ҋә Ӕࠁ, 4 рױೠ ా҅ ই٣য۽ दب PyCon APAC 2016 12
ৡۄੋ ݫद ӡ ࠙ನ 4 ੌ߈ਵ۽ ۽Ӓ ӏ࠙ನܳ
ٮܲҊ ঌ ۰ઉ. 4 ইېח NPS Chat Corpus ݫद ӡ ࠙ನ PyCon APAC 2016 13
ѱ ղ ݫद ӡ ࠙ನ 4 ৡۄੋ җ ࠺तೞա
ખ ؊ فԁ ҃ ೱ 4 ౠ ӡ ݫदо (?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
ই٣য 4 ੌ߈ ਬ: ݫद ӡо নೞҊ, ࠼بо ݆ ঋ
4 झಁݠ: ݫद ӡо নೞ ঋҊ, ࠼بח ֫ 4 , যڃ ਬ ࠼بо ֫Ҋ ӡо নೞ ঋਵݶ झಁݠ PyCon APAC 2016 15
рױೠ Ѩ ҕध 4 ਬ ߹ പࣻ / ݫद
ӡ ઙܨ ࣻ 4 ࠺तೠ ӡ ݫदܳ ࠁյ ࣻ۾ ч ழ PyCon APAC 2016 16
࠙ܨ 4 spam_ratioо ӝળ ч ࢚ੋ Ѫਸ झಁݠ۽ р 4
ӝળ ч Ѿ ോܻझ౮ೞѱ... 4 ࢸ റ, ࠙ܨػ நܼఠ ݫद ഛੋਵ۽ ч ઑ PyCon APAC 2016 17
࠙ܨ റ ݫद ӡ ࠙ನ 4 ࠼بо ֫ ౠ ӡ
ݫद(= झಅ)о ܻ࠙غ PyCon APAC 2016 18
Ѿҗ ਊ 4 ҳഅ рױ೮݅, য়ఐ оמࢿ 4 ӝળ
чਸ ֫ѱ ই न܉بܳ ֫ 4 Ѿҗܳ оҊ ઁ PyCon APAC 2016 19
ѐࢶ ߑೱ 4 ӝળ ч Ѿਸ ખ ؊ җੋ ߑߨਵ۽
4 োয ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ਃب(TF-IDF) Ҋ۰ 4 ӝ҅ण ঌҊ્ܻ ਊ PyCon APAC 2016 20
ӝ҅ण ࣗѐ PyCon APAC 2016 21
ӝ҅णਸ ॳח ਬ 4 ֢۱ਵ۽ ҡଳ Ѿҗޛ 4 নೠ
ޙઁী ೠ ੌ߈ੋ ࣛܖ࣌ 4 ࣻ ౠࢿ(ೖ)ਸ زदী Ҋ۰ೡ ࣻ 4 ؘఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
࠙ܨ৬ ഥӈ 4 ӝ҅ण ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա
4 ࠙ܨ - ઙܨܳ ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ஏ ೞח Ѫ 4 য࠭ Ѩ ࠙ܨী ࣘೣ PyCon APAC 2016 23
ب णҗ ਯ ण 4 ب ण(Supervised Learning) 4 ӝઓ
҃ী ೧ ࠙ܨػ ࢠ ؘఠо ਸ ٸ 4 ਯ ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ ؘఠо হਸ ٸ 4 ࠗ࠙ ؘఠח ࠙ܨغয ঋ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
ӝ҅ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ
ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
ঌҊ્ܻ ࢶఖ? 4 ੌ߈ਵ۽ Ҋә ঌҊ્ܻ ؊ ࠂೠ ݽ؛ ण
оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ ޖઑѤ જ Ѫ ইש 4 ण Ѿҗܳ ࢎۈ ೧ೞӝীח ӝࠄ ঌҊ્ܻ જ PyCon APAC 2016 26
ஏী ೠ ಣо 4 ഛࢿী ೠ о ਃ ! 4
Q: ਬ 100ݺ 2ݺ ח য࠭ܳ Ѩೞ۰ ೠ. पࣻ۽ ݽف ࢚ ਬ۽ ౸ױ೮ਸ ٸ ഛبח? 4 A: 100ݺ 2ݺ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
ஏ ױਤ 4 ب(Precision) അਯ(Recall)җ ١ নೠ ױਤ 4 ب:
Ѫ ݃ա য࠭ੋо? 4 അਯ: য࠭ ݃ա ওחо? 4 ؘఠо ࠛӐഋ(Imbalance)ੌٸח ౠ ب৬ അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ ҃ח അਯ 0 PyCon APAC 2016 28
P/R Curve ৬ AUC જ ࠙ܨӝח? PyCon APAC 2016 29
ࢎ۹2 ӝ҅णਵ۽ ߁ Ѩ PyCon APAC 2016 30
࢚ട 4 ۄ࠳ ѱীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ߁ ۨо
ഝѐ ! 4 ߁: ѱ ղ ചܳ ࠺ ࢚ੋ ߑߨਵ۽ णٙ 4 ࠈ ౠࢿਸ ೞա ل۽ ౠೞӝ য۰ → ӝ҅ण ਃ PyCon APAC 2016 31
ण ߑध ࢶఖ 4 Ҷ ۡ֔/٩۞ਵ۽ ೡ ਃח হח ٠…
4 җѢ ۽Ӓо غҊ Ҋ, 4 ஏীࢲ ӝઓ য࠭ நܼఠ ܻझܳ оҊ ! → ӝ҅ण, ౠ ب ण оמ! 4 Decision Tree ߑध ب णਵ۽ Ѿ PyCon APAC 2016 32
ળ࠺ җ 1. ۽Ӓ ࣻ ࢚క ഛੋ 2. ۽Ӓ ҳઑ/
ঈ 3. णਸ ਤೠ ೖ(Feature) ୶ PyCon APAC 2016 33
ӝ҅णب ۽Ӓ ࣻࠗఠ 4 ۽Ӓܳ ҅ਵ۽ ݽਵח Ѫب औ ঋ
4 ࠙ࢳ/णী Ѧܻח दр 10~20% ب 4 ؘఠܳ ݽਵҊ оҕೞחؘ ࠗ࠙ दр Ѧܽ. 4 ۽Ӓ ഋध оә Ӓ۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ࠙ܨ೧ (ࢲߡ/۽Ӓ ઙܨ, द ߹۽) 4 ۄ٘ झషܻ(S3) ୶ୌ ☁ PyCon APAC 2016 34
ਦب ࢲߡীࢲ ۽Ӓ ࣻೞӝ 4 ѱ ࢲߡח ࠗ࠙ ਦب ӝ߈
4 য় ࣗझ જ ోٜ(fluentd, logstash ١)ਸ ॳҊ रਵա 4 ਦب ࢲߡী ࢸо औ ঋҊ, ੌࠗ ӝמ ࠗ 4 ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ ۽Ӓ ੌਸ RSync۽ زӝೞѢա 4 ѱ DBী ࣘೞৈ Dump റ ࣠ PyCon APAC 2016 35
۽Ӓо ࣻ غਵݶ ೖܳ ٜ݅ 4 ೖ(Feature, ౠࢿ): ण ࢚
ౠਸ ࢸݺ೧ח ч 4 ) чਸ ஏೞח ҃ ! → ӝ, ߑೱ, ജ҃, Үా, ಞदࢸ ١ ೖ PyCon APAC 2016 36
ೖ ѐߊ(Feature Engineering) 4 (࠺)ഋ ؘఠীࢲ ೖܳ Ҋ ࢤࢿೞח স
4 ܲ ೖٜী ղػ ೖܳ ইղӝب ೣ 4 ٸ۽ח ࠂೠ ٘о ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙ ۽Ӓীࢲ ೞنਸ ా೧ ೖ ࢤࢿ PyCon APAC 2016 37
ೞنਸ ॄঠ݅ ೞա? 4 ؘఠо Bigೞ ঋਵݶ ਃ হ 4
न… 4 ߓ Jobਸ য়ۖزউ جܻѢա 4 ӝਵ۽ ETLਸ ా೧ DBী ֍যفח җ ਃೡ ࣻ 4 ࠺ഋ/ਊ ؘఠীࢲ ࠼ߣೠ ೖ ѐߊਸ ೠݶ જ PyCon APAC 2016 38
যڌѱ ॄঠೞա? 4 ೞن ۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ਵա,
ࣇҗ ਊ য۰ 4 ۄ٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ਊ ! - AWS EMR(Elastic Map Reduce) PyCon APAC 2016 39
AWSח ࠺ऱ ঋա? 4 ୭ച ೞݶ ࠺ऱ ঋ ! 4
ਃೡ ٸ݅ ॳח ױࣘ ۞झఠ(Transient Cluster)۽ ਊ 4 Task ֢٘ח ҃ݒ ߑध Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр 0.036$ (ࢲ ܻ, 2016-08-09 ӝળ) PyCon APAC 2016 40
AWS EMR ۞झఠ द ചݶ PyCon APAC 2016 41
ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن ੌ(< 100MB)ٜ ݆
Ѫী ஂড 4 ੌٜ ߽, ࣗ, ୷ೡ ਃ 4 ݃ٶೠ ోਸ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ੌ݅ স, ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ܻ PyCon APAC 2016 42
ݠ, ࣗ & ୷ റ S3ী ػ ۽Ӓ PyCon APAC
2016 43
ೞن MapReduce ٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః
4 ೞن झܿਸ ਊ೧ ॆਵ۽ MR ٬ 4 ۽ஸীࢲ ࢠ ؘఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ ࠁ ખ וܻ݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class
MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ੌ п ۄੋ for word in WORD_RE.findall(line): # ݽٚ ױযী ೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘ Ѿҗܳ ஂ yield word, sum(counts) def reducer(self, word, counts): # ۞झఠ Ѿҗܳ ஂ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
दझమ ҳࢿب PyCon APAC 2016 46
അട ঈ 4 ӝ҅णਸ ਤ೧ 4 GM ઁೞח ӔѢ(=ೖ)৬ 4
ઁػ நܼఠ ܻझܳ ਃ PyCon APAC 2016 47
ೖ ࢤࢿ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 Үೠ
ೖࠁח নೠ ೖܳ 4 যରೖ ࠂਵ۽ ౸ױ 4 ୡӝীח ૣ दрী ೧, উചغݶ ӡѱ PyCon APAC 2016 48
ୡӝী ࡳইࠄ ೖٜ 4 ۽Ӓੋ ࣻ 4 ۨ दр 4
۽Ӓ ইਓ ࠛ࠙ݺೠ ҃о ݆ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ইమ/ݠפ णٙ ࣻ 4 ௮झ ઙܐ ࣻ 4 NPC/PC р ై ࣻ PyCon APAC 2016 49
ೖ ఋੑ? 4 ѱ पࣻ ഋ, పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽
աׇ 4 оә पࣻ ഋਵ۽ ాੌೞח Ѫ ߄ۈ 4 Bool 0, 1۽ 4 పҊܻ ఋੑ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
ٜ݅য ೖ 4 ױࣽ ఫझ (.txt) ੌ 4 நܼఠݺ
+ ೖ ߓৌ ഋध PyCon APAC 2016 51
ӝ҅ण ೯ PyCon APAC 2016 52
ӝ҅ण о߶ 4 ୭ઙ ೖ ੌ ӝо Ҋ, ӝ҅ण
ࣻ೯ب о߶ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమۢ ݽٚ ؘఠܳ ࠊঠೞח ण ޖѢ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭ ೞಌ ಁ۞ఠܳ Ѿೞח Ѫ җઁ 4 নೠ ࣇਵ۽ ৈ۞ߣ प೧ࠊঠ 4 ࠙ दझమਸ ഝਊೞח ҃ب... PyCon APAC 2016 53
যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द рױೠ Ѫਵ۽ 4
࠺तೠ ࢎ۹ ࢶ೯ োҳо ਵݶ ଵҊೞ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
Decision Tree۽ द 4 ࠂೞ ঋҊ ౸ױ җ ೧о ਊ
4 ॆ Scikit-Learn ಁః Ѫਸ ࢎਊ 4 নೠ ӝ҅ण ঌҊ્ܻਸ प ઁҕ 4 ੋఠಕझо ాੌغয য ݽ؛ Үо ਊ 4 ೖ(X)৬ য࠭ ৈࠗ(y)ܳ ֍Ҋ ण 4 DTח ೖ ӏച ਃ হয ಞܻ PyCon APAC 2016 55
DT ࢎਊ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from
sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
PyCon APAC 2016 57
Decision Tree ण җ 1. ೖ ੌীࢲ ӝઓ য࠭ ೖܳ
Ҋ 2. زࣻ ࢚ ਬ ೖ ҳೣ 4 Under Sampling 3. ؘఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞ఠ۽ ण द PyCon APAC 2016 58
ୡӝ Ѿҗ 4 ಣӐ ഛب 80% ب 4 Binary Class
࠙ܨ ҃ ࣻо ੜ աয়ח ಞ 4 աࢁ ঋѪ э݅, 4 ஏ Ѿҗо ઁ ӔѢ۽ ॳੋח ীࢲ ݆ ࠗ PyCon APAC 2016 59
ഛبܳ ৢܻ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘఠ ࣇਸ ܻ࠙
ೞҊ 4 GridSearchCVܳ ా೧ ୭ ೞಌ ಁ۞ఠܳ 4 ಣӐ ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח ೠ ߣ ࠁҊ र tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
PyCon APAC 2016 61
Ѿ ܻܳ ࠁפ... 4 णػ ݽ؛ যڃ ӝળਵ۽ ౸ױೞח ঌ
ࣻ → নೠ ҵ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ೧ח ޙઁ 4 DTח җ(Overfitting)غӝ औӝী, Depthо ցޖ Ө ঋѱ PyCon APAC 2016 62
ৈӝࢲ ؊ ࢚ ࣻо ৢۄо ঋ 4 GMשҗ ࢚ റ
࢜۽ ೖٜ ୶о 4 زदী ইమ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ ېझ݅ ࢶఖ 4 ঋҊ ইమਸ ࣻ 4 դ೧೧ ࠁח Ѫٜب ೖ۽ ٜ݅ ࣻ ח Ѫ ֢ೞ 4 ) 'ࠈ ےؒೞѱ ࢤࢿػ ܴਸ оҊ যਃ'' PyCon APAC 2016 63
) நܼఠ ܴ ےؒࢿ ౸ױ (/ݽ അ ಁఢ) ## நܼఠ
ܴ ߊ оמೠ ౸ױೞח गب ٘ # ܴਸ ݽ बࠅ۽ ߄Է(1о , 2о ݽ) # ) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # җ э ಁఢ ਵݶ ߊ оמ (प۽ח ؊ ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
ഛೠ ߑߨ ইפ݅... ࠂਵ۽ ౸ױೞӝী ب ؽ PyCon APAC 2016
65
୶о ೖ۽ झযо ೱ࢚, Ӓ۞ա… 4 ಣӐ ഛب 96%۽ ೱ࢚.
ࣻח ֫ ಞ݅, 4 प ਊ೧ࠄ Ѿҗ 4 GMש ഛੋ җীࢲ য়ఐ Ԩ ա১ ! 4 DecisionTree Ҋੋ җ ޙઁ۽ ౸ױ PyCon APAC 2016 66
Random Forest۽ Ү 4 ݆ Decision Tree ܳ ઑೠ ঔ࢚࠶
పץ 4 ࣻ DTܳ ࠙ ण(=ӏച ബҗ) दఃҊ ైೞח ߑध 4 ࣻо ծইب উੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উੋ 95% PyCon APAC 2016 67
Random Forest ण 4 ӝࠄਵ۽ Decision Tree৬ ࠺त 4 max_depth,
min_samples_leaf ݽ؛ ࠂبܳ ઑ. ѱ द೧ࢲ ઑӘঀ ఃਕࠄ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ Ѿ ! 4 ցޖ ݶ णदр ӡҊ, ցޖ ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
RF ਊ റ Ѿҗ 4 ഛبח 95% 4 ࠗೞѱ ҅
߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ஏ ഛܫب Ҋ 4 ഛܫ ֫(>70%) ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%ب അਯ(Recall) ೞۅ ୶ 4 Ӓ۞ա, ب(Precision)ח… PyCon APAC 2016 69
100% ׳ࢿ GMש ࣻসਵ۽ Ѩష೧ न Ѿҗ… ! PyCon APAC
2016 70
ওਵפ ઁܳ... 4 2ѐਘৈী Ѧ ઁ 4 ోਸ ࢎਊೠ ߁
ࠗ࠙ ࢎۄ! ! 4 ӝ/ࣘਵ۽ ઁܳ ೧ঠ ബҗо PyCon APAC 2016 71
ଵҊ: ୭ઙ ೖ ਃب PyCon APAC 2016 72
ѐࢶ ߑೱ 4 Ѩػ Ѿҗܳ ਊ೧ ण ݽ؛ ѐࢶ 4
ࠈ ҅ী ೠ PIIܳ ࣻ೧فݶ नӏ ࠈ णী ਊೡ Ѫ 4 ઁ റ ߸ઙ ࠈ ݽפఠ݂ ਃ PyCon APAC 2016 73
റӝ PyCon APAC 2016 74
ו՛ 4 ؘఠ ࣻࠗఠ оҕ, ࠙ࢳө ݽٚ җਸ ॆਵ۽
! 4 Jupyter ֢࠘ਸ ాೠ ఐ࢝ ؘఠ ࠙ࢳ " 4 ؊ নೠ ࠙ঠী ӝ҅ णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
ӝ҅ण बച 4 Ө ח ഝਊਸ ਤ೧ ӝࠄ ۿਸ ؊
ҕࠗೞ ! 4 જ Hypothesisܳ ٜ݅ ࣻ ѱ ػ 4 ୭ചܳ ೡ ࣻ ѱ ػ 4 ೞա ࢚ ঌҊ્ܻਸ ࢎਊ೧ ࠁ 4 SVM, Neural Net ١ নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
ࣁਘ ൗ۞... ࢜۽ ۽Ӓ ࣻ/࠙ࢳ ജ҃ 4 RSync ߑध ->
Fluentd/Kinesis पदр ۽Ӓ ࣻ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 4 Columnar ߄ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙ ܻ / Cache ӝמ(߈ࠂ णী ъ) 4 ױࣘ Spark ۞झఠ(20 VMs = 80য, 320GB ۔)۽ ਊ (दр 3000ਗ ب) PyCon APAC 2016 77
ઑ 4 ӝ҅ण ղо ೞ۰ח ੌী ೠ ౸ױ ! 4
য࠭ ౠࢿ ױࣽೞݶ ాੋ ߑߨਵ۽ оמ 4 ఐ࢝ ؘఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ ঈೞ 4 নೠ ݽ؛/ೖܳ పझ೧ࠁ 4 ण ݽ؛ী ٮۄ ೖ ӏച/Үചо ਃೡ ࣻ ਵפ 4 ېझр Imbalance ޙઁী PyCon APAC 2016 78
٩۞? ӝ҅ण? 4 ٩۞ 4 Үೠ ೖ ূפয݂ ਃ হ
4 ݆ ಁ۞ఠ = ݆ ؘఠо ਃ 4 ӝ҅ण 4 ೖ স ਃೞ݅ 4 ಁ۞ఠ = ؘఠ۽ب ബ җ PyCon APAC 2016 79
࢚ 4 ؘఠ ূפয݂ য۰ 4 ؘఠ ഛࠁо о ਃ
4 झನۄܳ ߉ח ࠙ঠח য়۰ ݎ যف 4 োҳо ইפۄݶ ҷڣস/࢜ ؘఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘఠ ࠙ࢳоо ਃೠ द 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑਸ పझ ೡ ࣻ ݶ? ! PyCon APAC 2016 80
ਵ۽... ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ হ݅, ח Ѫۢ
ࠁח ҃ 4 ؘఠী݅ ೞ ݈Ҋ, بݫੋਸ ೧ೞ! PyCon APAC 2016 81
хࢎפ. PyCon APAC 2016 82
ଵҊ ݂ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf
4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83