Upgrade to Pro — share decks privately, control downloads, hide ads and more …

기계학습을 활용한 게임 어뷰징 검출

기계학습을 활용한 게임 어뷰징 검출

PyConAPAC 2016에서 발표한 문서입니다.

JeongJu Kim

August 16, 2016
Tweet

More Decks by JeongJu Kim

Other Decks in Technology

Transcript

  1. ߊ಴੗ ࣗѐ ӣ੿઱ ([email protected]) ੹: ѱ੐ ѐߊ - NHN /

    NPLUTO - 3D ূ૓ / ѱ੐ ௿ۄ੉঱౟ ѐߊ അ: ѱ੐ ؘ੉ఠ ࣻ૘ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
  2. ੉ ߊ಴ח 4 ӝ҅೟णী ؀ೠ ӝࠄ ૑ध੉ ੓ח ٜ࠙ਸ ؀࢚

    4 ౵੉ॆਸ ഝਊೠ ؘ੉ఠ ࠙ࢳҗ ӝ҅೟ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅೟णਸ بੑೞח ҅ӝо غ঻ਵݶ ೤פ׮ PyCon APAC 2016 3
  3. द੘ زӝ 4 ѱ੐ য࠭૚ ઁ੤ܳ 4 ਬ੷ नҊ /

    GM ݽפఠ݂ / ಁఢ ଺ӝ۽ח ೠ҅ 4 ࢎۈ੄ ѐੑ੉ ୭ࣗചػ য࠭૚ ఐ૑ दझమਸ ٜ݅੗ PyCon APAC 2016 4
  4. ѱ੐ য࠭૚੉ۆ? 4 “ӝദ੸ਵ۽ ੄بೞ૑ ঋ਷ ߑधਵ۽ ѱ੐ ੿ࠁܳ ؀۝

    ദٙೞѢա ب ਑ਸ ઱ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚੄ ೹੼ਸ ੉ਊೠ ೒ۨ੉ 4 ೧ఊ ోਸ ࢎਊೠ ࠺੿࢚ ೒ۨ੉ 4 ੹୓ ଻౴ହী بߓ۽ ҟҊ PyCon APAC 2016 5
  5. ਋ࢶ, ా҅ 4 ా҅ח ೂࠗೞ૑ ޅೠ ؘ੉ఠ৬ ஹೊ౴ ౵ਕ੄ ജ҃ীࢲ

    ߊ੹ 4 ా҅ ೟੗ٜ਷ ؘ੉ఠ/҅࢑ਸ ઴੉ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅য઎ӝী, ੸਷ ؘ੉ఠীࢲب о஖ܳ ߊѼೡ ࣻ ੓਺ 4 ӝࠄ੸ੋ ా҅ ૑ध਷ ѐߊ, ӝദ, ࢲ࠺झ ١ী ௾ ب਑੉ ؽ PyCon APAC 2016 7
  6. ఐ࢝੸ ؘ੉ఠ ࠙ࢳ 4 ؘ੉ఠী ऀয੓ח ੿ࠁܳ, 4 ׮নೠ пب۽

    ਃড, दпച ೧ࠁݴ ଺ח җ੿ ! 4 ୊਺ ੽ೞח ؘ੉ఠח ੉ җ੿ࠗఠ 4 ੗୓ दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
  7. ࢚ട 4 नӏ য়೑ೠ ѱ੐੄ ଻౴ହ੉ ѱ੐ ই੉మ ҟҊӖ۽ оٙ

    ! 4 ೧׼ ҅੿ਸ ઁ੤೧ب ߄۽ ࢜ ҅੿ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁ੤о ೙ਃೞৈ, ӝ҅೟णਸ ૓೯ೞӝীח दр੉ ࠗ઒ PyCon APAC 2016 10
  8. ଻౴ਸ ੉ਊೠ झಅ (Spam) 4 ѱ੐ ղীࢲ ੹৉ ଻౴ਵ۽ ݠפ/ই੉మ

    ౸ݒ ҟҊ 4 য࠭੷ח ೐۽Ӓ۔੸ ౸߹ਸ ݄ӝਤ೧ ݫ द૑ܳ դةച PyCon APAC 2016 11
  9. झಁݠ Ѩ୹ 4 ׮নೠ ߑߨ੉ оמೞѷਵա, 4 ੗োয ୊ܻա ӝ҅೟णэ਷

    Ҋә ੽Ӕࠁ׮, 4 рױೠ ా҅੸ ই੉٣য۽ दب PyCon APAC 2016 12
  10. ৡۄੋ ଻౴ ݫद૑ ӡ ੉੄ ࠙ನ 4 ੌ߈੸ਵ۽ ۽Ӓ ੿ӏ࠙ನܳ

    ٮܲ׮Ҋ ঌ ۰ઉ੓׮. 4 ইېח NPS Chat Corpus੄ ݫद૑ ӡ੉ ࠙ನ PyCon APAC 2016 13
  11. ѱ੐ ղ ଻౴ ݫद૑ ӡ੉ ࠙ನ 4 ৡۄੋ ଻౴җ ࠺तೞա

    ખ ؊ فԁ਍ ҃ ೱ 4 ౠ੿ ӡ੉ ݫद૑о ౗(?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
  12. ই੉٣য 4 ੌ߈ ਬ੷: ݫद૑ ӡ੉о ׮নೞҊ, ࠼بо ݆૑ ঋ਺

    4 झಁݠ: ݫद૑ ӡ੉о ׮নೞ૑ ঋҊ, ࠼بח ֫਺ 4 ૊, যڃ ਬ੷੄ ଻౴ ࠼بо ֫Ҋ ӡ੉о ׮নೞ૑ ঋਵݶ झಁݠ PyCon APAC 2016 15
  13. рױೠ Ѩ୹ ҕध 4 ਬ੷ ߹ ଻౴੄ പࣻ / ݫद૑

    ӡ੉ ઙܨ ࣻ 4 ࠺तೠ ӡ੉੄ ଻౴ ݫद૑ܳ ੗઱ ࠁյ ࣻ۾ ч੉ ழ૗ PyCon APAC 2016 16
  14. ࠙ܨ 4 spam_ratioо ӝળ ч ੉࢚ੋ Ѫਸ झಁݠ۽ р઱ 4

    ӝળ ч Ѿ੿਷ ോܻझ౮ೞѱ... 4 ؀୽ ࢸ੿ റ, ࠙ܨػ நܼఠ੄ ݫद૑ ഛੋਵ۽ ч ઑ੺ PyCon APAC 2016 17
  15. ࠙ܨ റ ݫद૑ ӡ੉ ࠙ನ 4 ࠼بо ֫਷ ౠ੿ ӡ੉੄

    ݫद૑(= झಅ)о ܻ࠙غ঻਺ PyCon APAC 2016 18
  16. Ѿҗ ੸ਊ 4 ҳഅ੉ рױ೮૑݅, য়ఐ੄ оמࢿ ੓਺ 4 ӝળ

    чਸ ֫ѱ ੟ই न܉بܳ ֫੐ 4 ੉ Ѿҗܳ о૑Ҋ ઁ੤ PyCon APAC 2016 19
  17. ѐࢶ ߑೱ 4 ӝળ ч Ѿ੿ਸ ખ ؊ җ೟੸ੋ ߑߨਵ۽

    4 ੗োয ୊ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ઺ਃب(TF-IDF) Ҋ۰ 4 ӝ҅೟ण ঌҊ્ܻ ੸ਊ PyCon APAC 2016 20
  18. ӝ҅೟णਸ ॳח ੉ਬ 4 ੸਷ ֢۱ਵ۽ ҡଳ਷ Ѿҗޛ 4 ׮নೠ

    ޙઁী ؀ೠ ੌ߈੸ੋ ࣛܖ࣌ 4 ׮ࣻ੄ ౠࢿ(ೖ୛)ਸ زदী Ҋ۰ೡ ࣻ ੓׮ 4 ؘ੉ఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
  19. ࠙ܨ৬ ഥӈ 4 ӝ҅೟ण਷ ௼ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա׍

    4 ࠙ܨ - ઙܨܳ ৘ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ৘ஏ ೞח Ѫ 4 য࠭૚ Ѩ୹਷ ࠙ܨী ࣘೣ PyCon APAC 2016 23
  20. ૑ب ೟णҗ ੗ਯ ೟ण 4 ૑ب ೟ण(Supervised Learning) 4 ӝઓ

    ҃೷ী ੄೧ ࠙ܨػ ࢠ೒ ؘ੉ఠо ੓ਸ ٸ 4 ੗ਯ ೟ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ೒ ؘ੉ఠо হਸ ٸ 4 ؀ࠗ࠙੄ ؘ੉ఠח ੸੺൤ ࠙ܨغয ੓૑ ঋ׮ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
  21. ӝ҅೟ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽૑झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ੿

    ౟ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ౟(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
  22. ঌҊ્ܻ੄ ࢶఖ਷? 4 ੌ߈੸ਵ۽ Ҋә ঌҊ્ܻ਷ ؊ ࠂ੟ೠ ݽ؛ ೟ण

    оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ੉ ޖઑѤ જ਷ Ѫ਷ ইש 4 ೟ण੄ Ѿҗܳ ࢎۈ੉ ੉೧ೞӝীח ӝࠄ ঌҊ્ܻ੉ જ׮ PyCon APAC 2016 26
  23. ৘ஏী ؀ೠ ಣо 4 ੿ഛࢿী ؀ೠ ੿੄о ೙ਃ ! 4

    Q: ਬ੷ 100ݺ ઺ 2ݺ ੓ח য࠭੷ܳ Ѩ୹ೞ۰ ೠ׮. पࣻ۽ ݽف ੿࢚ ਬ੷۽ ౸ױ೮ਸ ٸ ੿ഛبח? 4 A: 100ݺ ઺ 2ݺ੉ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
  24. ஏ੿ ױਤ 4 ੿޻ب(Precision) ੤അਯ(Recall)җ ١ ׮নೠ ױਤ 4 ੿޻ب:

    ଺਷ Ѫ ઺ ঴݃ա ૓૞ য࠭੷ੋо? 4 ੤അਯ: ੹୓ য࠭੷ ઺ ঴݃ա ଺ওחо? 4 ؘ੉ఠо ࠛӐഋ(Imbalance)ੌٸח ౠ൤ ੿޻ب৬ ੤അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ੄ ҃਋ח ੤അਯ੉ 0 PyCon APAC 2016 28
  25. ࢚ട 4 ۄ੉࠳ ѱ੐ীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ౵߁ ೒ۨ੉о

    ഝѐ ! 4 ౵߁: ѱ੐ ղ ੤ചܳ ࠺ ੿࢚੸ੋ ߑߨਵ۽ णٙ 4 ࠈ੄ ౠࢿਸ ೞա ل۽ ౠ੿ೞӝ য۰਑ → ӝ҅೟ण੉ ೙ਃ PyCon APAC 2016 31
  26. ೟ण ߑध ࢶఖ 4 Ҷ੉ ׏ۡ֔/٩۞׬ਵ۽ ೡ ೙ਃח হח ٠…

    4 җѢ ۽Ӓо ੷੢غҊ ੓঻Ҋ, 4 ਍৔ஏীࢲ ӝઓ য࠭੷ நܼఠ ܻझ౟ܳ о૑Ҋ ੓঻਺ ! → ӝ҅೟ण, ౠ൤ ૑ب ೟ण੉ оמ! 4 Decision Tree ߑध੄ ૑ب ೟णਵ۽ Ѿ੿ PyCon APAC 2016 32
  27. ળ࠺ җ੿ 1. ۽Ӓ ࣻ૘ ࢚క ഛੋ 2. ۽Ӓ੄ ҳઑ/੄޷

    ౵ঈ 3. ೟णਸ ਤೠ ೖ୛(Feature) ୶୹ PyCon APAC 2016 33
  28. ӝ҅೟णب ۽Ӓ ࣻ૘ࠗఠ 4 ۽Ӓܳ ୓҅੸ਵ۽ ݽਵח Ѫب औ૑ ঋ਺

    4 ࠙ࢳ/೟णী Ѧܻח दр਷ 10~20% ੿ب 4 ؘ੉ఠܳ ݽਵҊ оҕೞחؘ ؀ࠗ࠙੄ दр੉ Ѧܽ׮. 4 ۽Ӓ ഋध਷ оә੸ Ӓ؀۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ੸੺൤ ࠙ܨ೧ ੷੢ (ࢲߡ/۽Ӓ ઙܨ, द੼ ߹۽) 4 ௿ۄ਋٘ झషܻ૑(S3) ୶ୌ ☁ PyCon APAC 2016 34
  29. ਦب਋ ࢲߡীࢲ ۽Ӓ ࣻ૘ೞӝ 4 ѱ੐ ࢲߡח ؀ࠗ࠙ ਦب਋ ӝ߈

    4 য়೑ ࣗझ੄ જ਷ ోٜ(fluentd, logstash ١)ਸ ॳҊ र঻ਵա 4 ਦب਋ ࢲߡী ࢸ஖о औ૑ ঋҊ, ੌࠗ ӝמ੉ ࠗ઒ 4 ੗୓ ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ਷ ۽Ӓ ౵ੌਸ RSync۽ زӝೞѢա 4 ѱ੐ DBী ੽ࣘೞৈ Dump റ ੹࣠ PyCon APAC 2016 35
  30. ۽Ӓо ࣻ૘ غ঻ਵݶ ೖ୛ܳ ٜ݅੗ 4 ೖ୛(Feature, ౠࢿ): ೟ण ؀࢚੄

    ౠ૚ਸ ࢸݺ೧઱ח ч 4 ৘) ૘ чਸ ৘ஏೞח ҃਋ ! → ૘੄ ௼ӝ, ߑೱ, ജ҃, Үా, ಞ੄दࢸ ١੉ ೖ୛ PyCon APAC 2016 36
  31. ೖ୛ ѐߊ(Feature Engineering) 4 (࠺)੿ഋ ؘ੉ఠীࢲ ೖ୛ܳ ଺Ҋ ࢤࢿೞח ੘স

    4 ׮ܲ ೖ୛ٜী ղ੤ػ ೖ୛ܳ ଺ইղӝب ೣ 4 ٸ۽ח ࠂ੟ೠ ௏٘о ೙ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙۝੄ ۽Ӓীࢲ ೞنਸ ా೧ ೖ୛ ࢤࢿ PyCon APAC 2016 37
  32. ೞنਸ ॄঠ݅ ೞա? 4 ؘ੉ఠо Bigೞ૑ ঋਵݶ ೙ਃ হ਺ 4

    ؀न… 4 ߓ஖ Jobਸ য়ۖزউ جܻѢա 4 ઱ӝ੸ਵ۽ ETLਸ ా೧ DBী ֍যفח җ੿੉ ೙ਃೡ ࣻ ੓਺ 4 ࠺੿ഋ/؀ਊ۝ ؘ੉ఠীࢲ ࠼ߣೠ ೖ୛ ѐߊਸ ೠ׮ݶ જ਺ PyCon APAC 2016 38
  33. যڌѱ ॄঠೞա? 4 ૒੽ ೞن ௿۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ੓ਵա,

    ࣇ౴җ ਍ਊ੄ য۰਑ 4 ௿ۄ਋٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ੉ਊ ! - AWS੄ EMR(Elastic Map Reduce) PyCon APAC 2016 39
  34. AWSח ࠺ऱ૑ ঋա? 4 ୭੸ച ೞݶ ࠺ऱ૑ ঋ਺ ! 4

    ೙ਃೡ ٸ݅ ॳח ױࣘ੸ ௿۞झఠ(Transient Cluster)۽ ੉ਊ 4 Task ֢٘ח ҃ݒ ߑध੄ Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр ׼ 0.036$ (ࢲ਎ ܻ੹, 2016-08-09 ӝળ) PyCon APAC 2016 40
  35. ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن਷ ੘਷ ౵ੌ(< 100MB)ٜ੉ ݆਷

    Ѫী ஂড 4 ੘਷ ౵ੌٜ਷ ߽೤, ࣗ౴, ঑୷ೡ ೙ਃ 4 ݃ٶೠ ోਸ ଺૑ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ౵ੌ݅ ੘স, ੄ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ୊ܻ PyCon APAC 2016 42
  36. ೞن MapReduce ௏٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః૑

    4 ೞن झ౟ܿਸ ੉ਊ೧ ౵੉ॆਵ۽ MR ௏٬ 4 ۽ஸীࢲ ࢠ೒ ؘ੉ఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ੹ ࠁ׮ ખ וܻ૑݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
  37. from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class

    MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ౵ੌ੄ п ۄੋ੄ for word in WORD_RE.findall(line): # ݽٚ ױযী ؀೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘੄ Ѿҗܳ ஂ೤ yield word, sum(counts) def reducer(self, word, counts): # ௿۞झఠ੄ Ѿҗܳ ஂ೤ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
  38. അട ౵ঈ 4 ӝ҅೟णਸ ਤ೧ 4 GM੉ ઁ੤ೞח ӔѢ(=ೖ୛)৬ 4

    ઁ੤ػ நܼఠ ܻझ౟ܳ ਃ୒ PyCon APAC 2016 47
  39. ೖ୛ ࢤࢿ ౲ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 ੿Үೠ

    ೖ୛ࠁ׮ח ׮নೠ ೖ୛ܳ 4 যରೖ ࠂ೤੸ਵ۽ ౸ױ 4 ୡӝীח ૣ਷ दрী ؀೧, উ੿ചغݶ ӡѱ PyCon APAC 2016 48
  40. ୡӝী ࡳইࠄ ೖ୛ٜ 4 ۽Ӓੋ ࣻ 4 ೒ۨ੉ दр 4

    ۽Ӓ ইਓ੉ ࠛ࠙ݺೠ ҃਋о ݆਺ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ই੉మ/ݠפ णٙ ࣻ 4 ௮झ౟ ઙܐ ࣻ 4 NPC/PC р ੹ై ࣻ PyCon APAC 2016 49
  41. ೖ୛੄ ఋੑ਷? 4 ௼ѱ पࣻ ഋ, ஠పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽

    աׇ૗ 4 оә੸ पࣻ ഋਵ۽ ాੌೞח Ѫ੉ ߄ۈ૒ 4 Bool਷ 0, 1۽ 4 ஠పҊܻ ఋੑ਷ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
  42. ੿੘ ӝ҅೟ण਷ о߶਑ 4 ୭ઙ ೖ୛ ౵ੌ ௼ӝо ੘Ҋ, ӝ҅೟ण

    ࣻ೯ب о߶਍ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమ୊ۢ ݽٚ ؘ੉ఠܳ ࠊঠೞח ೟ण਷ ޖѢ਎ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭੸੄ ೞ੉ಌ ಁ۞޷ఠܳ Ѿ੿ೞח Ѫ੉ җઁ 4 ׮নೠ ࣇ౴ਵ۽ ৈ۞ߣ प೷೧ࠊঠ 4 ࠙࢑ दझమਸ ഝਊೞח ҃਋ب... PyCon APAC 2016 53
  43. যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द੘਷ рױೠ Ѫਵ۽ 4

    ࠺तೠ ࢎ۹੄ ࢶ೯ োҳо ੓ਵݶ ଵҊೞ੗ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
  44. Decision Tree۽ द੘ 4 ࠂ੟ೞ૑ ঋҊ ౸ױ җ੿੄ ੉೧о ਊ੉

    4 ౵੉ॆ Scikit-Learn ಁః૑੄ Ѫਸ ࢎਊ 4 ׮নೠ ӝ҅೟ण ঌҊ્ܻਸ ୽प൤ ઁҕ 4 ੋఠಕ੉झо ాੌغয ੓য ݽ؛ Ү୓о ਊ੉ 4 ೖ୛(X)৬ য࠭੷ ৈࠗ(y)ܳ ֍Ҋ ೟ण 4 DTח ೖ୛ ੿ӏച ೙ਃ হয ಞܻ PyCon APAC 2016 55
  45. DT ࢎਊ ৘ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from

    sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
  46. Decision Tree ೟ण җ੿ 1. ೖ୛ ౵ੌীࢲ ӝઓ য࠭੷੄ ೖ୛ܳ

    ଺Ҋ 2. زࣻ੄ ੿࢚ ਬ੷ ೖ୛ ҳೣ 4 Under Sampling 3. ؘ੉ఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞޷ఠ۽ ೟ण द੘ PyCon APAC 2016 58
  47. ୡӝ Ѿҗ 4 ಣӐ ੿ഛب 80% ੿ب 4 Binary Class

    ࠙ܨ੄ ҃਋ ੼ࣻо ੜ աয়ח ಞ 4 աࢁ૑ ঋ਷Ѫ э૑݅, 4 ৘ஏ੄ Ѿҗо ઁ੤੄ ӔѢ۽ ॳੋ׮ח ੼ীࢲ ݆੉ ࠗ઒ PyCon APAC 2016 59
  48. ੿ഛبܳ ৢܻ੗ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘ੉ఠ ࣇਸ ܻ࠙

    ೞҊ 4 GridSearchCVܳ ా೧ ୭੸੄ ೞ੉ಌ ಁ۞޷ఠܳ ଺਺ 4 ಣӐ ੿ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח૑ ೠ ߣ ࠁҊ र׮ tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
  49. Ѿ੿ ౟ܻܳ ࠁפ... 4 ೟णػ ݽ؛੉ যڃ ӝળਵ۽ ౸ױೞח૑ ঌ

    ࣻ ੓਺ → ׮নೠ ૒ҵ੄ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ੟೧૑ח ޙઁ 4 DTח җ੸೤(Overfitting)غӝ औӝী, Depthо ցޖ Ө૑ ঋѱ ઱੄ PyCon APAC 2016 62
  50. ৈӝࢲ ؊ ੉࢚ ੼ࣻо ৢۄо૑ ঋ਺ 4 GMשҗ ࢚੄ റ

    ࢜۽਍ ೖ୛ٜ ୶о 4 زदী ঳਷ ই੉మ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ੿ ௿ېझ݅ ࢶఖ 4 ਑૒੉૑ ঋҊ ই੉మਸ ঳਷ ࣻ 4 դ೧೧ ࠁ੉ח Ѫٜب ೖ୛۽ ٜ݅ ࣻ ੓ח Ѫ੉ ֢ೞ਋ 4 ৘) 'ࠈ਷ ےؒೞѱ ࢤࢿػ ੉ܴਸ о૑Ҋ ੓যਃ'' PyCon APAC 2016 63
  51. ৘) நܼఠ ੉ܴ੄ ےؒࢿ ౸ױ (੗/ݽ੄ ୹അ ಁఢ) ## நܼఠ

    ੉ܴ੉ ߊ਺ оמೠ૑ ౸ױೞח गب ௏٘ # ੉ܴਸ ੗ݽ बࠅ۽ ߄Է(1о ੗਺, 2о ݽ਺) # ৘) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # ׮਺җ э਷ ಁఢ੉ ੓ਵݶ ߊ਺ оמ (प੤۽ח ؊ ׮ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
  52. ୶о ೖ୛۽ झ௏যо ೱ࢚, Ӓ۞ա… 4 ಣӐ ੿ഛب 96%۽ ೱ࢚.

    ੼ࣻח ֫਷ ಞ੉૑݅, 4 प੤ ੸ਊ೧ࠄ Ѿҗ 4 GMש੄ ഛੋ җ੿ীࢲ য়ఐ੉ Ԩ ա১ ! 4 DecisionTree੄ Ҋ૕੸ੋ җ੸೤ ޙઁ۽ ౸ױ PyCon APAC 2016 66
  53. Random Forest۽ Ү୓ 4 ݆਷ Decision Tree ܳ ઑ೤ೠ ঔ࢚࠶

    ప௼ץ 4 ׮ࣻ੄ DTܳ ࠙࢑ ೟ण(=੿ӏച ബҗ) दఃҊ ై಴ೞח ߑध 4 ੼ࣻо ծইب উ੿੸ੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উ੿੸ੋ 95% PyCon APAC 2016 67
  54. Random Forest ೟ण 4 ӝࠄ੸ਵ۽ Decision Tree৬ ࠺त 4 max_depth,

    min_samples_leaf ݽ؛੄ ࠂ੟بܳ ઑ੺. ੘ѱ द੘೧ࢲ ઑӘঀ ఃਕࠄ׮ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ૑ Ѿ੿ ! 4 ցޖ ௼ݶ ೟णदр੉ ӡҊ, ցޖ ੸ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
  55. RF ੸ਊ റ Ѿҗ 4 ੿ഛبח 95% 4 ࠗ׼ೞѱ ૚҅

    ߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ৘ஏ੄ ഛܫب ঳Ҋ 4 ഛܫ੉ ֫਷(>70%) ৘ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%੿ب ੤അਯ(Recall) ೞۅ ୶੿ 4 Ӓ۞ա, ੿޻ب(Precision)ח… PyCon APAC 2016 69
  56. ଺ওਵפ ઁ੤ܳ... 4 2ѐਘৈী Ѧ୛ ઁ੤ 4 ోਸ ࢎਊೠ ౵߁੉

    ؀ࠗ࠙ ࢎۄ૗! ! 4 ઱ӝ੸/૑ࣘ੸ਵ۽ ઁ੤ܳ ೧ঠ ബҗо ੓਺ PyCon APAC 2016 71
  57. ѐࢶ ߑೱ 4 Ѩ୹ػ Ѿҗܳ ੉ਊ೧ ೟ण ݽ؛ ѐࢶ 4

    ࠈ ҅੿ী ؀ೠ PIIܳ ࣻ૘೧فݶ नӏ ࠈ ೟णী ਊ੉ೡ Ѫ 4 ઁ੤ റ ߸ઙ ࠈ ݽפఠ݂ ೙ਃ PyCon APAC 2016 73
  58. ו՛ ੼ 4 ؘ੉ఠ ࣻ૘ࠗఠ оҕ, ࠙ࢳө૑੄ ݽٚ җ੿ਸ ౵੉ॆਵ۽

    ! 4 Jupyter ֢౟࠘ਸ ాೠ ఐ࢝੸ ؘ੉ఠ ࠙ࢳ " 4 ؊ ׮নೠ ࠙ঠী ӝ҅ ೟णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
  59. ӝ҅೟ण बച 4 Ө੉ ੓ח ഝਊਸ ਤ೧ ӝࠄ ੉ۿਸ ؊

    ҕࠗೞ੗ ! 4 જ਷ Hypothesisܳ ٜ݅ ࣻ ੓ѱ ػ׮ 4 ୭੸ചܳ ೡ ࣻ ੓ѱ ػ׮ 4 ೞա ੉࢚੄ ঌҊ્ܻਸ ࢎਊ೧ ࠁ੗ 4 SVM, Neural Net ١ ׮নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
  60. ࣁਘ਷ ൗ۞... ࢜۽਍ ۽Ӓ ࣻ૘/࠙ࢳ ജ҃ 4 RSync ߑध ->

    Fluentd/Kinesis पदр ۽Ӓ ࣻ૘ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 ੷੢ 4 Columnar ߄੉ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙࢑ ୊ܻ / Cache ӝמ(߈ࠂ ೟णী ъ੼) 4 ױࣘ੸ Spark ௿۞झఠ(20 VMs = 80௏য, 320GB ۔)۽ ੉ਊ ઺ (दр ׼ 3000ਗ ੿ب) PyCon APAC 2016 77
  61. ઑ঱ 4 ӝ҅೟ण੉ ղо ೞ۰ח ੌী ੸೤ೠ૑ ౸ױ ! 4

    য࠭૚੄ ౠࢿ੉ ױࣽೞݶ ੹ా੸ੋ ߑߨਵ۽ оמ 4 ఐ࢝੸ ؘ੉ఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ੷ ౵ঈೞ੗ 4 ׮নೠ ݽ؛/ೖ୛ܳ పझ౟೧ࠁ੗ 4 ೟ण ݽ؛ী ٮۄ ೖ୛ ੿ӏച/૒Үചо ೙ਃೡ ࣻ ੓ਵפ ୓௼ 4 ௿ېझр Imbalance ޙઁী ઱੄ PyCon APAC 2016 78
  62. ٩۞׬? ӝ҅೟ण? 4 ٩۞׬ 4 ੿Үೠ ೖ୛ ূ૑פয݂੉ ೙ਃ হ਺

    4 ݆਷ ಁ۞޷ఠ = ݆਷ ؘ੉ఠо ೙ਃ 4 ӝ҅೟ण 4 ೖ୛ ੘স੉ ઺ਃೞ૑݅ 4 ੸਷ ಁ۞޷ఠ = ੸਷ ؘ੉ఠ۽ب ബ җ PyCon APAC 2016 79
  63. ੟࢚ 4 ؘ੉ఠ ূ૑פয݂੄ য۰਑ 4 ؘ੉ఠ੄ ഛࠁо о੢ ઺ਃ

    4 झನ౟ۄ੉౟ܳ ߉ח ࠙ঠח য়൤۰ ੹ݎ੉ যف਑ 4 ఑ োҳ੗о ইפۄݶ ҷڣ࢑স/౥࢜ ؘ੉ఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘ੉ఠ ࠙ࢳоо ೙ਃೠ द؀ 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑ೤ਸ పझ౟ ೡ ࣻ ੓׮ݶ? ! PyCon APAC 2016 80
  64. ՘ਵ۽... ੄ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ੉ হ૑݅, ੓ח Ѫ୊ۢ

    ࠁ੉ח ҃਋ 4 ؘ੉ఠী݅ ૘଱ೞ૑ ݈Ҋ, بݫੋਸ ੉೧ೞ੗! PyCon APAC 2016 81
  65. ଵҊ ݂௼ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf

    4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83