$30 off During Our Annual Pro Sale. View Details »

#JJUG - Java でカジュアルにはじめる機械学習

#JJUG - Java でカジュアルにはじめる機械学習

JJUG ナイト・セミナー「機械学習・自然言語処理特集!」12/17(水)での発表資料です。

セミナー概要はこちら:
http://jjug.doorkeeper.jp/events/18378

KOMIYA Atsushi

December 17, 2014
Tweet

More Decks by KOMIYA Atsushi

Other Decks in Programming

Transcript

  1. Java ͰΧδϡΞϧʹ͸͡ΊΔ
    ػցֶश
    2014-12-17 JJUG night seminar
    @komiya_atsushi

    View Slide

  2. ͓·ͩΕ

    View Slide

  3. ,0.*:""UTVTIJ
    !LPNJZB@BUTVTIJ

    View Slide

  4. View Slide

  5. View Slide

  6. "EUFDI
    +BWBFOHJOFFS

    View Slide

  7. ࠓ೔ͷτϐοΫ

    View Slide

  8. ػցֶशʂ ػցֶशʂ
    • ػցֶशͷಋೖతͳ࿩ͱɺJava ͰΧδϡΞϧʹػ
    ցֶश͢Δ࿩͕ϝΠϯͱͳΓ·͢
    • Ψνͷํ͸৸͍ͯͯͩͬͯ͘͞ߏ͍·ͤΜ
    • େن໛σʔλʹର͢Δػցֶशͷ͓࿩͸

    Ԑా͞Μʹ͓೚ͤ͠·͢ʂ
    • ࣗવݴޠॲཧʹ͍ͭͯ͸͡ΐʔͨʹ͞Μʹ

    ͓೚ͤ͠·͢ʂ

    View Slide

  9. ػցֶशΛ͸͡ΊΔલʹ
    ஌͓͖͍ͬͯͨ͜ͱ

    View Slide

  10. ػցֶशͱ͸ͳΜͧ΍ʁ
    ػցֶशνϡʔτϦΞϧˏ+VCBUVT$BTVBM5BMLT
    http://www.slideshare.net/unnonouno/jubatus-­‐casual-­‐talksΑΓҾ༻͠·ͨ͠ɻ

    View Slide

  11. ػցֶशͰԿ͕Ͱ͖Δͷ͔
    ෼ྨɾࣝผ
    ༧ଌɾճؼ
    ύλʔϯϚΠχϯά
    ΞιγΤʔγϣϯϧʔϧ
    ΫϥελϦϯά
    εύϜϝʔϧͷݕ஌
    χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ
    धཁɾച্༧ଌ
    ʜͳͲͳͲ

    View Slide

  12. ػցֶशͰԿ͕Ͱ͖Δͷ͔
    ύλʔϯϚΠχϯά
    ΞιγΤʔγϣϯϧʔϧ
    ΫϥελϦϯά
    εύϜϝʔϧͷݕ஌
    χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ
    धཁɾച্༧ଌ
    ڭࢣ͋Γֶश
    ɾਖ਼ղ͕͋Δ
    ɾʮϞσϧʯΛ࡞Δ
    ෼ྨɾࣝผ
    ༧ଌɾճؼ

    View Slide

  13. ػցֶशͰԿ͕Ͱ͖Δͷ͔
    ࣝผɾ෼ྨ
    ճؼɾ༧ଌ
    ύλʔϯϚΠχϯά
    ΞιγΤʔγϣϯϧʔϧ
    ΫϥελϦϯά
    εύϜϝʔϧͷݕ஌
    χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ
    धཁɾച্༧ଌ
    ڭࢣͳֶ͠श

    View Slide

  14. • ͲΜͳܗࣜͷσʔλͰ΋ OK ͱ͍͏Θ͚Ͱ͸ͳ͍
    • ݪଇͱͯ͠਺஋ྻʢϕΫτϧʣ͔͠ѻ͑ͳ͍
    • ඇߏ଄σʔλʢը૾ɺԻ੠ɺςΩετɺΞΫηεϩάɺetc.ʣ͸ͦͷ·
    ·Ͱ͸ѻ͑ͳ͍
    • ඇߏ଄σʔλ͔Β͞·͟·ͳʮಛ௃ྔʯΛநग़ͯ͠ʮಛ௃ϕΫτϧʯΛ
    ࡞Δ
    • ΧςΰϦΧϧม਺͸μϛʔม਺Ͱදݱ͢Δ
    • εέʔϦϯά
    • feature engineering
    • ڭࢣ͋Γֶशͷ܇࿅σʔλͷ৔߹͸ɺՃ͑ͯʮϥϕϧʯͳͲਖ਼ղ৘ใΛ
    ෇༩͢Δ
    ԿΛೖྗσʔλͱ͢Δͷ͔

    View Slide

  15. ಘΒΕͨ݁Ռ͸ਖ਼͍͠ͷ͔
    • ਖ਼͠͞Λ͔֬ΊΔ
    • k-෼ׂަࠩݕূ (k-fold cross validation)
    • ਖ਼͠͞ΛଌΔ
    • ෼ྨɾࣝผ
    • Precision, Recall, AUC, F-measure
    • ༧ଌɾճؼ
    • ૬ؔ܎਺ɺܾఆ܎਺ɺMAE, RMSE

    View Slide

  16. ઢܗ෼཭ɾඇઢܗ
    • Ұຊͷઢʢ௒ฏ໘ʣͰ఺ʢಛ௃ϕΫτϧʣΛ෼཭Ͱ͖Δ͔ʁ
    ઢܗ෼཭Մೳ ઢܗ෼཭ෆՄೳ ඇઢܗ

    View Slide

  17. ΦϯϥΠϯֶशɾΦϑϥΠϯֶश
    • ΦϯϥΠϯֶश
    • ஞ࣍ಘΒΕΔσʔλΛ΋ͱʹɺϞσϧΛਵ࣌ߋ৽͢Δ
    • ετϦʔϜॲཧతͳΠϝʔδ
    • ར༻ͨ͠σʔλ͸஝ੵ͢Δ͜ͱͳ͘ഁغͰ͖Δ
    • ʢઍ੾ͬͯ͸౤͛ɺઍ੾ͬͯ͸౤͛…ʣ
    • ΦϑϥΠϯֶश
    • ஝ੵ͞ΕͨσʔλΛ΋ͱʹɺϞσϧΛҰؾʹߋ৽͢Δ
    • όονॲཧʹ૬౰͢Δ

    View Slide

  18. Java Ͱػցֶश͢Δ

    View Slide

  19. ࠷ॳʹ͜Ε͚ͩ͸
    ݴ͓͖͍ͬͯͨ

    View Slide

  20. ंྠͷ࠶ൃ໌͸΍ΊΑ͏

    View Slide

  21. طଘͷϥΠϒϥϦ౳Λ
    ׆༻͠Α͏

    View Slide

  22. ػցֶशͷ࣮૷ɺਏΈ͔͠ͳ͍
    • ػցֶशΞϧΰϦζϜͷςετɺͱʹ͔͘ਏ͍
    • ʮςετॻ͔ͳ͍ͱ͔͓લ̋̋ͷલͰ΋ಉ͜͡ͱ
    ݴ͑Μͷʁʯ
    • ࣌ؒɾۭؒޮ཰ͷΑ͍࣮૷͸໘౗ɾ೉͍͠
    • the state of the art ͳΞϧΰϦζϜΛ࣮૷͢Δͷ΋ɺ
    େ෯ͳਫ਼౓޲্͕ݟࠐΊΔ৔߹ʹཹΊ͍ͨ
    • طଘϥΠϒϥϦ౳Λ࢖͏͚ͩͰ͸Ͳ͏ͯ͠΋ղܾͰ
    ͖ͳ͍৔߹ʹͷΈɺࣗલ࣮૷͢ΔΑ͏ʹ͍ͨ͠

    View Slide

  23. Java ʹΑΔػցֶश
    ޲͖ෆ޲͖

    View Slide

  24. ྫ͑͹͜ΜͳϫʔΫϑϩʔ
    1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ
    • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ
    2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ
    • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ
    3. ϞσϧΛ࡞੒͢Δ
    • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ
    • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ
    4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ
    • ਫ਼౓͸͍͔΄Ͳ͔ʁ
    5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ

    View Slide

  25. ྫ͑͹͜ΜͳϫʔΫϑϩʔ
    1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ
    • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ
    2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ
    • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ
    3. ϞσϧΛ࡞੒͢Δ
    • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ
    • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ
    4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ
    • ਫ਼౓͸͍͔΄Ͳ͔ʁ
    5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ
    ͜ͷ͋ͨΓͰ
    ػցֶशΛ
    ׆༻͢Δ

    View Slide

  26. ྫ͑͹͜ΜͳϫʔΫϑϩʔ
    1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ
    • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ
    2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ
    • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ
    3. ϞσϧΛ࡞੒͢Δ
    • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ
    • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ
    4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ
    • ਫ਼౓͸͍͔΄Ͳ͔ʁ
    5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ
    ͜ͷ͋ͨΓ͸
    ΞυϗοΫͳ
    ෼ੳ͕ඞཁ

    View Slide

  27. ྫ͑͹͜ΜͳϫʔΫϑϩʔ
    1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ
    • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ
    2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ
    • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ
    3. ϞσϧΛ࡞੒͢Δ
    • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ
    • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ
    4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ
    • ਫ਼౓͸͍͔΄Ͳ͔ʁ
    5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ
    +BWBʹ޲͍ͯ
    ͍Δͷ͸
    ͜ͷ͋ͨΓ

    View Slide

  28. దࡐదॴͰ͍͜͏
    • Java ͰػցֶशΛ༻͍ͨΞυϗοΫ෼ੳ͕Ͱ͖ͳ͍Θ͚Ͱ͸ͳ͍͕ɺR
    ͳͲΛ࢖͏ํ͕Ұൠతʢʁʣ
    • ෼ੳΛ͢ΔͨΊʹಛԽͨ͠؀ڥ͕޲͍͍ͯΔ
    • Java ͩͱ Weka ͱ͔ Spark ͷΠϯλϥΫςΟϒͳίϯιʔϧͱ͔
    • Java Ͱૉ௚ʹίʔυΛॻ͍ͯϏϧυͯ͠ɺ࠷ॳ͔Β࣮ߦ… Έ͍ͨͳ
    ͷ͸खֻ͕͔ؒΓա͗Δ
    • ҰํͰ R ͸ R ͰɺγεςϜԽʹ͸޲͔ͳ͍
    • HTTP ϦΫΤετΛड͚ͯɺαʔό಺ͰϦΞϧλΠϜͰػցֶशͨ͠
    ͍… Έ͍ͨͳγεςϜΛ࡞Γ͍ͨ৔߹ͳͲ

    View Slide

  29. Java ͔Β࢖͑Δ
    ػցֶशϥΠϒϥϦͳͲ

    View Slide

  30. liblinear-java
    • gradle ‘de.bwaldvogel:liblinear:1.95'
    • https://github.com/bwaldvogel/liblinear-java
    • ˒ 121
    • LibSVM Λઢܗ෼ྨɾճؼʹಛԽͨ͠΋ͷɺͷ Java ϙʔ
    ςΟϯά
    • ϥΠϒϥϦ
    • ΘΓͱؤுͬͯɺຊମ (C++ ൛) ͷ࠷৽όʔδϣϯʹ௥ै
    ͠Α͏ͱ͍ͯ͠Δ

    View Slide

  31. Weka
    • gradle ‘nz.ac.waikato.cms.weka:weka-stable:
    3.6.11'
    • ଟछଟ༷ͳػցֶशΞϧΰϦζϜ͕ఏڙ͞Ε
    ͍ͯΔ෼ੳϓϥοτϑΥʔϜ
    • ϥΠϒϥϦͱͯ͠΋ར༻͢Δ͜ͱ͕Ͱ͖Δ
    • ͱΓ͋͑ͣػցֶशΛ͸͡ΊͯΈΔͳΒɺ

    ·ͣ͸͜Ε͔Β

    View Slide

  32. MLlib (Spark)
    • gradle ‘org.apache.spark:spark-mllib_2.10:1.1.1'
    • https://github.com/apache/spark
    • ˒ 2,336
    • ෼ࢄॲཧϑϨʔϜϫʔΫ Spark ্Ͱͷར༻Λલఏͱͨ͠
    ϥΠϒϥϦ
    • ػೳ௥Ճɾվળ͕ࠓͩ੝Μ
    • ΞυϗοΫ෼ੳͷ؀ڥͱͯ͠΋ར༻Ͱ͖Δʢ͸ͣʣ
    • ৄ͍͠࿩͸͜ͷޙͷԐా͞ΜτʔΫʹظ଴ʂ

    View Slide

  33. Mahout
    • gradle ‘org.apache.mahout:mahout-core:0.9'
    • https://github.com/apache/mahout
    • ˒ 229
    • ෼ࢄॲཧϑϨʔϜϫʔΫ Hadoop ্ͷػցֶशϥ
    ΠϒϥϦ
    • Spark / MLlib ͕ग़͖͔ͯͯΒ͍ͩͿΦϫίϯײ͕
    ᕷΈग़͖ͯͨؾ͕…

    View Slide

  34. SAMOA
    • https://github.com/yahoo/samoa
    • ˒ 363
    • Storm ͳͲͷ෼ࢄετϦʔϛϯάϑϨʔϜ
    ϫʔΫ্Ͱར༻Ͱ͖ΔػցֶशϥΠϒϥϦ
    • Yahoo! Labs ۘ੡
    • ̎͜͜ʙ̏ϲ݄͸͋·Γ։ൃ׆ൃͰͳ͍ʁ

    View Slide

  35. Jubatus
    • https://github.com/jubatus/jubatus
    • ˒ 389
    • ෼ࢄॲཧϑϨʔϜϫʔΫˍΦϯϥΠϯػցֶशϥΠϒ
    ϥϦ
    • ຊମ͸ C++ ࣮૷͕ͩɺJava ͷΫϥΠΞϯτϥΠϒϥ
    Ϧ͕ఏڙ͞Ε͍ͯΔ
    • ϦΞϧλΠϜॲཧͳػցֶश͕ཁٻ͞ΕΔ৔߹ʹద͠
    ͍ͯΔʁ

    View Slide

  36. h2o
    • https://github.com/h2oai/h2o
    • ˒ 1,333
    • ෼ࢄॲཧϑϨʔϜϫʔΫ Hadoop ্Ͱར༻Ͱ
    ͖ΔػցֶशϥΠϒϥϦ
    • Կ͔ͱ࿩୊ͷ Deep learning Λ Java Ͱ͍ͨ͠
    ͳΒɺ͜Ε୒Ұʂʁ

    View Slide

  37. ͸͡ΊͯΈΑ͏ػցֶश

    View Slide

  38. σʔληοτ

    View Slide

  39. UCI Machine learning repository
    • https://archive.ics.uci.edu/ml/datasets.html
    • ͍͍ͩͨ CSV ϑΝΠϧͰఏڙ͞Ε͍ͯΔ
    • ୅දతͳσʔληοτ
    • Mushroom: Ωϊί
    • ৭ɾܗঢ়ɾେ͖͞ͱ৯༻ɾ༗ಟͷϥϕϧ / ೋ஋෼ྨ
    • Iris: ΞϠϝσʔλ
    • ͕͘΍Ֆหͷ෯ɾ௕͞ͱ඼छͷϥϕϧ / ଟ஋෼ྨ
    • Abalone: ΞϫϏ
    • େ͖͞΍ॏ͞ͳͲͱ೥ྸ / ೥ྸͷ༧ଌ

    View Slide

  40. Weka Ͱ෼ྨɾճؼͯ͠ΈΔ

    View Slide

  41. http://bit.ly/jjug-­‐ml

    View Slide

  42. Weka ͷೖྗܗࣜ
    @RELATION iris
    @ATTRIBUTE sepallength NUMERIC
    @ATTRIBUTE sepalwidth NUMERIC
    @ATTRIBUTE petallength NUMERIC
    @ATTRIBUTE petalwidth NUMERIC
    @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
    !
    @DATA
    5.1,3.5,1.4,0.2,Iris-setosa
    4.9,3.0,1.4,0.2,Iris-setosa
    4.7,3.2,1.3,0.2,Iris-setosa
    4.6,3.1,1.5,0.2,Iris-setosa
    5.4,3.9,1.7,0.4,Iris-setosa
    ……
    "3''ϑΝΠϧ

    View Slide

  43. Weka ͷೖྗܗࣜ
    @RELATION iris
    @ATTRIBUTE sepallength NUMERIC
    @ATTRIBUTE sepalwidth NUMERIC
    @ATTRIBUTE petallength NUMERIC
    @ATTRIBUTE petalwidth NUMERIC
    @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
    !
    @DATA
    5.1,3.5,1.4,0.2,Iris-setosa
    4.9,3.0,1.4,0.2,Iris-setosa
    4.7,3.2,1.3,0.2,Iris-setosa
    4.6,3.1,1.5,0.2,Iris-setosa
    5.4,3.9,1.7,0.4,Iris-setosa
    ……
    "3''ϑΝΠϧ
    $47ϑΝΠϧΛΘ͟Θ͟ม׵͢Δͷ΋໘౗ʜ

    View Slide

  44. ࠓճ͸
    weka.core.converters.CSVLoader
    (Λͪΐͬͱ͍ͬͨ͡΋ͷ)
    Λ࢖͍·͢

    View Slide

  45. ར༻͍ͯ͠ΔΫϥεͷղઆ
    • weka.classifiers.Evaluation
    • k-෼ׂަࠩݕূΛ࣮ࢪ͢Δ
    • weka.classifiers.functions.Logistic
    • ϩδεςΟοΫճؼʹΑΔ෼ྨ
    • weka.classifiers.trees.RandomForest
    • RandomForest (ܾఆ໦ͷྨ) ʹΑΔ෼ྨ
    • weka.classifiers.functions.LinearRegression
    • ઢܗճؼ

    View Slide

  46. ࣮ࡍʹσϞͯ͠Έ·͢

    View Slide

  47. ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ

    View Slide