Slide 1

Slide 1 text

Java ͰΧδϡΞϧʹ͸͡ΊΔ ػցֶश 2014-12-17 JJUG night seminar @komiya_atsushi

Slide 2

Slide 2 text

͓·ͩΕ

Slide 3

Slide 3 text

,0.*:""UTVTIJ !LPNJZB@BUTVTIJ

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

"EUFDI +BWBFOHJOFFS

Slide 7

Slide 7 text

ࠓ೔ͷτϐοΫ

Slide 8

Slide 8 text

ػցֶशʂ ػցֶशʂ • ػցֶशͷಋೖతͳ࿩ͱɺJava ͰΧδϡΞϧʹػ ցֶश͢Δ࿩͕ϝΠϯͱͳΓ·͢ • Ψνͷํ͸৸͍ͯͯͩͬͯ͘͞ߏ͍·ͤΜ • େن໛σʔλʹର͢Δػցֶशͷ͓࿩͸
 Ԑా͞Μʹ͓೚ͤ͠·͢ʂ • ࣗવݴޠॲཧʹ͍ͭͯ͸͡ΐʔͨʹ͞Μʹ
 ͓೚ͤ͠·͢ʂ

Slide 9

Slide 9 text

ػցֶशΛ͸͡ΊΔલʹ ஌͓͖͍ͬͯͨ͜ͱ

Slide 10

Slide 10 text

ػցֶशͱ͸ͳΜͧ΍ʁ ػցֶशνϡʔτϦΞϧˏ+VCBUVT$BTVBM5BMLT http://www.slideshare.net/unnonouno/jubatus-­‐casual-­‐talksΑΓҾ༻͠·ͨ͠ɻ

Slide 11

Slide 11 text

ػցֶशͰԿ͕Ͱ͖Δͷ͔ ෼ྨɾࣝผ ༧ଌɾճؼ ύλʔϯϚΠχϯά ΞιγΤʔγϣϯϧʔϧ ΫϥελϦϯά εύϜϝʔϧͷݕ஌ χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ धཁɾച্༧ଌ ʜͳͲͳͲ

Slide 12

Slide 12 text

ػցֶशͰԿ͕Ͱ͖Δͷ͔ ύλʔϯϚΠχϯά ΞιγΤʔγϣϯϧʔϧ ΫϥελϦϯά εύϜϝʔϧͷݕ஌ χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ धཁɾച্༧ଌ ڭࢣ͋Γֶश ɾਖ਼ղ͕͋Δ ɾʮϞσϧʯΛ࡞Δ ෼ྨɾࣝผ ༧ଌɾճؼ

Slide 13

Slide 13 text

ػցֶशͰԿ͕Ͱ͖Δͷ͔ ࣝผɾ෼ྨ ճؼɾ༧ଌ ύλʔϯϚΠχϯά ΞιγΤʔγϣϯϧʔϧ ΫϥελϦϯά εύϜϝʔϧͷݕ஌ χϡʔεهࣄͷΧςΰϦ෼ྨ ঎඼Ϩίϝϯσʔγϣϯ धཁɾച্༧ଌ ڭࢣͳֶ͠श

Slide 14

Slide 14 text

• ͲΜͳܗࣜͷσʔλͰ΋ OK ͱ͍͏Θ͚Ͱ͸ͳ͍ • ݪଇͱͯ͠਺஋ྻʢϕΫτϧʣ͔͠ѻ͑ͳ͍ • ඇߏ଄σʔλʢը૾ɺԻ੠ɺςΩετɺΞΫηεϩάɺetc.ʣ͸ͦͷ· ·Ͱ͸ѻ͑ͳ͍ • ඇߏ଄σʔλ͔Β͞·͟·ͳʮಛ௃ྔʯΛநग़ͯ͠ʮಛ௃ϕΫτϧʯΛ ࡞Δ • ΧςΰϦΧϧม਺͸μϛʔม਺Ͱදݱ͢Δ • εέʔϦϯά • feature engineering • ڭࢣ͋Γֶशͷ܇࿅σʔλͷ৔߹͸ɺՃ͑ͯʮϥϕϧʯͳͲਖ਼ղ৘ใΛ ෇༩͢Δ ԿΛೖྗσʔλͱ͢Δͷ͔

Slide 15

Slide 15 text

ಘΒΕͨ݁Ռ͸ਖ਼͍͠ͷ͔ • ਖ਼͠͞Λ͔֬ΊΔ • k-෼ׂަࠩݕূ (k-fold cross validation) • ਖ਼͠͞ΛଌΔ • ෼ྨɾࣝผ • Precision, Recall, AUC, F-measure • ༧ଌɾճؼ • ૬ؔ܎਺ɺܾఆ܎਺ɺMAE, RMSE

Slide 16

Slide 16 text

ઢܗ෼཭ɾඇઢܗ • Ұຊͷઢʢ௒ฏ໘ʣͰ఺ʢಛ௃ϕΫτϧʣΛ෼཭Ͱ͖Δ͔ʁ ઢܗ෼཭Մೳ ઢܗ෼཭ෆՄೳ ඇઢܗ

Slide 17

Slide 17 text

ΦϯϥΠϯֶशɾΦϑϥΠϯֶश • ΦϯϥΠϯֶश • ஞ࣍ಘΒΕΔσʔλΛ΋ͱʹɺϞσϧΛਵ࣌ߋ৽͢Δ • ετϦʔϜॲཧతͳΠϝʔδ • ར༻ͨ͠σʔλ͸஝ੵ͢Δ͜ͱͳ͘ഁغͰ͖Δ • ʢઍ੾ͬͯ͸౤͛ɺઍ੾ͬͯ͸౤͛…ʣ • ΦϑϥΠϯֶश • ஝ੵ͞ΕͨσʔλΛ΋ͱʹɺϞσϧΛҰؾʹߋ৽͢Δ • όονॲཧʹ૬౰͢Δ

Slide 18

Slide 18 text

Java Ͱػցֶश͢Δ

Slide 19

Slide 19 text

࠷ॳʹ͜Ε͚ͩ͸ ݴ͓͖͍ͬͯͨ

Slide 20

Slide 20 text

ंྠͷ࠶ൃ໌͸΍ΊΑ͏

Slide 21

Slide 21 text

طଘͷϥΠϒϥϦ౳Λ ׆༻͠Α͏

Slide 22

Slide 22 text

ػցֶशͷ࣮૷ɺਏΈ͔͠ͳ͍ • ػցֶशΞϧΰϦζϜͷςετɺͱʹ͔͘ਏ͍ • ʮςετॻ͔ͳ͍ͱ͔͓લ̋̋ͷલͰ΋ಉ͜͡ͱ ݴ͑Μͷʁʯ • ࣌ؒɾۭؒޮ཰ͷΑ͍࣮૷͸໘౗ɾ೉͍͠ • the state of the art ͳΞϧΰϦζϜΛ࣮૷͢Δͷ΋ɺ େ෯ͳਫ਼౓޲্͕ݟࠐΊΔ৔߹ʹཹΊ͍ͨ • طଘϥΠϒϥϦ౳Λ࢖͏͚ͩͰ͸Ͳ͏ͯ͠΋ղܾͰ ͖ͳ͍৔߹ʹͷΈɺࣗલ࣮૷͢ΔΑ͏ʹ͍ͨ͠

Slide 23

Slide 23 text

Java ʹΑΔػցֶश ޲͖ෆ޲͖

Slide 24

Slide 24 text

ྫ͑͹͜ΜͳϫʔΫϑϩʔ 1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ 2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ 3. ϞσϧΛ࡞੒͢Δ • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ 4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ • ਫ਼౓͸͍͔΄Ͳ͔ʁ 5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ

Slide 25

Slide 25 text

ྫ͑͹͜ΜͳϫʔΫϑϩʔ 1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ 2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ 3. ϞσϧΛ࡞੒͢Δ • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ 4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ • ਫ਼౓͸͍͔΄Ͳ͔ʁ 5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ ͜ͷ͋ͨΓͰ ػցֶशΛ ׆༻͢Δ

Slide 26

Slide 26 text

ྫ͑͹͜ΜͳϫʔΫϑϩʔ 1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ 2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ 3. ϞσϧΛ࡞੒͢Δ • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ 4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ • ਫ਼౓͸͍͔΄Ͳ͔ʁ 5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ ͜ͷ͋ͨΓ͸ ΞυϗοΫͳ ෼ੳ͕ඞཁ

Slide 27

Slide 27 text

ྫ͑͹͜ΜͳϫʔΫϑϩʔ 1. ର৅ͱ͢Δ໰୊Λೝࣝ͢Δ • ͲͷΑ͏ͳλεΫ͕߹͏ͷ͔ʁ 2. อ༗͍ͯ͠Δσʔλʹ͍ͭͯཧղΛਂΊΔ • ͲͷΑ͏ͳಛ௃ྔ͕நग़Ͱ͖Δͷ͔ʁ 3. ϞσϧΛ࡞੒͢Δ • ͲͷΞϧΰϦζϜΛར༻͢΂͖͔ʁ • Ͳͷಛ௃ྔΛར༻͢΂͖͔ʁ 4. ࡞੒ͨ͠ϞσϧΛධՁ͢Δ • ਫ਼౓͸͍͔΄Ͳ͔ʁ 5. γεςϜʹ૊ΈࠐΉɾγεςϜԽ͢Δ +BWBʹ޲͍ͯ ͍Δͷ͸ ͜ͷ͋ͨΓ

Slide 28

Slide 28 text

దࡐదॴͰ͍͜͏ • Java ͰػցֶशΛ༻͍ͨΞυϗοΫ෼ੳ͕Ͱ͖ͳ͍Θ͚Ͱ͸ͳ͍͕ɺR ͳͲΛ࢖͏ํ͕Ұൠతʢʁʣ • ෼ੳΛ͢ΔͨΊʹಛԽͨ͠؀ڥ͕޲͍͍ͯΔ • Java ͩͱ Weka ͱ͔ Spark ͷΠϯλϥΫςΟϒͳίϯιʔϧͱ͔ • Java Ͱૉ௚ʹίʔυΛॻ͍ͯϏϧυͯ͠ɺ࠷ॳ͔Β࣮ߦ… Έ͍ͨͳ ͷ͸खֻ͕͔ؒΓա͗Δ • ҰํͰ R ͸ R ͰɺγεςϜԽʹ͸޲͔ͳ͍ • HTTP ϦΫΤετΛड͚ͯɺαʔό಺ͰϦΞϧλΠϜͰػցֶशͨ͠ ͍… Έ͍ͨͳγεςϜΛ࡞Γ͍ͨ৔߹ͳͲ

Slide 29

Slide 29 text

Java ͔Β࢖͑Δ ػցֶशϥΠϒϥϦͳͲ

Slide 30

Slide 30 text

liblinear-java • gradle ‘de.bwaldvogel:liblinear:1.95' • https://github.com/bwaldvogel/liblinear-java • ˒ 121 • LibSVM Λઢܗ෼ྨɾճؼʹಛԽͨ͠΋ͷɺͷ Java ϙʔ ςΟϯά • ϥΠϒϥϦ • ΘΓͱؤுͬͯɺຊମ (C++ ൛) ͷ࠷৽όʔδϣϯʹ௥ै ͠Α͏ͱ͍ͯ͠Δ

Slide 31

Slide 31 text

Weka • gradle ‘nz.ac.waikato.cms.weka:weka-stable: 3.6.11' • ଟछଟ༷ͳػցֶशΞϧΰϦζϜ͕ఏڙ͞Ε ͍ͯΔ෼ੳϓϥοτϑΥʔϜ • ϥΠϒϥϦͱͯ͠΋ར༻͢Δ͜ͱ͕Ͱ͖Δ • ͱΓ͋͑ͣػցֶशΛ͸͡ΊͯΈΔͳΒɺ
 ·ͣ͸͜Ε͔Β

Slide 32

Slide 32 text

MLlib (Spark) • gradle ‘org.apache.spark:spark-mllib_2.10:1.1.1' • https://github.com/apache/spark • ˒ 2,336 • ෼ࢄॲཧϑϨʔϜϫʔΫ Spark ্Ͱͷར༻Λલఏͱͨ͠ ϥΠϒϥϦ • ػೳ௥Ճɾվળ͕ࠓͩ੝Μ • ΞυϗοΫ෼ੳͷ؀ڥͱͯ͠΋ར༻Ͱ͖Δʢ͸ͣʣ • ৄ͍͠࿩͸͜ͷޙͷԐా͞ΜτʔΫʹظ଴ʂ

Slide 33

Slide 33 text

Mahout • gradle ‘org.apache.mahout:mahout-core:0.9' • https://github.com/apache/mahout • ˒ 229 • ෼ࢄॲཧϑϨʔϜϫʔΫ Hadoop ্ͷػցֶशϥ ΠϒϥϦ • Spark / MLlib ͕ग़͖͔ͯͯΒ͍ͩͿΦϫίϯײ͕ ᕷΈग़͖ͯͨؾ͕…

Slide 34

Slide 34 text

SAMOA • https://github.com/yahoo/samoa • ˒ 363 • Storm ͳͲͷ෼ࢄετϦʔϛϯάϑϨʔϜ ϫʔΫ্Ͱར༻Ͱ͖ΔػցֶशϥΠϒϥϦ • Yahoo! Labs ۘ੡ • ̎͜͜ʙ̏ϲ݄͸͋·Γ։ൃ׆ൃͰͳ͍ʁ

Slide 35

Slide 35 text

Jubatus • https://github.com/jubatus/jubatus • ˒ 389 • ෼ࢄॲཧϑϨʔϜϫʔΫˍΦϯϥΠϯػցֶशϥΠϒ ϥϦ • ຊମ͸ C++ ࣮૷͕ͩɺJava ͷΫϥΠΞϯτϥΠϒϥ Ϧ͕ఏڙ͞Ε͍ͯΔ • ϦΞϧλΠϜॲཧͳػցֶश͕ཁٻ͞ΕΔ৔߹ʹద͠ ͍ͯΔʁ

Slide 36

Slide 36 text

h2o • https://github.com/h2oai/h2o • ˒ 1,333 • ෼ࢄॲཧϑϨʔϜϫʔΫ Hadoop ্Ͱར༻Ͱ ͖ΔػցֶशϥΠϒϥϦ • Կ͔ͱ࿩୊ͷ Deep learning Λ Java Ͱ͍ͨ͠ ͳΒɺ͜Ε୒Ұʂʁ

Slide 37

Slide 37 text

͸͡ΊͯΈΑ͏ػցֶश

Slide 38

Slide 38 text

σʔληοτ

Slide 39

Slide 39 text

UCI Machine learning repository • https://archive.ics.uci.edu/ml/datasets.html • ͍͍ͩͨ CSV ϑΝΠϧͰఏڙ͞Ε͍ͯΔ • ୅දతͳσʔληοτ • Mushroom: Ωϊί • ৭ɾܗঢ়ɾେ͖͞ͱ৯༻ɾ༗ಟͷϥϕϧ / ೋ஋෼ྨ • Iris: ΞϠϝσʔλ • ͕͘΍Ֆหͷ෯ɾ௕͞ͱ඼छͷϥϕϧ / ଟ஋෼ྨ • Abalone: ΞϫϏ • େ͖͞΍ॏ͞ͳͲͱ೥ྸ / ೥ྸͷ༧ଌ

Slide 40

Slide 40 text

Weka Ͱ෼ྨɾճؼͯ͠ΈΔ

Slide 41

Slide 41 text

http://bit.ly/jjug-­‐ml

Slide 42

Slide 42 text

Weka ͷೖྗܗࣜ @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} ! @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa …… "3''ϑΝΠϧ

Slide 43

Slide 43 text

Weka ͷೖྗܗࣜ @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} ! @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa …… "3''ϑΝΠϧ $47ϑΝΠϧΛΘ͟Θ͟ม׵͢Δͷ΋໘౗ʜ

Slide 44

Slide 44 text

ࠓճ͸ weka.core.converters.CSVLoader (Λͪΐͬͱ͍ͬͨ͡΋ͷ) Λ࢖͍·͢

Slide 45

Slide 45 text

ར༻͍ͯ͠ΔΫϥεͷղઆ • weka.classifiers.Evaluation • k-෼ׂަࠩݕূΛ࣮ࢪ͢Δ • weka.classifiers.functions.Logistic • ϩδεςΟοΫճؼʹΑΔ෼ྨ • weka.classifiers.trees.RandomForest • RandomForest (ܾఆ໦ͷྨ) ʹΑΔ෼ྨ • weka.classifiers.functions.LinearRegression • ઢܗճؼ

Slide 46

Slide 46 text

࣮ࡍʹσϞͯ͠Έ·͢

Slide 47

Slide 47 text

͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ