Slide 1

Slide 1 text

commons-math3 Ͱ͸͡ΊΔ ΏΔ;Θ౷ܭˍػցֶश DBFlute fes 2014 2014.11.22 at BizReach @komiya_atsushi

Slide 2

Slide 2 text

࠷ॳʹݴ͓ͬͯ͘

Slide 3

Slide 3 text

๻ͷ࿩͸ DB ʹԿΒؔ܎ͳ͍ ࿩Ͱ͢ͷͰɺDB ͷ࿩Λฉ͖͍ͨ ํ͸ͥͻ B ձ৔ʹ޲͔ΘΕΔ͜ͱ Λ͓͢͢Ί͠·͢

Slide 4

Slide 4 text

͓·ͩΕ

Slide 5

Slide 5 text

,0.*:""UTVTIJ !LPNJZB@BUTVTIJ

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ʮੈքதͷྑ࣭ͳ৘ใΛඞཁͳਓʹૹΓಧ͚Δʯ ͨΊʹɺौ୩ɾࡩٰொͰ ೔ʑδϟόδϟό͍ͯ͠·͢

Slide 8

Slide 8 text

ຊ୊ʹೖΔલʹͻͱ͜ͱ (ຊ೔ొஃ͢ΔͨΊͷϛογϣϯ)

Slide 9

Slide 9 text

DBFlute ͷσʔλϕʔε αϙʔτঢ়گ

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Redshift ͱ͔ Hive ͱ͔ Presto ͱ͔΋ɺ αϙʔτͯ͘͠ΕΔͱ ͏Ε͍͠ͳ (*´Ő`*)

Slide 14

Slide 14 text

ࠓ೔ͷ࿩

Slide 15

Slide 15 text

Java Ͱ౷ܭ Java Ͱػցֶश

Slide 16

Slide 16 text

ʮ౷ܭͩͬͨΒɺExcel ࢖͑͹͍͍͡ΌΜʯ ʮͪΐͬͱؤுΕ͹ɺR ͰػցֶशͰ͖ΔΑʁʯ

Slide 17

Slide 17 text

͓ͬ͠ΌΔ͜ͱ͸͝΋ͬͱ΋

Slide 18

Slide 18 text

͋͑ͯ Java Ͱ΍Δඞཁ͸ͳ͍Α͏ ͳؾ΋͢ΔΜ͚ͩͲɺSIer ͷओྗݴ ޠ͍ͬͯͬͨΒδϟόͩ͠ɺδϟό ͱ͍ͬͨΒόονԽ͠΍͍͢͠… ˞஫ɿݸਓͷओ؍ʹΑΔ΋ͷͰ͢

Slide 19

Slide 19 text

ͦ͏͍͏Θ͚ͰɺΏΔ;Θʹʂ

Slide 20

Slide 20 text

ΈΜͳ͍͖ͩ͢ Apache commons ͷ commons-math3 Λ ࢖ͬͯΈΑ͏ʂ

Slide 21

Slide 21 text

ύοέʔδ֓આ

Slide 22

Slide 22 text

• org.apache.commons.math3 • distribution … ֬཰෼෍ • linear … ߦྻԋࢉ • ml … ػցֶश • cluster … ΫϥελϦϯά • neuralnet … χϡʔϥϧωοτϫʔΫ • optim … ʢઢܗܭը๏ͳͲͷʣ࠷దԽ • stat • correlation … ૬ؔ܎਺ • descriptive … هड़౷ܭ • inference … Ծઆݕఆ • regression … ճؼ ˞શύοέʔδͰ͸ͳ͘ɺҰ෦Λྻڍ

Slide 23

Slide 23 text

distribution • ֬཰෼෍ • ͱʹ͔͘๛෋Ͱɺϝδϟʔͳ΋ͷ͸͢΂ͯ཈͑ͯ͋Δײ͡ • ਖ਼ن෼෍ • t ෼෍ • ϙΞιϯ෼෍ • ΧΠೋ৐෼෍ • ϕʔλ෼෍ • ϫΠϒϧ෼෍ • …ͳͲͳͲ

Slide 24

Slide 24 text

ml.clustering • ڭࢣͳ͠ػցֶशͷʮΫϥελϦϯάʯ • ఏڙ͞Ε͍ͯΔͷ͸ඇ֊૚ܕΫϥελϦϯάͷ ΞϧΰϦζϜͷΈ • ͋·Γॆ࣮͍ͯ͠ͳ͍ • Fuzzy K-means • K-means++ • DBSCAN

Slide 25

Slide 25 text

stat.correlation • Ұൠతͳ૬ؔ܎਺ͷ࣮૷͕ఏڙ͞Ε͍ͯΔ • ʢϐΞιϯͷʣ૬ؔ܎਺ • εϐΞϚϯͷॱҐ૬ؔ܎਺ • έϯυʔϧͷॱҐ૬ؔ܎਺

Slide 26

Slide 26 text

stat.descriptive • ͍ΘΏΔʮهड़౷ܭʯͱݺ͹Ε͍ͯΔ΋ͷ • ֤छ౷ܭྔͷࢉग़Λ͢Δ͜ͱ͕Ͱ͖Δ • Ϟʔϝϯτ • ฏۉɺ෼ࢄɺඪ४ภࠩɺ࿪౓ɺઑ౓ • ॱং • ࠷େ஋ɺ࠷খ஋ɺதԝ஋ɺύʔηϯλΠϧ

Slide 27

Slide 27 text

stat.inference • ʮਪଌ౷ܭʯͱݺ͹Ε͍ͯΔ΋ͷ • ओʹԾઆݕఆͷ࣮૷͕ఏڙ͞Ε͍ͯΔ • ೋ߲ݕఆ • ΧΠೋ৐ݕఆ • t ݕఆ • …ͳͲͳͲ

Slide 28

Slide 28 text

stat.regression • આ໌ม਺͔Β݁ՌΛ༧ଌ͢Δʮճؼʯ • ୯ճؼ / ॏճؼͷ྆ํΛఏڙ͍ͯ͠Δ • ࠷খೋ৐ճؼʢOrdinary least squares regressionʣ • ҰൠԽઢܗճؼʢGeneralized least squares regressionʣ

Slide 29

Slide 29 text

࣮ࡍʹࢼͯ͠ΈΑ͏

Slide 30

Slide 30 text

bit.ly/dbflute-fes-2014-komiya

Slide 31

Slide 31 text

هड़౷ܭ

Slide 32

Slide 32 text

ʮ೔ຊͷ໺ٿબखͷ೥เ͕Ͳ͏ͳ͍ͬͯΔͷ͔ɺ ฏۉ஋͚ͩͰ͸ͳͯͬ͘͘͟Γͱ஌Γ͍ͨʯ • DescriptiveStatistics ΫϥεΛ࢖͍·͢ • DescriptiveStatistics#addValue()  Ͱ
 ਺஋ʢ೥เʣΛ௥Ճ͍͖ͯ͠·͢ • ԼهϝιουͰ౷ܭྔΛࢉग़Ͱ͖·͢ • ฏۉ #getMean() • ඪ४ภࠩ #getStandardDeviation() • ύʔηϯλΠϧ #getPercentile(p)

Slide 33

Slide 33 text

Ծઆݕఆ

Slide 34

Slide 34 text

ʮ༑ୡʹ΋ΒͬͨαΠίϩ͕͋ΔΜ͚ͩ Ͳɺ͜ΕͬͯϑΣΞͳαΠίϩͳͷ͔ͳʁʯ • ʢ࿡໘ମʣαΠίϩͷ֤໨ͷग़ݱ֬཰͕౳͍͔͠Λ֬ೝ͢Δ ͨΊʹɺΧΠೋ৐ݕఆΛ࢖͍·͢ • ChiSquareTest ΫϥεΛར༻͠·͢ • ֤ࣄ৅ͷظ଴͞ΕΔ֬཰ʢ1/6ʣͱ؍ଌ݁Ռͷ౓਺ʢ࣮ࡍʹ αΠίϩΛৼͬͯΈͯɺͦΕͧΕͷ໨ͷग़ͨճ਺ʣɺ༗ҙਫ ४Λࢦఆͯ͠ ChiSquareTest#chiSquareTest() Λݺͼग़ ͢͜ͱͰɺݕఆ݁ՌʢؼແԾઆͷڐ༰ʗ٫Լʣ͕ಘΒΕ·͢ • ؼແԾઆ ʹ αΠίϩͷ֤໨ͷग़ݱ֬཰͸౳͍͠

Slide 35

Slide 35 text

૬ؔ܎਺

Slide 36

Slide 36 text

ʮ͓ʔ͍ɺ̋̋܅ɺ݄ผͷฏۉؾԹͱϏʔϧͷ ച্͸૬͍ؔͯ͠Δͷ͔ɺௐ΂ͯ͘Εͳ͍͔ Ͷʁʯ • ૬ؔͷ༗ແΛ֬ೝ͢ΔͨΊʹɺ૬ؔ܎਺Λ
 ࢉग़͢Δඞཁ͕͋Γ·͢ • PearsonsCorrelation ΫϥεΛ࢖͍·͢ • ࣄલʹɺ૬ؔ܎਺ΛٻΊ͍ͨ 2 ͭͷ਺஋ྻΛ
 ͦΕͧΕ double ഑ྻʹ͓͖ͯ͠·͢ • PearsonsCorrelation#correlation()  Ͱ
 ਺஋ྻͷ૬ؔ܎਺Λࢉग़͢Δ͜ͱ͕Ͱ͖·͢

Slide 37

Slide 37 text

ճؼ

Slide 38

Slide 38 text

ʮ͓ʔ͍ɺ̋̋܅ɺͳΜͱͳ͘૬͍ؔͯ͠Δͷ ͸෼͔͔ͬͨΒ͞ɺࠓ౓͸݄ผͷฏۉؾԹ͔Β Ϗʔϧͷച্Λ༧ଌͯ͠Έͯ͘Εͳ͍͔Ͷʁʯ • ʮ݄ผͷฏۉؾԹʯͱ͍͏Ұͭͷઆ໌ม਺͔Β໨ తม਺Ͱ͋ΔʮϏʔϧͷച্ʯΛ༧ଌ͢ΔͷͰɺ ୯ճؼͱͳΓ·͢ • SimpleRegression ΫϥεΛ࢖͍·͢ • SimpleRegression#addData(x,  y) Ͱɺ
 આ໌ม਺ x ͱ໨తม਺ y Λ௥Ճ͍͖ͯ͠·͢ • SimpleRegression#predict(x)  Ͱ༧ଌ஋ΛಘΔ ͜ͱ͕Ͱ͖·͢

Slide 39

Slide 39 text

ΫϥελϦϯά

Slide 40

Slide 40 text

ʮԶ͞ɺΞϠϝͷ᣾΍Ֆหͷ෯ɾ௕͞Λଌఆ͢Δͷ͕झຯ Ͱଌఆ݁Ռͷσʔλ͕खݩʹ͋ΔΜ͚ͩͲɺ͜ΕΛͦͷಛ ௃͔ΒͳΜͱͳ̏ͭ͘ʹάϧʔϐϯά͍ͨ͠ΜͩΑͶ…ʯ • ΫϥελϦϯάΛ͢ΔͨΊʹɺଌఆσʔλΛ֨ೲ͢ΔΫϥεΛ࡞ Δඞཁ͕͋Γ·͢ • Clusterable ΠϯλϑΣʔεΛ࣮૷͢Δඞཁ͕͋Γ·͢ • ଌఆσʔλΛσʔλΫϥεͷΦϒδΣΫτʹ͠·͢ • KMeansPlusPlusClusterer ΫϥεΛར༻͠·͢ • KMeansPlusPlusClusterer#cluster()  ͰɺΫϥελϦϯά݁ ՌΛಘΔ͜ͱ͕Ͱ͖·͢ • ҰͭͷΫϥελͷ৘ใʹ૬౰͢Δ CentroidCluster ΦϒδΣ ΫτͷϦετ͕ฦ٫͞Ε·͢

Slide 41

Slide 41 text

·ͱΊ

Slide 42

Slide 42 text

commons-math3 ࢖͑͹… • ౷ܭॲཧ͸͹ͬͪΓͩͶʂ • ͪΐͬͱͨ͠ػցֶशλεΫ΋Ͱ͖ͪΌ͏Α • ΫϥελϦϯά͙Β͍͔͠ͳ͍͚Ͳ… • ࢖͍ํ΋ͦΜͳʹ೉͘͠͸ͳ͍ʂ • ͲͷΑ͏ͳೖྗσʔλΛ༩͑Ε͹Α͍͔Λҙࣝ ͢Δ

Slide 43

Slide 43 text

Ͱ΋ɺ΋͏ͪΐͬͱػցֶश͍ͨ͠… • େن໛σʔλΛѻ͍͍ͨͳΒ • ͍·΍ΔͳΒ Apache Spark ͓͢͢Ί • Hive ؀ڥ͕͋ΔͳΒ hivemall Ͱ΋͍͍͔΋ • Mahout ͸…… • খن໛σʔλͷ৔߹͸ • liblinear • Weka

Slide 44

Slide 44 text

͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ