Slide 1

Slide 1 text

ݴޠॲཧ100ຊϊοΫΛRubyͰ΍Δ ʢsciruby-jp issue #2ʣ

Slide 2

Slide 2 text

ࣗݾ঺հͱ΍ͬͨ͜ͱ • B4 at ஜ೾େֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿ৘ใநग़ʢ֬཰Ϟσϧʣ • ୲౰ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock

Slide 3

Slide 3 text

ݴޠॲཧ100ຊϊοΫ • ౦๺େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ૝ఆ͞ΕΔݴޠ͸Python • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ໰୊ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ

Slide 4

Slide 4 text

RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰ΍Ζ͏ͱ͍ͯ͠Δਓ͸͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ
 ɹ • ૝ఆݴޠɿPython • RubyͰ΋Ͱ͖ΔʁʢͰ͖ΔͩΖ͏ʣ
 -> ࣮ࡍʹղ͍ͯΈΔ
 ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ

Slide 5

Slide 5 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE

Slide 6

Slide 6 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6

Slide 7

Slide 7 text

ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔ΋ͷɿ୯ޠʢଟ͘ͷ৔߹ʣ • ग़ݱ͢Δ୯ޠͷ਺͸ͱͯ΋ଟ͍ʢ਺ສ - ਺ेສʣ • ͢΂ͯͷ୯ޠΛૉੑͱͯ͠࢖͏ͱֶश͕͏·͍͔͘ͳ͍ • ޮ཰తͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦ͸ଘࡏ͠ͳ͍ • ࠓճ͸͓ख੡ʢhttps://github.com/himkt/rblearnʣ

Slide 8

Slide 8 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8

Slide 9

Slide 9 text

ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ࢖͏͜ͱ͕૝ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ޲͔ͳ͍ʁ* • ࢥ͍ࠐΈ͔΋஌Εͳ͍ʢࠓճͷσʔλ͸10000 * 10000͘Β͍ʣ • NArrayͰ࣮૷ͨ͠ • ඞཁͳ΋ͷɿίετؔ਺ͱޯ഑ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮૷Մʣ

Slide 10

Slide 10 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE

Slide 11

Slide 11 text

ΫϩεόϦσʔγϣϯ • σʔληοτΛ෼ׂͯ͠ෳ਺ճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐ΂Δ • Python: sklearn::cross_validation • ഑ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ • Integer array indexing (masking ?) • NArrayʹ͸͋Δ NMatrixʹ͸ͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html

Slide 12

Slide 12 text

ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়Ͱ͸ϥΠϒϥϦଆͰ࣮૷͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python: scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ͸܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕෼཭

Slide 13

Slide 13 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE

Slide 14

Slide 14 text

ओ੒෼෼ੳ

Slide 15

Slide 15 text

ओ੒෼෼ੳ

Slide 16

Slide 16 text

ओ੒෼෼ੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ • ݻ༗஋ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ͸·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗஋ɾݻ༗ϕΫτϧܭࢉ͸ະ࣮૷ -> อཹ

Slide 17

Slide 17 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE

Slide 18

Slide 18 text

word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟ෼ʣ • NArrayͰ࣮૷ • word2vec͸ϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕ͹ྑ͍ • ࣮ࡍʹඞཁͳͷ͸ϕΫτϧಉ࢜ͷίαΠϯྨࣅ౓ͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆ෼ʣ • NArrayͷ΄͏͕଎͔ͬͨͷͰNArrayΛ࢖ͬͨ

Slide 19

Slide 19 text

ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओ੒෼෼ੳ • 90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE

Slide 20

Slide 20 text

k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ • NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮૷ͨ͠ʢNArrayͷ΄͏͕଎͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘૷Ͱ͖Δ

Slide 21

Slide 21 text

·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨ͸NArray, NMatrix͕͋Ε͹ղ͚Δ • େن໛ͳσʔλͷओ੒෼෼ੳͱ͔͸Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ • աڈϩάΛݟͨʢࡢ೔ʣ • ༗Ε͹خ͍͠ʢRuby͸ࣗવݴޠॲཧʹ޲͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ
 ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ଄͕౷Ұతʹ࢖͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔

Slide 22

Slide 22 text

΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix: ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer