Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
言語処理100本ノックをRubyでやったメモ
Search
himkt
August 06, 2016
11
2.4k
言語処理100本ノックをRubyでやったメモ
himkt
August 06, 2016
Tweet
Share
More Decks by himkt
See All by himkt
Linformer: paper reading
himkt
0
240
RoBERTa: paper reading
himkt
1
250
NLP SoTA 勉強会 / ner_2019
himkt
2
1.2k
自然言語処理 @ クックパッド / nlp at cookpad
himkt
1
450
Interpretable Machine Learning 6.3 - Prototypes and Criticisms
himkt
2
120
ニューラル固有表現抽出 / Neural Named Entity Recognition
himkt
3
560
ニューラル固有表現抽出器を実装してみる / PyNER
himkt
6
1.9k
Spacyでお手軽NLP / NLP with spacy
himkt
0
900
Deep Learning Book 10その2 / deep learning book 10 vol2
himkt
2
140
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
28
12k
Happy Clients
brianwarren
91
6.3k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
501
140k
Git: the NoSQL Database
bkeepers
PRO
421
63k
Clear Off the Table
cherdarchuk
82
310k
A designer walks into a library…
pauljervisheath
199
23k
The World Runs on Bad Software
bkeepers
PRO
60
6.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
657
120k
Writing Fast Ruby
sferik
619
59k
How to train your dragon (web standard)
notwaldorf
71
5.1k
Into the Great Unknown - MozCon
thekraken
10
830
Code Review Best Practice
trishagee
54
15k
Transcript
ݴޠॲཧ100ຊϊοΫΛRubyͰΔ ʢsciruby-jp issue #2ʣ
ࣗݾհͱͬͨ͜ͱ • B4 at ஜେֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿใநग़ʢ֬Ϟσϧʣ
• ୲ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
ݴޠॲཧ100ຊϊοΫ • ౦େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ఆ͞ΕΔݴޠPython • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ
RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰΖ͏ͱ͍ͯ͠Δਓ͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ ɹ • ఆݴޠɿPython
• RubyͰͰ͖ΔʁʢͰ͖ΔͩΖ͏ʣ -> ࣮ࡍʹղ͍ͯΈΔ ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6
ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔͷɿ୯ޠʢଟ͘ͷ߹ʣ • ग़ݱ͢Δ୯ޠͷͱͯଟ͍ʢສ - ेສʣ • ͯ͢ͷ୯ޠΛૉੑͱͯ͠͏ͱֶश͕͏·͍͔͘ͳ͍ •
ޮతͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦଘࡏ͠ͳ͍ • ࠓճ͓खʢhttps://github.com/himkt/rblearnʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8
ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ͏͜ͱ͕ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ͔ͳ͍ʁ*
• ࢥ͍ࠐΈ͔Εͳ͍ʢࠓճͷσʔλ10000 * 10000͘Β͍ʣ • NArrayͰ࣮ͨ͠ • ඞཁͳͷɿίετؔͱޯ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮Մʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ΫϩεόϦσʔγϣϯ • σʔληοτΛׂͯ͠ෳճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐΔ • Python: sklearn::cross_validation • ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •
Integer array indexing (masking ?) • NArrayʹ͋Δ NMatrixʹͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়ͰϥΠϒϥϦଆͰ࣮͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:
scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ओੳ
ओੳ
ओੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ
• ݻ༗ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗ɾݻ༗ϕΫτϧܭࢉະ࣮ -> อཹ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟʣ •
NArrayͰ࣮ • word2vecϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕྑ͍ • ࣮ࡍʹඞཁͳͷϕΫτϧಉ࢜ͷίαΠϯྨࣅͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆʣ • NArrayͷ΄͏͕͔ͬͨͷͰNArrayΛͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ
• NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮ͨ͠ʢNArrayͷ΄͏͕͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘Ͱ͖Δ
·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨNArray, NMatrix͕͋Εղ͚Δ • େنͳσʔλͷओੳͱ͔Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ
• աڈϩάΛݟͨʢࡢʣ • ༗Εخ͍͠ʢRubyࣗવݴޠॲཧʹ͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ͕౷Ұతʹ͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:
ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer