Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
言語処理100本ノックをRubyでやったメモ
Search
himkt
August 06, 2016
11
2.6k
言語処理100本ノックをRubyでやったメモ
himkt
August 06, 2016
Tweet
Share
More Decks by himkt
See All by himkt
Linformer: paper reading
himkt
0
520
RoBERTa: paper reading
himkt
1
350
NLP SoTA 勉強会 / ner_2019
himkt
2
1.4k
自然言語処理 @ クックパッド / nlp at cookpad
himkt
1
520
Interpretable Machine Learning 6.3 - Prototypes and Criticisms
himkt
2
160
ニューラル固有表現抽出 / Neural Named Entity Recognition
himkt
3
740
ニューラル固有表現抽出器を実装してみる / PyNER
himkt
6
2.1k
Spacyでお手軽NLP / NLP with spacy
himkt
0
1k
Deep Learning Book 10その2 / deep learning book 10 vol2
himkt
2
190
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
Into the Great Unknown - MozCon
thekraken
40
2k
Building Adaptive Systems
keathley
43
2.7k
RailsConf 2023
tenderlove
30
1.2k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
A better future with KSS
kneath
238
17k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Being A Developer After 40
akosma
90
590k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
The Language of Interfaces
destraynor
158
25k
The Cult of Friendly URLs
andyhume
79
6.5k
Transcript
ݴޠॲཧ100ຊϊοΫΛRubyͰΔ ʢsciruby-jp issue #2ʣ
ࣗݾհͱͬͨ͜ͱ • B4 at ஜେֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿใநग़ʢ֬Ϟσϧʣ
• ୲ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
ݴޠॲཧ100ຊϊοΫ • ౦େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ఆ͞ΕΔݴޠPython • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ
RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰΖ͏ͱ͍ͯ͠Δਓ͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ ɹ • ఆݴޠɿPython
• RubyͰͰ͖ΔʁʢͰ͖ΔͩΖ͏ʣ -> ࣮ࡍʹղ͍ͯΈΔ ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6
ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔͷɿ୯ޠʢଟ͘ͷ߹ʣ • ग़ݱ͢Δ୯ޠͷͱͯଟ͍ʢສ - ेສʣ • ͯ͢ͷ୯ޠΛૉੑͱͯ͠͏ͱֶश͕͏·͍͔͘ͳ͍ •
ޮతͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦଘࡏ͠ͳ͍ • ࠓճ͓खʢhttps://github.com/himkt/rblearnʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8
ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ͏͜ͱ͕ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ͔ͳ͍ʁ*
• ࢥ͍ࠐΈ͔Εͳ͍ʢࠓճͷσʔλ10000 * 10000͘Β͍ʣ • NArrayͰ࣮ͨ͠ • ඞཁͳͷɿίετؔͱޯ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮Մʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ΫϩεόϦσʔγϣϯ • σʔληοτΛׂͯ͠ෳճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐΔ • Python: sklearn::cross_validation • ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •
Integer array indexing (masking ?) • NArrayʹ͋Δ NMatrixʹͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়ͰϥΠϒϥϦଆͰ࣮͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:
scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ओੳ
ओੳ
ओੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ
• ݻ༗ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗ɾݻ༗ϕΫτϧܭࢉະ࣮ -> อཹ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟʣ •
NArrayͰ࣮ • word2vecϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕྑ͍ • ࣮ࡍʹඞཁͳͷϕΫτϧಉ࢜ͷίαΠϯྨࣅͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆʣ • NArrayͷ΄͏͕͔ͬͨͷͰNArrayΛͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ
• NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮ͨ͠ʢNArrayͷ΄͏͕͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘Ͱ͖Δ
·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨNArray, NMatrix͕͋Εղ͚Δ • େنͳσʔλͷओੳͱ͔Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ
• աڈϩάΛݟͨʢࡢʣ • ༗Εخ͍͠ʢRubyࣗવݴޠॲཧʹ͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ͕౷Ұతʹ͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:
ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer