Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
言語処理100本ノックをRubyでやったメモ
Search
himkt
August 06, 2016
11
2.6k
言語処理100本ノックをRubyでやったメモ
himkt
August 06, 2016
Tweet
Share
More Decks by himkt
See All by himkt
Linformer: paper reading
himkt
0
540
RoBERTa: paper reading
himkt
1
350
NLP SoTA 勉強会 / ner_2019
himkt
2
1.4k
自然言語処理 @ クックパッド / nlp at cookpad
himkt
1
520
Interpretable Machine Learning 6.3 - Prototypes and Criticisms
himkt
2
170
ニューラル固有表現抽出 / Neural Named Entity Recognition
himkt
3
750
ニューラル固有表現抽出器を実装してみる / PyNER
himkt
6
2.1k
Spacyでお手軽NLP / NLP with spacy
himkt
0
1k
Deep Learning Book 10その2 / deep learning book 10 vol2
himkt
2
200
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
303
21k
4 Signs Your Business is Dying
shpigford
184
22k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
9
810
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
GraphQLとの向き合い方2022年版
quramy
49
14k
The Invisible Side of Design
smashingmag
301
51k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Rails Girls Zürich Keynote
gr2m
95
14k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Transcript
ݴޠॲཧ100ຊϊοΫΛRubyͰΔ ʢsciruby-jp issue #2ʣ
ࣗݾհͱͬͨ͜ͱ • B4 at ஜେֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿใநग़ʢ֬Ϟσϧʣ
• ୲ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
ݴޠॲཧ100ຊϊοΫ • ౦େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ఆ͞ΕΔݴޠPython • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ
RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰΖ͏ͱ͍ͯ͠Δਓ͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ ɹ • ఆݴޠɿPython
• RubyͰͰ͖ΔʁʢͰ͖ΔͩΖ͏ʣ -> ࣮ࡍʹղ͍ͯΈΔ ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6
ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔͷɿ୯ޠʢଟ͘ͷ߹ʣ • ग़ݱ͢Δ୯ޠͷͱͯଟ͍ʢສ - ेສʣ • ͯ͢ͷ୯ޠΛૉੑͱͯ͠͏ͱֶश͕͏·͍͔͘ͳ͍ •
ޮతͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦଘࡏ͠ͳ͍ • ࠓճ͓खʢhttps://github.com/himkt/rblearnʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8
ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ͏͜ͱ͕ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ͔ͳ͍ʁ*
• ࢥ͍ࠐΈ͔Εͳ͍ʢࠓճͷσʔλ10000 * 10000͘Β͍ʣ • NArrayͰ࣮ͨ͠ • ඞཁͳͷɿίετؔͱޯ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮Մʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ΫϩεόϦσʔγϣϯ • σʔληοτΛׂͯ͠ෳճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐΔ • Python: sklearn::cross_validation • ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •
Integer array indexing (masking ?) • NArrayʹ͋Δ NMatrixʹͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়ͰϥΠϒϥϦଆͰ࣮͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:
scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ओੳ
ओੳ
ओੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ
• ݻ༗ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗ɾݻ༗ϕΫτϧܭࢉະ࣮ -> อཹ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟʣ •
NArrayͰ࣮ • word2vecϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕྑ͍ • ࣮ࡍʹඞཁͳͷϕΫτϧಉ࢜ͷίαΠϯྨࣅͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆʣ • NArrayͷ΄͏͕͔ͬͨͷͰNArrayΛͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ
• NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮ͨ͠ʢNArrayͷ΄͏͕͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘Ͱ͖Δ
·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨNArray, NMatrix͕͋Εղ͚Δ • େنͳσʔλͷओੳͱ͔Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ
• աڈϩάΛݟͨʢࡢʣ • ༗Εخ͍͠ʢRubyࣗવݴޠॲཧʹ͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ͕౷Ұతʹ͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:
ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer