Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Frotiers of Natural Language Processing
Search
Mamoru Komachi
April 23, 2015
Technology
0
12
Frotiers of Natural Language Processing
Recruit Technologies Open Lab #01 (テーマ: 自然言語処理)で話したときに使ったスライドです。
https://atnd.org/events/64383
Mamoru Komachi
April 23, 2015
Tweet
Share
More Decks by Mamoru Komachi
See All by Mamoru Komachi
大規模言語モデルのインパクトと課題/oc2023
mamoruk
0
26
Exploring and Adapting Chinese GPT to Pinyin Input Method
mamoruk
0
100
Recent advances in natural language understanding and natural language generation
mamoruk
0
96
Introduction to Natural Language Processing
mamoruk
0
23
Generative Adversarial Network for Natural Language Processing
mamoruk
0
34
Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning
mamoruk
2
730
Sequence-to-Dependency Neural Machine Translation
mamoruk
0
27
Visualizing and Understanding Neural Machine Translation
mamoruk
0
27
ABCNN: Attention-Based Convolutional Neural Network for Medeling Sentence Pairs
mamoruk
0
37
Other Decks in Technology
See All in Technology
サイボウズフロントエンドエキスパートチームについて / FrontendExpert Team
cybozuinsideout
PRO
5
38k
.NET 9 のパフォーマンス改善
nenonaninu
0
1k
NilAway による静的解析で「10 億ドル」を節約する #kyotogo / Kyoto Go 56th
ytaka23
3
380
KubeCon NA 2024 Recap / Running WebAssembly (Wasm) Workloads Side-by-Side with Container Workloads
z63d
1
250
Google Cloud で始める Cloud Run 〜AWSとの比較と実例デモで解説〜
risatube
PRO
0
110
How to be an AWS Community Builder | 君もAWS Community Builderになろう!〜2024 冬 CB募集直前対策編?!〜
coosuke
PRO
2
2.8k
組織に自動テストを書く文化を根付かせる戦略(2024冬版) / Building Automated Test Culture 2024 Winter Edition
twada
PRO
17
4.5k
私なりのAIのご紹介 [2024年版]
qt_luigi
1
120
サービスでLLMを採用したばっかりに振り回され続けたこの一年のあれやこれや
segavvy
2
480
OpenAIの蒸留機能(Model Distillation)を使用して運用中のLLMのコストを削減する取り組み
pharma_x_tech
4
560
成果を出しながら成長する、アウトプット駆動のキャッチアップ術 / Output-driven catch-up techniques to grow while producing results
aiandrox
0
350
WACATE2024冬セッション資料(ユーザビリティ)
scarletplover
0
210
Featured
See All Featured
Making Projects Easy
brettharned
116
5.9k
Into the Great Unknown - MozCon
thekraken
33
1.5k
Navigating Team Friction
lara
183
15k
Code Review Best Practice
trishagee
65
17k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Practical Orchestrator
shlominoach
186
10k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
170
Optimising Largest Contentful Paint
csswizardry
33
3k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.3k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
6
520
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Transcript
ࣗવݴޠॲཧͷ৽ల։ 20154݄21 टେֶ౦ژ γεςϜσβΠϯֶ෦ খொक
ࣗݾհ: খொकʢ͜·ͪ·Δʣ 2 ß 2005.03 ౦ژେֶڭཆֶ෦جૅՊֶՊ Պֶ࢙ɾՊֶֶՊଔۀ ß 2010.03 ಸྑઌେɾത࢜ޙظ՝ఔमྃ
ത࢜ʢֶʣ ઐ: ࣗવݴޠॲཧ ß 2010.04ʙ2013.03 ಸྑઌେ ॿڭʢদຊ༟࣏ݚڀࣨʣ ß 2013.04〜 टେֶ౦ژ ।ڭतʢࣗવݴޠॲཧݚڀࣨʣ
ຊͷ࣍ ß ਂֶश͕ࣗવݴޠॲཧʹ༩͑ΔΠϯύ Ϋτ ß ࣗવݴޠॲཧͷ৽ͨͳൃల 3
ਂֶशʢdeep learningʣ ß ෳϨΠϠʔͷχϡʔϥϧωοτϫʔΫ ʹΑͬͯෳࡶͳϞσϧΛֶश͢ΔΈ ß ༷ʑͳύλʔϯೝࣝλεΫͰେ෯ͳੑೳ ্Λୡ͠ɺGoogle, Facebook, Microsoft,
Baidu ͳͲ͞·͟·ͳاۀ͕͜ ͧͬͯݚڀ 4
Lee et al., ICML 2009. 5
ਂֶशͷॴ ß ૉੑֶʢfeature engineeringʣ͕ෆཁɻ ϥϕϧͳ͠σʔλ͔Βࣗಈతʹ༗ޮͳૉ ੑͷΈ߹Θֶ͕ͤशՄೳɻ →ϋΠύʔύϥϝʔλଘࡏ ß σʔλ͔ΒେҬతͳදݱֶशʢdistributed representationʣ͕Մೳ
→ΫϥελϦϯάہॴతͳදݱֶश 6
χϡʔϥϧωοτϫʔΫ ͷϒϨΠΫεϧʔ ß Hinton et al., A Fast Learning Algorithm
for Deep Belief Nets, Neural Computing, 2006. ß χϡʔϥϧωοτϫʔΫ1950͔Β ͕͋ͬͨɺදݱೳྗ͕ߴ͗ͯ͢ʢσʔλ ྔʹରͯ͠ʣաֶशʹͳΓ͔ͬͨ͢ɻ →͝ͱʹֶशΛߦ͍ɺෳΛॏͶΔ ͜ͱͰաֶशͷ͕ղܾͰ͖ͨʂ 7
࠶ؼతχϡʔϥϧωοτϫʔΫ Λ༻͍ͨը૾ೝࣝͱߏจղੳ 8 • Parsing Natural Scenes and Natural Language
with Recursive Neural Networks, Socher et al., ICML 2011. • ྡ͢Δը૾ྖҬɾ୯ ޠ͔Β࠶ؼతʹߏΛ ೝࣝ͢Δ →Staford Parser ʹ౷ ߹ (ACL 2013)
࠶ؼతχϡʔϥϧωοτϫʔΫͰ ϑϨʔζͷײۃੑྨ࣮ݱ 9 • Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank, Socher et al., EMNLP 2013.
Socher et al. (NIPS 2011): ୯ޠϕΫ τϧ͔ΒจͷҙຯΛ࠶ؼతʹܭࢉ 10
ϦΧϨϯτχϡʔϥϧωοτ ϫʔΫͰແݶͷจ຺ΛߟྀՄೳ 11 • Recurrent Neural Network based Language Model,
Mikolov et al., InterSpeech 2010. →աڈͷཤྺΛߟྀͯ͠ݱࡏͷ୯ޠΛ༧ଌ͢ΔϞσϧ
ػց༁ܥྻ͔ΒܥྻΛੜ͢ ΔϞσϧͱͯ͠ਂֶशͰѻ͑Δ ß Sequence to Sequence Learning with Neural Networks,
Sutskever et al., NIPS 2014. →LSTM (Long-Short Term Memory) Λ2ͭ༻ ͍ɺೖྗܥྻΛݻఆͷϕΫτϧʹม ͠ɺͦͷϕΫτϧ͔Βग़ྗܥྻΛੜ 12
จࣈ͚͔ͩΒਂֶशͰςΩετ ྨϓϩάϥϜ͕Ͱ͖ͯ͠·͏ ß Text Understanding from Scratch, Zhang and LeCun,
arXiv 2015. →จࣈ͚͔ͩΒதӳͷςΩετྨثΛֶश ß Learning to Execute, Zaremba and Sutskever, arXiv 2015. →RNNͱLTSM͚͔ͩΒPythonϓϩάϥϜΛ ʮֶशʯ࣮ͯ͠ߦ 13
ਂֶशΛͬͯϚϧνϞʔμϧ ͳೖग़ྗΛࣗવʹ౷߹ ß ը૾͚͔ͩΒΩϟϓγϣϯΛੜ http://deeplearning.cs.toronto.edu/i2t http://googleresearch.blogspot.jp/2014/11/a-picture-is- worth-thousand-coherent.html 14
ຊͷ࣍ ß ਂֶश͕ࣗવݴޠॲཧʹ༩͑ΔΠϯύ Ϋτ ß ࣗવݴޠॲཧͷ৽ͨͳൃల 15
ࣗવݴޠॲཧͷޭ ß ࣝผϞσϧ Þ λά͖ͭίʔύεΛ༻ҙͯ͠ڭࢣ͋Γֶश Þ ܗଶૉղੳɺݻ༗දݱೝࣝɺߏจղੳɺetc ß ࠷దԽ Þ
ϥϯΩϯάΈ߹Θͤ࠷దԽʹఆࣜԽ Þ Σϒݕࡧɺػց༁ɺจॻཁɺetc 16
ੈքΛڍ͛ͨଟݴޠॲཧͷͨΊͷ ཁૉٕज़ͷݚڀ։ൃ ß CoNLL: Conference on Natural Language Learning ͷڞ௨λεΫʢຖ։࠵ʣ
Þ 2012: ଟݴޠஊղੳ Þ 2009: ଟݴޠߏจɾҙຯղੳ Þ 2006, 2007: ଟݴޠߏจղੳ ß ಉ͡ΞϧΰϦζϜΛෳͷݴޠʹద༻͠ɺ ݴޠʹΑΒͳ͍ղੳख๏Λ୳ٻ 17
Java ʹΑΔଟݴޠॲཧπʔϧ ʢ༻ͷϞσϧϥΠηϯεཁަবʣ ß Stanford CoreNLP (Java) Þ ӳޠɺεϖΠϯޠɺதࠃޠͷܗଶૉղੳɾݻ ༗දݱೝࣝɾߏจղੳɾஊղੳπʔϧ
ß Apache OpenNLP (Java) Þ σϯϚʔΫޠɺυΠπޠɺӳޠɺεϖΠϯޠɺ ΦϥϯμޠɺϙϧτΨϧޠɺεΣʔσϯޠ Λαϙʔτ ß LingPipe (Java) Þ ӳޠʢࢺ༩ɾݻ༗දݱநग़ʣɾதࠃޠ ʢ୯ޠׂʣͷϞσϧ 18
ଟݴޠܗଶૉղੳͷͨΊͷ λά༷ͱίʔύε ß A Universal Part-of-Speech Tagset, Petrov et al.,
LREC 2012. Þ 22ݴޠ: ӳޠɺதࠃޠɺຊޠɺؖࠃޠɺetc Þ ଟݴޠɾݴޠΛ·͍ͨͩߏจղੳͷݚڀ։ൃ ͷͨΊʹɺ·ͣࢺΛҰ؏͚͍ͯͭͨ͠ Þ ຊޠຊޠॻ͖ݴ༿ۉߧίʔύε ʢBCCWJʣͷ୯Ґʹ४ڌͨ͠୯ޠׂ 19
ଟݴޠΓड͚ղੳͷͨΊͷ λά༷ͱίʔύε ß Universal Dependency Annotation for Multilingual Parsing, McDonald
et al., ACL 2013. Þ υΠπޠɾӳޠɾεΣʔσϯޠɾεϖΠϯޠɾ ϑϥϯεޠɾؖࠃޠɾetc Þ ຊޠ Universal Dependencies ͷࢼҊ, ۚࢁΒ, ݴ ޠॲཧֶձ࣍େձ 2015. 20
ࣗવݴޠॲཧͷཁૉٕज़ख़ظ ཁૉٕज़ ਫ਼ ܗଶૉղੳʢ͔ͪॻ͖ʣ 99% ߏจղੳʢΓड͚ʣ 90% ҙຯղੳʢड़ޠ߲ߏʣ 60% ஊղੳʢจΛ͑ͨؔʣ
30% 21 ղ ੳ ͷ ྲྀ Ε จਖ਼ղʹ͢Δͱ5ׂ ཁૉٕज़୯ମͰͷਫ਼্಄ଧͪ ᶃΞϓϦέʔγϣϯʹଈͨ͠ੑೳධՁͷඞཁ ᶄਫ਼Ҏ֎ͷ໘ͰͷΞϐʔϧ
ӳޠͷݴޠղੳ৽ฉهࣄ͔Β ΣϒςΩετ ß Workshop on Syntactic Analysis on Non- Canonical
Language (SANCL 2012) ß Google English Web Treebank (2012) Þ ΣϒςΩετʢϒϩάɺχϡʔεάϧʔϓɺ ϝʔϧɺϦϏϡʔɺQA ʣʹܗଶૉɾߏจʢ Γड͚ʣใΛλά͚ͮ 22
ΣϒςΩετɺΑΓ͍͠ ϢʔβੜܕͷςΩετղੳ ß Tweet NLPʢӳޠͷΈʣ http://www.ark.cs.cmu.edu/TweetNLP/ Þ Twokenizer: ܗଶૉղੳ Þ
Tweeboparser: Γड͚ղੳ Þ Tweebank: Twitter ίʔύε Þ Twitter Word Clusters: ୯ޠΫϥελ 23
ޠऀ͕ॻ͍ͨจ๏తʹਖ਼͍͠ςΩ ετ͔ΒɺݴޠֶशऀͷςΩετ ß 2011લޙ͔ΒຖͷΑ͏ʹӳޠֶशऀ ͷ࡞จͷจ๏ޡΓగਖ਼ڞ௨λεΫ͕։࠵ Þ Helping Our Own (HOO)
2011, 2012 Þ CoNLL 2013, 2014 ß ӳޠֶशऀίʔύεଟϦϦʔε Þ NUS Corpus of Learner English Þ Lang-8 Learner Corpora 24
ݻ༗දݱೝࣝɾޠٛᐆດੑղফ ͔Β entity linking ß ݻ༗දݱೝࣝ Þ ݻ༗දݱͷՕॴΛಉఆ ß
entity linking Þ ݻ༗දݱ͕ԿΛࢦ͔͢ᐆດੑղফ Þ Wikify (Wikification) 25 ҆ഒट૬͕ࣄ࣮ޡೝΛೝΊɺҨ״Λද໌ͨ͠ɻ
ຊͷ·ͱΊ ß ਂֶश͕ݴޠॲཧʹ༩͑ΔΠϯύΫτ Þ ߏจղੳ͔Βҙຯղੳ·Ͱ end-to-end Þ ϚϧνϞʔμϧʢը૾ɾԻɾݴޠʣॲཧ Þ ςΩετੜ͕ࠓޙരൃతʹීٴͦ͠͏
ß ࣗવݴޠॲཧͷ৽ͨͳൃల Þ ݴޠඇґଘͳख๏ͷݕ౼ͱͷੳ Þ ؤ݈ͳղੳख๏ͷࡧ Þ ΣϒͷొʹΑΔݹͯ͘৽͍͠ઃఆ 26