Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Frotiers of Natural Language Processing
Search
Mamoru Komachi
April 23, 2015
Technology
0
11
Frotiers of Natural Language Processing
Recruit Technologies Open Lab #01 (テーマ: 自然言語処理)で話したときに使ったスライドです。
https://atnd.org/events/64383
Mamoru Komachi
April 23, 2015
Tweet
Share
More Decks by Mamoru Komachi
See All by Mamoru Komachi
Exploring and Adapting Chinese GPT to Pinyin Input Method
mamoruk
0
92
Recent advances in natural language understanding and natural language generation
mamoruk
0
77
Introduction to Natural Language Processing
mamoruk
0
12
Generative Adversarial Network for Natural Language Processing
mamoruk
0
16
Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning
mamoruk
2
710
Sequence-to-Dependency Neural Machine Translation
mamoruk
0
18
Visualizing and Understanding Neural Machine Translation
mamoruk
0
15
ABCNN: Attention-Based Convolutional Neural Network for Medeling Sentence Pairs
mamoruk
0
18
Japanese Text Processing with Deep Neural Network
mamoruk
0
12
Other Decks in Technology
See All in Technology
開発パフォーマンスを最大化するための開発体制
ham0215
7
1.1k
ワールドカフェI /チューターを改良する / World Café I and Improving the Tutors
ks91
PRO
0
150
地理空間データ可視化・解析・活用ソリューション Pacific Spatial Solutions (PSS)
pacificspatialsolutions
0
330
GrafanaMeetup_AmazonManagedGrafanaのアクセス制御機能とマルチテナント環境下でのアクセス制御について
daitak
0
400
本当のAWS基礎
toru_kubota
1
630
止まらないLinuxシステムを構築する_高信頼性クラスタ入門
koedoyoshida
2
420
require(ESM)とECMAScript仕様
uhyo
4
970
How to do well in consulting–Balkan Ruby 2024
irinanazarova
0
140
チームでロジカルシンキングに改めて向き合っている話 〜学習環境と実践⽅法〜
sansantech
PRO
3
3.3k
データベース02: データベースの概念
trycycle
0
180
よく聞くけど使ったことないソフトウェアNo.1 KafkaとSnowflake
foursue
4
510
家族アルバム みてねにおけるGrafana活用術 / Grafana Meetup Japan Vol.1 LT
isaoshimizu
1
1k
Featured
See All Featured
Building Your Own Lightsaber
phodgson
100
5.7k
Git: the NoSQL Database
bkeepers
PRO
423
63k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
21
1.4k
How STYLIGHT went responsive
nonsquared
92
4.8k
Build The Right Thing And Hit Your Dates
maggiecrowley
25
2k
What’s in a name? Adding method to the madness
productmarketing
PRO
17
2.7k
Mobile First: as difficult as doing things right
swwweet
217
8.6k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
245
20k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
228
16k
Pencils Down: Stop Designing & Start Developing
hursman
117
11k
Imperfection Machines: The Place of Print at Facebook
scottboms
261
12k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
33
6k
Transcript
ࣗવݴޠॲཧͷ৽ల։ 20154݄21 टେֶ౦ژ γεςϜσβΠϯֶ෦ খொक
ࣗݾհ: খொकʢ͜·ͪ·Δʣ 2 ß 2005.03 ౦ژେֶڭཆֶ෦جૅՊֶՊ Պֶ࢙ɾՊֶֶՊଔۀ ß 2010.03 ಸྑઌେɾത࢜ޙظ՝ఔमྃ
ത࢜ʢֶʣ ઐ: ࣗવݴޠॲཧ ß 2010.04ʙ2013.03 ಸྑઌେ ॿڭʢদຊ༟࣏ݚڀࣨʣ ß 2013.04〜 टେֶ౦ژ ।ڭतʢࣗવݴޠॲཧݚڀࣨʣ
ຊͷ࣍ ß ਂֶश͕ࣗવݴޠॲཧʹ༩͑ΔΠϯύ Ϋτ ß ࣗવݴޠॲཧͷ৽ͨͳൃల 3
ਂֶशʢdeep learningʣ ß ෳϨΠϠʔͷχϡʔϥϧωοτϫʔΫ ʹΑͬͯෳࡶͳϞσϧΛֶश͢ΔΈ ß ༷ʑͳύλʔϯೝࣝλεΫͰେ෯ͳੑೳ ্Λୡ͠ɺGoogle, Facebook, Microsoft,
Baidu ͳͲ͞·͟·ͳاۀ͕͜ ͧͬͯݚڀ 4
Lee et al., ICML 2009. 5
ਂֶशͷॴ ß ૉੑֶʢfeature engineeringʣ͕ෆཁɻ ϥϕϧͳ͠σʔλ͔Βࣗಈతʹ༗ޮͳૉ ੑͷΈ߹Θֶ͕ͤशՄೳɻ →ϋΠύʔύϥϝʔλଘࡏ ß σʔλ͔ΒେҬతͳදݱֶशʢdistributed representationʣ͕Մೳ
→ΫϥελϦϯάہॴతͳදݱֶश 6
χϡʔϥϧωοτϫʔΫ ͷϒϨΠΫεϧʔ ß Hinton et al., A Fast Learning Algorithm
for Deep Belief Nets, Neural Computing, 2006. ß χϡʔϥϧωοτϫʔΫ1950͔Β ͕͋ͬͨɺදݱೳྗ͕ߴ͗ͯ͢ʢσʔλ ྔʹରͯ͠ʣաֶशʹͳΓ͔ͬͨ͢ɻ →͝ͱʹֶशΛߦ͍ɺෳΛॏͶΔ ͜ͱͰաֶशͷ͕ղܾͰ͖ͨʂ 7
࠶ؼతχϡʔϥϧωοτϫʔΫ Λ༻͍ͨը૾ೝࣝͱߏจղੳ 8 • Parsing Natural Scenes and Natural Language
with Recursive Neural Networks, Socher et al., ICML 2011. • ྡ͢Δը૾ྖҬɾ୯ ޠ͔Β࠶ؼతʹߏΛ ೝࣝ͢Δ →Staford Parser ʹ౷ ߹ (ACL 2013)
࠶ؼతχϡʔϥϧωοτϫʔΫͰ ϑϨʔζͷײۃੑྨ࣮ݱ 9 • Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank, Socher et al., EMNLP 2013.
Socher et al. (NIPS 2011): ୯ޠϕΫ τϧ͔ΒจͷҙຯΛ࠶ؼతʹܭࢉ 10
ϦΧϨϯτχϡʔϥϧωοτ ϫʔΫͰແݶͷจ຺ΛߟྀՄೳ 11 • Recurrent Neural Network based Language Model,
Mikolov et al., InterSpeech 2010. →աڈͷཤྺΛߟྀͯ͠ݱࡏͷ୯ޠΛ༧ଌ͢ΔϞσϧ
ػց༁ܥྻ͔ΒܥྻΛੜ͢ ΔϞσϧͱͯ͠ਂֶशͰѻ͑Δ ß Sequence to Sequence Learning with Neural Networks,
Sutskever et al., NIPS 2014. →LSTM (Long-Short Term Memory) Λ2ͭ༻ ͍ɺೖྗܥྻΛݻఆͷϕΫτϧʹม ͠ɺͦͷϕΫτϧ͔Βग़ྗܥྻΛੜ 12
จࣈ͚͔ͩΒਂֶशͰςΩετ ྨϓϩάϥϜ͕Ͱ͖ͯ͠·͏ ß Text Understanding from Scratch, Zhang and LeCun,
arXiv 2015. →จࣈ͚͔ͩΒதӳͷςΩετྨثΛֶश ß Learning to Execute, Zaremba and Sutskever, arXiv 2015. →RNNͱLTSM͚͔ͩΒPythonϓϩάϥϜΛ ʮֶशʯ࣮ͯ͠ߦ 13
ਂֶशΛͬͯϚϧνϞʔμϧ ͳೖग़ྗΛࣗવʹ౷߹ ß ը૾͚͔ͩΒΩϟϓγϣϯΛੜ http://deeplearning.cs.toronto.edu/i2t http://googleresearch.blogspot.jp/2014/11/a-picture-is- worth-thousand-coherent.html 14
ຊͷ࣍ ß ਂֶश͕ࣗવݴޠॲཧʹ༩͑ΔΠϯύ Ϋτ ß ࣗવݴޠॲཧͷ৽ͨͳൃల 15
ࣗવݴޠॲཧͷޭ ß ࣝผϞσϧ Þ λά͖ͭίʔύεΛ༻ҙͯ͠ڭࢣ͋Γֶश Þ ܗଶૉղੳɺݻ༗දݱೝࣝɺߏจղੳɺetc ß ࠷దԽ Þ
ϥϯΩϯάΈ߹Θͤ࠷దԽʹఆࣜԽ Þ Σϒݕࡧɺػց༁ɺจॻཁɺetc 16
ੈքΛڍ͛ͨଟݴޠॲཧͷͨΊͷ ཁૉٕज़ͷݚڀ։ൃ ß CoNLL: Conference on Natural Language Learning ͷڞ௨λεΫʢຖ։࠵ʣ
Þ 2012: ଟݴޠஊղੳ Þ 2009: ଟݴޠߏจɾҙຯղੳ Þ 2006, 2007: ଟݴޠߏจղੳ ß ಉ͡ΞϧΰϦζϜΛෳͷݴޠʹద༻͠ɺ ݴޠʹΑΒͳ͍ղੳख๏Λ୳ٻ 17
Java ʹΑΔଟݴޠॲཧπʔϧ ʢ༻ͷϞσϧϥΠηϯεཁަবʣ ß Stanford CoreNLP (Java) Þ ӳޠɺεϖΠϯޠɺதࠃޠͷܗଶૉղੳɾݻ ༗දݱೝࣝɾߏจղੳɾஊղੳπʔϧ
ß Apache OpenNLP (Java) Þ σϯϚʔΫޠɺυΠπޠɺӳޠɺεϖΠϯޠɺ ΦϥϯμޠɺϙϧτΨϧޠɺεΣʔσϯޠ Λαϙʔτ ß LingPipe (Java) Þ ӳޠʢࢺ༩ɾݻ༗දݱநग़ʣɾதࠃޠ ʢ୯ޠׂʣͷϞσϧ 18
ଟݴޠܗଶૉղੳͷͨΊͷ λά༷ͱίʔύε ß A Universal Part-of-Speech Tagset, Petrov et al.,
LREC 2012. Þ 22ݴޠ: ӳޠɺதࠃޠɺຊޠɺؖࠃޠɺetc Þ ଟݴޠɾݴޠΛ·͍ͨͩߏจղੳͷݚڀ։ൃ ͷͨΊʹɺ·ͣࢺΛҰ؏͚͍ͯͭͨ͠ Þ ຊޠຊޠॻ͖ݴ༿ۉߧίʔύε ʢBCCWJʣͷ୯Ґʹ४ڌͨ͠୯ޠׂ 19
ଟݴޠΓड͚ղੳͷͨΊͷ λά༷ͱίʔύε ß Universal Dependency Annotation for Multilingual Parsing, McDonald
et al., ACL 2013. Þ υΠπޠɾӳޠɾεΣʔσϯޠɾεϖΠϯޠɾ ϑϥϯεޠɾؖࠃޠɾetc Þ ຊޠ Universal Dependencies ͷࢼҊ, ۚࢁΒ, ݴ ޠॲཧֶձ࣍େձ 2015. 20
ࣗવݴޠॲཧͷཁૉٕज़ख़ظ ཁૉٕज़ ਫ਼ ܗଶૉղੳʢ͔ͪॻ͖ʣ 99% ߏจղੳʢΓड͚ʣ 90% ҙຯղੳʢड़ޠ߲ߏʣ 60% ஊղੳʢจΛ͑ͨؔʣ
30% 21 ղ ੳ ͷ ྲྀ Ε จਖ਼ղʹ͢Δͱ5ׂ ཁૉٕज़୯ମͰͷਫ਼্಄ଧͪ ᶃΞϓϦέʔγϣϯʹଈͨ͠ੑೳධՁͷඞཁ ᶄਫ਼Ҏ֎ͷ໘ͰͷΞϐʔϧ
ӳޠͷݴޠղੳ৽ฉهࣄ͔Β ΣϒςΩετ ß Workshop on Syntactic Analysis on Non- Canonical
Language (SANCL 2012) ß Google English Web Treebank (2012) Þ ΣϒςΩετʢϒϩάɺχϡʔεάϧʔϓɺ ϝʔϧɺϦϏϡʔɺQA ʣʹܗଶૉɾߏจʢ Γड͚ʣใΛλά͚ͮ 22
ΣϒςΩετɺΑΓ͍͠ ϢʔβੜܕͷςΩετղੳ ß Tweet NLPʢӳޠͷΈʣ http://www.ark.cs.cmu.edu/TweetNLP/ Þ Twokenizer: ܗଶૉղੳ Þ
Tweeboparser: Γड͚ղੳ Þ Tweebank: Twitter ίʔύε Þ Twitter Word Clusters: ୯ޠΫϥελ 23
ޠऀ͕ॻ͍ͨจ๏తʹਖ਼͍͠ςΩ ετ͔ΒɺݴޠֶशऀͷςΩετ ß 2011લޙ͔ΒຖͷΑ͏ʹӳޠֶशऀ ͷ࡞จͷจ๏ޡΓగਖ਼ڞ௨λεΫ͕։࠵ Þ Helping Our Own (HOO)
2011, 2012 Þ CoNLL 2013, 2014 ß ӳޠֶशऀίʔύεଟϦϦʔε Þ NUS Corpus of Learner English Þ Lang-8 Learner Corpora 24
ݻ༗දݱೝࣝɾޠٛᐆດੑղফ ͔Β entity linking ß ݻ༗දݱೝࣝ Þ ݻ༗දݱͷՕॴΛಉఆ ß
entity linking Þ ݻ༗දݱ͕ԿΛࢦ͔͢ᐆດੑղফ Þ Wikify (Wikification) 25 ҆ഒट૬͕ࣄ࣮ޡೝΛೝΊɺҨ״Λද໌ͨ͠ɻ
ຊͷ·ͱΊ ß ਂֶश͕ݴޠॲཧʹ༩͑ΔΠϯύΫτ Þ ߏจղੳ͔Βҙຯղੳ·Ͱ end-to-end Þ ϚϧνϞʔμϧʢը૾ɾԻɾݴޠʣॲཧ Þ ςΩετੜ͕ࠓޙരൃతʹීٴͦ͠͏
ß ࣗવݴޠॲཧͷ৽ͨͳൃల Þ ݴޠඇґଘͳख๏ͷݕ౼ͱͷੳ Þ ؤ݈ͳղੳख๏ͷࡧ Þ ΣϒͷొʹΑΔݹͯ͘৽͍͠ઃఆ 26