Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Frotiers of Natural Language Processing

Frotiers of Natural Language Processing

Recruit Technologies Open Lab #01 (テーマ: 自然言語処理)で話したときに使ったスライドです。

https://atnd.org/events/64383

Mamoru Komachi

April 23, 2015
Tweet

More Decks by Mamoru Komachi

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ: খொकʢ͜·ͪ·΋Δʣ 2 ß 2005.03 ౦ژେֶڭཆֶ෦جૅՊֶՊ Պֶ࢙ɾՊֶ఩ֶ෼Պଔۀ ß 2010.03 ಸྑઌ୺େɾത࢜ޙظ՝ఔमྃ

    ത࢜ʢ޻ֶʣ ઐ໳: ࣗવݴޠॲཧ ß 2010.04ʙ2013.03 ಸྑઌ୺େ ॿڭʢদຊ༟࣏ݚڀࣨʣ ß 2013.04〜 ट౎େֶ౦ژ ।ڭतʢࣗવݴޠॲཧݚڀࣨʣ
  2. χϡʔϥϧωοτϫʔΫ ͷϒϨΠΫεϧʔ ß Hinton et al., A Fast Learning Algorithm

    for Deep Belief Nets, Neural Computing, 2006. ß χϡʔϥϧωοτϫʔΫ͸1950೥୅͔Β ͕͋ͬͨɺදݱೳྗ͕ߴ͗ͯ͢ʢσʔλ ྔʹରͯ͠ʣաֶशʹͳΓ΍͔ͬͨ͢ɻ →૚͝ͱʹֶशΛߦ͍ɺෳ਺૚ΛॏͶΔ ͜ͱͰաֶशͷ໰୊͕ղܾͰ͖ͨʂ 7
  3. ࠶ؼతχϡʔϥϧωοτϫʔΫ Λ༻͍ͨը૾ೝࣝͱߏจղੳ 8 • Parsing Natural Scenes and Natural Language

    with Recursive Neural Networks, Socher et al., ICML 2011. • ྡ઀͢Δը૾ྖҬɾ୯ ޠ͔Β࠶ؼతʹߏ଄Λ ೝࣝ͢Δ →Staford Parser ʹ౷ ߹ (ACL 2013)
  4. ϦΧϨϯτχϡʔϥϧωοτ ϫʔΫͰແݶ௕ͷจ຺ΛߟྀՄೳ 11 • Recurrent Neural Network based Language Model,

    Mikolov et al., InterSpeech 2010. →աڈͷཤྺΛߟྀͯ͠ݱࡏͷ୯ޠΛ༧ଌ͢ΔϞσϧ
  5. ػց຋༁΋ܥྻ͔ΒܥྻΛੜ੒͢ ΔϞσϧͱͯ͠ਂ૚ֶशͰѻ͑Δ ß Sequence to Sequence Learning with Neural Networks,

    Sutskever et al., NIPS 2014. →LSTM (Long-Short Term Memory) Λ2ͭ༻ ͍ɺೖྗܥྻΛݻఆ௕ͷϕΫτϧʹม׵ ͠ɺͦͷϕΫτϧ͔Βग़ྗܥྻΛੜ੒ 12
  6. จࣈ͚͔ͩΒਂ૚ֶशͰςΩετ ෼ྨ΍ϓϩάϥϜ͕Ͱ͖ͯ͠·͏ ß Text Understanding from Scratch, Zhang and LeCun,

    arXiv 2015. →จࣈ͚͔ͩΒதӳͷςΩετ෼ྨثΛֶश ß Learning to Execute, Zaremba and Sutskever, arXiv 2015. →RNNͱLTSM͚͔ͩΒPythonϓϩάϥϜΛ ʮֶशʯ࣮ͯ͠ߦ 13
  7. ੈքΛڍ͛ͨଟݴޠॲཧͷͨΊͷ ཁૉٕज़ͷݚڀ։ൃ ß CoNLL: Conference on Natural Language Learning ͷڞ௨λεΫʢຖ೥։࠵ʣ

    Þ 2012: ଟݴޠஊ࿩ղੳ Þ 2009: ଟݴޠߏจɾҙຯղੳ Þ 2006, 2007: ଟݴޠߏจղੳ ß ಉ͡ΞϧΰϦζϜΛෳ਺ͷݴޠʹద༻͠ɺ ݴޠʹΑΒͳ͍ղੳख๏Λ୳ٻ 17
  8. Java ʹΑΔଟݴޠॲཧπʔϧ ʢ঎༻ͷϞσϧϥΠηϯε͸ཁަবʣ ß Stanford CoreNLP (Java) Þ ӳޠɺεϖΠϯޠɺதࠃޠͷܗଶૉղੳɾݻ ༗දݱೝࣝɾߏจղੳɾஊ࿩ղੳπʔϧ

    ß Apache OpenNLP (Java) Þ σϯϚʔΫޠɺυΠπޠɺӳޠɺεϖΠϯޠɺ ΦϥϯμޠɺϙϧτΨϧޠɺε΢Σʔσϯޠ Λαϙʔτ ß LingPipe (Java) Þ ӳޠʢ඼ࢺ෇༩ɾݻ༗දݱநग़ʣɾதࠃޠ ʢ୯ޠ෼ׂʣͷϞσϧ 18
  9. ଟݴޠܗଶૉղੳͷͨΊͷ λά࢓༷ͱίʔύε ß A Universal Part-of-Speech Tagset, Petrov et al.,

    LREC 2012. Þ 22ݴޠ: ӳޠɺதࠃޠɺ೔ຊޠɺؖࠃޠɺetc Þ ଟݴޠɾݴޠΛ·͍ͨͩߏจղੳͷݚڀ։ൃ ͷͨΊʹɺ·ͣ඼ࢺΛҰ؏͚͍ͯͭͨ͠ Þ ೔ຊޠ͸೔ຊޠॻ͖ݴ༿ۉߧίʔύε ʢBCCWJʣͷ୹୯Ґʹ४ڌͨ͠୯ޠ෼ׂ 19
  10. ଟݴޠ܎Γड͚ղੳͷͨΊͷ λά࢓༷ͱίʔύε ß Universal Dependency Annotation for Multilingual Parsing, McDonald

    et al., ACL 2013. Þ υΠπޠɾӳޠɾε΢ΣʔσϯޠɾεϖΠϯޠɾ ϑϥϯεޠɾؖࠃޠɾetc Þ ೔ຊޠ Universal Dependencies ͷࢼҊ, ۚࢁΒ, ݴ ޠॲཧֶձ೥࣍େձ 2015. 20
  11. ࣗવݴޠॲཧͷཁૉٕज़͸੒ख़ظ ཁૉٕज़ ਫ਼౓ ܗଶૉղੳʢ෼͔ͪॻ͖ʣ 99% ߏจղੳʢ܎Γड͚ʣ 90% ҙຯղੳʢड़ޠ߲ߏ଄ʣ 60% ஊ࿩ղੳʢจΛ௒͑ͨؔ܎ʣ

    30% 21 ղ ੳ ͷ ྲྀ Ε จਖ਼ղ཰ʹ͢Δͱ5ׂ ཁૉٕज़୯ମͰͷਫ਼౓޲্͸಄ଧͪ ᶃΞϓϦέʔγϣϯʹଈͨ͠ੑೳධՁͷඞཁ ᶄਫ਼౓Ҏ֎ͷ໘ͰͷΞϐʔϧ
  12. ӳޠͷݴޠղੳ΋৽ฉهࣄ͔Β ΢ΣϒςΩετ΁ ß Workshop on Syntactic Analysis on Non- Canonical

    Language (SANCL 2012) ß Google English Web Treebank (2012) Þ ΢ΣϒςΩετʢϒϩάɺχϡʔεάϧʔϓɺ ϝʔϧɺϦϏϡʔɺQA ʣʹܗଶૉɾߏจʢ܎ Γड͚ʣ৘ใΛλά͚ͮ 22
  13. ΢ΣϒςΩετ΋ɺΑΓ೉͍͠ Ϣʔβੜ੒ܕͷςΩετղੳ΁ ß Tweet NLPʢӳޠͷΈʣ http://www.ark.cs.cmu.edu/TweetNLP/ Þ Twokenizer: ܗଶૉղੳ Þ

    Tweeboparser: ܎Γड͚ղੳ Þ Tweebank: Twitter ίʔύε Þ Twitter Word Clusters: ୯ޠΫϥελ 23
  14. ฼ޠ࿩ऀ͕ॻ͍ͨจ๏తʹਖ਼͍͠ςΩ ετ͔ΒɺݴޠֶशऀͷςΩετ΁ ß 2011೥લޙ͔Βຖ೥ͷΑ͏ʹӳޠֶशऀ ͷ࡞จͷจ๏ޡΓగਖ਼ڞ௨λεΫ͕։࠵ Þ Helping Our Own (HOO)

    2011, 2012 Þ CoNLL 2013, 2014 ß ӳޠֶशऀίʔύε΋ଟ਺ϦϦʔε Þ NUS Corpus of Learner English Þ Lang-8 Learner Corpora 24
  15. ݻ༗දݱೝࣝɾޠٛᐆດੑղফ ͔Β entity linking ΁ ß ݻ༗දݱೝࣝ Þ ݻ༗දݱͷՕॴΛಉఆ ß

    entity linking Þ ݻ༗දݱ͕ԿΛࢦ͔͢ᐆດੑղফ Þ Wikify (Wikification) 25 ҆ഒट૬͕ࣄ࣮ޡೝΛೝΊɺҨ״Λද໌ͨ͠ɻ
  16. ຊ೔ͷ·ͱΊ ß ਂ૚ֶश͕ݴޠॲཧʹ༩͑ΔΠϯύΫτ Þ ߏจղੳ͔Βҙຯղੳ·Ͱ end-to-end Þ ϚϧνϞʔμϧʢը૾ɾԻ੠ɾݴޠʣॲཧ Þ ςΩετੜ੒͕ࠓޙരൃతʹීٴͦ͠͏

    ß ࣗવݴޠॲཧͷ৽ͨͳൃల Þ ݴޠඇґଘͳख๏ͷݕ౼ͱ໰୊ͷ෼ੳ Þ ؤ݈ͳղੳख๏ͷ໛ࡧ Þ ΢Σϒͷొ৔ʹΑΔݹͯ͘৽͍͠໰୊ઃఆ 26