Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Learner Errors in the Choice of Conte...

Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics

Slides about introducing a paper "Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics" by Ekaterina Kochmar and Ted Briscoe presented at COLING 2014 reading group at Tokyo Metropolitan University, Japan.

Mamoru Komachi

November 06, 2014
Tweet

More Decks by Mamoru Komachi

Other Decks in Research

Transcript

  1. Detecting Learner Errors in the Choice of Content Words Using

    Compositional Distributional Semantics Ekaterina Kochmar and Ted Briscoe, ACL 2014 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक <[email protected]> COLING 2014 ಡΈձ@ट౎େֶ౦ژ 2014/11/06
  2. Detecting Learner Errors in the Choice of Adjective-Noun Combinations Using

    Compositional Distributional Semantics Ekaterina Kochmar and Ted Briscoe, ACL 2014 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक <[email protected]> COLING 2014 ಡΈձ@ट౎େֶ౦ژ 2014/11/06
  3. ӳޠֶशऀ͸ܗ༰ࢺ-໊ࢺ ͷ૊Έ߹ΘͤΛΑؒ͘ҧ͑Δ | ҙຯ͕ࣅ͍ͯΔͷͰؒҧ͑ͯ࢖ͬͯ͠·͏ { *big/large quantity { *big/great importance

    | Α͋͘Δܗ༰ࢺΛؒҧ͑ͯ࢖ͬͯ͠·͏ { *big/long history { *greatest/highest revenue { *bigger/wider variety { *large/broad knowledge | ҰൠతͰͳ͍ܗ༰ࢺΛ࢖ͬͯ͠·͏ { *classic/classical dance { *economical/economic crisis 3
  4. ಺༰ޠͷޡΓݕग़͸ػೳޠͱ ൺ΂ͯνϟϨϯδϯάͳλεΫ | ػೳޠʢલஔࢺɾףࢺʣ͸ closed set ͳͷͰɺ confusion set ͱޡΓ෼෍͸ֶशऀςΩετ͔Β

    ֶशՄೳ (Rozovskaya and Roth, ACL 2011) | ಺༰ޠ͸ open set ͳͷͰ confusion set Λ࡞ Δͷ͕೉͍͠ʢͨΊଟΫϥε෼ྨλεΫʹམͱ ͤͳ͍ʣ →ݴޠֶशऀͷจষʹ͸ʢจ๏ɾҙຯతʹ͸ਖ਼͘͠ ͯ΋ʣ௿ස౓ޠؚ͕·ΕΔͷͰɺڞىͷΈʹجͮ͘ ख๏͸͏·͘ߦ͔ͳ͍ɻcf. appropriate concern vs proper concern 4
  5. ಺༰ޠͷޡΓ͸3൪໨ʹଟ͍͕ɺ ೉͘͠औΓ૊·Εͯ͜ͳ͔ͬͨ | ಺༰ޠ͸ open set ͳͷͰ confusion set Λ࡞

    Δͷ͕೉͍͠ 1. ޡΓՕॴ͸ಉఆࡁΈͰɺީิબ୒͢ΔλεΫ ಉٛޠɾಉԻޠɾ฼ޠʹؔ͢Δݴ͍׵͔͑Βީ ิબ୒ (Dahlmeier and Ng, EMNLP 2011) 2. ޡΓՕॴ΋෼͔Βͳ͍λεΫ ݴޠֶशऀͷจষʹ͸ʢจ๏ɾҙຯతʹ͸ਖ਼͠ ͯ͘΋ʣ௿ස౓ޠؚ͕·ΕΔͷͰɺڞىͷΈʹ جͮ͘ख๏͸͏·͘ߦ͔ͳ͍ɻcf. appropriate concern vs proper concern | →ޙऀͷλεΫͰ͸ɺσʔλεύʔεωεΛղ ফ͢Δඞཁ͕͋Δ 6
  6. ӳޠֶशऀͷܗ༰ࢺ-໊ࢺ ޡΓͷΞϊςʔγϣϯ | จ຺ඇґଘʢOOC: out-of-contextʣͱจ຺ґ ଘʢIC: in-contextʣͷΞϊςʔγϣϯΛ۠ผɻ classic dance ͸จ຺ʹΑͬͯ͸

    OK ͕ͩɺ΄ ͱΜͲͷ৔߹ޡΓͱΈͳͯ͠΋Α͍ɻ { They performed a classic Ceilidh dance. { I have tried a rock’n’roll dance and a *classic/classical dance already. | จ຺Λແࢹ͢Δ͔Ͳ͏͔͸γεςϜ΍ΞϓϦ έʔγϣϯͰܾΊΕ͹Α͍ͷͰɺจ຺৘ใ͸༗ ༻ɻ 8
  7. CLC-FCE σʔληοτ ʹର͢ΔΞϊςʔγϣϯ | 61छྨͷؒҧ͑΍͍͢ܗ༰ࢺΛநग़ | 798छྨͷܗ༰ࢺ-໊ࢺޡΓ͕λά෇͚ʢઐ໳Ոʣ { correct/incorrect {

    Ͳ͕ؒ͜ҧ͍ͬͯΔ͔ʢܗ༰ࢺɾ໊ࢺɾ྆ํʣ { ޡΓͷछྨʢಉٛޠɾܗͷྨࣅɾͦΕҎ֎ʣ { ਖ਼ྫʢగਖ਼͢Δͱͨ͠৔߹ͷ݁Ռʣ 9 ※LB = lower bound; UB = upper bound Ұக཰ κ = 0.65 (OOC) ͔ͳΓ͍͚ͯΔ κ = 0.49 (IC) ·͊·͍͚͊ͯΔ
  8. ޡΓݕग़ͷͨΊͷҙຯϞσϧ ҙຯϞσϧ (Mitchell and Lapata, ACL 2008; Baroni and Zamparelli,

    EMNLP 2010) Mitchell and Lapata (2008) ͷϞσϧ͸ରশͳͷ Ͱɺܗ༰ࢺ-໊ࢺͷΑ͏ͳํ޲ੑ͕͋Δҙຯؔ܎ͷ Ϟσϧʹ͸ෆద→Baroni and Zamperelli (2010) ͷ ܗ༰ࢺಛԽઢܗϚοϓ | Ճ๏త (add: additive) Ϟσϧ pi = ui + vi | ৐๏త (mult: multiplicative) Ϟσϧ pi = ui * vi | ܗ༰ࢺಛԽઢܗϚοϓ (alm: adjective- specific linear maps) p = Bv 10
  9. ܗ༰ࢺಛԽઢܗϚοϓ p = Bv ͷڞىߦྻߏங | ໊ࢺ͸෼෍Ծઆʹجͮ͘ϕΫτϧɺܗ༰ࢺ͸໊ ࢺͷϕΫτϧΛมԽͤ͞ΔॏΈߦྻͰɺܗ༰ࢺ- ໊ࢺͷҙຯ߹੒͸ߦྻɾϕΫτϧͷ৐ࢉͰఆٛ 11

    1ສจ຺ཁૉʹίʔύεதͷ࠷සग़໊ࢺɾܗ༰ࢺɾಈࢺ ʢίʔύε͸BNCͰRASPʹΑͬͯղੳͯ͠༻͍ͨʣ 8,000 ໊ࢺ 4,000 ܗ༰ࢺ 64,000 ܗ༰ࢺ ໊ࢺϖΞ N A AN ߦྻͷཁૉ͸ local mutual informaiton N A A N SVDͰ࣍ݩѹॖͯ͠300࣍ݩʹ ߦྻͷॏΈ͸ܗ༰ࢺ͝ͱʹ ଟมྔPLSճؼͰֶश
  10. ҙຯʹجͮ͘ૉੑʢ1ʣ ઌߦݚڀͷ࠶࣮૷ 1. ϕΫτϧ௕ 2. ೖྗ໊ࢺʹର͢Δ cos ྨࣅ౓ 3. ೖྗܗ༰ࢺʹର͢Δ

    cos ྨࣅ౓ 4. ग़ྗʹର͢Δ10ۙ๣ʹ͓͚Δۙ๣ͷີ౓ 5. ೖྗʹର͢Δ10ۙ๣ʹ͓͚Δۙ๣ͷີ౓ 6. ۙ๣ͷϥϯΫ͖ͭີ౓ 7. ۙ๣ͷ਺ 8. ೖྗʹର͢Δ10ۙ๣ͷΦʔόʔϥοϓ 12
  11. ڞىख๏͸௿ස౓ޠʹऑ͍͕ɺ ҙຯϞσϧʴػցֶश͸ؤ݈ | ϕʔεϥΠϯ { λʔήοτͷ୯ޠʹର͢Δ WordNet ͷಉٛޠͱ ্Ґޠ͔ΒͳΔ confusion

    set ͷதͰɺݩͷ୯ޠ ͱൺ΂ͯ BNC ʹ͓͚Δڞىස౓ʢnormalized pmiʣ͕ߴ͍୯ޠ͕͋Ε͹ޡΓͩͱݕग़͢Δɻ | ఏҊख๏ { NLTK ͷܾఆ໦ɻૉੑ͸લܝͷҙຯૉੑʴ୯ޠɻ 15
  12. ࢀߟจݙʢҙຯϞσϧʣ | Mitchell and Lapata. Vector-based models in semantic composition.

    ACL 2008. | Baroni and Zamparelli. Nouns are vectors, adjectives are matrices: Representing adjective-noun construction in semantic space. EMNLP 2010. | Lazaridou et al. Fish transporters and miracle homes: How compositional distributional semantics can help NP parsing. EMNLP 2013. | Kochmar and Briscoe Capturing Anomalies in the Choice of Content Words in Compositional Distributional Semantic Space. RANLP 2013. 17
  13. ࢀߟจݙʢESL ޡΓగਖ਼ʣ | Rozovskaya and Roth. Algorithm Selection and Model

    Adaptation for ESL Correction Tasks. ACL 2011. | Yannakoudakis et al. A New Dataset and Method for Automatically Grading ESOL Texts. ACL 2011. | Dahlmeier and Ng. Correcting Semantic Collocation Errors with L1-induced Paraphrases. EMNLP 2011. 18