Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go Climb a Dependency Tree and Correct the Grammatical Errors

Go Climb a Dependency Tree and Correct the Grammatical Errors

EMNLP 2014 読み会@首都大学東京で紹介した

Longkai Zhang and Houfeng Wang, Go Climb a Dependency Tree and Correct the Grammatical Errors, EMNLP 2014

のスライドです。

Mamoru Komachi

December 03, 2014
Tweet

More Decks by Mamoru Komachi

Other Decks in Research

Transcript

  1. Go Climb a Dependency Tree and Correct the Grammatical Errors

    Longkai Zhang and Houfeng Wang, EMNLP 2014 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक [email protected] EMNLP 2014 ಡΈձ@ट౎େֶ౦ژ 2014/12/04
  2. ӳޠֶशऀͷจ๏ޡΓగਖ਼ख๏ ͸֤Ϟσϧ͕ಠཱ͍ͯ͠Δ | CoNLL-2013 shared task { લஔࢺɾݶఆࢺɾಈࢺͷܗɾओޠಈࢺͷҰகɾ ໊ࢺͷ୯ෳͱ͍͏5ͭͷޡΓΛର৅ {

    NUCLEίʔύεʢγϯΨϙʔϧࠃཱେ͕࡞ͬͨ ӳޠֶशऀίʔύεʣ { m2scorerʢγϯΨϙʔϧࠃཱେͰ։ൃ͞Εͨ precision, recall, F஋Λग़͢είΞϥʣ | ୯ޠ͝ͱʹҟͳΔ౷ܭϞσϧͰޡΓగਖ਼Λߦͳ ͍ɺେҬతͳ૬ޓ࡞༻͸ߟྀ͍ͯ͠ͳ͍ →࣮ࡍ͸ޡΓಉ͕࢜૬͍ؔͯ͠Δ͜ͱ΋͋Δ 2
  3. େҬతͳ૬ޓ࡞༻Λߟྀ͢Δ Ϟσϧ͸ܭࢉྔ͕େ͖͍ | ޡΓΛؚΜͩϊΠδʔͳจ຺Λ༻͍ͯగਖ਼ { ߴ࣍ͷܥྻϥϕϦϯά (Gamon, 2011) { ϊΠδʔνϟωϧϞσϧʢPark

    and Levy, 2011) { ϏʔϜ୳ࡧ (Dahlmeier and Ng, 2012) | ੔਺ઢܗܭը๏ʢILPʣʹجͮ͘େҬత࠷దԽ { Wu and Ng (2013) ͦΕͧΕͷλΠϓͷޡΓʹ৴པ౓Λઃ͚ɺେҬ తʹ࠷దԽ { Rozovskaya and Roth (2013) ୯ޠͰ͸ͳ۟͘ߏ଄ͷగਖ਼ →࠷ѱ࣌ʹ͸ࢦ਺࣌ؒͷܭࢉྔ͕͔͔Δ 3
  4. that boy is on the ... ΛݟΔͱ ਖ਼ͦ͠͏͕࣮ͩ͸ओޠ͸ books 4

    (a) “The books of that boy is on the desk .” (b) lThe books of that boy are on the desk .” ʹର͢Δґଘߏ଄໦ ʢਤ1ʣ →ہॴతͳ৘ใ͔͠ݟͳ͍෼ྨثͰ͸గਖ਼Ͱ͖ͳ͍
  5. TreeNode ݴޠϞσϧ (TNLM) ͸จ୯ҐͷޡΓΛޮ཰తʹగਖ਼ ఏҊख๏: 2छྨͷϞσϧͷ૊Έ߹Θͤʢಛʹલऀʣ | general model {

    ಈࢺͷܗɾ໊ࢺͷ୯ෳɾओޠಈࢺͷҰகޡΓ { ґଘߏ଄ʹجͮ͘ݴޠϞσϧʹΑͬͯޡΓగਖ਼ →௕ڑ཭ͷґଘؔ܎΋௚઀ϞσϧԽ { จશମΛݟͯେҬతʹޡΓશͯΛߟྀ͢Δ͕ɺ ޮ཰తʢଟ߲ࣜ࣌ؒʣʹగਖ਼Մೳ | special model { ݶఆࢺɾલஔࢺͷޡΓ { େن໛ੜίʔύε͔Β࠷େΤϯτϩϐʔ๏ʹΑ Δ෼ྨثʹΑͬͯڭࢣ͋ΓֶशͰޡΓగਖ਼ 5
  6. General model ͸จΛґଘߏ ଄໦ͱͯ͠ѻ͍είΞϦϯά͢ Δ | CFGʹجͮ͘ݴޠϞσϧͷείΞؔ਺ L ݸͷੜ੒نଇ r

    ͷείΞͷੵͰείΞϦϯά ʢ͜͜Ͱ͸ґଘߏ଄จ๏Λߟ͑Δʣ ※֬཰Ҏ֎ͷ஋΋ѻ͑ΔΑ͏ɺ P(ri ) Ͱ͸ͳ͘ҰൠతͳείΞؔ਺ ͱ͍ͯ͠Δ 6 score(s) = score i=0 L ∏ (r i )
  7. TreeNode LM ʹΑΔ෦෼୯ޠ ྻΛ༻͍ͨจͷείΞϦϯά | “The car of my parents

    is damaged by the storm.” ʹର ͢Δ෦෼୯ޠྻʢਤ2ɺද3ʣ | ґଘߏ଄໦ n ʹର͠ɺKݸͷम ০෦Λ C1 , ..., CK ͱ͍͏ࢠϊʔ υͱͯ͠΋ͭͱ͖ɺSeq(n) = [C1 , ..., n, ..., CK ] ͱ͍͏୯ޠྻ ʹର͠ɺݴޠϞσϧʢtrigramʣ ͰείΞΛ͚ͭΔɻ 7
  8. TNLM͸గਖ਼ީิΛࣗಈੜ੒͠ ͯݴޠϞσϧʹΑΓީิΛબ୒ | TNLM ͷσίʔσΟϯά ci,j : ࢠϊʔυ Ci ͷ

    j ൪໨ͷగਖ਼ީิ | Viterbi Ͱޮ཰తʹ୳ࡧՄೳ | గਖ਼͢Δલͷ୯ޠͷॏΈΛ૿΍͢ύϥϝʔλͷ ಋೖʢdevset Ͱνϡʔχϯάʣ 9 score(seq) = TNLM(seq) C i . scores i=1 K ∏ [ j i ] seq =[c 1, j1 ,...,n i ,...,c K, jK ] n. scores[i]= maxscore(seq)
  9. Yoshimoto et al. (2013) ͱ ൺ΂ͯ TNLM ͸ޮ཰͕Α͍ | Yoshimoto

    et al. (2013) { ݴޠϞσϧ: Treelet language model (Pauls and Klein, 2012) →CoNLL 2013 shared task Ͱͷ݁Ռ͸ඍົɻ { Treelet language model ͸ෳࡶͳจ຺ʹجͮ͘ ΋ͷͰɺσʔλεύʔεωεͷӨڹ͕ਂࠁ | TreeNode Language ModelʢఏҊख๏ʣ { ༗༻ͳจ຺͔͠ߟྀ͠ͳ͍ͷͰσʔλεύʔε ωεͷ໰୊Λճආ { ܇࿅͢Δͷ͸ී௨ͷݴޠϞσϧΛ܇࿅͢Δͷͱ ಉఔ౓ͷ͔͔͔࣌ؒ͠Βͳ͍ 10
  10. general model ͷ͋ͱ special model ΛύΠϓϥΠϯͰ͔͚Δ | ݶఆࢺʢͱ͍ͬͯ΋ a/the/NONE ͷΈ͕ީิʣ

    ͱલஔࢺޡΓʢͱ͍ͬͯ΋ in/for/to/of/on ͷΈ ͕ީิʣΛ୲౰→ஔ׵ޡΓʴ࡟আޡΓ | ݶΒΕͨ෼ྔ͔͠ͳ͍ NUCLE ίʔύεͰ͸ͳ ͘ɺparsed Gigaword ίʔύεΛ༻ֶ͍ͯश | ڭࢣ͋ΓֶशʢNaive BayesɺฏۉԽύʔηϓ τϩϯɺSVMͱ࠷େΤϯτϩϐʔ๏Λൺֱͯ͠ɺ Ұ൪ਫ਼౓͕Α͔ͬͨMEʹͨ͠ʣ { ࢖ͬͨૉੑςϯϓϨʔτ͸ຊจද5,6Λࢀর | general model Ͱ͖Ε͍ʹͳͬͨೖྗΛѻ͑Δ 11
  11. ࣮ݧ݁Ռ: TNLM ͸ී௨ͷݴޠϞ σϧΑΓޡΓగਖ਼ޮՌ͕ߴ͍ 12 ද7: TNLM ͕ general model

    ૬౰ɻ+Detͱ+Prep͸ͦΕͧΕ special modelsɻ+Det+Prep͕࠷ऴγεςϜɻ ද8: TNLM ͱී௨ͷtrigramͷൺֱɻ܇࿅ɾςετσʔλ͸ ͦΕͧΕಉ͡ɻ →ී௨ͷݴޠϞσϧ͸trigramͰ͸ͳ͍͕……
  12. ࣮ݧ݁Ռ: TNLM ͕ CoNLL-2013 σʔλͰݱࡏੈք࠷ߴਫ਼౓ 13 ද11: CoNLL-2013 ͷಉ͡σʔλΛ࢖ͬͨݚڀͱͷൺֱɻ ఏҊख๏͕

    state-of-the-artɻ →Yoshimoto et al. (2013) ͸ Treelet LM ͚ͩͷ݁ՌͳͷͰɺ ԼͷදͰݴ͏ͱී௨ͷݴޠϞσϧͱTNLMͷؒʁ
  13. ·ͱΊ γϯϓϧͳ 5SFF/PEF ݴޠϞσϧ͸ޡΓగਖ਼ʹޮՌత | ґଘߏ଄໦ʹجͮ͘ݴޠϞσϧ TreeNode Language Model ΛఏҊͨ͠ɻ

    { ෆཁͳจ຺Λߟྀ͠ͳ͍ͷͰؤ݈ɻ { େҬతͳෳ਺ͷޡΓͷ૬ޓ࡞༻ΛߟྀͰ͖Δɻ { ܇࿅΋طଘͷݴޠϞσϧͱಉ౳Ͱɺσίʔυ΋ ViterbiΞϧΰϦζϜͰଟ߲ࣜ࣌ؒɻ { طଘͷݴޠϞσϧͱޓ׵ੑ͕͋Γɺ࣮૷͕؆୯ɻ | TreeNode Language Model Λ༻͍ͯ CoNLL- 2013 ӳޠֶशऀจ๏ޡΓగਖ਼ڞ௨λεΫͰ state-of-the-art ͷੑೳΛୡ੒ͨ͠ɻ 14