Slide 1

Slide 1 text

Large-Scale Syntactic Language Modeling with Treelets Adam Pauls and Dan Klein (ACL 2012) Presented by Mamoru Komachi At ୈ4ճ࠷ઌ୺NLPษڧձ 2012/08/31

Slide 2

Slide 2 text

N-gram LM ͸௕ڑ཭ͷ ґଘؔ܎Λѻ͑·ͤΜ |  NάϥϜݴޠϞσϧͷར఺ {  ࣮૷͕؆୯ {  ؤ݈ʹಈ࡞͢Δ {  େن໛Ͱ΋େৎ෉ |  NάϥϜݴޠϞσϧͷܽ఺ {  ௕ڑ཭ͷґଘؔ܎Λଊ͑ΒΕͳ͍ 2

Slide 3

Slide 3 text

εέʔϧ͠ɺ࣮૷΋؆୯ͳ ౷ޠతݴޠϞσϧΛఏҊ͠·͢ |  ੜ੒తݴޠϞσϧ {  ߏจ໦্ͷtreelet ʹ ৚݅෇͚ΒΕͨϞσϧ {  େن໛σʔλʹεέʔϧ ͢Δ {  NάϥϜݴޠϞσϧͱ ಉ͘͡Β͍࣮૷͕؆୯ |  ͨ͘͞ΜฒྻͰܭࢉ͠ͳ ͯ͘΋Α͍ʢ୯७͸ਖ਼ٛʣ 3

Slide 4

Slide 4 text

͍Ζ͍ΖͳλεΫɾઃఆͰɺ Treelet ݴޠϞσϧΛධՁ͠·͢ |  ઌߦݚڀͱͷൺֱ {  NάϥϜݴޠϞσϧ΍ଞͷ໦ߏ଄Λ༻͍ͨੜ੒త ͳ౷ޠతݴޠϞσϧΑΓੑೳ͕ߴ͍ {  ਖ਼ྫ͚͔ͩΒߏஙͯ͠΋ɺ֤λεΫʹಛԽͨࣝ͠ ผϞσϧͱಉ౳ͷੑೳ |  ͍Ζ͍ΖͳλεΫͰͷൺֱ {  ύʔϓϨΩγςΟ {  ٙࣅෛྫͱਖ਼ྫͷ෼ྨλεΫ {  ػց຋༁ͷग़ྗͱϦϑΝϨϯεͷ෼ྨλεΫ 4

Slide 5

Slide 5 text

2. Treelet ݴޠϞσϧ͸ ࠨ͔Βӈʹ໦Λੜ੒͠·͢ |  ໦ʹର͢Δ֬཰஋ͷׂ౰: T=constituency tree (e.g. r = P ˠ C1 …Cd P=parent symbol of rule r C=children h=ʢ͢Ͱʹੜ੒͞Εͨʣconditioning context PCFGͷͱ͖͸h=P (਌ϊʔυ)ͷΈʹ৚͚݅ͮ 5

Slide 6

Slide 6 text

ΈΜͳ͝ଘ஌NάϥϜ ݴޠϞσϧͷ͍͍ੑ࣭ |  NάϥϜݴޠϞσϧ͸ {  ؍ଌͨ͠NάϥϜස౓ʹج͍ͮͯ֬཰ΛׂΓ౰ͯ {  ؍ଌͨ͜͠ͱͷͳ͍NάϥϜ͸ΑΓখ͍͞จ຺ʹ όοΫΦϑ ͢Δ |  ͜͏͍ͬͨεϜʔδϯά͸ؤ݈Ͱ࣮૷΋؆୯ 6

Slide 7

Slide 7 text

Treelet ݴޠϞσϧ΋ಉ༷ εϜʔδϯάΛ͠·͢ |  ͲͷΑ͏ʹεϜʔδϯά ͢Δ͔ʁ ਌ʢPʣΛจ຺ʹ͢Δɻ ʢ਌Λੜ੒͢Δr’ʹՃ͑ʣ |  ґଘؔ܎Λߟྀ͢ΔͨΊจ຺͸௕͍͕ͨ͘͠ɺ σʔλ͔Β֬཰஋ਪఆͷͨΊʹ୹͘΋͍ͨ͠ 7

Slide 8

Slide 8 text

਌Λੜ੒͢ΔϧʔϧΛ จ຺ʹ͢Δ3ͭͷϝϦοτ |  Pͱͦͷ਌ͷP’ͷ྆ํΛߟྀʹೖΕΔͱɺP୯ମ ΑΓ༧ଌྗ͕ߴ͍ɻ (Johnson, 1998) |  ҐஔʹΑΔҧ͍ΛߟྀʹೖΕΒΕΔɻ E.g. ओޠͱ໨తޠͷ໊ࢺ۟͸ҧ͏෼෍ (Klein and Manning, 2003)ˠಈࢺ͔ΒΈ໊ͨࢺͷҐ ஔ͸͜ΕΒΛ۠ผ͢ΔΑ͍ࢦඪʹͳΔ |  ୯ޠͷੜ੒ͷͱ͖preterminal ͷ sibling ʹ৚ ͚݅ͮΔ͜ͱͰ͖Δɻˠಈࢺͷ֨ϑϨʔϜΛߟ ྀ͢Δ͜ͱ͕ՄೳʹͳΔ 8

Slide 9

Slide 9 text

2.1 Treelet ݴޠϞσϧ͸ ہॴతͳจ຺΋ߟྀ͠·͢ |  ཧ૝తʹ͸NάϥϜݴޠ ϞσϧΛτοϓμ΢ϯͷ PCFG ෩ʹஔ͖׵͍͑ͨ… ˠݱ࣮తʹ͸NάϥϜͷ ৘ใ͸༧ଌʹॏཁ |  Left-to-right จ຺Λߟྀ ͢ΔͨΊʹɺલͷ2୯ޠΛ จ຺ʹՃ͑Δ ˠίϩέʔγϣϯ΍ޠኮ తͳ૬ؔؔ܎Λଊ͑Δ ͜ͱ͕Ͱ͖Δ 9

Slide 10

Slide 10 text

2.2 ऴ୺ه߸ͱඇऴ୺ه߸ ͰόοΫΦϑΛ෼͚·͢ |  ඇऴ୺ه߸ˠ |  ऴ୺ه߸ɹˠ p(Cd 1 | P, ! P , ! r ) → p(Cd 1 | P, ! P ) → p(Cd 1 | P) λ p(C i |Ci−1 i−3 , P) i=1 d ∏ +(1− λ) p(C i |Ci−1 i−3 ) i=1 d ∏ 10

Slide 11

Slide 11 text

2.3 Treelet LM ͸4ͭͷ ֬཰෼෍͕ඞཁͳ͚ͩʂ |  NάϥϜݴޠϞσϧΛ࡞Δͷͱ ಉ͘͡ɺtreelet ͷස౓ΛΧ΢ ϯτͯ͠ӈͷ֬཰෼෍Λܭࢉɻ |  ස౓͸Ͳ͔͜Βܭࢉ͢Δʁ {  ਓखͰ࡞ͬͨPenn Treebank ίʔύε ˠ࣭͸ߴ͍͕αΠζ͕খ͍͞ {  ߏจղੳثΛ࢖ͬͯࣗಈతʹߏจ໦Λੜ੒ ˠΤϥʔ͸ؚ·ΕΔ͕ߏจ໦ࣗମʹڵຯ͕͋ΔΘ ͚Ͱ͸͘ɺੜ੒͞Εͨจʹڵຯ͕͋ΔͷͰ໰୊ ͳ͍ɻ p(C 1 d | P, ! P , ! r ) p(w | P, R, ! r ,w −1 ,w −2 ) p(C i |Ci−1 i−n+1 , P) p(C i |Ci−1 i−n+1 ) 11

Slide 12

Slide 12 text

3 ґଘؔ܎Λߟྀ͢Δ ͨΊͷ7ͭͷม׵ϧʔϧ 12 |  9ݸͷϧʔϧΛॱ൪ʹద༻͠ɺconstituency tree Λม׵͍ͯ͘͠

Slide 13

Slide 13 text

Temporal NPs: ໊࣌ؒࢺ۟ |  Klein and Manning (2003) ʹैͬͯ࣌ؒදݱʹ ҹΛ͚ͭΔ e.g. today ˠ NNT, months ˠ NNTS 13

Slide 14

Slide 14 text

Head Annotations: ओࣙͷΞϊςʔγϣϯ |  ओ͕ࣙ closed class ͷ୯ޠͷͱ͖ɺඇऴ୺ه߸ ͱલऴ୺ه߸ΛϚʔΫ e.g. VP-VB^S 14

Slide 15

Slide 15 text

NP Flattening: ໊ࢺ۟ͷฏୱԽ |  ฒྻɾಉ֨ʹͳ͍ͬͯͳ͍ࢠͲ΋໊ࢺ۟Λআ͖ɺ ଞͷ໊ࢺ۟ʹࢧ഑͞Ε͍ͯΔ໊ࢺ۟͸࡟আ e.g. લஔࢺ۟ʹम০͞Ε͍ͯΔ໊ࢺ۟ 15

Slide 16

Slide 16 text

Number Annotations: ਺ͷΞϊςʔγϣϯ |  ਺ࣈ͸ CD-YR, CD-NM, CD-DC, CD-MX, CD- AL ͷ5ͭͷΫϥεʹ෼ׂɻ E.g. CD-DC খ਺఺ΛؚΉ਺ࣈ 16

Slide 17

Slide 17 text

SBAR Flattening: SBAR ͷฏୱԽ |  SBAR ʹࢧ഑͞Ε͍ͯΔ S ϊʔυ͸࡟আ ˠओޠ΍໨తޠ͕ͳ͍৔߹ɺSBAR ௚Լͷ S ͸ ಛघͳ෼෍Λ͍ͯ͠Δ 17

Slide 18

Slide 18 text

VP Flattening: ಈࢺ۟ͷฏୱԽ |  VPΛ௚઀ࢧ഑͍ͯ͠ΔVP͸࡟আ e.g. will be going ˠ going ͷ VP ͷΈ࢒͢ 18

Slide 19

Slide 19 text

Gapped Sentence Annotation |  Collins (1999) ͱ Klein and Manning (2003) ʹ ै͍ɺempty subject Λ࣋ͭϊʔυΛϚʔΫɻ ˠࣗಈղੳΛ͢ΔͷͰͦ͏͍͏ͷ͕ग़ͯ͘Δ 19

Slide 20

Slide 20 text

Parent Annotation: ਌ͷΞϊςʔγϣϯ |  ಈࢺ۟͸਌ͷγϯϘϧͰΞϊςʔγϣϯɻ ˠgrandparent Ͱ৚͚݅ͮΔ͜ͱ͕Ͱ͖Δɻ SBAR௚ԼͷVP͸໨తޠ͕ͳ͍͜ͱ͕Α͋͘Δ 20

Slide 21

Slide 21 text

Unary Deletion: ୯߲ϧʔϧͷ࡟আ |  ϧʔτͱલऴ୺ه߸ͷੜ੒نଇҎ֎ͷ߲Λ1ͭ͠ ͔࣋ͨͳ͍ϧʔϧ͸࡟আ ˠ΄ͱΜͲͷ୯߲ϧʔϧ͸अຐ 21

Slide 22

Slide 22 text

ܭࢉྔ͸େ͖͍Ͱ͕͢ɺ ࣮༻্͸໰୊͋Γ·ͤΜ |  L୯ޠ͔ΒͳΔจͷ֬཰Λܭࢉ͢Δʹ͸શͯͷՄ ೳͳߏจ໦ʹ͍ͭͯ଍͠ࠐΉඞཁ {  PCFG ͰఆࣜԽ͢Δͱ O(L^3) {  ݱ࣮తʹ͸σίʔμʹ૊ΈࠐΈɺpruning ͢Δɻ {  ˠຊ࿦จͷର৅֎͕ͩɺ೉͘͠ͳ͍ |  ࠓճͷ࣮ݧͰ͸طଘͷߏจղੳثΛ༻͍ɺ1000- best ߏจ໦Ͱ֬཰஋Λܭࢉ {  1-best Ͱ΋λεΫతʹ͸ਫ਼౓͸มΘΒͳ͍͕ɺ ύʔϓϨΩγςΟΛաେධՁͯ͠͠·͏ɻ {  ϘτϧωοΫ͸ߏจղੳثͷॲཧ࣌ؒ 22

Slide 23

Slide 23 text

15-20୯ޠͰੜ੒ͯ͠Έͨ จΛൺ΂͍ͯͩ͘͞ ੜ੒ϞσϧͳͷͰ͜ͷΑ͏ʹจΛੜ੒Ͱ͖Δ 23

Slide 24

Slide 24 text

ύʔϓϨΩγςΟ΋طଘͷ ੜ੒ϞσϧΑΓ௿͍Ͱ͢ |  WSJ ͷηΫγϣϯ0ͰධՁ Treelet=ఏҊख๏ Treelet-Trans=ఏҊ͢Δม׵نଇΛ ద༻ͨ͋͠ͱͷ໦Ͱ PCFG Treelet-Rule=Treelet ͔Β ޠኮʹؔ͢Δจ຺Λআ͍ͨ΋ͷ 5-gram=KNεϜʔδϯάͨ͠5άϥϜݴޠϞσϧ PCFG-LA=ݴޠϞσϧϞʔυʹͨ͠Berkeley Parser HeadLex=(Collins, 1999) ͷϞσϧ1ͱಉ༷ͷओࣙޠኮԽख๏ 24

Slide 25

Slide 25 text

ࣝผϞσϧͱൺֱͯ͠΋ Treelet LM ͸ߴ͍ੑೳͰ͢ |  Trigram ͔Β࡞੒ͨٙ͠ࣅෛྫ (Okanohara and Tsujii, 2007) ͱਖ਼͍͠จͷ෼ྨλεΫ BLLIP=(Post, 2011) ͷίʔύε 1B=PTB+BLLIP+Gigaword LSVM=Latent SVM (Cherry and Quirk, 2008) TSG=Tree Substitution Grammar Rerank=Reranking features from (Charniak and Johnson, 2005) Treelet-Rule ͷ΄͏͕͍͍ͷ͸ ٙࣅෛྫ͕3-gram͔Βੜ੒͞Ε͍ͯΔ͔Βʁ 25

Slide 26

Slide 26 text

ػց຋༁Ͱ΋Treelet ݴޠϞσϧ͸༗ޮͰ͢ |  Moses (French to English, Germean to English) ͱ Joshua (Chinese to English) Ͱग़ྗ ͨ͠ӳจͱɺϦϑΝϨϯεจΛ෼ྨ͢ΔλεΫ ݴޠϞσϧ͸1Bίʔύεͱର༁ίʔύεͷӳޠ ଆͰτϨʔχϯά 26

Slide 27

Slide 27 text

·ͱΊ |  ୯७ͳ౷ޠతݴޠϞσϧΛఏҊͨ͠ |  େن໛σʔλΛ࢖ͬͯطଘͷNάϥϜʹର͢Δε Ϝʔδϯάख๏ͰਪఆՄೳ |  ͍ΖΜͳλεΫͰଞͷੜ੒ϞσϧͷݴޠϞσϧ ΑΓΑ͍ੑೳɺࣝผϞσϧͷख๏ͱಉఔ౓ͷੑ ೳ |  ࣮૷΋؆୯ʂ 27

Slide 28

Slide 28 text

࣭໰ͷ࣌ؒʢ̍ʣ |  ਅ౻͞Μͷ symbol refinement ͷ΄͏͕ཧ࿦త ʹ΋͖Ε͍ͩ͠ਓखͰࠇຐज़తͳ΋ͷΛ࡞Βͳ ͯ͘Α͍͕ɺͦΕͱͷҧ͍͸ʁʢ࣋ڮʣ ˠ࣮૷͕؆୯ɺେن໛ʹεέʔϧ͢Δʢখொʣ |  ͡Ό͋ͳΜͰߴ଎ʁʢ࣋ڮʣ ˠߏจղੳͷ͕࣌ؒݴޠϞσϧߏஙʹೖ͍ͬͯ ͳ͍͔ΒͰ͸ʁʢদݪʣ 28

Slide 29

Slide 29 text

࣭໰ͷ࣌ؒʢ̎ʣ |  ୯߲ϧʔϧʹ͸ॿಈࢺʹؔ͢Δҙຯͷ͋Δϧʔ ϧ΋͋ΔͷͰ͸ͳ͍͔ͱࢥ͏͕ɺऔΓআ͍ͯ͠ ·ͬͯΑ͍ͷ͔ʁʢ૬ᖒʣ ˠશ͕ͯҙຯ͕ͳ͍ͱ͸ॻ͍͍ͯͳ͍͕ɺڪΒ ͘ύʔϓϨΩγςΟͳͲͰධՁͯ͠վળ͍ͯ͠Δ ͔Βಋೖ͞Εͨϧʔϧͩͱࢥ͏ʢখொʣ 29

Slide 30

Slide 30 text

࣭໰ͷ࣌ؒʢ̏ʣ |  Treelet ͷܭࢉͷͱ͖ derivation ͕1ͭʹܾ·Β ͳ͍৔߹͕͋Δͱࢥ͏͕ɺͲ͏͍ͯ͠Δ͔ʁ ʢ࣋ڮʣ ˠશ֬཰ΛٻΊΔʹ͸inside outside ͱ͔࢖͏ ͷͰ͸ͳ͍͔ͱࢥ͏͕ɺ໌ࣔతʹॻ͍ͯ͋ͬͨ هԱ͕ͳ͍ɻ࣮ݧͰ͸1,000ϕετͷղੳ໦Λ ࢖ͬͯස౓Λܭࢉ͍ͯͨ͠ɻʢখொʣ 30