Upgrade to Pro — share decks privately, control downloads, hide ads and more …

最先端NLP論文紹介:Revisiting the Uniform Information Density Hypothesis (EMNLP2021). Linguistic Dependencies and Statistical Dependence (EMNLP2021).

最先端NLP論文紹介:Revisiting the Uniform Information Density Hypothesis (EMNLP2021). Linguistic Dependencies and Statistical Dependence (EMNLP2021).

Revisiting the Uniform Information Density Hypothesis (EMNLP2021)
Linguistic Dependencies and Statistical Dependence (EMNLP2021)

tatsuki kuribayashi

October 07, 2022
Tweet

More Decks by tatsuki kuribayashi

Other Decks in Research

Transcript

  1. Revisiting the Uniform Information Density Hypothesis Clara Meister, Tiago Pimentel,

    Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy (EMNLP2021) Linguistic Dependencies and Statistical Dependence Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell (EMNLP 2021) ౦๺େֶ ܀ྛथੜ 2022/9/26 ࠷ઌ୺NLP2022  ࠷ઌ୺NLPษڧձ 2022 ࿦จ঺հ ૝ఆಡऀɿ ʮαϓϥΠβϧͱಡΈ࣌ؒͱ͔ฉ͍ͨ͜ͱ͕͋Δ͕ɺഎޙʹԿ͕͋Δͷ͔Α͘෼͔͍ͬͯͳ͍ʯ ʮݴޠϞσϧͱਓؒͷݴޠॲཧͷؔ܎ੑʹ͍ͭͯയવͱؾʹͳΔʯ ʮจষͷಡΈ΍͢͞ΛଌΓ͍ͨʯ
  2. دΓಓ͠·͢ʂ 2022/9/26 ࠷ઌ୺NLP2022  ֬཰ͱॲཧෛՙ ೝ஌ϞσϦϯά ৘ใີ౓Ұ༷ੑԾઆ ݴޠֶ ૬ޓ৘ใྔͱ܎Γड͚ 1ຊ໨

    2ຊ໨ ࣗવݴޠॲཧ ৘ใཧ࿦తྔͷܭࢉػ (ݴޠϞσϧ) ͷఏڙ دΓಓ ܭࢉ৺ཧݴޠֶ ೝ஌ݴޠֶɺ ʢ͍ΘΏΔʣ ৺ཧݴޠֶͱ͸ ҟͳΔ ৘ใཧ࿦
  3. l ਓؒͷݴޠॲཧ͕஌Γ͍ͨ - ௚઀؍࡯ෆՄೳ l ܥྻతͳ (ஞ࣍తͳɾ౷ޠతͳ) ॲཧͰੜ͡Δෛՙͷઆ໌Λߟ͑Δ - ಡΈ࣌ؒͷܭଌɺ༰ೝੑ൑அͳͲΛ௨ͯ͠ෛՙΛ؍࡯

    l Ұൠతͳݪཧɾݪଇ͔Βԋ៷తʹਐΊΔ - c.f. ݴޠʹಛ༗ͷϝΧχζϜ͔ΒϘτϜΞοϓʹ࿦ΛਐΊΔ ਓؒͷஞ࣍తͳจॲཧ 2022/9/26 ࠷ઌ୺NLP2022  I ate a pineapple . ҰޠͣͭಡΜͰ͍Δ࣌ͷॲཧෛՙ I ’m studying pineapple . 😳 🙂
  4. Ұ୴৴͡Δݪཧɾݪଇ 1. ॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2. ͋Δཁૉͷॲཧෛՙ͸ͦͷߏ੒ཁૉͷॲཧෛՙͷ࿨ͰදͤΔ 3. ͳΔ΂͘খ͞ͳॲཧෛՙ͕ੜ͡ΔΑ͏ίϛϡχέʔγϣϯ͢Δ 2022/9/26

    ࠷ઌ୺NLP2022  cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) cost(𝒘#:% ) = - ! cost 𝑤! 𝒘"! I ’m studying pineapple . I ’m studying pineapple . 𝑓: ֬཰ ↦ ॲཧෛՙ ෆ੔߹ྫɿwarp-up effect (અ຤ Ͱઅશମͷ࠶ղऍΛ͢ΔԾઆ) ࣮ݧతͳূڌɺ೴ as ༧ଌػցͷΑ͏ͳݟํ (༧ଌූ߸ԽɺϕΠζ೴ɺࣗ༝ΤωϧΪʔݪཧͳͲͷํ޲) ฉ͖खʹͱͬͯͷ ฉ͖खʹͱͬͯͷ ࿩͠ख͸ฉ͖खʹ ͜ͷ3͔ͭΒݴޠʹ͍༷ͭͯʑͳಎ࡯Λಋ͘ ڠௐతͰ͋Δ Ұ୴ɺશମ࿦ͱܾผ…
  5. ໰͍: f ͸۩ମతʹʁ 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2022/9/26 ࠷ઌ୺NLP2022 

    cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ [Smith&Levy,2013]
  6. 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) ໰͍:

    f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 2022/9/26 ࠷ઌ୺NLP2022  −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ [Smith&Levy,2013] cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013]
  7. 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) ໰͍:

    f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 2022/9/26 ࠷ઌ୺NLP2022  −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ 💡 ਓؒͷॲཧ͕௒ஞ࣍త (super-incremental) Ͱ͋Δͱ͖ f ͸ର਺ؔ਺ʹۙࣅ͞ΕΔ 𝑓 = −𝑥 𝑓 = −𝑥! − 𝑥" cost 𝑤# |𝑤$# = + % 𝑓(𝑝 𝑐% 𝑐$% ) −log 𝑝(𝑤!|𝑤"!) −log 𝑝(𝑤!|𝑤"!) −log 𝑝(𝑤! |𝑤"! ) k=20 k=20 k=20 𝑓 = 1 𝑥 𝑝 𝑤!|𝒘"! = 𝑝 subword# ×𝑝 subword$ subword# … 𝑝 subword% subword"% f(p) ͕p=1෇ۙͰઢܗ (ۃ࿦ɺඍ෼Մೳ) খ࿩ᶃ [Smith&Levy,2013] [Smith&Levy,2013] f ͷܗͷٞ࿦͕ॲཧͷ࿈ଓੑͷٞ࿦ʹஔ͖׵ΘΔʂ ҎԼΛຬͨ͢ͱ͖ɺk͕େ͖͚Ε͹ɺ͍͔ͳΔ f ΋ର਺ؔ਺ʹۙࣅ͞ΕΔɻ cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) Ծఆ2ΑΓ ෛՙ͸࿨ [ৄࡉ] ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013] ※෼͔Γ΍͘͢subwordͱॻ͍͕ͨɺ࣮ࡍ͸จࣈ΍Իૉɺඍখ࣌ؒ୯ҐͷԻͳͲͷ૝ఆ lim %→' min ! 𝑝 subword! subword"! = 1 ※kΛ૿΍ͨ͠ͱ͖΋ͬͱ΋ڻ͘෦෼͕ڻ͔ͳ͘ͳ͍ͬͯ͘
  8. ໰͍: f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2022/9/26

    ࠷ઌ୺NLP2022  −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ − log 𝑝 𝑤! 𝒘"! = 𝐾𝐿(𝑃! (𝑇)||𝑃!#$ (𝑇)) 💡 ୯ޠͷαϓϥΠβϧ −log 𝑝(𝑤:|𝒘;:)͸ ʮ୯ޠΛಡΜͩ࣌ʹՄೳͳߏ଄͕ͲΕ͚ͩߜΒΕ͔ͨʯ ͱ౳Ձ ֤ߏ଄ͷಋग़֬཰ 𝑤! ͱ੔߹͠ͳ͍ಋग़ ͸֬཰0ʹͳ͍ͬͯ͘ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) [Smith&Levy,2013] খ࿩ᶄ [Levy, 2007] ߏ଄Tͷੑ࣭͸໰Θͳ͍ [Hale, 2016] cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013]
  9. −log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) ࠷ઌ୺NLP2022

     ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ [Smith&Levy,2013]
  10. −log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) 2022/9/26

    ࠷ઌ୺NLP2022  ৘ใ − log 𝑝(𝑠) (ϝοηʔδ) Λ௕͞ l ͷܥྻ 𝑠 = 𝑤$ , 𝑤& , … , 𝑤' ʹ෼഑ͯ͠఻ୡ −log𝑝(𝑠) = −+ # log𝑝(𝑤# |𝒘$# ) 💡 Ծઆ2ͷԼͰɺॲཧෛՙͷ૯࿨͕࠷খʹͳΔઓུ: ֤୯ޠ 𝑤! ʹ౳͍͠৘ใྔΛ఻ୡͤ͞Δ cost = − > !($ ' [log 𝑝(𝑤! |𝒘"! )]% ≥ −𝑙[> !($ ' log 𝑝(𝑤! |𝒘"! ) 𝑙 ]% 💡 Ծఆ1-3ͷԼͰ͸ɺԾઆ2͸৘ใີ౓Ұ༷ੑԾઆ (UID) Λࢧ࣋ ΠΣϯηϯͷෆ౳ࣜΛར༻ ৘ใྔ͕Ұ༷ʹ෼෍͢Δͱ͖౳߸੒ཱ (Լք) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ logͷ֎ଆʹႈؔ਺ΛԾఆ͕ͨ͠ɺ[0, ∞)ͰԼʹತͰ͋Ε͹ྑ͍ sjdia ay asd dfi sdsoiuf sidu fsio ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ ॏཁ −log 𝑝(𝑠) [Smith&Levy,2013] ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) [Levy&Jaeger, 2006] ୯Ґ࣌ؒɾγϯϘϧ͋ͨΓʹ఻ୡ͞ΕΔ৘ใྔ͕ Ұ༷ʹͳΔΑ͏ͳίϛϡχέʔγϣϯΛ޷Ή [ಋग़] Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ
  11. −log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) 2022/9/26

    ࠷ઌ୺NLP2022  ৘ใ − log 𝑝(𝑠) (ϝοηʔδ) Λ௕͞ l ͷܥྻ 𝑠 = 𝑤$ , 𝑤& , … , 𝑤' ʹ෼഑ͯ͠఻ୡ −log𝑝(𝑠) = −+ # log𝑝(𝑤# |𝒘$# ) 💡 Ծઆ2ͷԼͰɺॲཧෛՙͷ૯࿨͕࠷খʹͳΔઓུ: ֤୯ޠ 𝑤! ʹ౳͍͠৘ใྔΛ఻ୡͤ͞Δ cost = − > !($ ' [log 𝑝(𝑤! |𝒘"! )]% ≥ −𝑙[> !($ ' log 𝑝(𝑤! |𝒘"! ) 𝑙 ]% 💡 Ծఆ1-3ͷԼͰ͸ɺԾઆ2͸৘ใີ౓Ұ༷ੑԾઆ (UID) Λࢧ࣋ ΠΣϯηϯͷෆ౳ࣜΛར༻ ৘ใྔ͕Ұ༷ʹ෼෍͢Δͱ͖౳߸੒ཱ (Լք) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ logͷ֎ଆʹႈؔ਺ΛԾఆ͕ͨ͠ɺ[0, ∞)ͰԼʹತͰ͋Ε͹ྑ͍ sjdia ay asd dfi sdsoiuf sidu fsio ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ ॏཁ −log 𝑝(𝑠) [Smith&Levy,2013] ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) [Levy&Jaeger, 2006] ୯Ґ࣌ؒɾγϯϘϧ͋ͨΓʹ఻ୡ͞ΕΔ৘ใྔ͕ Ұ༷ʹͳΔΑ͏ͳίϛϡχέʔγϣϯΛ޷Ή [ಋग़] Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ UIDʹ͍ͭͯΤϯτϩϐʔͰٞ࿦͢Δݚڀ [Genzel&Chaniak, 2002] ΋͋Δ ͕ɺ͜͜Ͱ͸αϓϥΠβϧͰߟ͍͑ͯΔ (ৄ͍͠ਓ޲͚) ͜͜Ͱ͸ɺࣗવݴޠʹஔ͍ͯ৘ใྔ΍ΤϯτϩϐʔϨʔτ͕Ұ༷Ͱ͋ Δ͜ͱΛओு͍ͨ͠Θ͚Ͱ͸ͳ͘ɺҰ༷ʹۙͮ͘΄Ͳฉ͖खͷ ॲཧෛՙ͕ݮΔͷͰ͸ͱ͍͏૬ରతͳॲཧෛՙͷٞ࿦Λ͍ͯ͠Δ
  12. Clara+’21: Revisiting the Uniform Information Density Hypothesis l f Λۂ͛ͨํ͕

    (k >1) ɺಡΈ࣌ؒɾ༰ೝੑ൑அΛ͏·͘આ໌: Ծఆ1-3ͷ΋ͱͰUIDΛऑ͘ࢧ࣋ (ஶऀΒ΋ڧ͍ओு͸͍ͯ͠ͳ͍) 2022/9/26 ࠷ઌ୺NLP2022  L αϓϥΠβϧ Ծઆ acceptability(s) ~ > ! cost 𝑤! 𝒘"! = > ! −[log 𝑝(𝑤! |𝒘"! )]% จͷ߹ܭಡΈ࣌ؒͱ༰ೝੑ൑அσʔλΛར༻ k=1ͱൺ΂ͯ༰ೝੑ൑அͰ͸༗ҙࠩ͋Γ (p<0.001 w/Bonferroni’s correction)ɺಡΈ࣌ؒͰ͸ແ͠ [Smith&Levy,2013] [Clara+,2021]
  13. Clara+’21: Revisiting the Uniform Information Density Hypothesis l ৘ใີ౓Ұ༷ੑԾઆ (UID)

    ͷఆࣜԽͷൺֱ 2022/9/26 ࠷ઌ୺NLP2022  sjdia ay asd dfi sdsoiuf sidu fsio ৘ใ (৘ใཧ࿦ΑΓ) ௨৴࿏༰ྔʹ͍ۙҰఆసૹ཰Ͱίϛϡχέʔγϣϯ͢Δ͜ͱͰɺޮ཰తͳ௨৴͕Մೳɻ γϯϘϧ͝ͱͷ৘ใྔʹ͹Β͖͕ͭ͋Δͱɺཧ૝తͳ௨৴͔Βҳ୤͠ɺॲཧෛՙ্͕͕Δɻ ϊΠζ UIDͷԦಓతͳઆ໌
  14. Clara+’21: Revisiting the Uniform Information Density Hypothesis l ৘ใີ౓Ұ༷ੑԾઆ (UID)

    ͷఆࣜԽͷൺֱ 2022/9/26 ࠷ઌ୺NLP2022  cost 𝑠 = C !-# . (− log 𝑝 𝑤!|𝒘"! − 𝜇)$ cost 𝑠 = C !-# . (− log 𝑝 𝑤! |𝒘"! − log 𝑝 𝑤!/# |𝒘"!/# )$ ͹Β͖ͭͷఆࣜԽ͸༷ʑɿ ᶃ ෼ࢄ v.s. ہॴతͳ෼ࢄ ᶄ ෼ࢄͷ৔߹ɺฏۉ (ཧ૝తͳ௨৴ྔ) ͸Ͳ͏ఆΊΔʁ ݴޠͷฏۉʁจষͷฏۉʁจͷฏۉʁہॴతͳฏۉʁ [Clara+,2021] sjdia ay asd dfi sdsoiuf sidu fsio ৘ใ (৘ใཧ࿦ΑΓ) ௨৴࿏༰ྔʹ͍ۙҰఆసૹ཰Ͱίϛϡχέʔγϣϯ͢Δ͜ͱͰɺޮ཰తͳ௨৴͕Մೳɻ γϯϘϧ͝ͱͷ৘ใྔʹ͹Β͖͕ͭ͋Δͱɺཧ૝తͳ௨৴͔Βҳ୤͠ɺॲཧෛՙ্͕͕Δɻ ϊΠζ UIDͷԦಓతͳઆ໌
  15. ݁Ռ࠶ܝ l ৭ʑൺ΂ͨ΋ͷͷɺαϓϥΠβϧͷႈ৐ (k=1-1.5) ʹউΔ΋ͷ͸ͳ͍ l ίʔύεશମʹ͓͚Δॲཧෛՙฏۉ͔Βͷဃ཭ΛଌΔͱྑ͍ 2022/9/26 ࠷ઌ୺NLP2022 

    cost 𝑠 = > !($ ' (− log 𝑝 𝑤! |𝒘"! − 𝜇)& ෼ࢄͰଌΔ৔߹͸ sjdia ay asd dfi sdsoiuf 𝜇 ίʔύε จͳͲɺΑΓ খ͍͞୯Ґ [Clara+,2021] [Clara+,2021] 🤔 ಄ͷதͰ͸ݴޠ as ఆৗաఔΛ๬ΜͰ͍Δʁ ෛՙ
  16. l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022  − log

    𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ จ຺ʹΑΔڻ͖ͷܰݮ ୯ޠ୯ମͰͷڻ͖
  17. l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022  − log

    𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ จ຺ʹΑΔڻ͖ͷܰݮ ୯ޠ୯ମͰͷڻ͖ ৚݅෇͖PMIͷߴ͍ޠϖΞ ౷ޠߏ଄ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… [Hoover+,2021]
  18. l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022  − log

    𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) = − log 𝑝 𝑤! − - &'# !(# pmi 𝑤! ; 𝑤& − - &'# !(# pmi 𝑤! ; 𝑤& ; 𝒘"& ୯ޠ୯ମͰͷڻ͖ จ຺ʹΑΔڻ͖ͷܰݮ ߴ࣍ͷ߲ ୯ޠؒͷpmi Head-dependent mutual information hypothesis pmiͱґଘؔ܎Λରরɻ౷ޠؔ܎ʹ͋Δ୯ޠ ಉ࢜͸ɺνϟϯεΑΓ͸ߴ͍pmiΛ࣋ͭɻ จ຺ (ߴ࣍ͷ߲) ͸ແࢹʢΧ΢ϯτϕʔεͷௐࠪʣ [Futrell+,2019] ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… ϑΥʔϚϧ ͳٞ࿦
  19. l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022  − log

    𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) = − log 𝑝 𝑤! − - &'# !(# pmi 𝑤! ; 𝑤& − - &'# !(# pmi 𝑤! ; 𝑤& ; 𝒘"& ୯ޠ୯ମͰͷڻ͖ จ຺ʹΑΔڻ͖ͷܰݮ ߴ࣍ͷ߲ ୯ޠؒͷpmi Head-dependent mutual information hypothesis pmiͱґଘؔ܎Λରরɻ౷ޠؔ܎ʹ͋Δ୯ޠ ಉ࢜͸ɺνϟϯεΑΓ͸ߴ͍pmiΛ࣋ͭɻ จ຺ (ߴ࣍ͷ߲) ͸ແࢹʢΧ΢ϯτϕʔεͷௐࠪʣ [Futrell+,2019] ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… ϑΥʔϚϧ ͳٞ࿦ จ຺৚݅෇͖pmiͱ ౷ޠߏ଄ͷؔ܎ Λௐࠪ 2ຊ໨
  20. Hoover+’21: Linguistic Dependencies and Statistical Dependence l จ຺৚݅෇͖୯ޠؒ pmi (cpmi)

    ͕ɺ౷ޠߏ଄ͱ੔߹͍ͯ͠Δ͔ l (ͳ͔ͥ) ૒ํ޲ݴޠϞσϧΛcpmiܭࢉثͱͯ͠࢖༻ l ૒ํ޲จ຺Λར༻͓ͯ͠ΓɺαϓϥΠβϧͱͷܨ͕Γ͸ٞ࿦͞Εͯͳ͍ɻ - Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. 2022/9/26 ࠷ઌ୺NLP2022  EisnerΞϧΰϦζϜͰ CPMIΛ࠷େԽ͢Δ projectiveͳ໦ΛٻΊΔ จ຺ͷݩͰɺ͋Δ୯ޠͷଘࡏ͕ ͋Δ୯ޠͷ༧ଌʹͲΕ͚ͩد༩͢Δ͔ είΞߦྻ Λ࡞Δ 𝑤& 𝑤! [Hoover+,2021] [Hoover+,2021] [Hoover+,2021] (αϓϥΠβϧ͔Βͷ࿩ͷಋೖ͸ɺࣗ෼͕উखʹ͍ͯ͠Δ) ۩ମతʹޠΒΕ͓ͯΒͣ…
  21. ݁Ռ l ϕʔεϥΠϯ: word2vec಺ੵ as pmi - ͜Ε͸͑Β͍ l cpmiͰ͸pmiΑΓ௕ڑ཭ґଘͷprec.͕

    ্͕ͬͨ - ༧ଌͷ܏޲ࣗମ͸୹͍ґଘ͕૿͑ͨ l ࠷΋͍ۙޠΛ݁Ϳ (ઢঢ়ߏ଄) ϕʔεϥΠϯ ʹউͯͣ - ͪΐͬͱٞ࿦͕ࡶ l cpmi͕܎Γड͚ʹରԠ͢Δͱ͸ ڧ͘ݴ͑ͳ͍งғؾ - ৘ใཧ࿦తΞϓϩʔνͱݴޠֶΛͲ͏ܨ͛Δ͔͸՝୊ 2022/9/26 ࠷ઌ୺NLP2022  unlabeled undirected accuracy (޿͘ݴ͑͹ࣗવݴޠॲཧ) [Hoover+,2021] [Hoover+,2021]
  22. ͦͷଞɺީิͩͬͨ࿩ l LM͕ޠॱγϟοϑϧͯ͠΋໰୊ղ͚ͪΌ͏໰୊ l ਓؒ΋ޠॱγϟοϑϧʹؤ݈ͩͱ͍͏࿩͕͋Δɿ Composition is the Core Driver

    of the Language-selective Network [Mollica+’20] l ಡΈ׆ಈɿޠΛద੾ʹ૊Έ্͛Δ࡞ۀ - ૊Έ্͛Δ΂͖ޠ͕෼͔ΔΑ͏ʹ (ۙ͘ʹ) ฒΜͰ͍Ε͹ɺޠॱ่͕Ε͍ͯͯ΋ࢧোͳ͘ಡΊΔ - ૊Έ্͛Δ΂͖ޠͷϖΞΛPMIͷߴ͍ޠϖΞͩͱΈͳ͠ɺޠॱΛม͑ͨ࣌ͷ local PMIͷߴ͞ͱಡΈෛՙʹؔ܎͋Γ - Scr1-7ͱಉ͡ճ਺ͷೖΕସ͑ૢ࡞Ͱɺ Ͱ͖Δ͚ͩLocal PMIΛԼ͛Δͱ͍͏ϕʔεϥΠϯ ͕΄͔ͬͨ͠ 2022/9/26 ࠷ઌ୺NLP2022  NLPerʮޠॱγϟοϑϧͯ͠ղ͚Δͷ͸͓͔͍͠ʂʯ ͸ࣗ໌ͳओுͰ͸ͳ͍͔΋ [Mollica+’20]
  23. FAQ l ͜Ε͸NLPͳͷ͔ʁ - /-1ͷձٞͷதͰ΋ɺz-JOHVJTUJD5IFPSJFT $PHOJUJWF.PEFMJOHBOE1TZDIPMJOHVJTUJDTz ͱ͍ͬͨαϒྖҬͰ͸ٞ࿦͕੝Μɻಛʹɺ৘ใཧ࿦ͱਓؒͷจॲཧͱ͍ͬͨςʔϚ͸ ࠷ۙʢେن໛χϡʔϥϧݴޠϞσϧͱ͍͏ಓ۩͕खʹೖͬͨͨΊʣΞπ͍ɻ - खલຯḩ͕ͩɺ͜ͷϑΟʔϧυͰ೥࿈ଓ࠷೉ؔձٞ

    "$-MPOH "$-MPOH &./-1MPOH ʹ࿦จΛ௨ ͨ͠ʢڞஶऀͷօ༷🙏ʣɻ͔ͳΓݐઃతͳίϛϡχςΟͩͱײ͍ͯ͡Δɻ - NLPͷֶࡍతͳ໘ന͞ʹ৮ΕΒΕΔɻҾ͖ग़͠ɾࢹ໺͕޿͕Δɻ l ͜͏͍͏ݚڀʹڵຯ͕͋Δ͕ɺΩʔϫʔυ͕෼͔Βͳ͍ - NLPerͰ͋Ε͹ɺ”computational psycholinguistics” ͕͓ͦΒ͘࠷΋਎ۙɻ - ԾઆΛ΋ͱʹܭࢉϞσϧʢྫ. ݴޠϞσϧʣΛ࡞ͬͯώτͱରর͠ɺώτʹ͍ͭͯ ʢ͋ΘΑ͘͹௚઀؍࡯Ͱ͖ͳ͍෦෼·Ͱʣٯ޻ֶతʹɾߏ੒࿦తͳΞϓϩʔνͰഭΔɻ ʢc.f. ৺ཧݴޠֶɿਓؒࣗମΛର৅ͱ࣮ͨ͠ݧΛߦ͍஌ݟΛੵΈ্͛Δֶ໰ͱೝࣝʣ - “cognitive science” (ਓ޻஌ೳ͙Β͍Ͱ͔͍Ωʔϫʔυ͕ͩ)ɺ”information-theoretic hogehoge” ͳͲ΋ ൺֱత͍ۙΩʔϫʔυɻ”computational linguistics”͸΄΅NLPͱಉٛͷͨΊ޿͗͢Δɻ “cognitive modeling”͸͍ۙɻ “cognitive linguistics”͸ԕ͍ɻ - CMCL workshop΍ɺCogsciɺJournal of CognitionͰݴޠݚڀΛ୳͢ͱؔ࿈ݚڀ͕ݟ͔ͭΔɻ 2022/9/26 ࠷ઌ୺NLP2022 
  24. ࢀߟࢿྉ l Smith, Nathaniel J., and Roger Levy. 2013. “The

    effect of word predictability on reading time is logarithmic.” Journal of Cognition 128 (3): 302–19. l Levy, Roger. 2008. “Expectation-based syntactic comprehension.” Journal of Cognition 106 (3): 1126–77. l Genzel, Dmitriy, and Eugene Charniak. 2002. “Entropy rate constancy in text.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 199–206. l Jaeger, T., and Roger Levy. 2007. “Speakers optimize information density through syntactic reduction.” In Advances in Neural Information Processing Systems, edited by B. Schölkopf, J. Platt, and T. Hoffman, 19:849–56. MIT Press. l Hale, John. 2016. “Information-theoretical Complexity Metrics.” Language and Linguistics Compass 10 (9): 397– 412. l Futrell, Richard, Peng Qian, Edward Gibson, Evelina Fedorenko, and Idan Blank. 2019. “Syntactic Dependencies Correspond to Word Pairs with High Mutual Information.” In Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), 3–13. Paris, France: Association for Computational Linguistics. l Mollica, Francis, Matthew Siegelman, Evgeniia Diachek, Steven T. Piantadosi, Zachary Mineroff, Richard Futrell, Hope Kean, Peng Qian, and Evelina Fedorenko. 2020. “Composition Is the Core Driver of the Language- Selective Network.” Neurobiology of Language 1 (1): 104–34. l Genzel, Dmitriy, and Eugene Charniak. 2002. “Entropy rate constancy in text.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 199–206. 2022/9/26 ࠷ઌ୺NLP2022