Slide 1

Slide 1 text

Revisiting the Uniform Information Density Hypothesis Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy (EMNLP2021) Linguistic Dependencies and Statistical Dependence Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell (EMNLP 2021) ౦๺େֶ ܀ྛथੜ 2022/9/26 ࠷ઌ୺NLP2022 ࠷ઌ୺NLPษڧձ 2022 ࿦จ঺հ ૝ఆಡऀɿ ʮαϓϥΠβϧͱಡΈ࣌ؒͱ͔ฉ͍ͨ͜ͱ͕͋Δ͕ɺഎޙʹԿ͕͋Δͷ͔Α͘෼͔͍ͬͯͳ͍ʯ ʮݴޠϞσϧͱਓؒͷݴޠॲཧͷؔ܎ੑʹ͍ͭͯയવͱؾʹͳΔʯ ʮจষͷಡΈ΍͢͞ΛଌΓ͍ͨʯ

Slide 2

Slide 2 text

دΓಓ͠·͢ʂ 2022/9/26 ࠷ઌ୺NLP2022 ֬཰ͱॲཧෛՙ ೝ஌ϞσϦϯά ৘ใີ౓Ұ༷ੑԾઆ ݴޠֶ ૬ޓ৘ใྔͱ܎Γड͚ 1ຊ໨ 2ຊ໨ ࣗવݴޠॲཧ ৘ใཧ࿦తྔͷܭࢉػ (ݴޠϞσϧ) ͷఏڙ دΓಓ ܭࢉ৺ཧݴޠֶ ೝ஌ݴޠֶɺ ʢ͍ΘΏΔʣ ৺ཧݴޠֶͱ͸ ҟͳΔ ৘ใཧ࿦

Slide 3

Slide 3 text

l ਓؒͷݴޠॲཧ͕஌Γ͍ͨ - ௚઀؍࡯ෆՄೳ l ܥྻతͳ (ஞ࣍తͳɾ౷ޠతͳ) ॲཧͰੜ͡Δෛՙͷઆ໌Λߟ͑Δ - ಡΈ࣌ؒͷܭଌɺ༰ೝੑ൑அͳͲΛ௨ͯ͠ෛՙΛ؍࡯ l Ұൠతͳݪཧɾݪଇ͔Βԋ៷తʹਐΊΔ - c.f. ݴޠʹಛ༗ͷϝΧχζϜ͔ΒϘτϜΞοϓʹ࿦ΛਐΊΔ ਓؒͷஞ࣍తͳจॲཧ 2022/9/26 ࠷ઌ୺NLP2022 I ate a pineapple . ҰޠͣͭಡΜͰ͍Δ࣌ͷॲཧෛՙ I ’m studying pineapple . 😳 🙂

Slide 4

Slide 4 text

Ұ୴৴͡Δݪཧɾݪଇ 1. ॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2. ͋Δཁૉͷॲཧෛՙ͸ͦͷߏ੒ཁૉͷॲཧෛՙͷ࿨ͰදͤΔ 3. ͳΔ΂͘খ͞ͳॲཧෛՙ͕ੜ͡ΔΑ͏ίϛϡχέʔγϣϯ͢Δ 2022/9/26 ࠷ઌ୺NLP2022 cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) cost(𝒘#:% ) = - ! cost 𝑤! 𝒘"! I ’m studying pineapple . I ’m studying pineapple . 𝑓: ֬཰ ↦ ॲཧෛՙ ෆ੔߹ྫɿwarp-up effect (અ຤ Ͱઅશମͷ࠶ղऍΛ͢ΔԾઆ) ࣮ݧతͳূڌɺ೴ as ༧ଌػցͷΑ͏ͳݟํ (༧ଌූ߸ԽɺϕΠζ೴ɺࣗ༝ΤωϧΪʔݪཧͳͲͷํ޲) ฉ͖खʹͱͬͯͷ ฉ͖खʹͱͬͯͷ ࿩͠ख͸ฉ͖खʹ ͜ͷ3͔ͭΒݴޠʹ͍༷ͭͯʑͳಎ࡯Λಋ͘ ڠௐతͰ͋Δ Ұ୴ɺશମ࿦ͱܾผ…

Slide 5

Slide 5 text

໰͍: f ͸۩ମతʹʁ 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2022/9/26 ࠷ઌ୺NLP2022 cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ [Smith&Levy,2013]

Slide 6

Slide 6 text

1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) ໰͍: f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 2022/9/26 ࠷ઌ୺NLP2022 −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ [Smith&Levy,2013] cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013]

Slide 7

Slide 7 text

1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) ໰͍: f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 2022/9/26 ࠷ઌ୺NLP2022 💡 ਓؒͷॲཧ͕௒ஞ࣍త (super-incremental) Ͱ͋Δͱ͖ f ͸ର਺ؔ਺ʹۙࣅ͞ΕΔ 𝑓 = −𝑥 𝑓 = −𝑥! − 𝑥" cost 𝑤# |𝑤$# = + % 𝑓(𝑝 𝑐% 𝒄$% , 𝒘$# ) −log 𝑝(𝑤!|𝑤"!) −log 𝑝(𝑤!|𝑤"!) −log 𝑝(𝑤! |𝑤"! ) k=20 k=20 k=20 𝑓 = 1 𝑥 𝑝 𝑤! = 𝑝 subword# ×𝑝 subword$ subword# … 𝑝 subword% subword"% f(p) ͕p=1෇ۙͰઢܗ খ࿩ᶃ [Smith&Levy,2013] f ͷܗͷٞ࿦͕ॲཧͷ࿈ଓੑͷٞ࿦ʹஔ͖׵ΘΔʂ ҎԼΛຬͨ͢ͱ͖ɺk͕େ͖͚Ε͹ɺ͍͔ͳΔ f ΋ର਺ؔ਺ʹۙࣅ͞ΕΔɻ cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) Ծఆ2ΑΓ ෛՙ͸࿨ [ৄࡉ] ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013] ※෼͔Γ΍͘͢subwordͱॻ͍͕ͨɺ࣮ࡍ͸จࣈ΍Իૉɺඍখ࣌ؒ୯ҐͷԻͳͲͷ૝ఆ lim %→' min ( 𝑝 subword( subword"( = 1 −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ [Smith&Levy,2013]

Slide 8

Slide 8 text

໰͍: f ͸۩ମతʹʁ (୤ઢ: αϓϥΠβϧԾઆ) 1. ෦෼ͷॲཧෛՙ͸ɺ֬཰ 𝑝(word|context) ͱؔ܎͢Δ 2022/9/26 ࠷ઌ୺NLP2022 −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ − log 𝑝 𝑤! 𝒘"! = 𝐾𝐿(𝑃! (𝑇)||𝑃!#$ (𝑇)) 💡 ୯ޠͷαϓϥΠβϧ −log 𝑝(𝑤9|𝒘:9)͸ ʮ୯ޠΛಡΜͩ࣌ʹՄೳͳߏ଄͕ͲΕ͚ͩߜΒΕ͔ͨʯ ͱ౳Ձ ֤ߏ଄ͷಋग़֬཰ 𝑤! ͱ੔߹͠ͳ͍ಋग़ ͸֬཰0ʹͳ͍ͬͯ͘ Ծઆ1: αϓϥΠβϧԾઆ: 𝑓 = −log(1) [Smith&Levy,2013] খ࿩ᶄ [Levy, 2007] ߏ଄Tͷੑ࣭͸໰Θͳ͍ [Hale, 2016] cost 𝑤! 𝒘"! = 𝑓(𝑝(𝑤! |𝒘"! )) ࣮ݧతͳࢧ͕࣋ଟ͍ [Smith&Levy, 2013]

Slide 9

Slide 9 text

−log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) ࠷ઌ୺NLP2022 ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ [Smith&Levy,2013]

Slide 10

Slide 10 text

−log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) 2022/9/26 ࠷ઌ୺NLP2022 ৘ใ − log 𝑝(𝑠) (ϝοηʔδ) Λ௕͞ l ͷܥྻ 𝑠 = 𝑤$ , 𝑤& , … , 𝑤' ʹ෼഑ͯ͠఻ୡ −log𝑝(𝑠) = −+ # log𝑝(𝑤# |𝒘$# ) 💡 Ծઆ2ͷԼͰɺॲཧෛՙͷ૯࿨͕࠷খʹͳΔઓུ: ֤୯ޠ 𝑤! ʹ౳͍͠৘ใྔΛ఻ୡͤ͞Δ cost = − > !($ ' [log 𝑝(𝑤! |𝒘"! )]% ≥ −𝑙[> !($ ' log 𝑝(𝑤! |𝒘"! ) 𝑙 ]% 💡 Ծఆ1-3ͷԼͰ͸ɺԾઆ2͸৘ใີ౓Ұ༷ੑԾઆ (UID) Λࢧ࣋ ΠΣϯηϯͷෆ౳ࣜΛར༻ ৘ใྔ͕Ұ༷ʹ෼෍͢Δͱ͖౳߸੒ཱ (Լք) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ logͷ֎ଆʹႈؔ਺ΛԾఆ͕ͨ͠ɺ[0, ∞)ͰԼʹತͰ͋Ε͹ྑ͍ sjdia ay asd dfi sdsoiuf sidu fsio ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ ॏཁ −log 𝑝(𝑠) [Smith&Levy,2013] ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) [Levy&Jaeger, 2006] ୯Ґ࣌ؒɾγϯϘϧ͋ͨΓʹ఻ୡ͞ΕΔ৘ใྔ͕ Ұ༷ʹͳΔΑ͏ͳίϛϡχέʔγϣϯΛ޷Ή [ಋग़] Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ

Slide 11

Slide 11 text

−log 𝑝(𝑤!|𝑤"!) খ ໰͍: f ͸۩ମతʹʁ ( f ͱ৘ใີ౓Ұ༷ੑԾఆ) 2022/9/26 ࠷ઌ୺NLP2022 ৘ใ − log 𝑝(𝑠) (ϝοηʔδ) Λ௕͞ l ͷܥྻ 𝑠 = 𝑤$ , 𝑤& , … , 𝑤' ʹ෼഑ͯ͠఻ୡ −log𝑝(𝑠) = −+ # log𝑝(𝑤# |𝒘$# ) 💡 Ծઆ2ͷԼͰɺॲཧෛՙͷ૯࿨͕࠷খʹͳΔઓུ: ֤୯ޠ 𝑤! ʹ౳͍͠৘ใྔΛ఻ୡͤ͞Δ cost = − > !($ ' [log 𝑝(𝑤! |𝒘"! )]% ≥ −𝑙[> !($ ' log 𝑝(𝑤! |𝒘"! ) 𝑙 ]% 💡 Ծఆ1-3ͷԼͰ͸ɺԾઆ2͸৘ใີ౓Ұ༷ੑԾઆ (UID) Λࢧ࣋ ΠΣϯηϯͷෆ౳ࣜΛར༻ ৘ใྔ͕Ұ༷ʹ෼෍͢Δͱ͖౳߸੒ཱ (Լք) −log 𝑝(𝑤!|𝑤"!) cost(𝑤!|𝑤"!) খ খ େ େ logͷ֎ଆʹႈؔ਺ΛԾఆ͕ͨ͠ɺ[0, ∞)ͰԼʹತͰ͋Ε͹ྑ͍ sjdia ay asd dfi sdsoiuf sidu fsio ࠓ೔ͷ࿦จ Ұຊ໨ʹؔ࿈ʂ ॏཁ −log 𝑝(𝑠) [Smith&Levy,2013] ֬཰͕খ͘͞ͳΔͱෛՙ͕ٸʹ্ঢ͢Δ ྫ: cost 𝑤! 𝑤"! = −[log 𝑝(𝑤! |𝒘"! )]% 𝑘 > 1 (1) [Levy&Jaeger, 2006] ୯Ґ࣌ؒɾγϯϘϧ͋ͨΓʹ఻ୡ͞ΕΔ৘ใྔ͕ Ұ༷ʹͳΔΑ͏ͳίϛϡχέʔγϣϯΛ޷Ή [ಋग़] Ծઆ2: f ͸ର਺ؔ਺ΑΓ΋ (Լʹತଆʹ) ۂ͕͍ͬͯΔ UIDʹ͍ͭͯΤϯτϩϐʔͰٞ࿦͢Δݚڀ [Genzel&Chaniak, 2002] ΋͋Δ ͕ɺ͜͜Ͱ͸αϓϥΠβϧͰߟ͍͑ͯΔ (ৄ͍͠ਓ޲͚) ͜͜Ͱ͸ɺࣗવݴޠʹஔ͍ͯ৘ใྔ΍ΤϯτϩϐʔϨʔτ͕Ұ༷Ͱ͋ Δ͜ͱΛओு͍ͨ͠Θ͚Ͱ͸ͳ͘ɺҰ༷ʹۙͮ͘΄Ͳฉ͖खͷ ॲཧෛՙ͕ݮΔͷͰ͸ͱ͍͏૬ରతͳॲཧෛՙͷٞ࿦Λ͍ͯ͠Δ

Slide 12

Slide 12 text

Clara+’21: Revisiting the Uniform Information Density Hypothesis l f Λۂ͛ͨํ͕ (k >1) ɺಡΈ࣌ؒɾ༰ೝੑ൑அΛ͏·͘આ໌: Ծఆ1-3ͷ΋ͱͰUIDΛऑ͘ࢧ࣋ (ஶऀΒ΋ڧ͍ओு͸͍ͯ͠ͳ͍) 2022/9/26 ࠷ઌ୺NLP2022 L αϓϥΠβϧ Ծઆ acceptability(s) ~ > ! cost 𝑤! 𝒘"! = > ! −[log 𝑝(𝑤! |𝒘"! )]% จͷ߹ܭಡΈ࣌ؒͱ༰ೝੑ൑அσʔλΛར༻ k=1ͱൺ΂ͯ༰ೝੑ൑அͰ͸༗ҙࠩ͋Γ (p<0.001 w/Bonferroni’s correction)ɺಡΈ࣌ؒͰ͸ແ͠ [Smith&Levy,2013] [Clara+,2021]

Slide 13

Slide 13 text

Clara+’21: Revisiting the Uniform Information Density Hypothesis l ৘ใີ౓Ұ༷ੑԾઆ (UID) ͷఆࣜԽͷൺֱ 2022/9/26 ࠷ઌ୺NLP2022 sjdia ay asd dfi sdsoiuf sidu fsio ৘ใ (৘ใཧ࿦ΑΓ) ௨৴࿏༰ྔʹ͍ۙҰఆసૹ཰Ͱίϛϡχέʔγϣϯ͢Δ͜ͱͰɺޮ཰తͳ௨৴͕Մೳɻ γϯϘϧ͝ͱͷ৘ใྔʹ͹Β͖͕ͭ͋Δͱɺཧ૝తͳ௨৴͔Βҳ୤͠ɺॲཧෛՙ্͕͕Δɻ ϊΠζ UIDͷԦಓతͳઆ໌

Slide 14

Slide 14 text

Clara+’21: Revisiting the Uniform Information Density Hypothesis l ৘ใີ౓Ұ༷ੑԾઆ (UID) ͷఆࣜԽͷൺֱ 2022/9/26 ࠷ઌ୺NLP2022 cost 𝑠 = B !.# / (− log 𝑝 𝑤!|𝒘"! − 𝜇)$ cost 𝑠 = B !.# / (− log 𝑝 𝑤! |𝒘"! − log 𝑝 𝑤!0# |𝒘"!0# )$ ͹Β͖ͭͷఆࣜԽ͸༷ʑɿ ᶃ ෼ࢄ v.s. ہॴతͳ෼ࢄ ᶄ ෼ࢄͷ৔߹ɺฏۉ (ཧ૝తͳ௨৴ྔ) ͸Ͳ͏ఆΊΔʁ ݴޠͷฏۉʁจষͷฏۉʁจͷฏۉʁہॴతͳฏۉʁ [Clara+,2021] sjdia ay asd dfi sdsoiuf sidu fsio ৘ใ (৘ใཧ࿦ΑΓ) ௨৴࿏༰ྔʹ͍ۙҰఆసૹ཰Ͱίϛϡχέʔγϣϯ͢Δ͜ͱͰɺޮ཰తͳ௨৴͕Մೳɻ γϯϘϧ͝ͱͷ৘ใྔʹ͹Β͖͕ͭ͋Δͱɺཧ૝తͳ௨৴͔Βҳ୤͠ɺॲཧෛՙ্͕͕Δɻ ϊΠζ UIDͷԦಓతͳઆ໌

Slide 15

Slide 15 text

݁Ռ࠶ܝ l ৭ʑൺ΂ͨ΋ͷͷɺαϓϥΠβϧͷႈ৐ (k=1-1.5) ʹউΔ΋ͷ͸ͳ͍ l ίʔύεશମʹ͓͚Δॲཧෛՙฏۉ͔Βͷဃ཭ΛଌΔͱྑ͍ 2022/9/26 ࠷ઌ୺NLP2022 cost 𝑠 = > !($ ' (− log 𝑝 𝑤! |𝒘"! − 𝜇)& ෼ࢄͰଌΔ৔߹͸ sjdia ay asd dfi sdsoiuf 𝜇 ίʔύε จͳͲɺΑΓ খ͍͞୯Ґ [Clara+,2021] [Clara+,2021] 🤔 ಄ͷதͰ͸ݴޠ as ఆৗաఔΛ๬ΜͰ͍Δʁ ෛՙ

Slide 16

Slide 16 text

l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022 − log 𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ จ຺ʹΑΔڻ͖ͷܰݮ ୯ޠ୯ମͰͷڻ͖

Slide 17

Slide 17 text

l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022 − log 𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ จ຺ʹΑΔڻ͖ͷܰݮ ୯ޠ୯ମͰͷڻ͖ ৚݅෇͖PMIͷߴ͍ޠϖΞ ౷ޠߏ଄ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… [Hoover+,2021]

Slide 18

Slide 18 text

l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022 − log 𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) = − log 𝑝 𝑤! − - &'# !(# pmi 𝑤! ; 𝑤& − - &'# !(# pmi 𝑤! ; 𝑤& ; 𝒘"& ୯ޠ୯ମͰͷڻ͖ จ຺ʹΑΔڻ͖ͷܰݮ ߴ࣍ͷ߲ ୯ޠؒͷpmi Head-dependent mutual information hypothesis pmiͱґଘؔ܎Λରরɻ౷ޠؔ܎ʹ͋Δ୯ޠ ಉ࢜͸ɺνϟϯεΑΓ͸ߴ͍pmiΛ࣋ͭɻ จ຺ (ߴ࣍ͷ߲) ͸ແࢹʢΧ΢ϯτϕʔεͷௐࠪʣ [Futrell+,2019] ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… ϑΥʔϚϧ ͳٞ࿦

Slide 19

Slide 19 text

l ΋͠ਓ͕ؒαϓϥΠβϧͷܭࢉΛ͍ͯ͠ΔͳΒɺ୯ޠಉ࢜ͷ݁ͼ͖ͭ ʹ͍ͭͯͲΜͳྔΛܭࢉ͍ͯͨ͜͠ͱʹͳΔʁ: จ຺৚݅෇͖pmi ٳܜɿ৘ใཧ࿦ͱݴޠߏ଄͸ܨ͕Δʁ 2022/9/26 ࠷ઌ୺NLP2022 − log 𝑝 𝑤! |𝒘"! = − log 𝑝 𝑤! − pmi(𝑤! ; 𝒘"! ) = − log 𝑝 𝑤! − - &'# !(# pmi(𝑤! ; 𝑤& |𝒘"& ) = − log 𝑝 𝑤! − - &'# !(# pmi 𝑤! ; 𝑤& − - &'# !(# pmi 𝑤! ; 𝑤& ; 𝒘"& ୯ޠ୯ମͰͷڻ͖ จ຺ʹΑΔڻ͖ͷܰݮ ߴ࣍ͷ߲ ୯ޠؒͷpmi Head-dependent mutual information hypothesis pmiͱґଘؔ܎Λରরɻ౷ޠؔ܎ʹ͋Δ୯ޠ ಉ࢜͸ɺνϟϯεΑΓ͸ߴ͍pmiΛ࣋ͭɻ จ຺ (ߴ࣍ͷ߲) ͸ແࢹʢΧ΢ϯτϕʔεͷௐࠪʣ [Futrell+,2019] ઌߦจ຺ͷݩͰɺ 𝑤7 ͕𝑤! ʹ͍ͭͯͲΕ΄Ͳ ৘ใΛఏڙ͢Δ͔ ΋྆͠ऀ͕͚ۙΕ͹ɺ ༧ଌʹجͮ͘ݴޠॲཧͱ ౷ޠ࿦͕៉ྷʹܨ͕͍ͬͯ͘ ͷʹͳ… ϑΥʔϚϧ ͳٞ࿦ จ຺৚݅෇͖pmiͱ ౷ޠߏ଄ͷؔ܎ Λௐࠪ 2ຊ໨

Slide 20

Slide 20 text

Hoover+’21: Linguistic Dependencies and Statistical Dependence l จ຺৚݅෇͖୯ޠؒ pmi (cpmi) ͕ɺ౷ޠߏ଄ͱ੔߹͍ͯ͠Δ͔ l (ͳ͔ͥ) ૒ํ޲ݴޠϞσϧΛcpmiܭࢉثͱͯ͠࢖༻ l ૒ํ޲จ຺Λར༻͓ͯ͠ΓɺαϓϥΠβϧͱͷܨ͕Γ͸ٞ࿦͞Εͯͳ͍ɻ - Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. 2022/9/26 ࠷ઌ୺NLP2022 EisnerΞϧΰϦζϜͰ CPMIΛ࠷େԽ͢Δ projectiveͳ໦ΛٻΊΔ จ຺ͷݩͰɺ͋Δ୯ޠͷଘࡏ͕ ͋Δ୯ޠͷ༧ଌʹͲΕ͚ͩد༩͢Δ͔ είΞߦྻ Λ࡞Δ 𝑤& 𝑤! [Hoover+,2021] [Hoover+,2021] [Hoover+,2021] (αϓϥΠβϧ͔Βͷ࿩ͷಋೖ͸ɺࣗ෼͕উखʹ͍ͯ͠Δ) ۩ମతʹޠΒΕ͓ͯΒͣ…

Slide 21

Slide 21 text

݁Ռ l ϕʔεϥΠϯ: word2vec಺ੵ as pmi - ͜Ε͸͑Β͍ l cpmiͰ͸pmiΑΓ௕ڑ཭ґଘͷprec.͕ ্͕ͬͨ - ༧ଌͷ܏޲ࣗମ͸୹͍ґଘ͕૿͑ͨ l ࠷΋͍ۙޠΛ݁Ϳ (ઢঢ়ߏ଄) ϕʔεϥΠϯ ʹউͯͣ - ͪΐͬͱٞ࿦͕ࡶ l cpmi͕܎Γड͚ʹରԠ͢Δͱ͸ ڧ͘ݴ͑ͳ͍งғؾ - ৘ใཧ࿦తΞϓϩʔνͱݴޠֶΛͲ͏ܨ͛Δ͔͸՝୊ 2022/9/26 ࠷ઌ୺NLP2022 unlabeled undirected accuracy (޿͘ݴ͑͹ࣗવݴޠॲཧ) [Hoover+,2021] [Hoover+,2021]

Slide 22

Slide 22 text

ͦͷଞɺީิͩͬͨ࿩ l LM͕ޠॱγϟοϑϧͯ͠΋໰୊ղ͚ͪΌ͏໰୊ l ਓؒ΋ޠॱγϟοϑϧʹؤ݈ͩͱ͍͏࿩͕͋Δɿ Composition is the Core Driver of the Language-selective Network [Mollica+’20] l ಡΈ׆ಈɿޠΛద੾ʹ૊Έ্͛Δ࡞ۀ - ૊Έ্͛Δ΂͖ޠ͕෼͔ΔΑ͏ʹ (ۙ͘ʹ) ฒΜͰ͍Ε͹ɺޠॱ่͕Ε͍ͯͯ΋ࢧোͳ͘ಡΊΔ - ૊Έ্͛Δ΂͖ޠͷϖΞΛPMIͷߴ͍ޠϖΞͩͱΈͳ͠ɺޠॱΛม͑ͨ࣌ͷ local PMIͷߴ͞ͱಡΈෛՙʹؔ܎͋Γ - Scr1-7ͱಉ͡ճ਺ͷೖΕସ͑ૢ࡞Ͱɺ Ͱ͖Δ͚ͩLocal PMIΛԼ͛Δͱ͍͏ϕʔεϥΠϯ ͕΄͔ͬͨ͠ 2022/9/26 ࠷ઌ୺NLP2022 NLPerʮޠॱγϟοϑϧͯ͠ղ͚Δͷ͸͓͔͍͠ʂʯ ͸ࣗ໌ͳओுͰ͸ͳ͍͔΋ [Mollica+’20]

Slide 23

Slide 23 text

FAQ l ͜Ε͸NLPͳͷ͔ʁ - NLPの会議の中でも、”Linguistic Theories, Cognitive Modeling and Psycholinguistics.” といったサブ 領域では議論が盛ん。特に、情報理論と人間の文処理といったテーマは 最近(大規模ニューラル言語モデルという道具が手に入ったため)アツい。 - 手前味噌だが、このフィールドで3年連続最難関会議 (ACL long, ACL long, EMNLP long) に論文を通した(共 著者の皆様🙏)。かなり建設的なコミュニティだと感じている。 - NLPͷֶࡍతͳ໘ന͞ʹ৮ΕΒΕΔɻҾ͖ग़͠ɾࢹ໺͕޿͕Δɻ l ͜͏͍͏ݚڀʹڵຯ͕͋Δ͕ɺΩʔϫʔυ͕෼͔Βͳ͍ - NLPerͰ͋Ε͹ɺ”computational psycholinguistics” ͕͓ͦΒ͘࠷΋਎ۙɻ - ԾઆΛ΋ͱʹܭࢉϞσϧʢྫ. ݴޠϞσϧʣΛ࡞ͬͯώτͱରর͠ɺώτʹ͍ͭͯ ʢ͋ΘΑ͘͹௚઀؍࡯Ͱ͖ͳ͍෦෼·Ͱʣٯ޻ֶతʹɾߏ੒࿦తͳΞϓϩʔνͰഭΔɻ ʢc.f. ৺ཧݴޠֶɿਓؒࣗମΛର৅ͱ࣮ͨ͠ݧΛߦ͍஌ݟΛੵΈ্͛Δֶ໰ͱೝࣝʣ - “cognitive science” (ਓ޻஌ೳ͙Β͍Ͱ͔͍Ωʔϫʔυ͕ͩ)ɺ”information-theoretic hogehoge” ͳͲ΋ ൺֱత͍ۙΩʔϫʔυɻ”computational linguistics”͸΄΅NLPͱಉٛͷͨΊ޿͗͢Δɻ “cognitive modeling”͸͍ۙɻ “cognitive linguistics”͸ԕ͍ɻ - CMCL workshop΍ɺCogsciɺJournal of CognitionͰݴޠݚڀΛ୳͢ͱؔ࿈ݚڀ͕ݟ͔ͭΔɻ 2022/9/26 ࠷ઌ୺NLP2022

Slide 24

Slide 24 text

ࢀߟࢿྉ l Smith, Nathaniel J., and Roger Levy. 2013. “The effect of word predictability on reading time is logarithmic.” Journal of Cognition 128 (3): 302–19. l Levy, Roger. 2008. “Expectation-based syntactic comprehension.” Journal of Cognition 106 (3): 1126–77. l Genzel, Dmitriy, and Eugene Charniak. 2002. “Entropy rate constancy in text.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 199–206. l Jaeger, T., and Roger Levy. 2007. “Speakers optimize information density through syntactic reduction.” In Advances in Neural Information Processing Systems, edited by B. Schölkopf, J. Platt, and T. Hoffman, 19:849–56. MIT Press. l Hale, John. 2016. “Information-theoretical Complexity Metrics.” Language and Linguistics Compass 10 (9): 397– 412. l Futrell, Richard, Peng Qian, Edward Gibson, Evelina Fedorenko, and Idan Blank. 2019. “Syntactic Dependencies Correspond to Word Pairs with High Mutual Information.” In Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), 3–13. Paris, France: Association for Computational Linguistics. l Mollica, Francis, Matthew Siegelman, Evgeniia Diachek, Steven T. Piantadosi, Zachary Mineroff, Richard Futrell, Hope Kean, Peng Qian, and Evelina Fedorenko. 2020. “Composition Is the Core Driver of the Language- Selective Network.” Neurobiology of Language 1 (1): 104–34. l Genzel, Dmitriy, and Eugene Charniak. 2002. “Entropy rate constancy in text.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 199–206. 2022/9/26 ࠷ઌ୺NLP2022