Cognitive Plausibility of Neural Language Models

Slide 1

Slide 1 text

Cognitive Plausibility of Neural Language Models Tohoku Univ. (postdoc) Tatsuki Kuribayashi 2022/12/6 MBZUAI

Slide 2

Slide 2 text

Research topics including collaborative works 1/2 l Computational psycholinguistics and NLP - Cognitive modeling using neural language models [Kuribayashi+,ACL2021] [Kuribayashi+,EMNLP2022] - Word preferences in language Mmdels [Kuribayashi+,ACL2020] - Language model training efficiency with respect to multi-modality or multi-linguality (to appear?) - Organizing committee of CMCL (Cognitive Modeling and Computational Linguistics) workshop 2023 (if the proposal is accepted) l Writing assistance - Writing assistance system (Langsmith) [Ito+(equal cont.),EMNLP2021(demo)] - Tool for developing NLP-powered editor (language server protocol) [Hagiwara+,EMNLP2019(demo)] - Translating rough drafts into academic-style texts [Ito+(equal cont.),INLG2019] Discourse processing (next slides…) Interpretability (next slides…) 2022/12/6 MBZUAI Today’s topic Service page

Slide 3

Slide 3 text

l Modeling discourse phenomena - Topicalization preferences in humans Japanese LMs [Fujihara+,COLING2022] (2nd author) - Ellipsis preferences in humans and Japanese LMs (to appear?) - Modeling event salience in a narrative [Otake+,COLING2020] - Argumentation structure parsing [Kuribayashi+,ACL2019] l Interpretability - Analyzing Transformers with vector norms [Kobayashi+,EMNLP2020,2021] (2nd author) - Chain-of-thought abilities of vanilla seq2seq [Aoki+,AACL-SRW2022 (best paper!, but non-archival)] Research topics including collaborative works 2/2 2022/12/6 MBZUAI

Slide 4

Slide 4 text

Brief summary: What language models (LMs) simulate human reading better? 2022/12/6 MBZUAI Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram I have a pen that… E.g.,−log 𝑝(word|context) text humans E.g., human gaze duration LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl better PPL human-like context access limitation human-like good correlation ’s bias may be similar to the actual humans’ bias different biases 😲 😲

Slide 5

Slide 5 text

2022/12/6 MBZUAI What’s the goal of natural language processing (NLP)? (Why is computational psycholinguistics?)

Slide 6

Slide 6 text

What’s the goal of NLP? l NLP---a branch of artificial intelligence focusing on language 2022/12/6 MBZUAI

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

What’s the goal of NLP? l NLP---a branch of artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means)

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Case1: Enhancing feasibility of psycholinguistic research l NLP---a branch of artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal Q. What’s the key to humans’ efficient language acquisition? Do some innate biases relate to? Raising children without any language experience from birth, then… [Coulton, 1972] Ethical issues

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Case2: Exact simulation of humans in application l Need feedback about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Okey, first, I compute self- attention over the full text… Suppose…

Slide 16

Slide 16 text

Case2: Exact simulation of humans in application l Need feedback about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Model showing the same processing difficulty humans do. compatible the definition of t is written in …Section 4, and u is in Section 2, ... ah….! Okey, first, I compute self- attention over the full text… Suppose… human-like model

Slide 17

Slide 17 text

l Modeling humans using modern NLP - Theory: people try to transmit a constant amount of information across time Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ?

Slide 18

Slide 18 text

l Modeling humans using modern NLP - Theory: people try to transmit a constant amount of information across time - Modern approach: observing the surprisal −log 𝑝(word|context) computed by neural LM Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ? I have a pen that… −log 𝑝(word|context)

Slide 19

Slide 19 text

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) well simulated human reading behavior 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP [Levy,2008][Smith&Levy,2013]

Slide 20

Slide 20 text

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) computed by different models 𝜽 are compared 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP Too many LM variants 🤗 [Wilcox+,2020]…

Slide 21

Slide 21 text

What model computes human-like surprisal? l E.g., hierarchical bias v.s. sequential bias 2022/12/6 MBZUAI vanilla (sequential) language model I know grammar α I know grammar β better correlation Grammar β is likely related to human sentence processing model A model B model C models [Hale+,ACL2018 (best paper)][Yoshida+,EMNLP2021]

Slide 22

Slide 22 text

Does scaling solve the cognitive modeling? l Scaling low works even in modeling human reading behavior? 2022/12/6 MBZUAI ∝ scaling cognitively plausible model ? [Kuribayashi+, ACL21]

Slide 23

Slide 23 text

Does scaling solve the cognitive modeling? l Scaling low works even in modeling human reading behavior? l Background: scaling low for language model performance (perplexity; PPL) 2022/12/6 MBZUAI ∝ scaling cognitively plausible model ? [Kuribayashi+, ACL21] [Kaplan+, 2020]

Slide 24

Slide 24 text

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 English better worse [Kuribayashi+, ACL21] better PPL better gaze duration modeling Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram

Slide 25

Slide 25 text

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 r = 0.53 English Japanese better worse better worse scaling law breaks [Kuribayashi+, ACL21] better PPL better gaze duration modeling

Slide 26

Slide 26 text

l SOV language arguably incurs non-uniform processing cost across tokens Human-like slow downs (speed up) diminished 2022/12/6 0 20 40 60 80 −15 −5 5 15 tokenN_in_sent s(tokenN_in_sent,3.7) 0 5 10 20 −100 −50 0 50 tokenN s(tokenN,2.62) Change of gaze duration (ms) position in sentence position in sentence Change of gaze duration (ms) Dundee Corpus (English) BCCWJ-EyeTrack (Japanese) stats. in toy corpus [Maurits+, 2010] reading time stats. [Kuribayashi+, 2021] MBZUAI Japanese [Kuribayashi+, ACL21]

Slide 27

Slide 27 text

l Human-like variations of surprisal diminished Human-like slow downs (speed up) diminished 2022/12/6 Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram 400 Effect of syntactic category MBZUAI [Kuribayashi+, ACL21] better PPL by-word category variation of surprisal diminished

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Similar reports: human-like slow downs diminished l Processing difficulty in syntactic violation differs between LMs and humans - LMs under-predict the difficulty 2022/12/6 MBZUAI [Wilcox+, ACL21] I know that my mother sent the present to Taylor last weekend. I know who my mother sent the present to Taylor last weekend. gpt2 rnng jrnn grnn human Cleft FGD-obj FGD-pp FGD-sbj MVRR NPL-any-orc NPL-any-src NPL-ever-orc NPL-ever-src RNA-f-orc RNA-f-src RNA-m-orc RNA-m-src SVNA-orc SVNA-pp SVNA-src 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 Test Suite Slowdown in Milliseconds Predicted vs. Observed Slowdown Between Conditions 😌 😵💫 😵💫 😵💫 LMs too smoothly processed the text Humans are more surprised

Slide 30

Slide 30 text

Cognitive plausibility of noisy language model l Tested LMs with limited context access 2022/12/6 MBZUAI [Kuribayashi+,EMNLP22] better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl … people wearing a red hat come …

Slide 31

Slide 31 text

Cognitive plausibility of noisy language model l Tested LMs with limited context access l Theories: human context access during syntactic processing is limited - pressure that long dependencies/deep nesting are avoided in natural language 2022/12/6 MBZUAI [Kuribayashi+, EMNLP22] … people wearing a red hat come … better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl

Slide 32

Slide 32 text

Summary l There are at least three mindsets in NLP: machine intelligence, computational philosophy, and computational psychology - they are not exclusive - not talking about which mindset is correct l Engineeringly good model is not always human-like (at least in our settings) - not intend that the current NLP directions are wrong - when achieving something, simply replicating nature is not always a good idea (e.g., airplane does not flap its wings as birds) l Understanding humans will continue to be challenging goal - at least scaling does not solve 2022/12/6 MBZUAI https://www.lesswrong.com/posts/eqxqgFxymP8hXDTt5/a nnouncing-the-inverse-scaling-prize-usd250k-prize-pool