Cognitive Plausibility of Neural Language Models

Cognitive Plausibility of Neural Language Models Tohoku Univ. (postdoc) Tatsuki
Kuribayashi 2022/12/6 MBZUAI

Research topics including collaborative works 1/2 l Computational psycholinguistics and
NLP - Cognitive modeling using neural language models [Kuribayashi+,ACL2021] [Kuribayashi+,EMNLP2022] - Word preferences in language Mmdels [Kuribayashi+,ACL2020] - Language model training efficiency with respect to multi-modality or multi-linguality (to appear?) - Organizing committee of CMCL (Cognitive Modeling and Computational Linguistics) workshop 2023 (if the proposal is accepted) l Writing assistance - Writing assistance system (Langsmith) [Ito+(equal cont.),EMNLP2021(demo)] - Tool for developing NLP-powered editor (language server protocol) [Hagiwara+,EMNLP2019(demo)] - Translating rough drafts into academic-style texts [Ito+(equal cont.),INLG2019] Discourse processing (next slides…) Interpretability (next slides…) 2022/12/6 MBZUAI Today’s topic Service page

l Modeling discourse phenomena - Topicalization preferences in humans Japanese
LMs [Fujihara+,COLING2022] (2nd author) - Ellipsis preferences in humans and Japanese LMs (to appear?) - Modeling event salience in a narrative [Otake+,COLING2020] - Argumentation structure parsing [Kuribayashi+,ACL2019] l Interpretability - Analyzing Transformers with vector norms [Kobayashi+,EMNLP2020,2021] (2nd author) - Chain-of-thought abilities of vanilla seq2seq [Aoki+,AACL-SRW2022 (best paper!, but non-archival)] Research topics including collaborative works 2/2 2022/12/6 MBZUAI

Brief summary: What language models (LMs) simulate human reading better?
2022/12/6 MBZUAI Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram I have a pen that… E.g.,−log 𝑝(word|context) text humans E.g., human gaze duration LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl better PPL human-like context access limitation human-like good correlation ’s bias may be similar to the actual humans’ bias different biases 😲 😲

2022/12/6 MBZUAI What’s the goal of natural language processing
(NLP)? (Why is computational psycholinguistics?)

What’s the goal of NLP? l NLP---a branch of artificial
intelligence focusing on language 2022/12/6 MBZUAI

intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view:

intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) E.g., machine translation, information retrieval…

intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means)

intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal

Case1: Enhancing feasibility of psycholinguistic research l NLP---a branch of
artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal Q. What’s the key to humans’ efficient language acquisition? Do some innate biases relate to? Raising children without any language experience from birth, then… [Coulton, 1972] Ethical issues

Case1: Enhancing feasibility of psycholinguistic research l NLP---a branch of
artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal Q. What’s the key to humans’ efficient language acquisition? Do some innate biases relate to? Raising children without any language experience from birth, then… [Coulton, 1972] Ethical issues As a computational simulation of human language acquisition, we trained language models under the situation as close as possible to the human language acquisition environment (e.g., multi-modality input), then identify which factors were important. [Warstadt&Bowman, 2022] Feasible

Case2: Exact simulation of humans in application l Need feedback
about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Suppose…

about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Okey, first, I compute self- attention over the full text… Suppose…

about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Model showing the same processing difficulty humans do. compatible the definition of t is written in …Section 4, and u is in Section 2, ... ah….! Okey, first, I compute self- attention over the full text… Suppose… human-like model

l Modeling humans using modern NLP - Theory: people try
to transmit a constant amount of information across time Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ?

l Modeling humans using modern NLP - Theory: people try
to transmit a constant amount of information across time - Modern approach: observing the surprisal −log 𝑝(word|context) computed by neural LM Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ? I have a pen that… −log 𝑝(word|context)

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) well
simulated human reading behavior 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP [Levy,2008][Smith&Levy,2013]

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) computed
by different models 𝜽 are compared 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP Too many LM variants 🤗 [Wilcox+,2020]…

What model computes human-like surprisal? l E.g., hierarchical bias v.s.
sequential bias 2022/12/6 MBZUAI vanilla (sequential) language model I know grammar α I know grammar β better correlation Grammar β is likely related to human sentence processing model A model B model C models [Hale+,ACL2018 (best paper)][Yoshida+,EMNLP2021]

Does scaling solve the cognitive modeling? l Scaling low works
even in modeling human reading behavior? 2022/12/6 MBZUAI ∝ scaling cognitively plausible model ? [Kuribayashi+, ACL21]

Does scaling solve the cognitive modeling? l Scaling low works
even in modeling human reading behavior? l Background: scaling low for language model performance (perplexity; PPL) 2022/12/6 MBZUAI ∝ scaling cognitively plausible model ? [Kuribayashi+, ACL21] [Kaplan+, 2020]

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the
cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 English better worse [Kuribayashi+, ACL21] better PPL better gaze duration modeling Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the
cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 r = 0.53 English Japanese better worse better worse scaling law breaks [Kuribayashi+, ACL21] better PPL better gaze duration modeling

l SOV language arguably incurs non-uniform processing cost across tokens
Human-like slow downs (speed up) diminished 2022/12/6 0 20 40 60 80 −15 −5 5 15 tokenN_in_sent s(tokenN_in_sent,3.7) 0 5 10 20 −100 −50 0 50 tokenN s(tokenN,2.62) Change of gaze duration (ms) position in sentence position in sentence Change of gaze duration (ms) Dundee Corpus (English) BCCWJ-EyeTrack (Japanese) stats. in toy corpus [Maurits+, 2010] reading time stats. [Kuribayashi+, 2021] MBZUAI Japanese [Kuribayashi+, ACL21]

l Human-like variations of surprisal diminished Human-like slow downs (speed
up) diminished 2022/12/6 Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram 400 Effect of syntactic category MBZUAI [Kuribayashi+, ACL21] better PPL by-word category variation of surprisal diminished

Similar reports: human-like slow downs diminished l Processing difficulty in
syntactic violation differs between LMs and humans - LMs under-predict the difficulty 2022/12/6 MBZUAI [Wilcox+, ACL21] I know that my mother sent the present to Taylor last weekend. I know who my mother sent the present to Taylor last weekend. 😌 😵💫

Similar reports: human-like slow downs diminished l Processing difficulty in
syntactic violation differs between LMs and humans - LMs under-predict the difficulty 2022/12/6 MBZUAI [Wilcox+, ACL21] I know that my mother sent the present to Taylor last weekend. I know who my mother sent the present to Taylor last weekend. gpt2 rnng jrnn grnn human Cleft FGD-obj FGD-pp FGD-sbj MVRR NPL-any-orc NPL-any-src NPL-ever-orc NPL-ever-src RNA-f-orc RNA-f-src RNA-m-orc RNA-m-src SVNA-orc SVNA-pp SVNA-src 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 Test Suite Slowdown in Milliseconds Predicted vs. Observed Slowdown Between Conditions 😌 😵💫 😵💫 😵💫 LMs too smoothly processed the text Humans are more surprised

Cognitive plausibility of noisy language model l Tested LMs with
limited context access 2022/12/6 MBZUAI [Kuribayashi+,EMNLP22] better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl … people wearing a red hat come …

Cognitive plausibility of noisy language model l Tested LMs with
limited context access l Theories: human context access during syntactic processing is limited - pressure that long dependencies/deep nesting are avoided in natural language 2022/12/6 MBZUAI [Kuribayashi+, EMNLP22] … people wearing a red hat come … better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl

Summary l There are at least three mindsets in NLP:
machine intelligence, computational philosophy, and computational psychology - they are not exclusive - not talking about which mindset is correct l Engineeringly good model is not always human-like (at least in our settings) - not intend that the current NLP directions are wrong - when achieving something, simply replicating nature is not always a good idea (e.g., airplane does not flap its wings as birds) l Understanding humans will continue to be challenging goal - at least scaling does not solve 2022/12/6 MBZUAI https://www.lesswrong.com/posts/eqxqgFxymP8hXDTt5/a nnouncing-the-inverse-scaling-prize-usd250k-prize-pool

Cognitive Plausibility of Neural Language Models

Cognitive Plausibility of Neural Language Models

tatsuki kuribayashi

More Decks by tatsuki kuribayashi

Other Decks in Research

Featured

Transcript

Cognitive Plausibility of Neural Language Models Tohoku Univ. (postdoc) Tatsuki

Research topics including collaborative works 1/2 l Computational psycholinguistics and

l Modeling discourse phenomena - Topicalization preferences in humans Japanese

Brief summary: What language models (LMs) simulate human reading better?

2022/12/6 MBZUAI What’s the goal of natural language processing

What’s the goal of NLP? l NLP---a branch of artificial

What’s the goal of NLP? l NLP---a branch of artificial

What’s the goal of NLP? l NLP---a branch of artificial

What’s the goal of NLP? l NLP---a branch of artificial

What’s the goal of NLP? l NLP---a branch of artificial

Case1: Enhancing feasibility of psycholinguistic research l NLP---a branch of

Case1: Enhancing feasibility of psycholinguistic research l NLP---a branch of

Case2: Exact simulation of humans in application l Need feedback

Case2: Exact simulation of humans in application l Need feedback

Case2: Exact simulation of humans in application l Need feedback

Case2: Exact simulation of humans in application l Need feedback

l Modeling humans using modern NLP - Theory: people try

l Modeling humans using modern NLP - Theory: people try

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) well

What model computes human-like surprisal? l Surprisal −log 𝑝𝜽(word|context) computed

What model computes human-like surprisal? l E.g., hierarchical bias v.s.

Does scaling solve the cognitive modeling? l Scaling low works

Does scaling solve the cognitive modeling? l Scaling low works

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the

l Language-dependent results 2022/12/6 MBZUAI Does scaling solve the

l SOV language arguably incurs non-uniform processing cost across tokens

l Human-like variations of surprisal diminished Human-like slow downs (speed

Similar reports: human-like slow downs diminished l Processing difficulty in

Similar reports: human-like slow downs diminished l Processing difficulty in

Cognitive plausibility of noisy language model l Tested LMs with

Cognitive plausibility of noisy language model l Tested LMs with

Summary l There are at least three mindsets in NLP: