NLP - Cognitive modeling using neural language models [Kuribayashi+,ACL2021] [Kuribayashi+,EMNLP2022] - Word preferences in language Mmdels [Kuribayashi+,ACL2020] - Language model training efficiency with respect to multi-modality or multi-linguality (to appear?) - Organizing committee of CMCL (Cognitive Modeling and Computational Linguistics) workshop 2023 (if the proposal is accepted) l Writing assistance - Writing assistance system (Langsmith) [Ito+(equal cont.),EMNLP2021(demo)] - Tool for developing NLP-powered editor (language server protocol) [Hagiwara+,EMNLP2019(demo)] - Translating rough drafts into academic-style texts [Ito+(equal cont.),INLG2019] Discourse processing (next slides…) Interpretability (next slides…) 2022/12/6 MBZUAI Today’s topic Service page
LMs [Fujihara+,COLING2022] (2nd author) - Ellipsis preferences in humans and Japanese LMs (to appear?) - Modeling event salience in a narrative [Otake+,COLING2020] - Argumentation structure parsing [Kuribayashi+,ACL2019] l Interpretability - Analyzing Transformers with vector norms [Kobayashi+,EMNLP2020,2021] (2nd author) - Chain-of-thought abilities of vanilla seq2seq [Aoki+,AACL-SRW2022 (best paper!, but non-archival)] Research topics including collaborative works 2/2 2022/12/6 MBZUAI
2022/12/6 MBZUAI Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram I have a pen that… E.g.,−log 𝑝(word|context) text humans E.g., human gaze duration LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl better PPL human-like context access limitation human-like good correlation ’s bias may be similar to the actual humans’ bias different biases 😲 😲
intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view:
intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) E.g., machine translation, information retrieval…
intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means)
intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal
artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal Q. What’s the key to humans’ efficient language acquisition? Do some innate biases relate to? Raising children without any language experience from birth, then… [Coulton, 1972] Ethical issues
artificial intelligence focusing on language l Go back to the definition of artificial intelligence [Shapiro, 2008]: 2022/12/6 MBZUAI … definition may be examined more closely by considering the field from three points of view: 1. Machine intelligence---push outwards the frontier of what we know how to program on computers, especially in the direction of tasks that, although we don’t know how to program them, people can perform. progressed (I believe) progressed (I believe) E.g., machine translation, information retrieval… E.g., gigantic language models 2. Computational philosophy---form a computational understanding of human- level intelligent behavior, without being restricted to the algorithms and data structures that the human mind actually does (or conceivably might) use. (if human-level intelligence is implementable on a computer by any means) 3. Computational psychology---understand human intelligent behavior by creating computer programs that behave in the same way that people do. For this goal it is important that the algorithm expressed by the program be the same algorithm that people actually use, and the data structure… often unstated, but pivotal goal Q. What’s the key to humans’ efficient language acquisition? Do some innate biases relate to? Raising children without any language experience from birth, then… [Coulton, 1972] Ethical issues As a computational simulation of human language acquisition, we trained language models under the situation as close as possible to the human language acquisition environment (e.g., multi-modality input), then identify which factors were important. [Warstadt&Bowman, 2022] Feasible
about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Suppose…
about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Suppose…
about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Okey, first, I compute self- attention over the full text… Suppose…
about my writing l Need expected (human) reader 2022/12/6 MBZUAI In what part of this text, readers might feel difficulty to follow? Ah…, the flow of the first paragraph is difficult to follow Super robust model magically(?) inferring the writer's intention. Model showing the same processing difficulty humans do. compatible the definition of t is written in …Section 4, and u is in Section 2, ... ah….! Okey, first, I compute self- attention over the full text… Suppose… human-like model
to transmit a constant amount of information across time Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ?
to transmit a constant amount of information across time - Modern approach: observing the surprisal −log 𝑝(word|context) computed by neural LM Case3: Measuring the texts 2022/12/6 MBZUAI Current techniques are not very good at estimating H(word|context), because we do not have a very good model of context,… [Genzel and Charniak, 2002] Technical issues −log 𝑝(word|context) ? I have a pen that… −log 𝑝(word|context)
simulated human reading behavior 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP [Levy,2008][Smith&Levy,2013]
by different models 𝜽 are compared 2022/12/6 MBZUAI I have a pen that… I have a pen that… E.g., surprisal −log 𝑝(word|context) text I have a pen that… humans E.g., human gaze duration modern NLP Too many LM variants 🤗 [Wilcox+,2020]…
sequential bias 2022/12/6 MBZUAI vanilla (sequential) language model I know grammar α I know grammar β better correlation Grammar β is likely related to human sentence processing model A model B model C models [Hale+,ACL2018 (best paper)][Yoshida+,EMNLP2021]
even in modeling human reading behavior? l Background: scaling low for language model performance (perplexity; PPL) 2022/12/6 MBZUAI ∝ scaling cognitively plausible model ? [Kuribayashi+, ACL21] [Kaplan+, 2020]
cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 English better worse [Kuribayashi+, ACL21] better PPL better gaze duration modeling Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram
cognitive modeling? Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram Spearman’s r = -0.87 r = 0.53 English Japanese better worse better worse scaling law breaks [Kuribayashi+, ACL21] better PPL better gaze duration modeling
up) diminished 2022/12/6 Trans-sm LSTM Trans-lg Model 100000 Number of updates 10000 1000 100 Data size LG MD SM + N-gram 400 Effect of syntactic category MBZUAI [Kuribayashi+, ACL21] better PPL by-word category variation of surprisal diminished
syntactic violation differs between LMs and humans - LMs under-predict the difficulty 2022/12/6 MBZUAI [Wilcox+, ACL21] I know that my mother sent the present to Taylor last weekend. I know who my mother sent the present to Taylor last weekend. 😌 😵💫
syntactic violation differs between LMs and humans - LMs under-predict the difficulty 2022/12/6 MBZUAI [Wilcox+, ACL21] I know that my mother sent the present to Taylor last weekend. I know who my mother sent the present to Taylor last weekend. gpt2 rnng jrnn grnn human Cleft FGD-obj FGD-pp FGD-sbj MVRR NPL-any-orc NPL-any-src NPL-ever-orc NPL-ever-src RNA-f-orc RNA-f-src RNA-m-orc RNA-m-src SVNA-orc SVNA-pp SVNA-src 0 200 400 0 200 400 0 200 400 0 200 400 0 200 400 Test Suite Slowdown in Milliseconds Predicted vs. Observed Slowdown Between Conditions 😌 😵💫 😵💫 😵💫 LMs too smoothly processed the text Humans are more surprised
limited context access 2022/12/6 MBZUAI [Kuribayashi+,EMNLP22] better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl … people wearing a red hat come …
limited context access l Theories: human context access during syntactic processing is limited - pressure that long dependencies/deep nesting are avoided in natural language 2022/12/6 MBZUAI [Kuribayashi+, EMNLP22] … people wearing a red hat come … better gaze duration modeling More severe noise English Japanese LSTM-xs-Wiki GPT2-xs-Wiki GPT2-md-Wiki GPT2-sm GPT2-md GPT2-lg GPT2-xl
machine intelligence, computational philosophy, and computational psychology - they are not exclusive - not talking about which mindset is correct l Engineeringly good model is not always human-like (at least in our settings) - not intend that the current NLP directions are wrong - when achieving something, simply replicating nature is not always a good idea (e.g., airplane does not flap its wings as birds) l Understanding humans will continue to be challenging goal - at least scaling does not solve 2022/12/6 MBZUAI https://www.lesswrong.com/posts/eqxqgFxymP8hXDTt5/a nnouncing-the-inverse-scaling-prize-usd250k-prize-pool