Masaru Isonuma The University of Edinburgh/The University of Tokyo ACL2023Ϩϙʔτ − LLMͷಈ޲Λத৺ʹ

ACL͸ܭࢉݴޠֶ/ࣗવݴޠॲཧͷτοϓࠃࡍձٞ 2 Toshihiro Kamishima. ML, DM, and AI Conference Map.

0% 5% 10% 15% 20% 25% 30% 35% 0 1,000 2,000 3,000 4,000 5,000 6,000 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Acceptance Rate (Main) # of Submissions Main Findings Acceptance Rate (Main) ࡢ೥ʹൺ΂౤ߘ਺͸44%૿Ճ 3 ACL Wiki. # of Submissions: 4,864 # of Main: 1,074 # of Findings: 901

• EMNLP2022͔ΒLLMͷΧςΰϦ͕ొ৔ • ଞͷΧςΰϦʹ΋LLMʹؔ͢Δൃදؚ͕·Ε͓ͯΓɺ࣮ଶ͸ߋʹଟ͍ҹ৅ 0% 5% 10% 15% 20% 25% 30% 35% 0 50 100 150 200 250 300 350 400 N LP Applications M achine Learning for NLP Inform ation Extraction D ialogue and Interactive… Large Language M odels R esources and Evaluation Q uestion Answering Interpretability and Analysis of… M achine Translation G eneration Language G rounding to… Sum m arization C om putational Social Science… Sentim ent Analysis, Stylistic… Them e: R eality Check Inform ation R etrieval and Text… M ultilingualism and C ross-… Sem antics: Sentence-level… Speech and M ultim odality Syntax: Tagging, C hunking,… Ethics and N LP Sem antics: Lexical D iscourse and Pragm atics Linguistic Theories, C og.… Phonology, M orphology, and… Linguistic D iversity Acceptance Rate (Main) # of Submissions findings main acceptance rate (main) ΧςΰϦผʹΈΔͱɺLLM͸5൪໨ʹଟ͍౤ߘ਺ 4 Anna Rogers et al., Program Chairs’ Report on Peer Review at ACL 2023.

• ೔ຊ͸ϨϏϡϫʔ਺ͱ౤ߘ࿦จͷஶऀ਺͕΄΅ಉ͡ => एखݚڀऀͷ౤ߘ͕ൺֱతগͳ͍ or ଟ͘ͷݚڀऀ͕ϨϏϡʔʹࢀՃʁ ౤ߘ࿦จͷஶऀ਺ॱͰΈΔͱ೔ຊ͸9-10Ґ 5 Anna Rogers et al., Program Chairs’ Report on Peer Review at ACL 2023.

֤औΓ૊Έʹ͍ͭͯɺACL2023Ͱൃද͞ΕͨจݙΛ঺հʢҰ෦ICLR/ICML2023࿦จΛؚΉʣ LLMͷ՝୊ͱऔΓ૊Έ 6 ݱঢ়ͷLLMʹ͓͚Δओͳ՝୊ ՝୊ʹର͢ΔऔΓ૊Έ • ϋϧγωʔγϣϯ • ਪ࿦ೳྗʢνϡʔτϦΞϧʣ • ܭࢉ/ֶशσʔλ࡞੒ίετ • ֶशͨ͠஌ࣝͷߋ৽ • ΑΓྑ͍Ϟσϧ/ֶशλεΫͷ୳ࡧ • ֎෦஌ࣝ/πʔϧͷ׆༻ʢνϡʔτϦΞϧʣ • ਪ࿦༻ϓϩϯϓτΤϯδχΞϦϯά • LLMʹΑΔֶशσʔλ࡞੒/ৠཹ • LLMͷฤू • LLMͷֶशϓϩηεཧղ

LLMʹؔ͢Δࣄલ஌ࣝʢֶशํ๏ʣ 7 ࣄલֶश ΞϥΠϝϯτ ʢinstruction tuning/RLHFʣ I can't think of any scenario where the Chiefs don't win that game if Charles doesn't go down. What's that? Need to chew clock with the run game? How convenient that we have an All Pro running back! While I agree that Charles going down definitely affected the outcome of the game, it's not like their back-up crapped the bed either. Knile Davis did end up with 2 TDs, so while he's not going to be mistaken for Charles, he played a great Answer the category of the following news. On Friday, Apple will introduce a new iPhone ... input target game Technology ਓؒͷϓϩϯϓτʹରԠͰ͖ΔΑ͏ʹ ༷ʑͳλεΫΛղ͔ͤΔʢ≈ԋशʣ େྔͷจষதͷ࣍୯ޠΛ༧ଌʢ≈ಡॻʣ

LLMʹؔ͢Δࣄલ஌ࣝʢLLMͷೳྗʣ 8 in-context learning (ICL) chain-of-thought (CoT) Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? input output A: The answer is 27. A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9. ਪ࿦աఔΛྫࣔ͢Δ͜ͱͰ ਪ࿦λεΫΛΑΓਖ਼֬ʹղ͚Δ ༩͑ΒΕͨྫࣔʹԊͬͯ λεΫΛղ͘͜ͱ͕Ͱ͖Δ

• LLM͸શͯͷ஌ࣝΛ֮͑Δ͜ͱ͸೉͘͠ɺ஌ࣝͷߋ৽΋ࠔ೉ => retrieverʹΑΔ஌ࣝͷิ׬͕༗ޮ – νϡʔτϦΞϧɿRetrieval-based Language Models and Applications – • ಉ༷ʹɺܭࢉث΍Խֶ൓Ԡ༧ଌثͳͲΛ૊ΈࠐΉ͜ͱͰɺLLMͷਪ࿦ೳྗ΍υϝΠϯ஌ࣝΛิ׬ ֎෦πʔϧ/஌ࣝͷ׆༻ 9 Who is the prime minister of the UK? Rishi Sunak becomes the prime minister in 2022. retriever LLM Rishi Sunak retrieverͷग़ྗΛ ϓϩϯϓτʹ݁߹ ֎෦σʔλϕʔε

• ༗໊Ͱͳ͍ΤϯςΟςΟʢਓ෺໊ɺ஍໊ͳͲʣΛLLM͸هԱͮ͠Β͘ɺύϥϝʔλΛ૿΍ͯ͠΋ޮՌ͸ബ͍ • retrieverʹΑͬͯ֎෦஌ࣝΛิ଍͢Δͱɺ༗໊Ͱͳ͍ΤϯςΟςΟʹ͓͚Δੑೳ͕޲্ɻ ͨͩ͠ɺretriever͕ޡͬͨ֎෦஌ࣝΛิ଍ͯ͠͠·͏͜ͱͰɺ٫ͬͯੑೳ͕Լ͕Δ͜ͱ͕͋Δ When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi 10

• ֎෦πʔϧͷಋೖʹΑΓɺNumGLUEλεΫʢ਺஋ܭࢉͱԽֶ஌ࣝΛཁ͢ΔλεΫʣʹͯੑೳ͕େ෯ʹ޲্ MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, Sadao Kurohashi 11 Few-shot examples ʹπʔϧτϦΨʔΛ Ճ͑Δ͜ͱͰɺͲͷ ৔໘ͰͲͷπʔϧΛ ݺͼग़͔͢ڭ͑Δ πʔϧτϦΨʔ͕ ੜ੒͞ΕͨΒੜ੒ Λதࢭ͠ɺݺͼग़ ͨ͠֎෦πʔϧͷ ग़ྗΛ݁߹ɻ݁߹ ޙʹੜ੒Λ࠶։

• ਪ࿦ೳྗ͸ɺֶश͍ͯ͠ͳ͍ϓϩϯϓτ΁ͷ൚ԽʹෆՄܽ • ͔͠͠ɺ୯७ͳ଍͠ࢉ΍ίϐʔʹࣦഊ͢ΔͳͲɺLLMͷਪ࿦ೳྗʹ͸՝୊͋Γ (Qian et al., 2023) • νϡʔτϦΞϧɿComplex Reasoning in Natural LanguageͷҰ෦Ͱɺਪ࿦ೳྗΛิॿ͢ΔϓϩϯϓτΛ঺հ – ਪ࿦ϓϩϯϓτͷ޻෉ 12 Jing Qian, Hong Wang, Zekun Li, Shiyang Li, Xifeng Yan. Limitations of Language Models in Arithmetic and Symbolic Induction. ACL 2023. ֶश ධՁ ʢະֶशʣ Do birds lay eggs? ʔ Yes Is quetzal a bird? ʔ Yes Does quetzal lay eggs? ॎ࣠: accuracy ԣ࣠: ਺ࣈͷܻ਺ ܻ਺͕ଟ͍਺΍ɺಉ͡਺ࣈ͕ ࿈ଓ͢Δ৔߹ʹࣦഊ͠΍͍͢ α͕େ͖͍΄Ͳ ಉ͡਺ࣈ͕࿈ଓ ֶशࡁ ະֶश ֶशࡁ ະֶश

• ෳࡶͳ໰୊Λ୯७ͳ໰୊ʹ෼ղ͢Δ͜ͱͰɺֶशͨ͠σʔλΑΓෳࡶͳσʔλΛѻ͏λεΫͰಛʹੑೳ޲্ – compositional generalizationͷϕϯνϚʔΫSCANͰCoT: 16%ʹର͠ɺ99%ͷaccuracyΛୡ੒ Least-to-Most Prompting Enables Complex Reasoning in Large Language Models Denny Zhou et al., ICLR 2023 13 LLMͰ໰୊Λ෼ղ LLMʹ࠷ॳͷ໰୊Λղ͔ͤΔ LLMʹ࣍ͷ໰୊Λղ͔ͤΔ

• LLMʹΑΓճ౴Λੜ੒ʢ𝒚! ʣ→ಉҰͷLLMʹΑΔϑΟʔυόοΫʢ𝐟𝐛ʣ →ϑΟʔυόοΫʹج͖ͮճ౴Λੜ੒ ʢ𝒚𝒕#𝟏 ʣΛ܁Γฦ͢ • ϑΟʔυόοΫΛ܁Γฦ͢΄ͲɺλεΫͷੑೳ͕޲্ SELF-REFINE: Iterative Refinement with Self-Feedback Aman Madaan et al., ICML 2023 14 # of iterations

• instruction tuningͰ͸ɺֶशλεΫ͕ଟ͍΄Ͳ ൚Խੑೳ͕ߴ͘ͳΔʢWang et al., 2022ʣ • ͔͠͠ਓ͕࡞ΕΔֶशλεΫͷྔʹ͸ݶք͋Γ Þ LLMʹΑΔֶशσʔλ࡞੒ LLMʹΑΔֶशσʔλ࡞੒/ৠཹ 15 Wang et al., SUPER-NATURALINSTRUCTIONS:Generalization via Declarative Instructions on 1600+ NLP Tasks. EMNLP 2022 • CoT౳ͷೳྗͷൃݱʹ͸Ұఆͷύϥϝʔλ͕ඞཁ ʢemergent ability; Wei et al, 2022ʣ • খ͍͞LMʹLLMฒͷೳྗΛ࣋ͨͤΒΕͳ͍͔ʁ Þ LLMͷग़ྗΛখ͍͞LMͷֶशʹར༻ʢৠཹʣ

Self-Instruct: Aligning Language Models with Self-Generated Instructions Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi 16 • GPT-3Ͱ࡞੒ͨ͠λεΫͰֶशͨ͠GPT-3͸ɺinstructionΛଊ͑ΒΕΔΑ͏ʹͳΔ͜ͱͰɺ 119λεΫͷzero-shotੑೳʹͯݩʑͷGPT-3Λେ্͖͘ճΔʢSuper-NaturalInstructionsϕϯνϚʔΫʣ গྔͷseed taskΛ༻ҙ seed taskΛ΋ͱʹin-context learningͰinstructionΛੜ੒ ෼ྨλεΫ͸ग़ྗ →ೖྗͷॱͰɺ ͦΕҎ֎͸ೖྗ→ ग़ྗͷॱͰੜ੒ ௿඼࣭/ྨࣅλεΫ ΛϑΟϧλ

Large Language Models Are Reasoning Teachers ʢྨࣅݚڀ͕4ຊ΄Ͳൃදʣ Namgyu Ho, Laura Schmid, Se-Young Yun 17 ԣ࣠: ڭࢣʹ༻͍ͨ GPT-3(175B)ͷछྨ CoTͰLLMʹਪ࿦աఔΛग़ྗͤ͞ɺͦͷਪ࿦աఔΛڭࢣσʔλʹ༻͍ͯখن໛LMΛֶश ຆͲͷλεΫͰਪ࿦ೳྗ޲্ɻൺֱత؆қͳλεΫͰ͸ڭࢣͷLLMʹඖఢ͢ΔҰํɺෳࡶͳλεΫͰ͸ڭࢣʹٴ͹ͣɻ

• LLM͕هԱ͍ͯ͠Δ෩Խͨ͠஌ࣝΛߋ৽ͨ͠ΓɺϓϥΠόγʔʹؔΘΔ஌ࣝΛ࡟আ͍ͨ͠ • ͔͠͠ɺࣄલֶशͷ࠶࣮ߦ͸ߴίετɻֶशࡁΈϞσϧΛφΠʔϒʹ࠶ֶशͯ͠΋ɺ ݴ͍׵͑ΒΕͨ஌͕ࣝߋ৽͞Εͳ͔ͬͨΓɺؔ܎ͳ͍஌͕ࣝॻ͖׵͑ΒΕͯ͠·͏ (Cao et al., 2021) Þ ಛఆͷ஌ࣝͷΈΛߋ৽͢ΔϞσϧͷฤू͕ண໨ Ϟσϧͷฤू 18 Nicola De Cao, Wilker Aziz, Ivan Titov. Editing Factual Knowledge in Language Models. EMNLP 2021. Who is the prime minister of the UK? LLM Liz Truss Where does Rishi Sunak live? LLM 10 Downing St, London SW1A 2AA

• ͋Δ஌ࣝΛߋ৽͢Δͱɺਪ࿦͞ΕΔ஌ࣝ΋·ͨߋ৽͞ΕΔ͔ʹண໨͠ɺධՁϕϯνϚʔΫΛఏҊ – ਪ࿦͞ΕΔ஌ࣝ΋ߋ৽͞ΕΔͳΒ͹ɺطଘͷ஌ࣝͱໃ६ͳ͘LLMʹ৽͍͠஌ࣝΛຒΊࠐΊΔ • طଘͷmodel editing͸஌ࣝΛߋ৽Ͱ͖Δ΋ͷͷɺ͔ͦ͜Βਪ࿦͞ΕΔ஌ࣝͷߋ৽͸ࠔ೉ – ୯७ʹ𝑥! ͷखલʹ𝑑! Λϓϩϯϓτͱͯ͠෇Ճͨ࣌͠ʹൺ΂Δͱɺߋ৽ਫ਼౓͸૬౰ʹ௿͍ Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge Yasumasa Onoe, Michael Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi 19

• LLM͔Β๨٫͍ͤͨ͞จষ𝒙ͷग़ݱ֬཰ΛԼ͛ΔΑ͏ʹɺԼهͷ໨తؔ਺ʢNLLʣΛ্͛Δ • ύϥϝʔλ਺͕ଟ͍Ϟσϧ΄ͲɺଞͷλεΫͷੑೳΛଛͳ͏͜ͱͳ͘๨٫Ͱ͖Δ • আڈର৅ͷจষͱྨࣅ͢Δจষ΍ɺআڈର৅ͷจষΛؚҙ͢Δจষ΋๨٫Ͱ͖Δ͔͸ෆ໌ Knowledge Unlearning for Mitigating Privacy Risks in Language Models Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo 20 general task performance unlearning performance gradient ascent (ఏҊख๏) differential privacy decoding baseline training data deduplication

• LLM͸໌ࣔతʹֶश͍ͯ͠ͳ͍ʹ΋ؔΘΒͣɺin-context learningʢICLʣ΍chain-of-thoughtʢCoTʣ͕ൃݱ – ࣄલֶशʹ࢖ΘΕΔίʔύεʹ͸ɺICL΍CoTΛ໌ࣔతʹؚΉจষ͸গͳ͍ʁʢཁݕূʣ • ICL΍CoT͸Ͳͷֶशσʔλ΍ΞʔΩςΫνϟʹىҼ͢Δͷ͔ʁ Þ ΑΓߴ౓ͳೳྗΛ࣋ͭLLMΛ։ൃ͢ΔͨΊͷώϯτʹͳΔ LLMͷֶशաఔͷཧղ 21 in-context learning (ICL) chain-of-thought (CoT) Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? A: The answer is 27. Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9. ࣄલֶश I can't think of any scenario where the Chiefs don't win that game if Charles doesn't go down. What's that? Need to chew clock with the run game? How convenient that we have an All Pro running back! While I agree that Charles going down definitely affected the outcome of the game, it's not like their back-up crapped the bed either. Knile Davis did end up with 2 TDs, so while he's not going to be mistaken for Charles, he played a great game

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun 22 Chain-of-Thought Ͱྫࣔ͢Δਪ࿦աఔΛɺ࿦ཧతʹޡ͍ͬͯΔਪ࿦աఔ (Invalid Reasoning) ʹͯ͠ΈΔ ྫࣔͨ͠ਪ࿦աఔ͕࿦ཧతʹޡ͍ͬͯͯ΋ɺLLM͸CoTͱ΄΅ಉ͡ਖ਼౴཰Ͱਪ࿦աఔΛग़ྗ͢Δ Þ LLMͷਪ࿦ೳྗ͸ࣄલֶशͰඋΘ͓ͬͯΓɺCoT͸ΫΤϦͱͯͦ͠ΕΛҾ͖ग़͍ͯ͠ΔՄೳੑ ్தࣜ·ͰؚΊͨGSM8Kͷ೉қ౓ผਖ਼౴཰ʢF1ʣ ೉қ౓ʹղ͘ͷʹඞཁͳਪ࿦ճ਺ʢ#͸example਺ʣ

• Ͳͷࣄલֶशσʔλ͕in-context learningʢICLʣΛՄೳʹ͢Δͷ͔໌Β͔ʹ͍ͨ͠ => ORCA (Han & Tsvetkov, 2022) ͰICLͱࣄલֶशͷޯ഑Λൺֱ͢Δ͜ͱͰಛఆ • ICLʹ༗ޮͳࣄલֶशσʔλ͸ɺ – ICLσʔλͱͷυϝΠϯͷྨࣅੑ͸ΈΒΕͳ͍ => υϝΠϯԣஅతʹICLೳྗΛ֫ಘ – ୯ޠ෼෍͕ൺֱతฏୱ => Ұൠతͳจষͱ୯ޠ෼෍͕ҟͳΔICLʹରԠͰ͖Δ – ΑΓ௕͍จ຺ͷཧղ͕ٻΊΒΕΔ => ௕͍จ຺ΛཧղͰ͖Δೳྗͷ֫ಘ͕ICLͷൃݱʹߩݙ Understanding In-Context Learning via Supportive Pretraining Data Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, Tianlu Wang 23 ICLσʔλͷޯ഑ ࣄલֶशσʔλ1ͷޯ഑ ࣄલֶशσʔλ2ͷޯ഑ ࣄલֶशσʔλ1ͷํ͕ޯ഑͕ྨࣅ͢ΔͨΊin-context learningʹ༗ޮ

• LLMͷਪ࿦ೳྗ͸Ҿ͖ଓ͖େ͖ͳ՝୊ʹͳΔ – LLM͸ඇৗʹଟ͘ͷσʔλΛֶश͍ͯ͠ΔͨΊɺҰݟͯ͠൚Խ͍ͯ͠ΔΑ͏ʹΈ͑Δ – ͔࣮͠͠͸ֶश͍ͯ͠ͳ͍σʔλʹ͸൚ԽͰ͖ͳ͍έʔε͕ࢄݟʢe.g., ܻ਺ͷେ͖͍਺ͷ଍͠ࢉʣ – ࠓޙLLMΛΑΓߴ౓ͳ׆ಈʢݚڀͳͲʣʹ׆༻͍ͯ͘͠ͱ͖ɺਪ࿦ೳྗͷ௿͞͸ϘτϧωοΫ • ԿΛֶशͤ͞Δͱਪ࿦ೳྗ্͕͕Δ͔ͱ͍͏ٞ࿦͕ࠓޙ͞ΒʹॏཁʹͳΔ – ݱࡏɺਪ࿦ೳྗΛ޲্ͤ͞Δํ๏ͱͯ͠ϓϩϯϓτΤϯδχΞϦϯάʢਓؒʹΑΔೖΕ஌ܙʣ͕ओྲྀ – ʮෳࡶͳ໰୊Λখ͞ͳ໰୊ʹ෼ղ͢ΔʯͳͲͷϝλͳ஌ܙΛLLMʹͲ͏਎ʹ͚ͭͤ͞Δ͔ – LLM͕༷࣋ͭʑͳೳྗ͕ԿΛֶश͢Δ͜ͱͰಘΒΕΔͷ͔ཧղ͢Δඞཁ ॴײʢ์ݴʣ 24 LLMʹ ͍ͭͯ ೔ຊʹ ͍ͭͯ • ೔ຊͷ౤ߘ਺ʹ઎ΊΔׂ߹͸Լ͕ͬͨҰํͰɺؖࠃͷଘࡏײ͕໨ཱͭ – Ұ֓ʹൺֱͰ͖ͳ͍΋ͷͷɺACL2019ͷ౤ߘ਺: 5Ґ→ACL2023ͷ౤ߘஶऀ਺: 9-10Ґʹޙୀ – ͦͷ෼໨ཱͭͷ͸ؖࠃʢACL2019ͷ౤ߘ਺: 8Ґ→ACL2023ͷ౤ߘஶऀ਺: 3Ґʣ – ؖࠃ੎ͷॴଐΛΈΔͱɺKAIST/ւ֎ؼࠃPI/LGͳͲͱͷڞಉݚڀ͕໨ཱͭ