Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ACL2023レポート − LLMの動向を中心に

ACL2023レポート − LLMの動向を中心に

ACL2023の発表論文について、LLMに関する下記のカテゴリごとに2本ずつ紹介しました。

・外部知識/ツールの活用(チュートリアル):P9
・推論用プロンプトエンジニアリング(チュートリアル):P12
・LLMによる学習データ作成/蒸留:P15
・LLMの編集:P18
・LLMの学習プロセス理解:P21

Masaru Isonuma

July 29, 2023
Tweet

More Decks by Masaru Isonuma

Other Decks in Research

Transcript

  1. 0% 5% 10% 15% 20% 25% 30% 35% 0 1,000

    2,000 3,000 4,000 5,000 6,000 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Acceptance Rate (Main) # of Submissions Main Findings Acceptance Rate (Main) ࡢ೥ʹൺ΂౤ߘ਺͸44%૿Ճ 3 ACL Wiki. https://aclweb.org/aclwiki/Conference_acceptance_rates # of Submissions: 4,864 # of Main: 1,074 # of Findings: 901
  2. • EMNLP2022͔ΒLLMͷΧςΰϦ͕ొ৔ • ଞͷΧςΰϦʹ΋LLMʹؔ͢Δൃදؚ͕·Ε͓ͯΓɺ࣮ଶ͸ߋʹଟ͍ҹ৅ 0% 5% 10% 15% 20% 25%

    30% 35% 0 50 100 150 200 250 300 350 400 N LP Applications M achine Learning for NLP Inform ation Extraction D ialogue and Interactive… Large Language M odels R esources and Evaluation Q uestion Answering Interpretability and Analysis of… M achine Translation G eneration Language G rounding to… Sum m arization C om putational Social Science… Sentim ent Analysis, Stylistic… Them e: R eality Check Inform ation R etrieval and Text… M ultilingualism and C ross-… Sem antics: Sentence-level… Speech and M ultim odality Syntax: Tagging, C hunking,… Ethics and N LP Sem antics: Lexical D iscourse and Pragm atics Linguistic Theories, C og.… Phonology, M orphology, and… Linguistic D iversity Acceptance Rate (Main) # of Submissions findings main acceptance rate (main) ΧςΰϦผʹΈΔͱɺLLM͸5൪໨ʹଟ͍౤ߘ਺ 4 Anna Rogers et al., Program Chairs’ Report on Peer Review at ACL 2023. https://aclanthology.org/2023.acl-long.report.pdf
  3. ֤औΓ૊Έʹ͍ͭͯɺACL2023Ͱൃද͞ΕͨจݙΛ঺հʢҰ෦ICLR/ICML2023࿦จΛؚΉʣ LLMͷ՝୊ͱऔΓ૊Έ 6 ݱঢ়ͷLLMʹ͓͚Δओͳ՝୊ ՝୊ʹର͢ΔऔΓ૊Έ • ϋϧγωʔγϣϯ • ਪ࿦ೳྗʢνϡʔτϦΞϧʣ •

    ܭࢉ/ֶशσʔλ࡞੒ίετ • ֶशͨ͠஌ࣝͷߋ৽ • ΑΓྑ͍Ϟσϧ/ֶशλεΫͷ୳ࡧ • ֎෦஌ࣝ/πʔϧͷ׆༻ʢνϡʔτϦΞϧʣ • ਪ࿦༻ϓϩϯϓτΤϯδχΞϦϯά • LLMʹΑΔֶशσʔλ࡞੒/ৠཹ • LLMͷฤू • LLMͷֶशϓϩηεཧղ
  4. LLMʹؔ͢Δࣄલ஌ࣝʢֶशํ๏ʣ 7 ࣄલֶश ΞϥΠϝϯτ ʢinstruction tuning/RLHFʣ I can't think of

    any scenario where the Chiefs don't win that game if Charles doesn't go down. What's that? Need to chew clock with the run game? How convenient that we have an All Pro running back! While I agree that Charles going down definitely affected the outcome of the game, it's not like their back-up crapped the bed either. Knile Davis did end up with 2 TDs, so while he's not going to be mistaken for Charles, he played a great Answer the category of the following news. On Friday, Apple will introduce a new iPhone ... input target game Technology ਓؒͷϓϩϯϓτʹରԠͰ͖ΔΑ͏ʹ ༷ʑͳλεΫΛղ͔ͤΔʢ≈ԋशʣ େྔͷจষதͷ࣍୯ޠΛ༧ଌʢ≈ಡॻʣ
  5. LLMʹؔ͢Δࣄલ஌ࣝʢLLMͷೳྗʣ 8 in-context learning (ICL) chain-of-thought (CoT) Q: Roger has

    5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? input output A: The answer is 27. A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9. ਪ࿦աఔΛྫࣔ͢Δ͜ͱͰ ਪ࿦λεΫΛΑΓਖ਼֬ʹղ͚Δ ༩͑ΒΕͨྫࣔʹԊͬͯ λεΫΛղ͘͜ͱ͕Ͱ͖Δ
  6. • LLM͸શͯͷ஌ࣝΛ֮͑Δ͜ͱ͸೉͘͠ɺ஌ࣝͷߋ৽΋ࠔ೉ => retrieverʹΑΔ஌ࣝͷิ׬͕༗ޮ – νϡʔτϦΞϧɿRetrieval-based Language Models and Applications

    – https://acl2023-retrieval-lm.github.io/ • ಉ༷ʹɺܭࢉث΍Խֶ൓Ԡ༧ଌثͳͲΛ૊ΈࠐΉ͜ͱͰɺLLMͷਪ࿦ೳྗ΍υϝΠϯ஌ࣝΛิ׬ ֎෦πʔϧ/஌ࣝͷ׆༻ 9 Who is the prime minister of the UK? Rishi Sunak becomes the prime minister in 2022. retriever LLM Rishi Sunak retrieverͷग़ྗΛ ϓϩϯϓτʹ݁߹ ֎෦σʔλϕʔε
  7. • ༗໊Ͱͳ͍ΤϯςΟςΟʢਓ෺໊ɺ஍໊ͳͲʣΛLLM͸هԱͮ͠Β͘ɺύϥϝʔλΛ૿΍ͯ͠΋ޮՌ͸ബ͍ • retrieverʹΑͬͯ֎෦஌ࣝΛิ଍͢Δͱɺ༗໊Ͱͳ͍ΤϯςΟςΟʹ͓͚Δੑೳ͕޲্ɻ ͨͩ͠ɺretriever͕ޡͬͨ֎෦஌ࣝΛิ଍ͯ͠͠·͏͜ͱͰɺ٫ͬͯੑೳ͕Լ͕Δ͜ͱ͕͋Δ When Not to Trust Language

    Models: Investigating Effectiveness of Parametric and Non-Parametric Memories Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi 10 https://aclanthology.org/2023.acl-long.546/
  8. • ֎෦πʔϧͷಋೖʹΑΓɺNumGLUEλεΫʢ਺஋ܭࢉͱԽֶ஌ࣝΛཁ͢ΔλεΫʣʹͯੑೳ͕େ෯ʹ޲্ MultiTool-CoT: GPT-3 Can Use Multiple External Tools with

    Chain of Thought Prompting Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, Sadao Kurohashi 11 https://aclanthology.org/2023.acl-short.130/ Few-shot examples ʹπʔϧτϦΨʔΛ Ճ͑Δ͜ͱͰɺͲͷ ৔໘ͰͲͷπʔϧΛ ݺͼग़͔͢ڭ͑Δ πʔϧτϦΨʔ͕ ੜ੒͞ΕͨΒੜ੒ Λதࢭ͠ɺݺͼग़ ͨ͠֎෦πʔϧͷ ग़ྗΛ݁߹ɻ݁߹ ޙʹੜ੒Λ࠶։
  9. • ਪ࿦ೳྗ͸ɺֶश͍ͯ͠ͳ͍ϓϩϯϓτ΁ͷ൚ԽʹෆՄܽ • ͔͠͠ɺ୯७ͳ଍͠ࢉ΍ίϐʔʹࣦഊ͢ΔͳͲɺLLMͷਪ࿦ೳྗʹ͸՝୊͋Γ (Qian et al., 2023) • νϡʔτϦΞϧɿComplex

    Reasoning in Natural LanguageͷҰ෦Ͱɺਪ࿦ೳྗΛิॿ͢ΔϓϩϯϓτΛ঺հ – https://wenting-zhao.github.io/complex-reasoning-tutorial/ ਪ࿦ϓϩϯϓτͷ޻෉ 12 Jing Qian, Hong Wang, Zekun Li, Shiyang Li, Xifeng Yan. Limitations of Language Models in Arithmetic and Symbolic Induction. ACL 2023. https://aclanthology.org/2023.acl-long.516/ ֶश ධՁ ʢະֶशʣ Do birds lay eggs? ʔ Yes Is quetzal a bird? ʔ Yes Does quetzal lay eggs? ॎ࣠: accuracy ԣ࣠: ਺ࣈͷܻ਺ ܻ਺͕ଟ͍਺΍ɺಉ͡਺ࣈ͕ ࿈ଓ͢Δ৔߹ʹࣦഊ͠΍͍͢ α͕େ͖͍΄Ͳ ಉ͡਺ࣈ͕࿈ଓ ֶशࡁ ະֶश ֶशࡁ ະֶश
  10. • ෳࡶͳ໰୊Λ୯७ͳ໰୊ʹ෼ղ͢Δ͜ͱͰɺֶशͨ͠σʔλΑΓෳࡶͳσʔλΛѻ͏λεΫͰಛʹੑೳ޲্ – compositional generalizationͷϕϯνϚʔΫSCANͰCoT: 16%ʹର͠ɺ99%ͷaccuracyΛୡ੒ Least-to-Most Prompting Enables Complex

    Reasoning in Large Language Models Denny Zhou et al., ICLR 2023 13 https://openreview.net/forum?id=WZH7099tgfM LLMͰ໰୊Λ෼ղ LLMʹ࠷ॳͷ໰୊Λղ͔ͤΔ LLMʹ࣍ͷ໰୊Λղ͔ͤΔ
  11. • instruction tuningͰ͸ɺֶशλεΫ͕ଟ͍΄Ͳ ൚Խੑೳ͕ߴ͘ͳΔʢWang et al., 2022ʣ • ͔͠͠ਓ͕࡞ΕΔֶशλεΫͷྔʹ͸ݶք͋Γ Þ

    LLMʹΑΔֶशσʔλ࡞੒ LLMʹΑΔֶशσʔλ࡞੒/ৠཹ 15 Wang et al., SUPER-NATURALINSTRUCTIONS:Generalization via Declarative Instructions on 1600+ NLP Tasks. EMNLP 2022 • CoT౳ͷೳྗͷൃݱʹ͸Ұఆͷύϥϝʔλ͕ඞཁ ʢemergent ability; Wei et al, 2022ʣ • খ͍͞LMʹLLMฒͷೳྗΛ࣋ͨͤΒΕͳ͍͔ʁ Þ LLMͷग़ྗΛখ͍͞LMͷֶशʹར༻ʢৠཹʣ
  12. Self-Instruct: Aligning Language Models with Self-Generated Instructions Yizhong Wang, Yeganeh

    Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi 16 https://aclanthology.org/2023.acl-long.754/ • GPT-3Ͱ࡞੒ͨ͠λεΫͰֶशͨ͠GPT-3͸ɺinstructionΛଊ͑ΒΕΔΑ͏ʹͳΔ͜ͱͰɺ 119λεΫͷzero-shotੑೳʹͯݩʑͷGPT-3Λେ্͖͘ճΔʢSuper-NaturalInstructionsϕϯνϚʔΫʣ গྔͷseed taskΛ༻ҙ seed taskΛ΋ͱʹin-context learningͰinstructionΛੜ੒ ෼ྨλεΫ͸ग़ྗ →ೖྗͷॱͰɺ ͦΕҎ֎͸ೖྗ→ ग़ྗͷॱͰੜ੒ ௿඼࣭/ྨࣅλεΫ ΛϑΟϧλ
  13. Large Language Models Are Reasoning Teachers ʢྨࣅݚڀ͕4ຊ΄Ͳൃදʣ Namgyu Ho, Laura

    Schmid, Se-Young Yun 17 https://aclanthology.org/2023.acl-long.830/ ԣ࣠: ڭࢣʹ༻͍ͨ GPT-3(175B)ͷछྨ CoTͰLLMʹਪ࿦աఔΛग़ྗͤ͞ɺͦͷਪ࿦աఔΛڭࢣσʔλʹ༻͍ͯখن໛LMΛֶश ຆͲͷλεΫͰਪ࿦ೳྗ޲্ɻൺֱత؆қͳλεΫͰ͸ڭࢣͷLLMʹඖఢ͢ΔҰํɺෳࡶͳλεΫͰ͸ڭࢣʹٴ͹ͣɻ
  14. • LLM͕هԱ͍ͯ͠Δ෩Խͨ͠஌ࣝΛߋ৽ͨ͠ΓɺϓϥΠόγʔʹؔΘΔ஌ࣝΛ࡟আ͍ͨ͠ • ͔͠͠ɺࣄલֶशͷ࠶࣮ߦ͸ߴίετɻֶशࡁΈϞσϧΛφΠʔϒʹ࠶ֶशͯ͠΋ɺ ݴ͍׵͑ΒΕͨ஌͕ࣝߋ৽͞Εͳ͔ͬͨΓɺؔ܎ͳ͍஌͕ࣝॻ͖׵͑ΒΕͯ͠·͏ (Cao et al., 2021) Þ

    ಛఆͷ஌ࣝͷΈΛߋ৽͢ΔϞσϧͷฤू͕ண໨ Ϟσϧͷฤू 18 Nicola De Cao, Wilker Aziz, Ivan Titov. Editing Factual Knowledge in Language Models. EMNLP 2021. Who is the prime minister of the UK? LLM Liz Truss Where does Rishi Sunak live? LLM 10 Downing St, London SW1A 2AA
  15. • ͋Δ஌ࣝΛߋ৽͢Δͱɺਪ࿦͞ΕΔ஌ࣝ΋·ͨߋ৽͞ΕΔ͔ʹண໨͠ɺධՁϕϯνϚʔΫΛఏҊ – ਪ࿦͞ΕΔ஌ࣝ΋ߋ৽͞ΕΔͳΒ͹ɺطଘͷ஌ࣝͱໃ६ͳ͘LLMʹ৽͍͠஌ࣝΛຒΊࠐΊΔ • طଘͷmodel editing͸஌ࣝΛߋ৽Ͱ͖Δ΋ͷͷɺ͔ͦ͜Βਪ࿦͞ΕΔ஌ࣝͷߋ৽͸ࠔ೉ – ୯७ʹ𝑥! ͷखલʹ𝑑!

    Λϓϩϯϓτͱͯ͠෇Ճͨ࣌͠ʹൺ΂Δͱɺߋ৽ਫ਼౓͸૬౰ʹ௿͍ Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge Yasumasa Onoe, Michael Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi 19 https://aclanthology.org/2023.acl-long.300/
  16. • LLM͔Β๨٫͍ͤͨ͞จষ𝒙ͷग़ݱ֬཰ΛԼ͛ΔΑ͏ʹɺԼهͷ໨తؔ਺ʢNLLʣΛ্͛Δ • ύϥϝʔλ਺͕ଟ͍Ϟσϧ΄ͲɺଞͷλεΫͷੑೳΛଛͳ͏͜ͱͳ͘๨٫Ͱ͖Δ • আڈର৅ͷจষͱྨࣅ͢Δจষ΍ɺআڈର৅ͷจষΛؚҙ͢Δจষ΋๨٫Ͱ͖Δ͔͸ෆ໌ Knowledge Unlearning for Mitigating

    Privacy Risks in Language Models Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo 20 https://aclanthology.org/2023.acl-long.805/ general task performance unlearning performance gradient ascent (ఏҊख๏) differential privacy decoding baseline training data deduplication
  17. • LLM͸໌ࣔతʹֶश͍ͯ͠ͳ͍ʹ΋ؔΘΒͣɺin-context learningʢICLʣ΍chain-of-thoughtʢCoTʣ͕ൃݱ – ࣄલֶशʹ࢖ΘΕΔίʔύεʹ͸ɺICL΍CoTΛ໌ࣔతʹؚΉจষ͸গͳ͍ʁʢཁݕূʣ • ICL΍CoT͸Ͳͷֶशσʔλ΍ΞʔΩςΫνϟʹىҼ͢Δͷ͔ʁ Þ ΑΓߴ౓ͳೳྗΛ࣋ͭLLMΛ։ൃ͢ΔͨΊͷώϯτʹͳΔ LLMͷֶशաఔͷཧղ

    21 in-context learning (ICL) chain-of-thought (CoT) Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? A: The answer is 27. Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9. ࣄલֶश I can't think of any scenario where the Chiefs don't win that game if Charles doesn't go down. What's that? Need to chew clock with the run game? How convenient that we have an All Pro running back! While I agree that Charles going down definitely affected the outcome of the game, it's not like their back-up crapped the bed either. Knile Davis did end up with 2 TDs, so while he's not going to be mistaken for Charles, he played a great game
  18. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

    Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun 22 https://aclanthology.org/2023.acl-long.153/ Chain-of-Thought Ͱྫࣔ͢Δਪ࿦աఔΛɺ࿦ཧతʹޡ͍ͬͯΔਪ࿦աఔ (Invalid Reasoning) ʹͯ͠ΈΔ ྫࣔͨ͠ਪ࿦աఔ͕࿦ཧతʹޡ͍ͬͯͯ΋ɺLLM͸CoTͱ΄΅ಉ͡ਖ਼౴཰Ͱਪ࿦աఔΛग़ྗ͢Δ Þ LLMͷਪ࿦ೳྗ͸ࣄલֶशͰඋΘ͓ͬͯΓɺCoT͸ΫΤϦͱͯͦ͠ΕΛҾ͖ग़͍ͯ͠ΔՄೳੑ ్தࣜ·ͰؚΊͨGSM8Kͷ೉қ౓ผਖ਼౴཰ʢF1ʣ ೉қ౓ʹղ͘ͷʹඞཁͳਪ࿦ճ਺ʢ#͸example਺ʣ
  19. • Ͳͷࣄલֶशσʔλ͕in-context learningʢICLʣΛՄೳʹ͢Δͷ͔໌Β͔ʹ͍ͨ͠ => ORCA (Han & Tsvetkov, 2022) ͰICLͱࣄલֶशͷޯ഑Λൺֱ͢Δ͜ͱͰಛఆ

    • ICLʹ༗ޮͳࣄલֶशσʔλ͸ɺ – ICLσʔλͱͷυϝΠϯͷྨࣅੑ͸ΈΒΕͳ͍ => υϝΠϯԣஅతʹICLೳྗΛ֫ಘ – ୯ޠ෼෍͕ൺֱతฏୱ => Ұൠతͳจষͱ୯ޠ෼෍͕ҟͳΔICLʹରԠͰ͖Δ – ΑΓ௕͍จ຺ͷཧղ͕ٻΊΒΕΔ => ௕͍จ຺ΛཧղͰ͖Δೳྗͷ֫ಘ͕ICLͷൃݱʹߩݙ Understanding In-Context Learning via Supportive Pretraining Data Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, Tianlu Wang 23 https://aclanthology.org/2023.acl-long.708/ ICLσʔλͷޯ഑ ࣄલֶशσʔλ1ͷޯ഑ ࣄલֶशσʔλ2ͷޯ഑ ࣄલֶशσʔλ1ͷํ͕ޯ഑͕ྨࣅ͢ΔͨΊin-context learningʹ༗ޮ
  20. • LLMͷਪ࿦ೳྗ͸Ҿ͖ଓ͖େ͖ͳ՝୊ʹͳΔ – LLM͸ඇৗʹଟ͘ͷσʔλΛֶश͍ͯ͠ΔͨΊɺҰݟͯ͠൚Խ͍ͯ͠ΔΑ͏ʹΈ͑Δ – ͔࣮͠͠͸ֶश͍ͯ͠ͳ͍σʔλʹ͸൚ԽͰ͖ͳ͍έʔε͕ࢄݟʢe.g., ܻ਺ͷେ͖͍਺ͷ଍͠ࢉʣ – ࠓޙLLMΛΑΓߴ౓ͳ׆ಈʢݚڀͳͲʣʹ׆༻͍ͯ͘͠ͱ͖ɺਪ࿦ೳྗͷ௿͞͸ϘτϧωοΫ •

    ԿΛֶशͤ͞Δͱਪ࿦ೳྗ্͕͕Δ͔ͱ͍͏ٞ࿦͕ࠓޙ͞ΒʹॏཁʹͳΔ – ݱࡏɺਪ࿦ೳྗΛ޲্ͤ͞Δํ๏ͱͯ͠ϓϩϯϓτΤϯδχΞϦϯάʢਓؒʹΑΔೖΕ஌ܙʣ͕ओྲྀ – ʮෳࡶͳ໰୊Λখ͞ͳ໰୊ʹ෼ղ͢ΔʯͳͲͷϝλͳ஌ܙΛLLMʹͲ͏਎ʹ͚ͭͤ͞Δ͔ – LLM͕༷࣋ͭʑͳೳྗ͕ԿΛֶश͢Δ͜ͱͰಘΒΕΔͷ͔ཧղ͢Δඞཁ ॴײʢ์ݴʣ 24 LLMʹ ͍ͭͯ ೔ຊʹ ͍ͭͯ • ೔ຊͷ౤ߘ਺ʹ઎ΊΔׂ߹͸Լ͕ͬͨҰํͰɺؖࠃͷଘࡏײ͕໨ཱͭ – Ұ֓ʹൺֱͰ͖ͳ͍΋ͷͷɺACL2019ͷ౤ߘ਺: 5Ґ→ACL2023ͷ౤ߘஶऀ਺: 9-10Ґʹޙୀ – ͦͷ෼໨ཱͭͷ͸ؖࠃʢACL2019ͷ౤ߘ਺: 8Ґ→ACL2023ͷ౤ߘஶऀ਺: 3Ґʣ – ؖࠃ੎ͷॴଐΛΈΔͱɺKAIST/ւ֎ؼࠃPI/LGͳͲͱͷڞಉݚڀ͕໨ཱͭ