Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLMによる日本語ニュース記事の平易化 / Japanese News Articles Simplification via Large Language Models

LLMによる日本語ニュース記事の平易化 / Japanese News Articles Simplification via Large Language Models

2023/04/27に開催された「ChatGPT/OpenAI API/LLM活用事例~NewsPicksと朝日新聞の合同勉強会を公開」で、朝日新聞社の浦川が発表したスライドを一部修正してアップロードしています。

https://uzabase-tech.connpass.com/event/280301/

Transcript

  1. ே೔৽ฉͷςΩετฏқԽσʔλ͕͋Γ·͢ ே೔৽ฉͷهࣄʹରͯ͠ ೔ຊޠڭࢣͷํʑ͕ ͓΋ʹඇ฼ࠃޠ࿩ऀʹΉ͚ฏқԽ ʢ໿ จϖΞʣ {“src:”͍ͣΕ΋ं྆ͷޙ෦࠲੮Ͱ͏ͭΉ͍ͨ··໨ΛͭͿ͍ͬͯͨɻ", trg:"ೋਓͱ΋ ंͷ ޙ෦࠲੮Ͱ

    ԼΛ ޲͍ͯ ໨Λ ด͍ͯ͡·ͨ͠ɻ"} { src: ͜ͷ··ͷঢ়گ͕ଓ͘ͱɺܭ໿̎̌̌ԯԁͷෛ୲૿ʹͳΔͱ൑໌ͨ͠ɻ trg: ͜ͷ··ͷ ঢ়ଶ͕ ଓ͘ͱɺ શ෦Ͱ ໿̎̌̌ԯԁ΋ ଟ͘ ෷͏ ඞ ཁ͕ ͋Δͱ ෼͔Γ·ͨ͠ɻ}
  2. --.ʹΑΔςΩετฏқԽ Sentence Simplification via Large Language Models Yutao Feng1, Jipeng

    Qiang1, Yun Li1, Yunhao Yuan1, and Yi Zhu1 1 College of Information Engineering, Yangzhou University [email protected], {jpqiang, liyun, yhyuan, zhuyi}@yzu.edu.cn Abstract Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning. Large Lan- guage models (LLMs) have demonstrated the ability to perform a variety of natural lan- guage processing tasks. However, it is not yet known whether LLMs can be served as a high-quality sentence simplification system. In this work, we empirically analyze the zero- /few-shot learning ability of LLMs by evaluat- ing them on a number of benchmark test sets. Experimental results show LLMs outperform state-of-the-art sentence simplification meth- ods, and are judged to be on a par with human annotators. 1 Introduction Sentence Simplification (SS) is a task of rephras- ing a sentence into a new form that is easier to read and understand while retaining its meaning, which can be used for increasing accessibility for people with dyslexia(Rello et al., 2013), autism(Evans et al., 2014) et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022). Nevertheless, it remains unclear how LLMs per- form in SS task compared to current SS methods. To address this gap in research, we undertake a systematic evaluation of the Zero-/Few-Shot learning capability of LLMs, by assessing their performance on existing SS benchmarks. We carry out an empirical comparison of the performance of ChatGPT and the most advanced GPT3.5 model (text-davinci-003). To the best of our knowledge, this is the first study of LLMs’s capabilities on SS task, aiming to provide a pre- liminary evaluation, including simplification prompt, multilingual simplification, and simplification robust- ness. The key findings and insights are summarized as follows: (1) GPT3.5 or ChatGPT based on one-shot learn- ing outperform the state-of-the-art SS methods. We found that these models excel at deleting non-essential information and adding new information, while exist- ing supervised SS methods tend to preserve the content without change. (2) ChatGPT is a monolithic model capable of sup- porting multiple languages, which makes it a compre- hensive multilingual text simplification technique. Af- ter evaluating the performance of ChatGPT on the task :2302.11957v1 [cs.CL] 23 Feb 2023 IUUQTBSYJWPSHBCT
  3. Sentence Simplification via Large Language Models Yutao Feng1, Jipeng Qiang1,

    Yun Li1, Yunhao Yuan1, and Yi Zhu1 1 College of Information Engineering, Yangzhou University [email protected], {jpqiang, liyun, yhyuan, zhuyi}@yzu.edu.cn Abstract Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning. Large Lan- guage models (LLMs) have demonstrated the ability to perform a variety of natural lan- guage processing tasks. However, it is not yet known whether LLMs can be served as a high-quality sentence simplification system. In this work, we empirically analyze the zero- /few-shot learning ability of LLMs by evaluat- ing them on a number of benchmark test sets. Experimental results show LLMs outperform state-of-the-art sentence simplification meth- ods, and are judged to be on a par with human annotators. 1 Introduction Sentence Simplification (SS) is a task of rephras- ing a sentence into a new form that is easier to read and understand while retaining its meaning, which can be used for increasing accessibility for people with dyslexia(Rello et al., 2013), autism(Evans et al., 2014) et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022). Nevertheless, it remains unclear how LLMs per- form in SS task compared to current SS methods. To address this gap in research, we undertake a systematic evaluation of the Zero-/Few-Shot learning capability of LLMs, by assessing their performance on existing SS benchmarks. We carry out an empirical comparison of the performance of ChatGPT and the most advanced GPT3.5 model (text-davinci-003). To the best of our knowledge, this is the first study of LLMs’s capabilities on SS task, aiming to provide a pre- liminary evaluation, including simplification prompt, multilingual simplification, and simplification robust- ness. The key findings and insights are summarized as follows: (1) GPT3.5 or ChatGPT based on one-shot learn- ing outperform the state-of-the-art SS methods. We found that these models excel at deleting non-essential information and adding new information, while exist- ing supervised SS methods tend to preserve the content without change. (2) ChatGPT is a monolithic model capable of sup- porting multiple languages, which makes it a compre- hensive multilingual text simplification technique. Af- ter evaluating the performance of ChatGPT on the task :2302.11957v1 [cs.CL] 23 Feb 2023 IUUQTBSYJWPSHBCT --.ʹΑΔهࣄͷฏқԽʢ(15 $IBU(15ʣ ϓϩϯϓτʮҙຯ͸ม͑ͣʹγϯϓϧʹͯ͠ʯ ࣗಈʗਓखධՁͦΕͧΕͰߴ͍ਫ਼౓ --.ʹΑΔςΩετฏқԽ
  4. ϓϩϯϓτ I want you to replace my complex sentence with

    simple sentence(s). Keep the meaning same, but make them simpler. Output should be in Japanese, with spaces between morphemes. Complex: {Input } Simple: I want you to replace my complex sentence with simple sentence(s). Keep the meaning same, but make them simpler. Output should be in Japanese, with spaces between morphemes. Complex: {Complex Sentence } Simple: {Simple Sentence(s) } Complex: {Input } Simple: I want you to replace my complex sentence with simple sentence(s). Keep the meaning same, but make them simpler. Output should be in Japanese, with spaces between morphemes. Complex: {Complex Sentence } Simple: {Simple Sentence(s) } Complex: {Complex Sentence } Simple: {Simple Sentence(s) } Complex: {Complex Sentence } Simple: {Simple Sentence(s) } Complex: {Input } Simple: ;FSP4IPU 4JOHMF4IPU 5ISFF4IPU
  5. ϓϩϯϓτ I want you to replace my complex sentence with

    simple sentence(s). Keep the meaning same, but make them simpler. Output should be in Japanese, with spaces between morphemes. Complex:ԁ҆ͷӨڹͳͲͰࢿࡐՁ͕֨ߴಅͨͨ͠Ίͩͱ͍͏ɻ Simple:͜Ε͸ ԁ҆ͷ ӨڹͳͲͰ ࢿࡐՁ͕֨ ߴ͘ͳͬͨͨΊͩͦ͏Ͱ͢ɻ Complex:ݪࡐྉՁ֨ͷߴಅͳͲͷӨڹ͕ΈΒΕ͍ͯΔͱ͍͏ɻ Simple:ݪࡐྉՁ͕֨ ߴ͘ͳ͍ͬͯΔ͜ͱͳͲ͕ Өڹ͍ͯ͠Δͱ ߟ͑ΒΕΔͦ͏Ͱ͢ɻ Complex:ٸ଎ͳԁ҆Ͱւ֎ͷചΓ্͕͛๲ΒΜͩ͜ͱ΋ɺۀ੷Λԡ্͛ͨ͠ɻ Simple:ٸʹ ਐΉ ԁ҆Ͱ ւ֎ͷ ചΓ্͕͛ େ͖͘ͳͬͨ͜ͱ΋ɺ ۀ੷Λ ্͛·ͨ͠ɻ Complex:͔͠͠మ߯΍໦ࡐͳͲݐஙࢿࡐ͸༌ೖ͕ଟ͘ɺϩγΞͷ΢ΫϥΠφ৵߈ͷӨڹ ΍ԁ͕҆ॏͳΓɺࢿࡐՁ͕֨ߴಅɻ Simple: 5ISFF4IPUʢ3FUSJFWBMʣ ,OPXMFEHF *OQVU ͔͠͠మ߯΍໦ࡐͳͲݐஙࢿࡐ͸༌ೖ͕ ଟ͘ɺϩγΞͷ΢ΫϥΠφ৵߈ͷӨڹ΍ ԁ͕҆ॏͳΓɺࢿࡐՁ͕֨ߴಅɻ ʢֶशσʔλͷຒΊࠐΈදݱʣ ೖྗͱDPTڑ཭ͷ͍ۙྫΛ 'FX4IPUͱͯ͠༩͑Δ
  6. ࣮ࡍͷੜ੒ ͙͢͞·உੑ͕શࢀՃऀʹ޲͔ͬͯʮࠓճͷઆ໌Ͱཧղͨ͠ਓ͕͍ͨΒڍख͍ͯͩ͘͠͞ʯ ͱݺͼ͔͚ͨɻ͕ͩɺ୭΋खΛڍ͛ͳ͔ͬͨɻ உੑ͕ࢀՃऀʹʮཧղͨ͠ʁʯͱਘͶͨɻ͕ͩɺख͸ڍ͕Βͳ͔ͬͨɻ உੑ͕ ʮࠓճͷ આ໌Ͱ ཧղͨ͠ ਓ͕ ͍ͨΒ

    ڍख͍ͯͩ͘͠͞ʯͱ શࢀՃऀʹ ݺͼ ͔͚·ͨ͠ɻ ͔͠͠ɺ ୭΋ खΛ ڍ͛ͳ͔ͬͨɻ உੑ͕͙͢ʹ ʮࠓճͷ આ໌Λ ཧղͨ͠ ਓ͸ खΛ ڍ͍͛ͯͩ͘͞ʯͱ શһʹ ݺͼ͔ ͚͕ͨɺ ୭΋ खΛ ڍ͛ͳ͔ͬͨɻ ͙͢͞· உੑ͕ શһʹ ޲͔ͬͯɺʮࠓճͷ આ໌Λ ཧղͨ͠ ਓ͸ खΛ ڍ͍͛ͯͩ͘͞ʯͱ ݺͼ͔͚·ͨ͠ɻ ͔͠͠ɺ ୭΋ खΛ ڍ͛·ͤΜͰͨ͠ɻ #"35 ;FSP4IPU 4JOHMF4IPU 5ISFF4IPU 5ISFF4IPU ʢ3FUSJFWBMʣ ͙͢͞· உੑ͕ શࢀՃऀʹ ޲͔ͬͯ ʮ ࠓճͷ આ໌Ͱ ཧղͨ͠ਓ͕͍ͨΒ ڍखͯͩ͘͠ ͍͞ʯͱ ݺͼ͔͚·ͨ͠ɻ͕ͩɺ ୭΋ खΛڍ͛·ͤΜͰͨ͠ɻ ਖ਼ղ ͦΕ͔Β ͙͢ʹ உੑ͕ ࢀՃऀͷ ΈΜͳʹ ޲͔ͬͯ ʮࠓ೔ͷ આ໌Ͱ ཧղͨ͠ ਓ͕ ͍ͨ Β खΛ ͍͋͛ͯͩ͘͞ʯͱ ݺͼ͔͚·ͨ͠ɻͰ΋ɺ ୭΋ खΛ ͋͛·ͤΜͰͨ͠ɻ ೖྗ
  7. ࣮ࡍͷੜ੒ I want you to replace my complex sentence with

    simple sentence(s). Keep the meaning same, but make them simpler. Output should be in Japanese, with spaces between morphemes. Complex:̍̏೔ͷ೔ؖट೴ձஊͰ΋ૣظղܾΛΊ͟͢ํ਑Λ֬ೝ͕ͨ͠ɺ͸͖ͬΓͱͨ͠߹ҙͷಓے͕ݟ͑ͯ͸͍ͳ͍ɻ Simple:̍̏೔ͷ ೔ຊͱ ؖࠃͷ ट೴ձஊͰ΋ ૣ͘ ղܾ͍ͨ͠ͱ͍͏ ߟ͑Λ ֬ೝ͠·͕ͨ͠ɺҙݟΛ ߹ΘͤΔͨΊͷ ਐΊํ ͸ Θ͔͍͍ͬͯͯ·ͤΜɻ Complex:ʮಉࢤࠃ౳ͱͷࠃࡍతڠྗʯͳͲ̐ܦඅΛʮ૯߹తͳ๷Ӵମ੍ͷڧԽʹࢿ͢Δܦඅʯͱ͢Δҙ޲Λද໌͕ͨ͠ɺ๷Ӵল͔Β ͸ʮࠃࡍతڠྗͱ͸Կ͔ʯͱࠔ࿭ͷ੠͕͕͋Δɻ Simple:ʮ஥ؒͷ ࠃ౳ͱͷ ࠃࡍతڠྗʯͳͲ ̐ͭͷ ܦඅΛ ʮ૯߹తͳ ๷Ӵମ੍ͷ ڧԽʹ ग़͢ ͓ۚʯͱ͢Δͭ΋Γͩͱ ࣔ ͠·ͨ͠ɻ͔͠͠ɺ ๷Ӵল͔Β͸ ʮࠃࡍతڠྗͱ͸ Կ͔ʯͱ Θ͔Βͳ͍ͱ͍͏ ੠͕ ͕͋Γ·ͨ͠ɻ Complex:Ұํɺʮʢࠓޙʣҡ৽ͱ࿩Λ͠ͳ͍ͱ͍͏͜ͱͰ͸ͳ͍ʯͱ΋ޠΓɺ࿈ܞ΁ͷະ࿅Λʹ͡·ͤͨɻ Simple:Ұํɺ ʮʢ͜Ε͔Β͸ʣҡ৽ͱ ࿩Λ ͠ͳ͍ͱ͍͏͜ͱͰ͸ͳ͍ʯͱ΋ ݴͬͯɺ ࿈ܞ͔ͨͬͨ͠ͱ ͍͏ ڧ͍ ؾ࣋ͪ Λ ද͠·ͨ͠ɻ Complex:ͨͩɺʮࠃࡍతڠྗʯΛओཁͳ࿦఺ͱͯٞ͠࿦༷ͨ͠ࢠ͸ͳ͍ɻ Simple: Three-Shot(Retrieval): ͔͠͠ɺ ʮࠃࡍతڠྗʯΛ ओͳ ࿩୊ͱͯ͠ ࿩ͨ͠ ༷ࢠ͸ ͋Γ·ͤΜɻ Three-Shot: ͔͠͠ɺʮࠃࡍత ڠྗʯΛ ओཁͳ ࿦఺ͱͯ͠ ٞ࿦ͨ͠ ༷ࢠ͸ ͳ͍ɻ ਖ਼ղ: ͔͠͠ɺ ʮࠃࡍతڠྗʯΛ ओཁͳ ࿩͢ ಺༰ͱͯ͠ ٞ࿦ͨ͠ ༷ࢠ͸ ͋Γ·ͤΜɻ
  8. ;FSP4IPUͰͭ͘Δʮ͍ΖΜͳ΍͍͞͠೔ຊޠʯ͠ΜͿΜ ᶃ ݟ ग़ ͠ ੜ ੒ ᶄஈ֊తͳฏқԽ ᶅٯʹ೉ղԽ ᶆ

    ̐ ί Ϛ ͷ ͨ Ί ͷ ϓ ϩ ϯ ϓ τ ੜ ੒ ᶇ೉͍͠୯ޠͷநग़ͱղઆ ᶈର࿩ܗࣜͷղઆจੜ੒