$30 off During Our Annual Pro Sale. View Details »

論文紹介:ChatGPT で情報抽出タスクは解けるのか? Is information extraction solved by ChatGPT? An analysis of performance, evaluation criteria, robustness and errors

Shota Kato
July 11, 2023

論文紹介:ChatGPT で情報抽出タスクは解けるのか? Is information extraction solved by ChatGPT? An analysis of performance, evaluation criteria, robustness and errors

Shota Kato

July 11, 2023
Tweet

More Decks by Shota Kato

Other Decks in Research

Transcript

  1. ࿦จ঺հɿChatGPT Ͱ৘ใநग़λεΫ͸ղ͚Δͷ͔ʁ
    Is information extraction solved by ChatGPT?
    An analysis of performance, evaluation criteria, robustness and errors
    Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu Liu, and Xiang Wan
    加藤 祥太
    [email protected]
    京都大学大学院 情報学研究科 情報学専攻
    システム科学コース ヒューマンシステム論分野
    2023/07/11 全体ゼミ

    View Slide

  2. l ෺ཧϞσϧࣗಈߏஙAIͷ࣮ݱʹ৘ใநग़λεΫ͕͔ܽͤͳ͍ͨΊɽ
    l จॻ͔Β৘ใΛநग़Ͱ͖Δ / ·ͱΊͯ͘ΕΔΞϓϦέʔγϣϯ͕͋ΔҰํͰɼ
    Կ͕Ͱ͖ͯԿ͕Ͱ͖ͳ͍ͷ͔͕໌Β͔Ͱ͸ͳ͍ͨΊɽ
    ໨తɿҎԼͷٙ໰ʹର͢Δݱঢ়Λڞ༗͢Δ͜ͱ
    l ChatGPTͰࣗવݴޠॲཧλεΫΛͲͷΑ͏ʹղ͘ͷ͔ʁ
    l ͲΕ͘Β͍ͷੑೳ͕ୡ੒Ͱ͖Δͷ͔ʁ
    l Կ͕Ͱ͖ͳ͍ͷ͔ʁ
    l ChatGPTΛͲ͏࢖͏ͷ͕ྑ͍͔ʁ
    ͳͥ͜ͷ࿦จΛબΜ͔ͩʁ 1

    View Slide

  3. l ̍̓ͷ৘ใநग़λεΫʹ͓͚ΔChatGPTͷੑೳΛݕূ͠ɼ
    طଘख๏ͷ࠷ߴੑೳʢstate-of-the-art; SOTAʣʹ͸ୡ͠ͳ͍͜ͱΛࣔͨ͠ɽ
    l ੑೳͱChatGPTͷೖग़ྗ͓Αͼσʔληοτͷؔ܎Λௐࠪͨ͠ɽ
    l Few-shot + in-context learning ʹΑΓੑೳΛվળͰ͖Δ͕SOTAʹ͸ٴ͹ͳ͍ɽ
    l Chain of thought ͱ Few-shot + in-context learning ͷੑೳࠩ͸ۇ͔ͩͬͨɽ
    l ChatGPT͸ɼ༩͑ΒΕͨೖྗͱແؔ܎ͳग़ྗΛ͢Δ͜ͱ͕΄ͱΜͲͳ͍ɽ
    l ແؔ܎ͳจ຺͕ೖྗʹؚ·Ε͍ͯΔ৔߹΍ग़ݱճ਺͕كͳʢlong-tail ͳʣ
    λʔήοτλΠϓΛର৅ͱ͢Δ৔߹ʹChatGPTͷੑೳ͸େ͖͘௿Լ͢Δɽ
    l ChatGPTͷग़ྗ͸ɼΞϊςʔγϣϯ͞ΕͨεύϯΑΓ௕͍܏޲ʹ͋ͬͨɽ
    l ChatGPTͷޡ౴ͷଟ͘͸ʮΞϊςʔγϣϯ͞Ε͍ͯͳ͍෦෼Λநग़͢Δʯ͜ͱ͕ݪҼͩͬͨɽ
    ཁ໿ 2

    View Slide

  4. ࣗવݴޠ͔ΒࣄલʹఆΊΒΕͨ࿮ΛຒΊΔ৘ใΛநग़͢Δٕज़ɽ
    ྫʣݻ༗දݱೝࣝʢNamed entity recognition; NERʣ
    ςΩετ͔Βਓ໊΍৔ॴͳͲͷݻ༗දݱΛநग़͢ΔλεΫ
    ৘ใநग़λεΫ 3
    https://github.com/zliucr/CrossNER

    View Slide

  5. l ݻ༗දݱೝࣝʢNamed entity recognition; NERʣ
    l ؔ܎நग़ʢRelation extraction; REʣ
    l Πϕϯτநग़ʢEvent extraction; EEʣ
    l ؍఺ײ৘ղੳʢAspect-based sentiment analysis; ABSAʣ
    ຊ࿦จͰର৅ͱ͢Δ৘ใநग़λεΫ 4
    Relation Entity1 Entity2 Example
    located_in loc loc (New York, US)
    work_for per org (Bill Gates, Microsoft)
    live_in per loc (Bush, US)
    kill per per (Oswald, JFK)
    ؔ܎நग़σʔληοτCoNLL04 [Roth&Yih,04]ʹؚ·ΕΔؔ܎ͷྫ

    View Slide

  6. 2020
    2023
    2021
    1-4
    5-8
    9-10
    1-3
    4-6
    7-10
    11-12
    T5
    GPT-3
    WebGPT
    BLOOMZ
    Galatica
    mT0 LLaMA
    2019
    FLAN
    InstructGPT
    GPT-NeoX-20B
    CodeGen
    OPT
    OPT-IML
    MT-NLG
    T0
    Tk-Instruct
    1-6
    GPT-4
    GShard
    UL2
    PaLM Flan-T5
    Flan-PaLM
    Sparrow
    ChatGPT
    Ernie 3.0 Titan
    Yuan 1.0
    PanGu-Σ
    Gopher
    GLaM
    mT5 PanGu-
    PLUG
    Bard
    LaMDA
    CPM-2
    HyperCLOVA
    Publicly Available
    Codex
    Jurassic-1
    Ernie 3.0
    Anthropic
    NLLB
    Cohere
    Pythia
    Vicuna
    Luminous
    YaLM
    11-12
    2022
    GLM
    AlexaTM
    BLOOM
    WeLM
    AlphaCode
    Chinchilla
    CodeGeeX
    Falcon
    എܠ | େن໛ݴޠϞσϧʢLarge language models; LLMsʣ 5
    [Zhao+,23]

    View Slide

  7. େن໛ݴޠϞσϧͰ͸ɼೖྗʢϓϩϯϓτʣΛ޻෉͢Δ͚ͩͰߴ͍ੑೳΛୡ੒Ͱ͖Δɽ
    l ຊจதֶशʢIn-Context Learning; ICLʣ[Brown+,20]
    l ࢥߟ࿈࠯ܕϓϩϯϓτʢChain-of-Thought prompting; CoTʣ[Wei+,22]
    େن໛ݴޠϞσϧͷϓϩϯϓτ 6
    A: The cafeteria had 23 apples originally. They used
    20 to make lunch. So they had 23 - 20 = 3. They
    bought 6 more apples, so they have 3 + 6 = 9. The
    answer is 9.
    Chain-of-Thought Prompting
    Q: Roger has 5 tennis balls. He buys 2 more cans of
    tennis balls. Each can has 3 tennis balls. How many
    tennis balls does he have now?
    A: The answer is 11.
    Q: The cafeteria had 23 apples. If they used 20 to
    make lunch and bought 6 more, how many apples
    do they have?
    A: The answer is 27.
    Standard Prompting
    Q: Roger has 5 tennis balls. He buys 2 more cans of
    tennis balls. Each can has 3 tennis balls. How many
    tennis balls does he have now?
    A: Roger started with 5 balls. 2 cans of 3 tennis balls
    each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
    Q: The cafeteria had 23 apples. If they used 20 to
    make lunch and bought 6 more, how many apples
    do they have?
    Model Input
    Model Output Model Output
    Model Input
    [Brown+,20]

    View Slide

  8. ৘ใநग़λεΫʹର͢ΔChatGPTͷੑೳΛධՁ͢Δ͜ͱ
    ̐ͭͷ؍఺ͰධՁ
    l ੑೳ
    l 14ͷ৘ใநग़λεΫʹؔ࿈͢Δ17σʔληοτ
    l ֤σʔληοτʹରͯ͠3छྨͷϓϩϯϓτɿzero-shot, few-shot ICL / CoT
    l ධՁࢦඪ
    l ؤ݈ੑ
    l ޡ౴ͷݪҼ
    ຊ࿦จͷ໨త 7

    View Slide

  9. ֤λεΫͰ̐ͭҎ্ͷσʔληοτΛબ୒ɽ14αϒλεΫɼ17σʔληοτΛ࢖༻ɽ
    l ݻ༗දݱೝࣝʢNERʣ
    l Flat entity Recognition (NER-Flat)ɿ֊૚ߏ଄ͷͳ͍ΤϯςΟςΟΛର৅ͱ͢Δɽ
    ʨCoNLL03 [Sang&Meulder,03], FewNERD [Ding+,21]ʩ
    l Nested Entity Recognition (NER-Nested)ɿ֊૚ߏ଄ͷ͋ΔΤϯςΟςΟΛର৅ͱ͢Δɽ
    ʨACE04 [Doddington+,04], ACE05-Ent [Walker+,06], GENIA [Ohta+,02]ʩ
    l ؔ܎நग़ʢREʣ
    l Relation Classification (RE-RC)ɿςΩετதͷΤϯςΟςΟؒͷؔ܎Λಛఆ͢Δɽ
    {CoNLL04 [Roth&Yih,04], NYT-multi [Zeng+,18], TACRED [Zhang+,17],
    SemEval2010 [Hendrickx+,10]}
    l Relational Triplet Extraction (RE-Triplet)ɿΤϯςΟςΟͷಛఆͱؔ܎நग़Λಉ࣌ʹߦ͏ɽ
    {CoNLL04 [Roth&Yih,04], NYT-multi [Zeng+,18], SemEval2010 [Hendrickx+,10]}
    l Πϕϯτநग़ʢEEʣ ɼ؍఺ײ৘ղੳʢABSAʣ
    ର৅λεΫͱσʔληοτ | NER / RE
    8
    ※ʨʩ಺͸σʔληοτ໊Λද͢ɽ
    EEͱABSAͷৄࡉ͸লུ

    View Slide

  10. ϓϩϯϓτʹ͸̑ͭͷཁૉΛؚΊΔ
    λεΫͷࢦࣔɼީิϥϕϧʢΤϯςΟςΟ΍ؔ܎ͷछྨͱ͍ͬͨநग़ର৅ͷ৘ใͷछྨʣɼ
    ग़ྗܗࣜͷઆ໌ɼग़ྗྫʢICLͱCoTͰ࢖༻͢ΔʣɼೖྗςΩετʢ৘ใநग़ݩͷจʣ
    ϓϩϯϓτͷઃܭ 9
    [Han+,23]

    View Slide

  11. ϓϩϯϓτʹ͸̑ͭͷཁૉΛؚΊΔ
    λεΫͷࢦࣔɼީิϥϕϧʢΤϯςΟςΟ΍ؔ܎ͷछྨͱ͍ͬͨநग़ର৅ͷ৘ใͷछྨʣɼ
    ग़ྗܗࣜͷઆ໌ɼग़ྗྫʢICLͱCoTͰ࢖༻͢ΔʣɼೖྗςΩετʢ৘ใநग़ݩͷจʣ
    ϓϩϯϓτͷઃܭ 10
    [Han+,23]

    View Slide

  12. ࣮ݧઃఆ | ̏छྨͷϓϩϯϓτͷ࡞੒ 11
    ;FSPTIPU
    Few-shot
    in-context learning
    Few-shot chain-of-
    thought prompting
    Zero-shot ͷ࣮ݧͰ࠷ߴੑೳΛୡ੒ͨ͠ϓϩϯϓτʹɼ
    ܇࿅༻σʔληοτ͔Β̑ͭͷαϯϓϧΛϥϯμϜʹ௥Ճ͢Δɽ
    ͲͷαϯϓϧΛબ୒͔͕ͨ͠ੑೳʹӨڹ͢ΔͨΊɼ
    ϥϯμϜʹ̑ճαϯϓϧબ୒Λ࣮ߦ͠ɼੑೳͷฏۉͱඪ४ภࠩΛൺֱ͢Δɽ
    ̑ͭͷҟͳΔzero-shot ϓϩϯϓτΛਓखͰઃܭ͢Δɽ
    ChatGPTͷੑೳ͸ೖྗʹӨڹ͞ΕΔͷͰɼ̑ͭͷϓϩϯϓτʹର͢Δ
    ੑೳͷฏۉͱඪ४ภࠩΛൺֱʹ༻͍Δɽ
    ຊจதֶशͰ༻͍ͨϓϩϯϓτʹࢥߟ࿈࠯ͷઆ໌Λ௥Ճ͢Δɽ
    ࢥߟ࿈࠯ͷઆ໌͸ChatGPTΛ༻͍ͳ͕ΒਓखͰ࡞੒͢Δɽ
    ̑ճͷ࣮ݧͰಘͨੑೳͷฏۉͱඪ४ภࠩΛൺֱ͢Δɽ

    View Slide

  13. l OpenAI͕ఏڙ͢ΔAPIΛ࢖༻͢Δɽ
    l ର࿩ཤྺͷӨڹΛແͨ͘͢Ίɼ֤ςεταϯϓϧʹରͯ͠ผʑʹԠ౴Λੜ੒͢Δɽ
    l ςετσʔληοτͷαϯϓϧ਺͸࠷େ̏̌̌̌ͱ͢Δɽ
    ʢςεταϯϓϧͷ਺͕ଟ͍ͱAPIͷ੍ֹۚݶʹୡ͢ΔͨΊʣ
    ChatGPTͷੑೳΛݕূͨ͠ઌߦݚڀ [Jiao+,23; Wei+23]Ͱ͸20–30αϯϓϧΛ࢖༻͍ͯͨ͠ɽ
    ʲࢀߟʳ
    CoNLL03ʹରͯ͠1छྨͷϓϩϯϓτͰ࣮ݧͨ͠ͱ͖ͷAPIར༻ྉɿ$0.79
    ϓϩϯϓτ͕̑छྨɼೖྗͷ༩͑ํ͕̏छྨʢzero-shot, ICL, CoTʣͳͷͰɼ
    CoNLL03ͷ࣮ݧ݁ՌΛ࠶ݱ͢Δͷʹཁ͢ΔAPIར༻ྉ͸ɼ$0.79 x 15 = $11.85ɽ
    ʢαϒλεΫɼσʔληοτʣͷ૊Έ߹Θͤ͸߹ܭ49૊ɽ
    ͢΂ͯͷ࣮ݧͷ࠶ݱʹཁ͢ΔAPIར༻ྉʢ֓ࢉʣɿ$11.85 x 49 = $580.65
    l Micro-f1ʢਖ਼ղ཰ͱಉ͡ʣΛطଘͷ࠷ߴੑೳʢstate-of-the-art; SOTA)ͱൺֱ͢Δɽ
    ࣮ݧઃఆ | ͦͷଞͷ৚݅ 12

    View Slide

  14. l ChatGPTͱSOTAʹ͸େ͖ͳ͕ࠩ͋Δɽ
    l λεΫɾγφϦΦ͕೉͍͠ͱੑೳ͕ࠩେ͖͘ͳΔɽ
    l λεΫͷ೉қ౓≒λΠϓͷछྨͷଟ͞
    l γφϦΦͷ೉͠͞≒λεΫࣗମͷෳࡶ͞ʢNER-Nested͸NER-FlatΑΓ΋γφϦΦ͕೉͍͠ʣ
    l Few-shot ICLʹΑΓੑೳ͕޲্͕ͨ͠ɼCoTʹΑΔੑೳ޲্෯͸খ͔ͬͨ͞ɽ
    ݁Ռ | NER
    13
    Task Dataset SOTA
    zero-shot 5-shot ICL 5-shot CoT
    max mean (std) mean (std) mean (std)
    NER-Flat CoNLL03 94.6 65.13 60.10 (3.81) 70.53 (1.44) 74.73 (1.08)
    FewNERD 67.1 34.28 31.56 (2.44) 36.87 (0.71) 46.55 (0.64)
    NER-Nested ACE04 88.5 29.55 27.80 (3.10) 38.52 (2.51) 40.57 (1.83)
    ACE05-Ent 87.5 24.77 23.38 (1.92) 36.17 (1.78) 33.98 (0.69)
    GENIA 81.5 39.43 38.09 (1.65) 48.82 (1.31) 50.89 (1.00)
    ※ ද͸࿦จͷTable 1Λݩʹ࡞੒͠ɼSOTAͷख๏͸লུͨ͠ɽ

    View Slide

  15. l ChatGPTͱSOTAʹ͸େ͖ͳ͕ࠩ͋Δɽ
    l λεΫɾγφϦΦ͕೉͍͠ͱੑೳ͕ࠩେ͖͘ͳΔɽ
    l Few-shot ICLʹΑΓੑೳ͕޲্͕ͨ͠ɼCoTʹΑΔੑೳ޲্෯͸খ͔ͬͨ͞ɽ
    ݁Ռ | RE
    14
    ※ ද͸࿦จͷTable 1Λݩʹ࡞੒͠ɼSOTAͷख๏͸লུͨ͠ɽ
    Task Dataset SOTA
    zero-shot 5-shot ICL 5-shot CoT
    max mean (std) mean (std) mean (std)
    RE-RC CoNLL04 65.82 59.21 (3.85) 55.32 (4.56) -
    NYT-multi 93.5 38.74 30.96 (5.51) 26.88 (2.74) -
    TACRED 75.6 21.58 19.47 (1.49) 27.84 (3.48) -
    SemEval2010 91.3 43.32 39.27 (2.20) 39.44 (2.55) -
    RE-Triplet CoNLL04 78.8 23.04 17.84 (3.43) 23.30 (1.29) 11.09 (4.83)
    NYT-multi 86.8 3.79 3.48 (0.24) 12.24 (0.59) 2.33 (1.64)
    SemEval2010 73.2 7.65 5.82 (1.29) 12.85 (1.14) -

    View Slide

  16. ৘ใநग़λεΫʹର͢ΔChatGPTͷੑೳΛධՁ͢Δ͜ͱ
    ̐ͭͷ؍఺ͰධՁ
    l ੑೳ
    l 14ͷ৘ใநग़λεΫʹؔ࿈͢Δ17σʔληοτ
    l ֤σʔληοτʹରͯ͠3छྨͷϓϩϯϓτɿzero-shot, few-shot ICL / CoT
    l ධՁࢦඪ
    l ؤ݈ੑ
    l ޡ౴ͷݪҼ
    ຊ࿦จͷ໨తʢ࠶ܝʣ 15

    View Slide

  17. l ChatGPTͷग़ྗ͸ɼਖ਼ղΑΓ΋௕͍εύϯΛఏࣔ͢Δ܏޲ʹ͋ͬͨɽ
    l ιϑτϚονϯάͰධՁͨ͠ΒɼF1஋͕࠷େ14.53ϙΠϯτ޲্ͨ͠ɽ
    ιϑτϚονϯάɿ༧ଌ݁Ռ͕ਖ਼ղΛؚΈɼਖ਼ղͱ༧ଌ݁Ռͷྨࣅ౓͕͖͍͠஋ΑΓ΋ߴ͍৔߹ʹɼ
    ༧ଌ݁ՌΛਖ਼ղͱ͢Δɽ
    ߟ࡯ | ධՁࢦඪ 16
    Annotated spans Predicted spans
    PGA Europro Tour 2021 PGA Europro Tour
    University of Michigan The University of Michigan
    NERʹ͓͚ΔΞϊςʔγϣϯ෦෼ͱ༧ଌ݁ՌͷྫʢTable 2ΑΓൈਮʣ
    Task Dataset SOTA Hard Soft ΔF1 (%)
    NER-Flat CoNLL03 94.6 60.10 62.12 +2.02 (3.4%)
    NER-Nested ACE05-Ent 87.5 23.38 33.97 +10.59 (45.3%)
    RE-Triplet CoNLL04 78.8 17.84 24.75 +6.91 (38.7%)
    ιϑτϚονϯάʢSoftʣΛ༻͍ͨͱ͖ͷݩͷධՁࢦඪʢHardʣͱͷࠩʢTable 3ΑΓൈਮʣ

    View Slide

  18. l ग़ྗ͕ࢦఆͨ͠ϑΥʔϚοτͰ͸ͳ͍ɼ
    ·ͨ͸ग़ྗʹҙਤ͠ͳ͍಺༰ʢNERͷ৔߹
    ࢦఆ͍ͯ͠ͳ͍λΠϓʣؚ͕·ΕΔ
    αϯϓϧͷׂ߹͸ɼશମͷ਺ˋͩͬͨɽ
    l Zero-shot ϓϩϯϓτʹແؔ܎ͷςΩετ
    ΛؚΊΔͱશλεΫͰੑೳ͕௿Լͨ͠ɽ
    ߟ࡯ | ؤ݈ੑʢग़ྗͱೖྗͷܗࣜʣ 17

    View Slide

  19. l ग़ݱස౓͕ߴ͍λΠϓͷ༧ଌੑೳ͸
    ग़ݱස౓͕௿͍λΠϓͷ༧ଌੑೳΑΓ
    ߴ͔ͬͨɽ
    ߟ࡯ | ؤ݈ੑʢλΠϓͷग़ݱස౓ɼͦͷଞʣ 18
    l ؔ܎நग़λεΫʹ͓͍ͯɼΤϯςΟςΟͷॱংΛม͑ͯ΋ੑೳ͸ͦΕ΄Ͳ௿Լ͠
    ͳ͔ͬͨͷͰɼChatGPT͸ओମ-٬ମͷؔ܎ΛཧղͰ͖͍ͯͳ͍ɽ
    ྫʣ㾎; ✗

    View Slide

  20. ߟ࡯ | ޡ౴ͷݪҼ 19
    Error type #Error Ratio (%)
    Missing spans 2,979 15.4
    Unmentioned spans 284 1.5
    Unannotated spans 6,361 32.9
    Incorrect span offsets 1,744 9.0
    Undefined types 883 4.6
    Incorrect types 4,296 22.2
    Other 2,801 14.4
    Total 19,348 100
    NER-FlatʢCoNLL03ʣʹ͓͚Δޡ౴ͷ෼ྨ݁Ռ
    l Ξϊςʔτ͞Ε͍ͯͳ͍εύϯΛग़ྗͨ͜͠ͱʹΑΔؒҧ͍͕࠷ଟɽ
    l ্Ґ̏ͭͷؒҧ͍ʢΞϊςʔτ͞Ε͍ͯͳ͍εύϯΛग़ྗɼࢦఆ͞Ε͍ͯͳ͍λ
    ΠϓΛग़ྗɼΞϊςʔτ͞ΕͨεύϯΑΓ΋ग़ྗ͕୹͍ʣ͕໿̓̌%Λ઎ΊΔɽ
    l σʔληοτͷΞϊςʔγϣϯͷ࣭ʹ
    ໰୊͕͋Δ͔΋͠Εͳ͍͜ͱΛࣔࠦ͠
    ͍ͯΔʁ
    → ChatGPTΛ༻͍ͯΞϊςʔγϣϯΛ
    ͢Δͱྑ͍͔΋ʁ

    View Slide

  21. l ̍̓ͷ৘ใநग़λεΫʹ͓͚ΔChatGPTͷੑೳΛݕূ͠ɼ
    طଘख๏ͷ࠷ߴੑೳʢstate-of-the-art; SOTAʣʹ͸ୡ͠ͳ͍͜ͱΛࣔͨ͠ɽ
    l ੑೳͱChatGPTͷೖग़ྗ͓Αͼσʔληοτͷؔ܎Λௐࠪͨ͠ɽ
    l Few-shot + in-context learning ʹΑΓੑೳΛվળͰ͖Δ͕SOTAʹ͸ٴ͹ͳ͍ɽ
    l Chain of thought ͱ Few-shot + in-context learning ͷੑೳࠩ͸ۇ͔ͩͬͨɽ
    l ChatGPT͸ɼ༩͑ΒΕͨೖྗͱແؔ܎ͳग़ྗΛ͢Δ͜ͱ͕΄ͱΜͲͳ͍ɽ
    l ແؔ܎ͳจ຺͕ೖྗʹؚ·Ε͍ͯΔ৔߹΍ग़ݱճ਺͕كͳʢlong-tailͳʣ
    λʔήοτλΠϓΛର৅ͱ͢Δ৔߹ʹChatGPTͷੑೳ͸େ͖͘௿Լ͢Δɽ
    l ChatGPTͷग़ྗ͸ɼΞϊςʔγϣϯ͞ΕͨεύϯΑΓ௕͍܏޲ʹ͋ͬͨɽ
    l ChatGPTͷޡ౴ͷଟ͘͸ʮΞϊςʔγϣϯ͞Ε͍ͯͳ͍෦෼Λநग़͢Δʯ͜ͱ͕ݪҼͩͬͨɽ
    ຊ࿦จͷ·ͱΊ 20

    View Slide

  22. l ChatGPTͳͲͷձ࿩ܗLLMΛ༻͍ͯɼHTMLܗࣜͷ࿦จ͔Β෺ੑσʔλʢ෺໊࣭ɺ
    ஋ɺ୯ҐʣΛநग़͢Δख๏ΛఏҊͨ͠ɽ
    l σʔλΛؚΉจষΛࣝผ͠ɼσʔλ͕ෳ਺ؚ·Ε͍ͯΔ͔Ͳ͏͔ʹԠͯ͡
    มԽ͢Δ৘ใநग़ϑϩʔΛఏҊͨ͠ɽ
    l ఏҊख๏Ͱ͸ਫ਼౓ɾ࠶ݱ౓ͱ΋ʹ90%Ҏ্Λୡ੒ͨ͠ɽ
    ChatGPTΛ༻͍ͨ৘ใநग़ͷݚڀ̍ [Polak&Morgan,23]
    21

    View Slide

  23. LLMͰ৘ใநग़λεΫΛߦ͏ zero-shot ͷϑϨʔϜϫʔΫChatIEΛఏҊͨ͠ɽ
    ChatGPTΛ༻͍ͨ৘ใநग़ͷݚڀ̎ [Wei+,23]
    22

    View Slide

  24. l ChatGPTͰࣗવݴޠॲཧλεΫΛͲͷΑ͏ʹղ͘ͷ͔ʁ
    l ϓϩϯϓτͷઃܭʢจ຺಺ֶशɼࢥߟ࿈࠯ɼfew-shot ɼϊΠζআڈͳͲʣ
    l ϑϨʔϜϫʔΫઃܭʢλεΫͷଟஈԽɼϑϩʔνϟʔτͷઃఆʣ
    l ͲΕ͘Β͍ͷੑೳ͕ୡ੒Ͱ͖Δͷ͔ʁԿ͕Ͱ͖ͳ͍ͷ͔ʁ
    l Ұൠతͳ෼໺΍؆୯ͳλεΫͳΒطଘख๏ͷ࠷ߴੑೳҎ্Λୡ੒Ͱ͖Δ͔΋ɽ
    l ಛఆͷ෼໺΍ෳࡶͳλεΫͩͱ্هͷ޻෉Λͯ͠΋طଘख๏ʹ͸ٴ͹ͳ͍ɽ
    l ChatGPTΛͲ͏࢖͏ͷ͕ྑ͍͔ʁ
    l ෳࡶͳλεΫΛࡉ͔͘͢Δɽ
    l Ұ౓ʹଟ͘ͷฦ౴ΛಘΑ͏ͱ͠ͳ͍ɽ
    ·ͱΊ 23

    View Slide

  25. [Han+,23] Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu Liu, and Xiang Wan. Is Information
    Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors. arXiv
    preprint arXiv:2305.14450, 2023.
    [Roth&Yih,04] Dan Roth and Wen-tau Yih. A Linear Programming Formulation for Global Inference in Natural
    Language Tasks. CoNLL-2004 at HLT-NAACL, pp. 1–8, 2004.
    [Zhao+,23] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen
    Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang
    Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A Survey of Large Language
    Models. arXiv preprint arXiv:2303.18223, 2023.
    [Brown+,20] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal,
    Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss,
    Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter,
    Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher
    Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot
    Learners. NeurIPS, pp. 1877–1901, 2020.
    ࢀߟจݙ

    View Slide

  26. [Wei+,22] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le,
    and Denny Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint
    arXiv:2201.11903, 2022.
    [Sang&Meulder,03]: Tjong Kim Sang, E.F., and De Meulder, F. Introduction to the CoNLL-2003 Shared Task:
    Language-Independent Named Entity Recognition. HLT-NAACL, pp. 142–147, 2003.
    [Ding+,21] Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Haitao Zheng, and Zhiyuan
    Liu. Few-NERD: A Few-shot Named Entity Recognition Dataset. ACI-IJCNLP, pp.3198–3213, 2021
    [Doddington+,04] George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and
    Ralph Weischedel. The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation. LREC, 2004
    [Walker+,06] Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. ACE 2005 Multilingual
    Training Corpus. LDC2006T06. Web Download. Philadelphia: Linguistic Data Consortium, 2006.
    [Ohta+,02] Tomoko Ohta, Yuka Tateisi, and Jin-Dong Kim. The GENIA corpus: an annotated research abstract
    corpus in molecular biology domain, HLT, pp. 82–86, 2002.
    [Zeng+,18] Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao. Extracting Relational Facts by an
    End-to-End Neural Model with Copy Mechanism. ACL, pp. 506–514, 2018.
    ࢀߟจݙ

    View Slide

  27. [Zhang+,17] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. Position-aware
    Attention and Supervised Data Improve Slot Filling. EMNLP, pp.35–45, 2017.
    [Hendrickx+,10] Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha,
    Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. SemEval-2010 Task 8: Multi-Way
    Classification of Semantic Relations between Pairs of Nominals. SemEval, pp. 33–38, 2010.
    [Jiao+,23] Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, and Zhaopeng Tu. Is ChatGPT A Good
    Translator? Yes With GPT-4 As The Engine. arXiv preprint arXiv:2301.08745, 2023.
    [Wei+,23] Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu,
    Yufeng Chen, Meishan Zhang, Yong Jiang, and Wenjuan Han. Zero-Shot Information Extraction via Chatting with
    ChatGPT. arXiv preprint arXiv:2302.10205, 2023.
    [Polak&Morgan,23] Maciej P Polak and Dane Morgan. Extracting Accurate Materials Data from Research Papers
    with Conversational Language Models and Prompt Engineering – Example of ChatGPT. arXiv preprint
    arXiv:2303.05352, 2023.
    ࢀߟจݙ

    View Slide

  28. Appendix
    27

    View Slide

  29. σʔληοτʹΑͬͯৄࡉ͸ҟͳΔ͕ɼେ·͔ʹҎԼͷ̏ͭͷλεΫΛղ͘ɽ
    1. ֤จͷ؍఺ΧςΰϦͷਪఆ
    l ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍
    => ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ
    2. ֤จͷ؍఺ΧςΰϦͷΤϯςΟςΟΛநग़
    l ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍ + ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ
    => ΤϯςΟςΟɿFOOD#QUALITY=ϐβɼFOOD#PRICE=Ձ֨ʣ
    3. ֤จͷ؍఺ΧςΰϦʹ͍ͭͯͷۃੑͷਪఆ
    l ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍ + ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ
    => ۃੑʢFOOD#QUALITY=㾎ɼFOOD#PRICE=✗ʣ
    ؍఺ײ৘ղੳλεΫ 28
    https://www.slideshare.net/takahirokubo7792/ss-96203329

    View Slide

  30. l Πϕϯτநग़ʢEvent extraction; EEʣ
    l Event detection (EE-Trigger): ΠϕϯτͷൃੜΛද͢୯ޠ΍۟Λಛఆ͠ɼ
    ରԠ͢ΔΠϕϯτλΠϓʹ෼ྨ͢Δɽ
    l Event argument extraction (EE-Argument): ༩͑ΕΒͨΠϕϯτʹؔ܎͢Δ
    ΤϯςΟςΟΛೝࣝ͠ɼରԠ͢Δ໾ׂΛ෼ྨ͢Δɽ
    l Trigger-argument joint extraction (EE-Joint): ΠϕϯττϦΨʔɼΠϕϯτλΠϓɼ
    ͦΕΒͷϩʔϧʹؔ͢ΔݴٴΛಉ࣌ʹಛఆ͢Δɽ
    ର৅λεΫ | EE
    29

    View Slide

  31. l ؍఺ײ৘ղੳʢAspect-based sentiment analysis; ABSAʣ
    l Aspect extraction (ABSA-AE): ϨϏϡʔ͔Β؍఺Λ͢΂ͯநग़͢Δɽ
    l Opinion extraction (ABSA-OE): ϨϏϡʔ͔ΒҙݟΛ͢΂ͯநग़͢Δɽ
    l Aspect-level sentiment classification (ABSA-ALSC): ϨϏϡʔதͷ༩͑ΒΕͨ؍఺ͷײ৘ۃੑΛ
    ༧ଌ͢Δɽ
    l Aspect-oriented opinion extraction (ABSA-AOE): ϨϏϡʔͷ֤؍఺ʹରʹͳΔҙݟΛநग़͢Δɽ
    l Aspect extraction and sentiment classification (ABSA-AESC): ҙݟͱରԠ͢Δײ৘ۃੑΛಉ࣌ʹ
    நग़͢Δɽ
    l Pair extraction (ABSA-Pair): ؍఺ͱରԠ͢ΔҙݟΛಉ࣌ʹநग़͢Δɽ
    l Triplet extraction (ABSA-Triplet): ͢΂ͯͷ؍఺ͱରԠ͢Δҙݟɺײ৘ۃੑΛಉ࣌ʹநग़͢Δɽ
    ର৅λεΫ | ABSA
    30

    View Slide

  32. l NER-Flat
    l CoNLL03 [Sang&Meulder,03]: (English data) 22,137 sentences, 301,418 tokens
    four entity types (locations, organizations, persons, MISC (others))
    l FewNERD [Ding+,21]: 188,238 sentences, 4,601,160 words, 66 entity types
    l NER-Nested:
    l ACE04
    l ACE05-Ent
    l GENIA
    σʔληοτͷৄࡉ | NER
    31

    View Slide