$30 off During Our Annual Pro Sale. View Details »

論文紹介:ChatGPT で情報抽出タスクは解けるのか? Is information ext...

論文紹介:ChatGPT で情報抽出タスクは解けるのか? Is information extraction solved by ChatGPT? An analysis of performance, evaluation criteria, robustness and errors

Shota Kato

July 11, 2023
Tweet

More Decks by Shota Kato

Other Decks in Research

Transcript

  1. ࿦จ঺հɿChatGPT Ͱ৘ใநग़λεΫ͸ղ͚Δͷ͔ʁ Is information extraction solved by ChatGPT? An analysis

    of performance, evaluation criteria, robustness and errors Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu Liu, and Xiang Wan 加藤 祥太 [email protected] 京都大学大学院 情報学研究科 情報学専攻 システム科学コース ヒューマンシステム論分野 2023/07/11 全体ゼミ
  2. l ̍̓ͷ৘ใநग़λεΫʹ͓͚ΔChatGPTͷੑೳΛݕূ͠ɼ طଘख๏ͷ࠷ߴੑೳʢstate-of-the-art; SOTAʣʹ͸ୡ͠ͳ͍͜ͱΛࣔͨ͠ɽ l ੑೳͱChatGPTͷೖग़ྗ͓Αͼσʔληοτͷؔ܎Λௐࠪͨ͠ɽ l Few-shot + in-context

    learning ʹΑΓੑೳΛվળͰ͖Δ͕SOTAʹ͸ٴ͹ͳ͍ɽ l Chain of thought ͱ Few-shot + in-context learning ͷੑೳࠩ͸ۇ͔ͩͬͨɽ l ChatGPT͸ɼ༩͑ΒΕͨೖྗͱແؔ܎ͳग़ྗΛ͢Δ͜ͱ͕΄ͱΜͲͳ͍ɽ l ແؔ܎ͳจ຺͕ೖྗʹؚ·Ε͍ͯΔ৔߹΍ग़ݱճ਺͕كͳʢlong-tail ͳʣ λʔήοτλΠϓΛର৅ͱ͢Δ৔߹ʹChatGPTͷੑೳ͸େ͖͘௿Լ͢Δɽ l ChatGPTͷग़ྗ͸ɼΞϊςʔγϣϯ͞ΕͨεύϯΑΓ௕͍܏޲ʹ͋ͬͨɽ l ChatGPTͷޡ౴ͷଟ͘͸ʮΞϊςʔγϣϯ͞Ε͍ͯͳ͍෦෼Λநग़͢Δʯ͜ͱ͕ݪҼͩͬͨɽ ཁ໿ 2
  3. l ݻ༗දݱೝࣝʢNamed entity recognition; NERʣ l ؔ܎நग़ʢRelation extraction; REʣ l

    Πϕϯτநग़ʢEvent extraction; EEʣ l ؍఺ײ৘ղੳʢAspect-based sentiment analysis; ABSAʣ ຊ࿦จͰର৅ͱ͢Δ৘ใநग़λεΫ 4 Relation Entity1 Entity2 Example located_in loc loc (New York, US) work_for per org (Bill Gates, Microsoft) live_in per loc (Bush, US) kill per per (Oswald, JFK) ؔ܎நग़σʔληοτCoNLL04 [Roth&Yih,04]ʹؚ·ΕΔؔ܎ͷྫ
  4. 2020 2023 2021 1-4 5-8 9-10 1-3 4-6 7-10 11-12

    T5 GPT-3 WebGPT BLOOMZ Galatica mT0 LLaMA 2019 FLAN InstructGPT GPT-NeoX-20B CodeGen OPT OPT-IML MT-NLG T0 Tk-Instruct 1-6 GPT-4 GShard UL2 PaLM Flan-T5 Flan-PaLM Sparrow ChatGPT Ernie 3.0 Titan Yuan 1.0 PanGu-Σ Gopher GLaM mT5 PanGu- PLUG Bard LaMDA CPM-2 HyperCLOVA Publicly Available Codex Jurassic-1 Ernie 3.0 Anthropic NLLB Cohere Pythia Vicuna Luminous YaLM 11-12 2022 GLM AlexaTM BLOOM WeLM AlphaCode Chinchilla CodeGeeX Falcon എܠ | େن໛ݴޠϞσϧʢLarge language models; LLMsʣ 5 [Zhao+,23]
  5. େن໛ݴޠϞσϧͰ͸ɼೖྗʢϓϩϯϓτʣΛ޻෉͢Δ͚ͩͰߴ͍ੑೳΛୡ੒Ͱ͖Δɽ l ຊจதֶशʢIn-Context Learning; ICLʣ[Brown+,20] l ࢥߟ࿈࠯ܕϓϩϯϓτʢChain-of-Thought prompting; CoTʣ[Wei+,22] େن໛ݴޠϞσϧͷϓϩϯϓτ

    6 A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9. Chain-of-Thought Prompting Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? A: The answer is 27. Standard Prompting Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Model Input Model Output Model Output Model Input [Brown+,20]
  6. ֤λεΫͰ̐ͭҎ্ͷσʔληοτΛબ୒ɽ14αϒλεΫɼ17σʔληοτΛ࢖༻ɽ l ݻ༗දݱೝࣝʢNERʣ l Flat entity Recognition (NER-Flat)ɿ֊૚ߏ଄ͷͳ͍ΤϯςΟςΟΛର৅ͱ͢Δɽ ʨCoNLL03 [Sang&Meulder,03],

    FewNERD [Ding+,21]ʩ l Nested Entity Recognition (NER-Nested)ɿ֊૚ߏ଄ͷ͋ΔΤϯςΟςΟΛର৅ͱ͢Δɽ ʨACE04 [Doddington+,04], ACE05-Ent [Walker+,06], GENIA [Ohta+,02]ʩ l ؔ܎நग़ʢREʣ l Relation Classification (RE-RC)ɿςΩετதͷΤϯςΟςΟؒͷؔ܎Λಛఆ͢Δɽ {CoNLL04 [Roth&Yih,04], NYT-multi [Zeng+,18], TACRED [Zhang+,17], SemEval2010 [Hendrickx+,10]} l Relational Triplet Extraction (RE-Triplet)ɿΤϯςΟςΟͷಛఆͱؔ܎நग़Λಉ࣌ʹߦ͏ɽ {CoNLL04 [Roth&Yih,04], NYT-multi [Zeng+,18], SemEval2010 [Hendrickx+,10]} l Πϕϯτநग़ʢEEʣ ɼ؍఺ײ৘ղੳʢABSAʣ ର৅λεΫͱσʔληοτ | NER / RE 8 ※ʨʩ಺͸σʔληοτ໊Λද͢ɽ EEͱABSAͷৄࡉ͸লུ
  7. ࣮ݧઃఆ | ̏छྨͷϓϩϯϓτͷ࡞੒ 11 ;FSPTIPU Few-shot in-context learning Few-shot chain-of-

    thought prompting Zero-shot ͷ࣮ݧͰ࠷ߴੑೳΛୡ੒ͨ͠ϓϩϯϓτʹɼ ܇࿅༻σʔληοτ͔Β̑ͭͷαϯϓϧΛϥϯμϜʹ௥Ճ͢Δɽ ͲͷαϯϓϧΛબ୒͔͕ͨ͠ੑೳʹӨڹ͢ΔͨΊɼ ϥϯμϜʹ̑ճαϯϓϧબ୒Λ࣮ߦ͠ɼੑೳͷฏۉͱඪ४ภࠩΛൺֱ͢Δɽ ̑ͭͷҟͳΔzero-shot ϓϩϯϓτΛਓखͰઃܭ͢Δɽ ChatGPTͷੑೳ͸ೖྗʹӨڹ͞ΕΔͷͰɼ̑ͭͷϓϩϯϓτʹର͢Δ ੑೳͷฏۉͱඪ४ภࠩΛൺֱʹ༻͍Δɽ ຊจதֶशͰ༻͍ͨϓϩϯϓτʹࢥߟ࿈࠯ͷઆ໌Λ௥Ճ͢Δɽ ࢥߟ࿈࠯ͷઆ໌͸ChatGPTΛ༻͍ͳ͕ΒਓखͰ࡞੒͢Δɽ ̑ճͷ࣮ݧͰಘͨੑೳͷฏۉͱඪ४ภࠩΛൺֱ͢Δɽ
  8. l OpenAI͕ఏڙ͢ΔAPIΛ࢖༻͢Δɽ l ର࿩ཤྺͷӨڹΛແͨ͘͢Ίɼ֤ςεταϯϓϧʹରͯ͠ผʑʹԠ౴Λੜ੒͢Δɽ l ςετσʔληοτͷαϯϓϧ਺͸࠷େ̏̌̌̌ͱ͢Δɽ ʢςεταϯϓϧͷ਺͕ଟ͍ͱAPIͷ੍ֹۚݶʹୡ͢ΔͨΊʣ ChatGPTͷੑೳΛݕূͨ͠ઌߦݚڀ [Jiao+,23; Wei+23]Ͱ͸20–30αϯϓϧΛ࢖༻͍ͯͨ͠ɽ

    ʲࢀߟʳ CoNLL03ʹରͯ͠1छྨͷϓϩϯϓτͰ࣮ݧͨ͠ͱ͖ͷAPIར༻ྉɿ$0.79 ϓϩϯϓτ͕̑छྨɼೖྗͷ༩͑ํ͕̏छྨʢzero-shot, ICL, CoTʣͳͷͰɼ CoNLL03ͷ࣮ݧ݁ՌΛ࠶ݱ͢Δͷʹཁ͢ΔAPIར༻ྉ͸ɼ$0.79 x 15 = $11.85ɽ ʢαϒλεΫɼσʔληοτʣͷ૊Έ߹Θͤ͸߹ܭ49૊ɽ ͢΂ͯͷ࣮ݧͷ࠶ݱʹཁ͢ΔAPIར༻ྉʢ֓ࢉʣɿ$11.85 x 49 = $580.65 l Micro-f1ʢਖ਼ղ཰ͱಉ͡ʣΛطଘͷ࠷ߴੑೳʢstate-of-the-art; SOTA)ͱൺֱ͢Δɽ ࣮ݧઃఆ | ͦͷଞͷ৚݅ 12
  9. l ChatGPTͱSOTAʹ͸େ͖ͳ͕ࠩ͋Δɽ l λεΫɾγφϦΦ͕೉͍͠ͱੑೳ͕ࠩେ͖͘ͳΔɽ l λεΫͷ೉қ౓≒λΠϓͷछྨͷଟ͞ l γφϦΦͷ೉͠͞≒λεΫࣗମͷෳࡶ͞ʢNER-Nested͸NER-FlatΑΓ΋γφϦΦ͕೉͍͠ʣ l Few-shot

    ICLʹΑΓੑೳ͕޲্͕ͨ͠ɼCoTʹΑΔੑೳ޲্෯͸খ͔ͬͨ͞ɽ ݁Ռ | NER 13 Task Dataset SOTA zero-shot 5-shot ICL 5-shot CoT max mean (std) mean (std) mean (std) NER-Flat CoNLL03 94.6 65.13 60.10 (3.81) 70.53 (1.44) 74.73 (1.08) FewNERD 67.1 34.28 31.56 (2.44) 36.87 (0.71) 46.55 (0.64) NER-Nested ACE04 88.5 29.55 27.80 (3.10) 38.52 (2.51) 40.57 (1.83) ACE05-Ent 87.5 24.77 23.38 (1.92) 36.17 (1.78) 33.98 (0.69) GENIA 81.5 39.43 38.09 (1.65) 48.82 (1.31) 50.89 (1.00) ※ ද͸࿦จͷTable 1Λݩʹ࡞੒͠ɼSOTAͷख๏͸লུͨ͠ɽ
  10. l ChatGPTͱSOTAʹ͸େ͖ͳ͕ࠩ͋Δɽ l λεΫɾγφϦΦ͕೉͍͠ͱੑೳ͕ࠩେ͖͘ͳΔɽ l Few-shot ICLʹΑΓੑೳ͕޲্͕ͨ͠ɼCoTʹΑΔੑೳ޲্෯͸খ͔ͬͨ͞ɽ ݁Ռ | RE

    14 ※ ද͸࿦จͷTable 1Λݩʹ࡞੒͠ɼSOTAͷख๏͸লུͨ͠ɽ Task Dataset SOTA zero-shot 5-shot ICL 5-shot CoT max mean (std) mean (std) mean (std) RE-RC CoNLL04 65.82 59.21 (3.85) 55.32 (4.56) - NYT-multi 93.5 38.74 30.96 (5.51) 26.88 (2.74) - TACRED 75.6 21.58 19.47 (1.49) 27.84 (3.48) - SemEval2010 91.3 43.32 39.27 (2.20) 39.44 (2.55) - RE-Triplet CoNLL04 78.8 23.04 17.84 (3.43) 23.30 (1.29) 11.09 (4.83) NYT-multi 86.8 3.79 3.48 (0.24) 12.24 (0.59) 2.33 (1.64) SemEval2010 73.2 7.65 5.82 (1.29) 12.85 (1.14) -
  11. l ChatGPTͷग़ྗ͸ɼਖ਼ղΑΓ΋௕͍εύϯΛఏࣔ͢Δ܏޲ʹ͋ͬͨɽ l ιϑτϚονϯάͰධՁͨ͠ΒɼF1஋͕࠷େ14.53ϙΠϯτ޲্ͨ͠ɽ ιϑτϚονϯάɿ༧ଌ݁Ռ͕ਖ਼ղΛؚΈɼਖ਼ղͱ༧ଌ݁Ռͷྨࣅ౓͕͖͍͠஋ΑΓ΋ߴ͍৔߹ʹɼ ༧ଌ݁ՌΛਖ਼ղͱ͢Δɽ ߟ࡯ | ධՁࢦඪ 16

    Annotated spans Predicted spans PGA Europro Tour 2021 PGA Europro Tour University of Michigan The University of Michigan NERʹ͓͚ΔΞϊςʔγϣϯ෦෼ͱ༧ଌ݁ՌͷྫʢTable 2ΑΓൈਮʣ Task Dataset SOTA Hard Soft ΔF1 (%) NER-Flat CoNLL03 94.6 60.10 62.12 +2.02 (3.4%) NER-Nested ACE05-Ent 87.5 23.38 33.97 +10.59 (45.3%) RE-Triplet CoNLL04 78.8 17.84 24.75 +6.91 (38.7%) ιϑτϚονϯάʢSoftʣΛ༻͍ͨͱ͖ͷݩͷධՁࢦඪʢHardʣͱͷࠩʢTable 3ΑΓൈਮʣ
  12. l ग़ݱස౓͕ߴ͍λΠϓͷ༧ଌੑೳ͸ ग़ݱස౓͕௿͍λΠϓͷ༧ଌੑೳΑΓ ߴ͔ͬͨɽ ߟ࡯ | ؤ݈ੑʢλΠϓͷग़ݱස౓ɼͦͷଞʣ 18 l ؔ܎நग़λεΫʹ͓͍ͯɼΤϯςΟςΟͷॱংΛม͑ͯ΋ੑೳ͸ͦΕ΄Ͳ௿Լ͠

    ͳ͔ͬͨͷͰɼChatGPT͸ओମ-٬ମͷؔ܎ΛཧղͰ͖͍ͯͳ͍ɽ ྫʣ㾎<Steven Paul Jabs, born_in, San Fransisco>; ✗<San Fransisco, born_in, Steven Paul Jabs>
  13. ߟ࡯ | ޡ౴ͷݪҼ 19 Error type #Error Ratio (%) Missing

    spans 2,979 15.4 Unmentioned spans 284 1.5 Unannotated spans 6,361 32.9 Incorrect span offsets 1,744 9.0 Undefined types 883 4.6 Incorrect types 4,296 22.2 Other 2,801 14.4 Total 19,348 100 NER-FlatʢCoNLL03ʣʹ͓͚Δޡ౴ͷ෼ྨ݁Ռ l Ξϊςʔτ͞Ε͍ͯͳ͍εύϯΛग़ྗͨ͜͠ͱʹΑΔؒҧ͍͕࠷ଟɽ l ্Ґ̏ͭͷؒҧ͍ʢΞϊςʔτ͞Ε͍ͯͳ͍εύϯΛग़ྗɼࢦఆ͞Ε͍ͯͳ͍λ ΠϓΛग़ྗɼΞϊςʔτ͞ΕͨεύϯΑΓ΋ग़ྗ͕୹͍ʣ͕໿̓̌%Λ઎ΊΔɽ l σʔληοτͷΞϊςʔγϣϯͷ࣭ʹ ໰୊͕͋Δ͔΋͠Εͳ͍͜ͱΛࣔࠦ͠ ͍ͯΔʁ → ChatGPTΛ༻͍ͯΞϊςʔγϣϯΛ ͢Δͱྑ͍͔΋ʁ
  14. l ̍̓ͷ৘ใநग़λεΫʹ͓͚ΔChatGPTͷੑೳΛݕূ͠ɼ طଘख๏ͷ࠷ߴੑೳʢstate-of-the-art; SOTAʣʹ͸ୡ͠ͳ͍͜ͱΛࣔͨ͠ɽ l ੑೳͱChatGPTͷೖग़ྗ͓Αͼσʔληοτͷؔ܎Λௐࠪͨ͠ɽ l Few-shot + in-context

    learning ʹΑΓੑೳΛվળͰ͖Δ͕SOTAʹ͸ٴ͹ͳ͍ɽ l Chain of thought ͱ Few-shot + in-context learning ͷੑೳࠩ͸ۇ͔ͩͬͨɽ l ChatGPT͸ɼ༩͑ΒΕͨೖྗͱແؔ܎ͳग़ྗΛ͢Δ͜ͱ͕΄ͱΜͲͳ͍ɽ l ແؔ܎ͳจ຺͕ೖྗʹؚ·Ε͍ͯΔ৔߹΍ग़ݱճ਺͕كͳʢlong-tailͳʣ λʔήοτλΠϓΛର৅ͱ͢Δ৔߹ʹChatGPTͷੑೳ͸େ͖͘௿Լ͢Δɽ l ChatGPTͷग़ྗ͸ɼΞϊςʔγϣϯ͞ΕͨεύϯΑΓ௕͍܏޲ʹ͋ͬͨɽ l ChatGPTͷޡ౴ͷଟ͘͸ʮΞϊςʔγϣϯ͞Ε͍ͯͳ͍෦෼Λநग़͢Δʯ͜ͱ͕ݪҼͩͬͨɽ ຊ࿦จͷ·ͱΊ 20
  15. l ChatGPTͰࣗવݴޠॲཧλεΫΛͲͷΑ͏ʹղ͘ͷ͔ʁ l ϓϩϯϓτͷઃܭʢจ຺಺ֶशɼࢥߟ࿈࠯ɼfew-shot ɼϊΠζআڈͳͲʣ l ϑϨʔϜϫʔΫઃܭʢλεΫͷଟஈԽɼϑϩʔνϟʔτͷઃఆʣ l ͲΕ͘Β͍ͷੑೳ͕ୡ੒Ͱ͖Δͷ͔ʁԿ͕Ͱ͖ͳ͍ͷ͔ʁ l

    Ұൠతͳ෼໺΍؆୯ͳλεΫͳΒطଘख๏ͷ࠷ߴੑೳҎ্Λୡ੒Ͱ͖Δ͔΋ɽ l ಛఆͷ෼໺΍ෳࡶͳλεΫͩͱ্هͷ޻෉Λͯ͠΋طଘख๏ʹ͸ٴ͹ͳ͍ɽ l ChatGPTΛͲ͏࢖͏ͷ͕ྑ͍͔ʁ l ෳࡶͳλεΫΛࡉ͔͘͢Δɽ l Ұ౓ʹଟ͘ͷฦ౴ΛಘΑ͏ͱ͠ͳ͍ɽ ·ͱΊ 23
  16. [Han+,23] Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu

    Liu, and Xiang Wan. Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors. arXiv preprint arXiv:2305.14450, 2023. [Roth&Yih,04] Dan Roth and Wen-tau Yih. A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL-2004 at HLT-NAACL, pp. 1–8, 2004. [Zhao+,23] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A Survey of Large Language Models. arXiv preprint arXiv:2303.18223, 2023. [Brown+,20] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot Learners. NeurIPS, pp. 1877–1901, 2020. ࢀߟจݙ
  17. [Wei+,22] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian

    Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903, 2022. [Sang&Meulder,03]: Tjong Kim Sang, E.F., and De Meulder, F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. HLT-NAACL, pp. 142–147, 2003. [Ding+,21] Ning Ding, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Haitao Zheng, and Zhiyuan Liu. Few-NERD: A Few-shot Named Entity Recognition Dataset. ACI-IJCNLP, pp.3198–3213, 2021 [Doddington+,04] George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation. LREC, 2004 [Walker+,06] Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. ACE 2005 Multilingual Training Corpus. LDC2006T06. Web Download. Philadelphia: Linguistic Data Consortium, 2006. [Ohta+,02] Tomoko Ohta, Yuka Tateisi, and Jin-Dong Kim. The GENIA corpus: an annotated research abstract corpus in molecular biology domain, HLT, pp. 82–86, 2002. [Zeng+,18] Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism. ACL, pp. 506–514, 2018. ࢀߟจݙ
  18. [Zhang+,17] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and

    Christopher D. Manning. Position-aware Attention and Supervised Data Improve Slot Filling. EMNLP, pp.35–45, 2017. [Hendrickx+,10] Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. SemEval, pp. 33–38, 2010. [Jiao+,23] Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, and Zhaopeng Tu. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv preprint arXiv:2301.08745, 2023. [Wei+,23] Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, and Wenjuan Han. Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv preprint arXiv:2302.10205, 2023. [Polak&Morgan,23] Maciej P Polak and Dane Morgan. Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering – Example of ChatGPT. arXiv preprint arXiv:2303.05352, 2023. ࢀߟจݙ
  19. σʔληοτʹΑͬͯৄࡉ͸ҟͳΔ͕ɼେ·͔ʹҎԼͷ̏ͭͷλεΫΛղ͘ɽ 1. ֤จͷ؍఺ΧςΰϦͷਪఆ l ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍ => ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ 2. ֤จͷ؍఺ΧςΰϦͷΤϯςΟςΟΛநग़ l

    ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍ + ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ => ΤϯςΟςΟɿFOOD#QUALITY=ϐβɼFOOD#PRICE=Ձ֨ʣ 3. ֤จͷ؍఺ΧςΰϦʹ͍ͭͯͷۃੑͷਪఆ l ϐβ͸ඒຯ͍͕͠Ձ֨͸ߴ͍ + ΧςΰϦɿFOOD#QUALITYɼFOOD#PRICEʣ => ۃੑʢFOOD#QUALITY=㾎ɼFOOD#PRICE=✗ʣ ؍఺ײ৘ղੳλεΫ 28 https://www.slideshare.net/takahirokubo7792/ss-96203329
  20. l Πϕϯτநग़ʢEvent extraction; EEʣ l Event detection (EE-Trigger): ΠϕϯτͷൃੜΛද͢୯ޠ΍۟Λಛఆ͠ɼ ରԠ͢ΔΠϕϯτλΠϓʹ෼ྨ͢Δɽ

    l Event argument extraction (EE-Argument): ༩͑ΕΒͨΠϕϯτʹؔ܎͢Δ ΤϯςΟςΟΛೝࣝ͠ɼରԠ͢Δ໾ׂΛ෼ྨ͢Δɽ l Trigger-argument joint extraction (EE-Joint): ΠϕϯττϦΨʔɼΠϕϯτλΠϓɼ ͦΕΒͷϩʔϧʹؔ͢ΔݴٴΛಉ࣌ʹಛఆ͢Δɽ ର৅λεΫ | EE 29
  21. l ؍఺ײ৘ղੳʢAspect-based sentiment analysis; ABSAʣ l Aspect extraction (ABSA-AE): ϨϏϡʔ͔Β؍఺Λ͢΂ͯநग़͢Δɽ

    l Opinion extraction (ABSA-OE): ϨϏϡʔ͔ΒҙݟΛ͢΂ͯநग़͢Δɽ l Aspect-level sentiment classification (ABSA-ALSC): ϨϏϡʔதͷ༩͑ΒΕͨ؍఺ͷײ৘ۃੑΛ ༧ଌ͢Δɽ l Aspect-oriented opinion extraction (ABSA-AOE): ϨϏϡʔͷ֤؍఺ʹରʹͳΔҙݟΛநग़͢Δɽ l Aspect extraction and sentiment classification (ABSA-AESC): ҙݟͱରԠ͢Δײ৘ۃੑΛಉ࣌ʹ நग़͢Δɽ l Pair extraction (ABSA-Pair): ؍఺ͱରԠ͢ΔҙݟΛಉ࣌ʹநग़͢Δɽ l Triplet extraction (ABSA-Triplet): ͢΂ͯͷ؍఺ͱରԠ͢Δҙݟɺײ৘ۃੑΛಉ࣌ʹநग़͢Δɽ ର৅λεΫ | ABSA 30
  22. l NER-Flat l CoNLL03 [Sang&Meulder,03]: (English data) 22,137 sentences, 301,418

    tokens four entity types (locations, organizations, persons, MISC (others)) l FewNERD [Ding+,21]: 188,238 sentences, 4,601,160 words, 66 entity types l NER-Nested: l ACE04 l ACE05-Ent l GENIA σʔληοτͷৄࡉ | NER 31