Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

nlp-survey

 nlp-survey

BERT後の自然言語処理についてのサーベイ

KARAKURI Inc.

April 09, 2021
Tweet

More Decks by KARAKURI Inc.

Other Decks in Research

Transcript

  1. GPT family [Radford+ 2018, Radford+ 2019, Brown+ 2020] 22 •

    ࣗݾճؼܕͷࣄલֶशݴޠϞσϧ • (15Ҏ߱ͷڻ͘΂͖ੜ੒݁ՌͰ஌ΒΕΔ • ύϥϝʔλ਺΍σʔλ਺ͷεέʔϧଇͷઌۦ͚
  2. UniLM [Dong+ NeurIPS 2019] 29 • ୯ํ޲ɼ૒ํ޲ɼTFRTFR ͷݴޠϞσϧΛಉ࣌ʹֶश • ҟͳΔattention

    maskΛ༻ ͍ͯtoken͕࢖༻Ͱ͖Δจ ຺৘ใΛ੍ޚ͢Δ͜ͱͰ্ هΛ࣮ݱ
  3. Next Sentence Prediction 32 [Rugers+ 2020 A Primer in BERTology:

    What We Know About How BERT Works] [Shi+ ACL 2020 Next Sentence Prediction helps Implicit Discourse Relation Classification within and across Domains Works]
  4. Fine-tuning 35 • ਂ૚Խ͕ॏཁ • ̎ஈ֊ࣄલֶश • ఢରతֶश • Data-augumentation

    [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT Works] [Pang+ 2019, Garg+ 2020, Arase & Tsuji 2019, Pruksachatkun+ 2020, Glavas & Vulic 2020] [Zhu+ 2019, Jiang+ 2019] [Lee+ 2019]
  5. ALBERT [Lan+ ICLR 2020] 39 • ຒΊࠐΈ࣍ݩ࡟ݮͱύϥϝʔλڞ༗ʹΑΔBERTͷܰྔԽ • ෛྫͷߏ੒Λվྑͨ͠next sentence

    predictionͷఏҊ 𝑉 𝐻 𝐸 𝐸 𝑉 𝐻 Sentence-order prediction Factorized embedding
  6. ELECTRA [Clark+ ICLR 2020] 51 • Masked language modelͷ୅ΘΓʹఢରతֶशΛ༻͍ΔࣄલֶशͷఏҊ •

    ߴʑ1/4ͷܭࢉࢿݯͰXLNet΍RoBERTaฒΈͷੑೳ • 1GPUͰ4೔ͷֶशͰGPTΛ྇կ
  7. Large Models 53 [State of AI Report 2020 (https://www.stateof.ai/)] •

    Megatron-LM (80ԯ) [Shoeiby+ ACL 2020] • Turing-NLG (170ԯ) [Microsoft 2020] • GPT-3 (1750ԯ) [Brown+ 2020] [https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/]
  8. Named Entity Recognition 82 [Li+ 2020 A Survey on Deep

    Learning for Named Entity Recognition]
  9. LUKE [Yamada+ ACL 2020] 83 • ୯ޠͱݻ༗දݱʹର͢Δmasked language modeling •

    Tokenͷछྨʢ୯ޠ or ΤϯςΟςΟʣΛҙࣝͨ͠attentionͷఏҊ
  10. Multi-head attentionʹ͓͚Δϔουͷ໾ׂ 89 • ҟͳΔϔουͰ֫ಘ͞ΕΔύλʔϯ͸ݶΒΕ͍ͯΔ͕ϔου͕ੑೳʹ༩͑ ΔӨڹ͸͹Β͖͕ͭ͋Δ • ଟ͘ͷhead͸ੑೳʹӨڹͤͣॏཁ౓͸ֶशॳظʹܾ·Δ • Enc-dec

    attentionͷํ͕self-attentionΑΓmulti-head͕ॏཁ • ಉ͡૚ͷϔου͸ಉ͡Α͏ͳύλʔϯΛࣔ͢ • ݴޠֶͰ͍͏ߏจ΍ڞࢀরʹ஫໨͍ͯ͠Δϔου͕ଘࡏ • ๅ͘͡Ծઆ͕੒Γཱͭ [Kovaleva+ EMNLP 2019] [Michel+ NeurIPS 2020] [Michel+ NeurIPS 2020] [Clark+ BlackBoxNLP 2019] [Clark+ BlackBoxNLP 2019] [Chen+ NeurIPS 2020]
  11. ֤૚ຖͷදݱͷҧ͍ 90 • ઙ͍૚͸൚༻తͳɼਂ͍૚͸λεΫݻ༗ͷදݱΛ֫ಘ • ઙ͍૚͸token΍पғͷจ຺ʹґΔදݱΛ֫ಘ͢Δ͕૚ΛܦΔͱऑ·Δ • ਂ͍૚͸ΑΓ௕ظͷґଘؔ܎ٴͼҙຯతͳදݱΛ֫ಘ͢Δ [Aken+ CIKM

    2019], [Peters+ RepL4NLP 2019], [Hao+ EMNLP 2019] [Lin+ BlackBoxNLP 2019], [Voita+ EMNLP 2019], [Ethayarajh+ EMNLP 2019], [Brunner+ ICLR 2020] [Raganato 2018], [Vig BlackBoxNLP 2019], [Jawahar ACL 2019]
  12. ݴޠֶతߏ଄ͷ෮ݩ 92 [Cenen NeurIPS 2019] • ݴޠ৘ใ͕ҙຯۭؒͱߏจۭؒʹผΕͯදݱ͞Ε͍ͯΔ • ELMOʹ΋BERTʹ΋ߏจ໦͕ຒΊࠐ·Ε͍ͯΔ •

    BERT͸֊૚ߏ଄Λ࣋ͭߏจදݱΛ֫ಘ͢Δ • Contextual model͸ྑ͍ߏจදݱΛ֫ಘ͢Δ͕ɼҙຯදݱͷҙຯͰ͸non- contextualͳख๏ͱେ͖ͳҧ͍͸ͳ͍ • BERT͸จ๏ͷଟ͘ͷ஌ࣝΛ֫ಘ͢Δ͕͹Β͖ͭ΋େ͖͍ • ʢ೔ຊޠʣBERT͸ޠॱͷ৘ใΛ׆༻͍ͯ͠Δ [Hewitt NAACL-HLT 2019] [Goldberg 2019] [Tenney ICLR 2019] [Warstadt EMNLP 2019] [Kuribayashi ACL 2020]
  13. TransformerϞσϧͷऑ఺ 93 [Lin ACL 2020] • ఢରతֶशʹରͯ͠ؤ݈Ͱ͸ͳ͍ • BERT͸ٖ૬ؔΛ׆༻͍ͯ͠Δ •

    Ұ౓தؒతλεΫʹfine-tuning͢Δͷ͸ѱӨڹΛ༩͑ΔՄೳੑ • ൱ఆʹऑ͍ • Common Sense Knowledge͸ͳ͍ [Jin+ AAAI 2020] [Niven+ ACL 2019] [Wang ACL 2020] [Ettinger ACL 2019], [Kassner ACL 2020]
  14. SWAG [Zellers+ EMNLP 2018] 96 • Common sense inferenceͷϕϯνϚʔΫ •

    Annotationʹ൐͏όΠΞεΛ࡟ݮ͢ΔAdversarial FilteringΛఏҊ
  15. A Primer in BERTology [Rogers+ 2020] 103 • BERTͷதͰԿ͕ى͖͍ͯΔͷ͔ʁͱ͍͏͜ͱΛܦݧతʹௐ΂ͨݚڀͨͪ ͷαʔϕΠ

    • BERTͷݱঢ়Ͱͷऑ఺ͳͲ΋·ͱ·͍ͬͯΔ [https://arxiv.org/pdf/2002.12327.pdf]
  16. ࢀߟࢿྉ 104 • [MLPapers](https://github.com/thunlp/PLMpapers) • [Highlights of ACL 2020](https://medium.com/analytics-vidhya/highlights-of-acl-2020-4ef9f27a4f0c) •

    [BERT-related Papers](https://github.com/tomohideshibata/BERT-related-papers) • [ML and NLP Research Highlights of 2020](https://ruder.io/research-highlights-2020/) • [จॻཁ໿ͷྺ࢙ΛḷͬͯʢʴBERTʹจॻཁ໿ͤͯ͞ΈΔʣ](https://qiita.com/siida36/items/4c0dbaa07c456a9fadd0) • [ࣄલֶशݴޠϞσϧͷಈ޲](https://speakerdeck.com/kyoun/survey-of-pretrained-language-models) • [ʲNLPʳ2020೥ʹੜ·ΕͨBERTͷ೿ੜܗ·ͱΊ](https://kai760.medium.com/nlp- 2020%E5%B9%B4%E3%81%AB%E7%94%9F%E3%81%BE%E3%82%8C%E3%81%9Fbert%E3%81%AE%E6%B4%BE%E7%94%9F%E5%BD%A2% E3%81%BE%E3%81%A8%E3%82%81-36f2f455919d) • [GPT-3ͷিܸ](https://deeplearning.hatenablog.com/entry/gpt3) • [Rogers+ 2020 A Primer in BERTology: What we know about how BERT works](https://arxiv.org/pdf/2002.12327.pdf) • [Tay+ 2020 Efficient Transformers: A Survey](https://arxiv.org/pdf/2009.06732.pdf) • [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey](https://arxiv.org/pdf/2003.08271.pdf) • [Liu+ 2020 A Survey on Contextual Embeddings](https://arxiv.org/pdf/2003.07278.pdf) • [Xia+ EMNLP 2020 Which *BERT? A Survey Organizing Contextualized Encoders](https://arxiv.org/pdf/2010.00854.pdf) • [Li+ IEEE Transactions on Knowledge and Data Engineering 2018 A Survey on Deep Learning for Named Entity Recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9039685)
  17. 112