$30 off During Our Annual Pro Sale. View Details »

nlp-survey

 nlp-survey

BERT後の自然言語処理についてのサーベイ

KARAKURI Inc.

April 09, 2021
Tweet

More Decks by KARAKURI Inc.

Other Decks in Research

Transcript

  1. BERTޙͷࣗવݴޠॲཧʹ͍ͭͯͷαʔϕΠ
    ߴ໦ࢤ࿠
    1

    View Slide

  2. αʔϕΠͷ໨త
    2
    1. BERTҎ߱ͷࣄલֶशࣗવݴޠϞσϧͷಈ޲Λ஌Γ͍ͨʂ
    • ࠷৽ͷঢ়گΛ஌Γ͍ͨ
    • ҙຯͷ͋Γͦ͏ͳվળ͕஌Γ͍ͨ
    ࠷ۙͲΜͳࣗવݴޠλεΫ͕ఏҊ͞Ε͍ͯΔͷ͔஌Γ͍ͨʂ
    ˠ ࣄલֶशࣗવݴޠϞσϧͷ࠷ۙͷಈ޲΍/-1λεΫΛ޿͘ઙ͘঺հ

    View Slide

  3. ࠓ೔ͷྲྀΕ
    3
    ̍ɽ൚༻ࣄલֶशݴޠϞσϧͷಈ޲
    ̎ɽ֤λεΫʹಛԽͨ͠Ϟσϧͷಈ޲
    ̏ɽTransformerͷ෼ੳ ධՁͷݟ௚͠
    ̐ɽ·ͱΊ

    View Slide

  4. ̍ɽ൚༻ࣄલֶशݴޠϞσϧͷಈ޲
    4

    View Slide

  5. ಋೖ
    5

    View Slide

  6. ࣄલֶशݴޠϞσϧҰཡ
    6
    https://github.com/thunlp/PLMpapers

    View Slide

  7. ࣄલֶशݴޠϞσϧҰཡ
    7
    [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey]

    View Slide

  8. Pre-training & Fine-tuning
    8
    ࣄલֶश
    'JOFUVOJOH

    View Slide

  9. Self-attention
    9
    [Cui+ EMNLP 2019]
    Values
    Query
    Softmax Keys
    token
    Query
    Key
    Value
    𝑊!
    𝑊"
    𝑊#

    View Slide

  10. GLUE [Wang+ ICLR 2019]
    10
    • ૯߹తͳݴޠཧղೳྗΛଌΔͨΊͷϕϯνϚʔΫ
    • จ๏൑ఆɼײ৘෼ੳɼಉٛจ൑ఆɼྨࣅจ൑ఆɼಉ࣭ٛ໰൑ఆɼؚҙؔ܎
    ൑ఆɼ࣭໰Ԡ౴ɼݴ͍׵͑൑ఆɼͳͲ

    View Slide

  11. SuperGLUE [Wang+ NeurIPS 2019]
    11
    • ೉͍͠GLUE

    View Slide

  12. ࣗݾූ߸ԽܕϞσϧ
    12

    View Slide

  13. BERT [Delvin+ NAACL 2019]
    13
    • ૒ํ޲ͷจ຺Λߟྀͨ͠TransformerϕʔεͷࣄલֶशݴޠϞσϧ
    • Masked Language ModelͱNext Sentence Predictionͷͭͷࣄલֶश

    View Slide

  14. Masked Language Model
    14
    BERT
    ෢࢜ಓ͸ͦͷද௃ͨΔࡩՖͱಉ͘͡ɺ೔ຊͷ౔஍ʹݻ༗ͷՖͰ͋Δ
    [CLS]
    ෢࢜ಓ ೔ຊ

    View Slide

  15. Next Sentence Prediction
    15
    ߹ཧੑ͸͋͘·Ͱ͋ΜͨͷੈքͰͷϧʔϧ
    BERT
    ͦͷೄ͡ΌΦϨ͸͠͹ΕͶ͑Α
    [SEP] [SEP]
    [CLS]
    [https://www.geeksforgeeks.org/understanding-bert-nlp/]
    /FYU4FOUFODF
    YES NO

    View Slide

  16. MT-DNN [Liu+ ACL 2019]
    16
    • 'JOFUVOJOH࣌ʹϚϧνλεΫֶशΛ௥ՃͰߦ͏͜ͱͰਫ਼౓޲্
    • BERTΑΓ΋ޮ཰తʹυϝΠϯదԠ΋Մೳ

    View Slide

  17. SpanBERT [Joshi+ TACL 2019]
    17
    • ҰఆൣғΛϚεΫͨ͠.BTLFE-BOHVBHF.PEFM

    View Slide

  18. RoBERTa [Liu+ 2019]
    18
    • #&35ͷϋΠύϥ୳ࡧ
    • ΑΓେ͖ͳόοναΠζ ɼσʔλɼεςοϓ਺Ͱֶश
    • Next sentence predictionͷഇࢭ
    • GLUEɼSQuADɼRACEͰSOTA

    View Slide

  19. DeBERTa [He+ ICLR 2021]
    19
    • SoftmaxͷલʹτʔΫϯͷઈରҐஔͷ৘ใΛ෇Ճ
    • ୯ޠͷ಺༰ͱҐஔΛผʑʹ2ຊͷϕΫτϧͰຒΊࠐΉ
    disentangled attentionͷఏҊ
    • SuperGLUEͰਓؒ௒͑

    View Slide

  20. ࣗݾճؼܕϞσϧ
    20

    View Slide

  21. Autoregressive Language Model
    21
    [http://peterbloem.nl/blog/transformers]
    [Yang+ NeurIPS 2019]

    View Slide

  22. GPT family [Radford+ 2018, Radford+ 2019, Brown+ 2020]
    22
    • ࣗݾճؼܕͷࣄલֶशݴޠϞσϧ
    • (15Ҏ߱ͷڻ͘΂͖ੜ੒݁ՌͰ஌ΒΕΔ
    • ύϥϝʔλ਺΍σʔλ਺ͷεέʔϧଇͷઌۦ͚

    View Slide

  23. XLNet [Yang+ NeurIPS 2019]
    23
    • ࣗݾճؼܕͱࣗݾූ߸Խܕͷ૒ํͷར఺Λ׆༻͢ΔࣄલֶशݴޠϞσϧ
    • ೖྗܥྻͷฒͼม͑ʹର͢Δ༧ଌֶश
    ࣗݾճؼܕ ࣗݾූ߸Խܕ

    View Slide

  24. 4FRUP4FR
    24

    View Slide

  25. MASS [Song+ ICML 2019]
    25
    • &ODPEFSEFDPEFSϞσϧͷͨΊͷࣄલֶश๏ͷఏҊ
    • Masked language modelͷग़ྗ͕ෳ਺ʹͳͬͨܗ

    View Slide

  26. BART [Lewis+ ACL 2020]
    26
    • ૒ํ޲ͷจ຺Λߟྀ͠ͳ͕ΒςΩετੜ੒΋ՄೳͳݴޠϞσϧͷఏҊ
    • ༷ʑͳํ๏ͰจॻʹϊΠζΛՃ͑Δ͜ͱ͕Մೳ
    • จॻཁ໿ͳͲͰߴ͍ੑೳΛൃش

    View Slide

  27. T5 [Raffel+ JMLR 2020]
    27
    • ࣗવݴޠλεΫΛςΩετ͔ΒςΩετ΁ͷࣸ૾ͱͯ͠දݱ
    • ༷ʑͳࣗવݴޠλεΫΛ౷Ұతʹѻ͑Δࣄલֶख๏ͷఏҊ

    View Slide

  28. Prefix Language Model
    28
    • prefixͷ෦෼͚ͩ૒ํ޲ͷจ຺ͷ࢖༻ΛڐՄ

    View Slide

  29. UniLM [Dong+ NeurIPS 2019]
    29
    • ୯ํ޲ɼ૒ํ޲ɼTFRTFR
    ͷݴޠϞσϧΛಉ࣌ʹֶश
    • ҟͳΔattention maskΛ༻
    ͍ͯtoken͕࢖༻Ͱ͖Δจ
    ຺৘ใΛ੍ޚ͢Δ͜ͱͰ্
    هΛ࣮ݱ

    View Slide

  30. ࣄલֶशʹؔͯ͠
    30

    View Slide

  31. Masking
    31
    [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT
    Works]

    View Slide

  32. Next Sentence Prediction
    32
    [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT
    Works]
    [Shi+ ACL 2020 Next Sentence Prediction helps Implicit Discourse Relation Classification within and across Domains Works]

    View Slide

  33. Pre-training Objectives
    33
    [Liu+ 2020 A Survey on Contextual Embeddings]

    View Slide

  34. 'JOFUVOJOHʹؔͯ͠
    34

    View Slide

  35. Fine-tuning
    35
    • ਂ૚Խ͕ॏཁ
    • ̎ஈ֊ࣄલֶश
    • ఢରతֶश
    • Data-augumentation
    [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT
    Works]
    [Pang+ 2019, Garg+ 2020, Arase & Tsuji 2019, Pruksachatkun+ 2020, Glavas & Vulic 2020]
    [Zhu+ 2019, Jiang+ 2019]
    [Lee+ 2019]

    View Slide

  36. Ϟσϧͷখن໛Խ
    36

    View Slide

  37. Compressed Transformers (1/2)
    37
    [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey]

    View Slide

  38. Compressed Transformers (2/2)
    38
    [Rogers+ 2020 A Primer in BERTology: What We Know About How BERT Works]

    View Slide

  39. ALBERT [Lan+ ICLR 2020]
    39
    • ຒΊࠐΈ࣍ݩ࡟ݮͱύϥϝʔλڞ༗ʹΑΔBERTͷܰྔԽ
    • ෛྫͷߏ੒Λվྑͨ͠next sentence predictionͷఏҊ
    𝑉
    𝐻 𝐸
    𝐸
    𝑉
    𝐻
    Sentence-order prediction
    Factorized embedding

    View Slide

  40. DistilBERT [Sanh + 2019]
    40
    • BERTΛৠཹͨ͠΋ͷ
    • 40%ͷϞσϧαΠζ࡟ݮɼ60%ͷߴ଎ԽɼΘ͔ͣ3%ͷੑೳྼԽ

    View Slide

  41. TinyBERT [Jiao+ EMNLP 2020]
    41
    • ಉ͘͡BERTͷৠཹͰɼ1/7ͷϞσϧαΠζɼ9ഒͷߴ଎Խ

    View Slide

  42. Q-BERT [Shen+ AAAI 2020]
    42
    • BERTͷྔࢠԽ
    • Hessianͷݻ༗஋ͷฏۉͱ෼ࢄʹج͍ͮͯਫ਼౓Λམͱ͢ͱ͜ΖΛܾΊΔ

    View Slide

  43. ԋࢉͷޮ཰Խ
    43

    View Slide

  44. Efficient Transformers
    44
    [Tay+ 2020 Efficient Transformers: A Survey]

    View Slide

  45. Sparse Transformer [Child+ 2019]
    45
    • ہॴతͳؔ܎ʹ੍ݶ͞ΕͨattentionͷఏҊ

    View Slide

  46. Longformer [Beltagy+ 2020]
    46
    • ݻఆ૭಺ͷہॴతattentionͱλεΫʹจ຺͍ͮͨେҬతattentionͷซ༻
    • attentionͷܭࢉ͕ઢܗΦʔμʔʹམͪΔ͜ͱͰ௕จʹରԠՄೳ

    View Slide

  47. Big Bird [Zaheer+ NeurIPS 2020]
    47
    • ϥϯμϜɼہॴɼେҬattentionͷซ༻

    View Slide

  48. Performer [Choromanski + ICLR 2021]
    48
    • ݩͷattentionΛ֬཰తʹਖ਼֬ʹਪఆͰ͖Δཧ࿦อূ͖ͭઢܗΦʔμʔͷ
    attentionͷఏҊ
    • εύʔεੑͷԾఆΛඞཁͱ͠ͳ͍ଞɼsoftmax Ҏ֎ʹ΋ద༻Մೳ

    View Slide

  49. Reformer [Kitaev+ ICLR 2021]
    49
    • ͍ۙϕΫτϧΛಉ͡ϋογϡ஋ʹׂΓ౰ͯΔattentionͷఏҊ
    • O(N^2)ͷattentionͷܭࢉΛO(N log N)ʹམͱ͢͜ͱͰ௕จʹ΋ରԠ

    View Slide

  50. Long Range Arena [Tay + ICLR 2021]
    50
    • ௕͍จষͷݴޠॲཧͷϕϯνϚʔΫ
    • Efficient transformersΛൺֱ͢ΔͨΊͷࢦඪ

    View Slide

  51. ELECTRA [Clark+ ICLR 2020]
    51
    • Masked language modelͷ୅ΘΓʹఢରతֶशΛ༻͍ΔࣄલֶशͷఏҊ
    • ߴʑ1/4ͷܭࢉࢿݯͰXLNet΍RoBERTaฒΈͷੑೳ
    • 1GPUͰ4೔ͷֶशͰGPTΛ྇կ

    View Slide

  52. Ϟσϧͷେن໛Խ
    52

    View Slide

  53. Large Models
    53
    [State of AI Report 2020 (https://www.stateof.ai/)]
    • Megatron-LM (80ԯ) [Shoeiby+ ACL 2020]
    • Turing-NLG (170ԯ) [Microsoft 2020]
    • GPT-3 (1750ԯ) [Brown+ 2020]
    [https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/]

    View Slide

  54. ֎෦஌ࣝͷ׆༻
    54

    View Slide

  55. THU-ERNIE [Zhang + ACL 2019]
    55
    • ஌ࣝάϥϑΛ૊ΈࠐΜͩࣄલֶशݴޠϞσϧ
    • BERTͷຒΊࠐΈ͔Β஌ࣝάϥϑͷΤϯςΟςΟΛग़ྗ

    View Slide

  56. KnowBERT [Peters + EMNLP-IJCNLP 2019]
    56
    • ΤϯςΟςΟͷຒΊࠐΈʹΑͬͯBERTͷຒΊࠐΈΛจ຺͚ͮΔ

    View Slide

  57. K-BERT [Liu + AAAI 2020]
    57
    • ஌ࣝάϥϑͰݕࡧΛֻ͚͔ͯΒBERTʹ௨͢

    View Slide

  58. REALM [Guu+ 2020]
    58
    • ࣄલֶश࣌ʹ৘ใݕࡧʹΑͬͯ৘ใΛิ׬͢Δ

    View Slide

  59. ̎ɽ֤λεΫʹಛԽͨ͠Ϟσϧͷಈ޲
    59

    View Slide

  60. ࣭໰Ԡ౴
    60

    View Slide

  61. SQuAD [Rajpurkar+ EMNLP 2016]
    61
    • ࣭໰Ԡ౴ͷͨΊͷσʔληοτ
    • จষதʹ౴͕͑໌ࣔతʹଘࡏ͢Δ

    View Slide

  62. SQuAD2.0 [Rajpurkar+ ACL 2018]
    62
    • ஈམͷ৘ใ͚͔ͩΒͰ͸౴͑ΒΕͳ͍࣭໰Λ௥Ճͨ͠SQuAD
    • ͲΕ͕౴͑ΒΕͳ͍࣭໰Ͱ͋Δ͔Λ൑அ͢Δ͜ͱ΋ٻΊΒΕΔ

    View Slide

  63. DROP [Dua+ NAACL 2019]
    63
    • ஈམͷ༷ʑͳՕॴͷ৘ใΛ࢖Θ
    ͳ͍ͱ౴͑ΒΕͳ͍࣭໰

    View Slide

  64. QuAC [Choi+ EMNLP 2018]
    64
    • Wikipediaͷจষʹ͍ͭͯͷର࿩ܕ࣭໰
    Ԡ౴σʔληοτ
    • ࣭໰͕ର࿩ͷจ຺ʹґଘ͢ΔͳͲจ຺
    ͷཧղΛཁ͢Δ

    View Slide

  65. CoQA [Reddy+ TACL 2019]
    65
    • ର࿩త࣭໰Ԡ౴ͷσʔληοτ

    View Slide

  66. HotpotQA [Yang+ EMNLP 2018]
    66
    • ෳ਺ஈམΛލ͙จষཧղ͕ඞཁͱ͞ΕΔ࣭໰Ԡ౴ͷσʔληοτ

    View Slide

  67. Natural Questions [Kwiatkowski+ TACL 2019]
    67
    • ࣮ࡍͷGoogleݕࡧͷ݁ՌΛݩʹͨ͠Open-domain QAͷσʔληοτ

    View Slide

  68. RACE [Lai+ EMNLP 2017]
    68
    • தࠃͷӳޠͷࢼݧͷσʔληοτ
    • ௕จಡղͷϕϯνϚʔΫ

    View Slide

  69. จॻੜ੒
    69

    View Slide

  70. GEM [Gehrmann+ 2021]
    70
    • ݴޠੜ੒λεΫͷϕϯνϚʔΫ

    View Slide

  71. BLEURT [Sellam+ 2020]
    71
    • ϊΠζ͕෇Ճ͞ΕͨWikipediaͰࣄલֶश͠ɼਓؒͷධՁͰfine-tuning
    ͨ͠BERTΛ༻͍ͨධՁ

    View Slide

  72. จॻཁ໿
    72

    View Slide

  73. ProphetNet [Qi+ EMNLP 2020]
    73
    • Nݸઌ·Ͱͷจষ༧ଌ

    View Slide

  74. HIBERT [Zhang+ ACL 2019]
    74
    • BERTʹΑΔநग़ܕཁ໿
    • จॻϨϕϧͱจষϨϕϧͷϞσϧΛ༻͍ͯɼ͋Δจষ͕ཁ໿͔෼ྨ

    View Slide

  75. DiscoBERT [Xu+ ACL 2020]
    75
    • ҰจؙʑͰ͸ͳͦ͘ͷҰ෦Λநग़
    • จষͷྲྀΕΛάϥϑͰཅʹදݱ

    View Slide

  76. BART [Lewis+ ACL 2020]
    76
    • ૒ํ޲ͷจ຺Λߟྀ͠ͳ͕ΒςΩετੜ੒΋ՄೳͳݴޠϞσϧͷఏҊ
    • ༷ʑͳํ๏ͰจॻʹϊΠζΛՃ͑Δ͜ͱ͕Մೳ
    • จॻཁ໿ͳͲͰߴ͍ੑೳΛൃش
    ࠶ܝ

    View Slide

  77. BERTSum [Liu+ EMNLP 2019]
    77
    • BERTʹΑΔநग़ܕཁ໿ͱந৅ܕཁ໿
    • ந৅ܕཁ໿ͷͨΊʹ̎ஈճͷfine-tuningͷఏҊ

    View Slide

  78. PEGASUS [Zhang+ ICML 2020]
    78
    • ந৅ܕཁ໿ͷͨΊͷࣄલֶश๏ͷఏҊ
    • ϚεΫ͞Εͨॏཁͳ୯ޠͷੜ੒ͱ࢒Γͷจষͷੜ੒

    View Slide

  79. QAGS [Wang + ACL 2020]
    79
    • ཁ໿͔Βੜ੒͞Ε࣭ͨ໰Λݪจͱཁ໿ͦΕͧΕΛ༻͍ͯ౴͑ͤ͞ɼͦͷҰ
    க౓ΛݟΔ͜ͱͰཁ໿ͷ࣭ΛධՁ

    View Slide

  80. Summarization by feedback [Stiennon + NeurIPS 2020]
    80
    • ਓؒͷϑΟʔυόοΫΛใुʹڧԽֶश

    View Slide

  81. ݻ༗දݱநग़
    81

    View Slide

  82. Named Entity Recognition
    82
    [Li+ 2020 A Survey on Deep Learning for Named Entity Recognition]

    View Slide

  83. LUKE [Yamada+ ACL 2020]
    83
    • ୯ޠͱݻ༗දݱʹର͢Δmasked language modeling
    • Tokenͷछྨʢ୯ޠ or ΤϯςΟςΟʣΛҙࣝͨ͠attentionͷఏҊ

    View Slide

  84. BERTͱݻ༗දݱ [Balasubramanian+ RepL4NLP 2020]
    84
    • BERT͸ݻ༗දݱͷೖΕସ͑ʹରͯ͠੬ऑ

    View Slide

  85. จॻ෼ྨ
    85

    View Slide

  86. TopicBERT [Yamada+ ACL 2020]
    86
    • Topic modelingΛซ༻͢Δ͜ͱͰจॻ෼ྨͷޮ཰Λ͋͛ͨBERT

    View Slide

  87. ̏ɽTransformerͷ෼ੳ ධՁͷݟ௚͠
    87

    View Slide

  88. TransformerϞσϧͷ෼ੳ
    88

    View Slide

  89. Multi-head attentionʹ͓͚Δϔουͷ໾ׂ
    89
    • ҟͳΔϔουͰ֫ಘ͞ΕΔύλʔϯ͸ݶΒΕ͍ͯΔ͕ϔου͕ੑೳʹ༩͑
    ΔӨڹ͸͹Β͖͕ͭ͋Δ
    • ଟ͘ͷhead͸ੑೳʹӨڹͤͣॏཁ౓͸ֶशॳظʹܾ·Δ
    • Enc-dec attentionͷํ͕self-attentionΑΓmulti-head͕ॏཁ
    • ಉ͡૚ͷϔου͸ಉ͡Α͏ͳύλʔϯΛࣔ͢
    • ݴޠֶͰ͍͏ߏจ΍ڞࢀরʹ஫໨͍ͯ͠Δϔου͕ଘࡏ
    • ๅ͘͡Ծઆ͕੒Γཱͭ
    [Kovaleva+ EMNLP 2019]
    [Michel+ NeurIPS 2020]
    [Michel+ NeurIPS 2020]
    [Clark+ BlackBoxNLP 2019]
    [Clark+ BlackBoxNLP 2019]
    [Chen+ NeurIPS 2020]

    View Slide

  90. ֤૚ຖͷදݱͷҧ͍
    90
    • ઙ͍૚͸൚༻తͳɼਂ͍૚͸λεΫݻ༗ͷදݱΛ֫ಘ
    • ઙ͍૚͸token΍पғͷจ຺ʹґΔදݱΛ֫ಘ͢Δ͕૚ΛܦΔͱऑ·Δ
    • ਂ͍૚͸ΑΓ௕ظͷґଘؔ܎ٴͼҙຯతͳදݱΛ֫ಘ͢Δ
    [Aken+ CIKM 2019], [Peters+ RepL4NLP 2019], [Hao+ EMNLP 2019]
    [Lin+ BlackBoxNLP 2019], [Voita+ EMNLP 2019], [Ethayarajh+ EMNLP 2019], [Brunner+ ICLR 2020]
    [Raganato 2018], [Vig BlackBoxNLP 2019], [Jawahar ACL 2019]

    View Slide

  91. BERTͱଟݴޠཧղ
    91
    • ୯ҰݴޠͰͷֶशͰෳ਺ݴޠʹ൚Խ͢ΔදݱΛ֫ಘՄೳ
    • ଟݴޠBERT͸ݴޠڞ௨දݱΛ֫ಘͯ͠Δ/ͳ͍
    • ଟݴޠBERTͷදݱʹ΋ߏจ໦͕ຒΊࠐ·Ε͍ͯΔ
    • ޠኮͷҙຯͰॏͳ͍ͬͯΔࣄ͸ॏཁͰ͸ͳ͍
    [Artetxe 2019]
    [Libovicky 2019], [Singh+ ICLR 2019]
    [Chi ACL 2020]
    [Wang ICLR 2020]

    View Slide

  92. ݴޠֶతߏ଄ͷ෮ݩ
    92
    [Cenen NeurIPS 2019]
    • ݴޠ৘ใ͕ҙຯۭؒͱߏจۭؒʹผΕͯදݱ͞Ε͍ͯΔ
    • ELMOʹ΋BERTʹ΋ߏจ໦͕ຒΊࠐ·Ε͍ͯΔ
    • BERT͸֊૚ߏ଄Λ࣋ͭߏจදݱΛ֫ಘ͢Δ
    • Contextual model͸ྑ͍ߏจදݱΛ֫ಘ͢Δ͕ɼҙຯදݱͷҙຯͰ͸non-
    contextualͳख๏ͱେ͖ͳҧ͍͸ͳ͍
    • BERT͸จ๏ͷଟ͘ͷ஌ࣝΛ֫ಘ͢Δ͕͹Β͖ͭ΋େ͖͍
    • ʢ೔ຊޠʣBERT͸ޠॱͷ৘ใΛ׆༻͍ͯ͠Δ
    [Hewitt NAACL-HLT 2019]
    [Goldberg 2019]
    [Tenney ICLR 2019]
    [Warstadt EMNLP 2019]
    [Kuribayashi ACL 2020]

    View Slide

  93. TransformerϞσϧͷऑ఺
    93
    [Lin ACL 2020]
    • ఢରతֶशʹରͯ͠ؤ݈Ͱ͸ͳ͍
    • BERT͸ٖ૬ؔΛ׆༻͍ͯ͠Δ
    • Ұ౓தؒతλεΫʹfine-tuning͢Δͷ͸ѱӨڹΛ༩͑ΔՄೳੑ
    • ൱ఆʹऑ͍
    • Common Sense Knowledge͸ͳ͍
    [Jin+ AAAI 2020]
    [Niven+ ACL 2019]
    [Wang ACL 2020]
    [Ettinger ACL 2019], [Kassner ACL 2020]

    View Slide

  94. exBERT [Hoover+ ACL 2020]
    94
    • ֶशࡁΈͷBERT͕֫ಘͨ͠දݱΛՄࢹԽ͢ΔͨΊͷπʔϧ

    View Slide

  95. ධՁํ๏ͷݟ௚͠
    95

    View Slide

  96. SWAG [Zellers+ EMNLP 2018]
    96
    • Common sense inferenceͷϕϯνϚʔΫ
    • Annotationʹ൐͏όΠΞεΛ࡟ݮ͢ΔAdversarial FilteringΛఏҊ

    View Slide

  97. HAMLET [Nie+ ACL 2020]
    97
    • ٖ૬ؔʹ࿭Θ͞Εͳ͍ݴޠϞσϧΛֶश͢ΔͨΊͷɼσʔλͷऩू͔
    Β܇࿅ͷ݁ՌΛड͚ͨվળʹࢸΔ·ͰͷաఔͷఏҊ

    View Slide

  98. CheckList [Riberio+ ACL 2020 (Best Paper)]
    98
    • ϒϥοΫϘοΫεςετʹΑΔϞσϧධՁ

    View Slide

  99. ࣗಈධՁࢦඪͷ໰୊఺ʹ͍ͭͯ [Mathur+ ACL 2020]
    99
    • ෳ਺ͷػց຋༁γεςϜΛධՁͨ͠ࡍʹɼ֎Ε஋ͱͳΔγεςϜ͕ଘ
    ࡏ͢ΔͱࣗಈධՁࢦඪʹΑͬͯධՁΛߦ͏͜ͱ͕ࠔ೉ʹͳΔ఺Λࢦఠ

    View Slide

  100. ɽ·ͱΊ
    100

    View Slide

  101. ࢀߟࢿྉͳͲ
    101

    View Slide

  102. NLP-progress
    102
    • ࣗવݴޠॲཧͷλεΫຖͷϕϯνϚʔΫͱSOTA͕·ͱ·͍ͬͯΔ
    [https://github.com/sebastianruder/NLP-progress]

    View Slide

  103. A Primer in BERTology [Rogers+ 2020]
    103
    • BERTͷதͰԿ͕ى͖͍ͯΔͷ͔ʁͱ͍͏͜ͱΛܦݧతʹௐ΂ͨݚڀͨͪ
    ͷαʔϕΠ
    • BERTͷݱঢ়Ͱͷऑ఺ͳͲ΋·ͱ·͍ͬͯΔ
    [https://arxiv.org/pdf/2002.12327.pdf]

    View Slide

  104. ࢀߟࢿྉ
    104
    • [MLPapers](https://github.com/thunlp/PLMpapers)
    • [Highlights of ACL 2020](https://medium.com/analytics-vidhya/highlights-of-acl-2020-4ef9f27a4f0c)
    • [BERT-related Papers](https://github.com/tomohideshibata/BERT-related-papers)
    • [ML and NLP Research Highlights of 2020](https://ruder.io/research-highlights-2020/)
    • [จॻཁ໿ͷྺ࢙ΛḷͬͯʢʴBERTʹจॻཁ໿ͤͯ͞ΈΔʣ](https://qiita.com/siida36/items/4c0dbaa07c456a9fadd0)
    • [ࣄલֶशݴޠϞσϧͷಈ޲](https://speakerdeck.com/kyoun/survey-of-pretrained-language-models)
    • [ʲNLPʳ2020೥ʹੜ·ΕͨBERTͷ೿ੜܗ·ͱΊ](https://kai760.medium.com/nlp-
    2020%E5%B9%B4%E3%81%AB%E7%94%9F%E3%81%BE%E3%82%8C%E3%81%9Fbert%E3%81%AE%E6%B4%BE%E7%94%9F%E5%BD%A2%
    E3%81%BE%E3%81%A8%E3%82%81-36f2f455919d)
    • [GPT-3ͷিܸ](https://deeplearning.hatenablog.com/entry/gpt3)
    • [Rogers+ 2020 A Primer in BERTology: What we know about how BERT works](https://arxiv.org/pdf/2002.12327.pdf)
    • [Tay+ 2020 Efficient Transformers: A Survey](https://arxiv.org/pdf/2009.06732.pdf)
    • [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey](https://arxiv.org/pdf/2003.08271.pdf)
    • [Liu+ 2020 A Survey on Contextual Embeddings](https://arxiv.org/pdf/2003.07278.pdf)
    • [Xia+ EMNLP 2020 Which *BERT? A Survey Organizing Contextualized Encoders](https://arxiv.org/pdf/2010.00854.pdf)
    • [Li+ IEEE Transactions on Knowledge and Data Engineering 2018 A Survey on Deep Learning for Named Entity
    Recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9039685)

    View Slide

  105. ͓·͚
    105

    View Slide

  106. GPT-3 [Brown+ 2020]
    106
    [https://deeplearning.hatenablog.com/entry/gpt3]

    View Slide

  107. GPT-3 [Brown+ 2020]
    107
    [https://twitter.com/sharifshameem/status/1283322990625607681?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweete
    mbed%7Ctwterm%5E1283322990625607681%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fdeeple
    arning.hatenablog.com%2Fentry%2Fgpt3]

    View Slide

  108. GPT-3 [Brown+ 2020]
    108
    [https://twitter.com/sh_reya/status/1284746918959239168?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7
    Ctwterm%5E1284746918959239168%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fdeeplearning.ha
    tenablog.com%2Fentry%2Fgpt3]

    View Slide

  109. DALL•E [OpenAI 2021]
    109
    • ݴޠͷࢦࣔʹैͬͯਓ޻ը૾Λੜ੒͢ΔϞσϧΒ͍͠
    • ݴޠͷߏ੒ੑΛखͳ͚͍ͮͯΔΑ͏ʹݟ͑Δͷ͕ͦ͢͝͏
    [https://openai.com/blog/dall-e/]

    View Slide

  110. DALL•E [OpenAI 2021]
    110
    [https://openai.com/blog/dall-e/]

    View Slide

  111. DALL•E [OpenAI 2021]
    111
    [https://openai.com/blog/dall-e/]

    View Slide

  112. 112

    View Slide