Slide 1

Slide 1 text

BERTޙͷࣗવݴޠॲཧʹ͍ͭͯͷαʔϕΠ ߴ໦ࢤ࿠ 1

Slide 2

Slide 2 text

αʔϕΠͷ໨త 2 1. BERTҎ߱ͷࣄલֶशࣗવݴޠϞσϧͷಈ޲Λ஌Γ͍ͨʂ • ࠷৽ͷঢ়گΛ஌Γ͍ͨ • ҙຯͷ͋Γͦ͏ͳվળ͕஌Γ͍ͨ ࠷ۙͲΜͳࣗવݴޠλεΫ͕ఏҊ͞Ε͍ͯΔͷ͔஌Γ͍ͨʂ ˠ ࣄલֶशࣗવݴޠϞσϧͷ࠷ۙͷಈ޲΍/-1λεΫΛ޿͘ઙ͘঺հ

Slide 3

Slide 3 text

ࠓ೔ͷྲྀΕ 3 ̍ɽ൚༻ࣄલֶशݴޠϞσϧͷಈ޲ ̎ɽ֤λεΫʹಛԽͨ͠Ϟσϧͷಈ޲ ̏ɽTransformerͷ෼ੳ ධՁͷݟ௚͠ ̐ɽ·ͱΊ

Slide 4

Slide 4 text

̍ɽ൚༻ࣄલֶशݴޠϞσϧͷಈ޲ 4

Slide 5

Slide 5 text

ಋೖ 5

Slide 6

Slide 6 text

ࣄલֶशݴޠϞσϧҰཡ 6 https://github.com/thunlp/PLMpapers

Slide 7

Slide 7 text

ࣄલֶशݴޠϞσϧҰཡ 7 [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey]

Slide 8

Slide 8 text

Pre-training & Fine-tuning 8 ࣄલֶश 'JOFUVOJOH

Slide 9

Slide 9 text

Self-attention 9 [Cui+ EMNLP 2019] Values Query Softmax Keys token Query Key Value 𝑊! 𝑊" 𝑊#

Slide 10

Slide 10 text

GLUE [Wang+ ICLR 2019] 10 • ૯߹తͳݴޠཧղೳྗΛଌΔͨΊͷϕϯνϚʔΫ • จ๏൑ఆɼײ৘෼ੳɼಉٛจ൑ఆɼྨࣅจ൑ఆɼಉ࣭ٛ໰൑ఆɼؚҙؔ܎ ൑ఆɼ࣭໰Ԡ౴ɼݴ͍׵͑൑ఆɼͳͲ

Slide 11

Slide 11 text

SuperGLUE [Wang+ NeurIPS 2019] 11 • ೉͍͠GLUE

Slide 12

Slide 12 text

ࣗݾූ߸ԽܕϞσϧ 12

Slide 13

Slide 13 text

BERT [Delvin+ NAACL 2019] 13 • ૒ํ޲ͷจ຺Λߟྀͨ͠TransformerϕʔεͷࣄલֶशݴޠϞσϧ • Masked Language ModelͱNext Sentence Predictionͷͭͷࣄલֶश

Slide 14

Slide 14 text

Masked Language Model 14 BERT ෢࢜ಓ͸ͦͷද௃ͨΔࡩՖͱಉ͘͡ɺ೔ຊͷ౔஍ʹݻ༗ͷՖͰ͋Δ [CLS] ෢࢜ಓ ೔ຊ

Slide 15

Slide 15 text

Next Sentence Prediction 15 ߹ཧੑ͸͋͘·Ͱ͋ΜͨͷੈքͰͷϧʔϧ BERT ͦͷೄ͡ΌΦϨ͸͠͹ΕͶ͑Α [SEP] [SEP] [CLS] [https://www.geeksforgeeks.org/understanding-bert-nlp/] /FYU4FOUFODF YES NO

Slide 16

Slide 16 text

MT-DNN [Liu+ ACL 2019] 16 • 'JOFUVOJOH࣌ʹϚϧνλεΫֶशΛ௥ՃͰߦ͏͜ͱͰਫ਼౓޲্ • BERTΑΓ΋ޮ཰తʹυϝΠϯదԠ΋Մೳ

Slide 17

Slide 17 text

SpanBERT [Joshi+ TACL 2019] 17 • ҰఆൣғΛϚεΫͨ͠.BTLFE-BOHVBHF.PEFM

Slide 18

Slide 18 text

RoBERTa [Liu+ 2019] 18 • #&35ͷϋΠύϥ୳ࡧ • ΑΓେ͖ͳόοναΠζ ɼσʔλɼεςοϓ਺Ͱֶश • Next sentence predictionͷഇࢭ • GLUEɼSQuADɼRACEͰSOTA

Slide 19

Slide 19 text

DeBERTa [He+ ICLR 2021] 19 • SoftmaxͷલʹτʔΫϯͷઈରҐஔͷ৘ใΛ෇Ճ • ୯ޠͷ಺༰ͱҐஔΛผʑʹ2ຊͷϕΫτϧͰຒΊࠐΉ disentangled attentionͷఏҊ • SuperGLUEͰਓؒ௒͑

Slide 20

Slide 20 text

ࣗݾճؼܕϞσϧ 20

Slide 21

Slide 21 text

Autoregressive Language Model 21 [http://peterbloem.nl/blog/transformers] [Yang+ NeurIPS 2019]

Slide 22

Slide 22 text

GPT family [Radford+ 2018, Radford+ 2019, Brown+ 2020] 22 • ࣗݾճؼܕͷࣄલֶशݴޠϞσϧ • (15Ҏ߱ͷڻ͘΂͖ੜ੒݁ՌͰ஌ΒΕΔ • ύϥϝʔλ਺΍σʔλ਺ͷεέʔϧଇͷઌۦ͚

Slide 23

Slide 23 text

XLNet [Yang+ NeurIPS 2019] 23 • ࣗݾճؼܕͱࣗݾූ߸Խܕͷ૒ํͷར఺Λ׆༻͢ΔࣄલֶशݴޠϞσϧ • ೖྗܥྻͷฒͼม͑ʹର͢Δ༧ଌֶश ࣗݾճؼܕ ࣗݾූ߸Խܕ

Slide 24

Slide 24 text

4FRUP4FR 24

Slide 25

Slide 25 text

MASS [Song+ ICML 2019] 25 • &ODPEFSEFDPEFSϞσϧͷͨΊͷࣄલֶश๏ͷఏҊ • Masked language modelͷग़ྗ͕ෳ਺ʹͳͬͨܗ

Slide 26

Slide 26 text

BART [Lewis+ ACL 2020] 26 • ૒ํ޲ͷจ຺Λߟྀ͠ͳ͕ΒςΩετੜ੒΋ՄೳͳݴޠϞσϧͷఏҊ • ༷ʑͳํ๏ͰจॻʹϊΠζΛՃ͑Δ͜ͱ͕Մೳ • จॻཁ໿ͳͲͰߴ͍ੑೳΛൃش

Slide 27

Slide 27 text

T5 [Raffel+ JMLR 2020] 27 • ࣗવݴޠλεΫΛςΩετ͔ΒςΩετ΁ͷࣸ૾ͱͯ͠දݱ • ༷ʑͳࣗવݴޠλεΫΛ౷Ұతʹѻ͑Δࣄલֶख๏ͷఏҊ

Slide 28

Slide 28 text

Prefix Language Model 28 • prefixͷ෦෼͚ͩ૒ํ޲ͷจ຺ͷ࢖༻ΛڐՄ

Slide 29

Slide 29 text

UniLM [Dong+ NeurIPS 2019] 29 • ୯ํ޲ɼ૒ํ޲ɼTFRTFR ͷݴޠϞσϧΛಉ࣌ʹֶश • ҟͳΔattention maskΛ༻ ͍ͯtoken͕࢖༻Ͱ͖Δจ ຺৘ใΛ੍ޚ͢Δ͜ͱͰ্ هΛ࣮ݱ

Slide 30

Slide 30 text

ࣄલֶशʹؔͯ͠ 30

Slide 31

Slide 31 text

Masking 31 [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT Works]

Slide 32

Slide 32 text

Next Sentence Prediction 32 [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT Works] [Shi+ ACL 2020 Next Sentence Prediction helps Implicit Discourse Relation Classification within and across Domains Works]

Slide 33

Slide 33 text

Pre-training Objectives 33 [Liu+ 2020 A Survey on Contextual Embeddings]

Slide 34

Slide 34 text

'JOFUVOJOHʹؔͯ͠ 34

Slide 35

Slide 35 text

Fine-tuning 35 • ਂ૚Խ͕ॏཁ • ̎ஈ֊ࣄલֶश • ఢରతֶश • Data-augumentation [Rugers+ 2020 A Primer in BERTology: What We Know About How BERT Works] [Pang+ 2019, Garg+ 2020, Arase & Tsuji 2019, Pruksachatkun+ 2020, Glavas & Vulic 2020] [Zhu+ 2019, Jiang+ 2019] [Lee+ 2019]

Slide 36

Slide 36 text

Ϟσϧͷখن໛Խ 36

Slide 37

Slide 37 text

Compressed Transformers (1/2) 37 [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey]

Slide 38

Slide 38 text

Compressed Transformers (2/2) 38 [Rogers+ 2020 A Primer in BERTology: What We Know About How BERT Works]

Slide 39

Slide 39 text

ALBERT [Lan+ ICLR 2020] 39 • ຒΊࠐΈ࣍ݩ࡟ݮͱύϥϝʔλڞ༗ʹΑΔBERTͷܰྔԽ • ෛྫͷߏ੒Λվྑͨ͠next sentence predictionͷఏҊ 𝑉 𝐻 𝐸 𝐸 𝑉 𝐻 Sentence-order prediction Factorized embedding

Slide 40

Slide 40 text

DistilBERT [Sanh + 2019] 40 • BERTΛৠཹͨ͠΋ͷ • 40%ͷϞσϧαΠζ࡟ݮɼ60%ͷߴ଎ԽɼΘ͔ͣ3%ͷੑೳྼԽ

Slide 41

Slide 41 text

TinyBERT [Jiao+ EMNLP 2020] 41 • ಉ͘͡BERTͷৠཹͰɼ1/7ͷϞσϧαΠζɼ9ഒͷߴ଎Խ

Slide 42

Slide 42 text

Q-BERT [Shen+ AAAI 2020] 42 • BERTͷྔࢠԽ • Hessianͷݻ༗஋ͷฏۉͱ෼ࢄʹج͍ͮͯਫ਼౓Λམͱ͢ͱ͜ΖΛܾΊΔ

Slide 43

Slide 43 text

ԋࢉͷޮ཰Խ 43

Slide 44

Slide 44 text

Efficient Transformers 44 [Tay+ 2020 Efficient Transformers: A Survey]

Slide 45

Slide 45 text

Sparse Transformer [Child+ 2019] 45 • ہॴతͳؔ܎ʹ੍ݶ͞ΕͨattentionͷఏҊ

Slide 46

Slide 46 text

Longformer [Beltagy+ 2020] 46 • ݻఆ૭಺ͷہॴతattentionͱλεΫʹจ຺͍ͮͨେҬతattentionͷซ༻ • attentionͷܭࢉ͕ઢܗΦʔμʔʹམͪΔ͜ͱͰ௕จʹରԠՄೳ

Slide 47

Slide 47 text

Big Bird [Zaheer+ NeurIPS 2020] 47 • ϥϯμϜɼہॴɼେҬattentionͷซ༻

Slide 48

Slide 48 text

Performer [Choromanski + ICLR 2021] 48 • ݩͷattentionΛ֬཰తʹਖ਼֬ʹਪఆͰ͖Δཧ࿦อূ͖ͭઢܗΦʔμʔͷ attentionͷఏҊ • εύʔεੑͷԾఆΛඞཁͱ͠ͳ͍ଞɼsoftmax Ҏ֎ʹ΋ద༻Մೳ

Slide 49

Slide 49 text

Reformer [Kitaev+ ICLR 2021] 49 • ͍ۙϕΫτϧΛಉ͡ϋογϡ஋ʹׂΓ౰ͯΔattentionͷఏҊ • O(N^2)ͷattentionͷܭࢉΛO(N log N)ʹམͱ͢͜ͱͰ௕จʹ΋ରԠ

Slide 50

Slide 50 text

Long Range Arena [Tay + ICLR 2021] 50 • ௕͍จষͷݴޠॲཧͷϕϯνϚʔΫ • Efficient transformersΛൺֱ͢ΔͨΊͷࢦඪ

Slide 51

Slide 51 text

ELECTRA [Clark+ ICLR 2020] 51 • Masked language modelͷ୅ΘΓʹఢରతֶशΛ༻͍ΔࣄલֶशͷఏҊ • ߴʑ1/4ͷܭࢉࢿݯͰXLNet΍RoBERTaฒΈͷੑೳ • 1GPUͰ4೔ͷֶशͰGPTΛ྇կ

Slide 52

Slide 52 text

Ϟσϧͷେن໛Խ 52

Slide 53

Slide 53 text

Large Models 53 [State of AI Report 2020 (https://www.stateof.ai/)] • Megatron-LM (80ԯ) [Shoeiby+ ACL 2020] • Turing-NLG (170ԯ) [Microsoft 2020] • GPT-3 (1750ԯ) [Brown+ 2020] [https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/]

Slide 54

Slide 54 text

֎෦஌ࣝͷ׆༻ 54

Slide 55

Slide 55 text

THU-ERNIE [Zhang + ACL 2019] 55 • ஌ࣝάϥϑΛ૊ΈࠐΜͩࣄલֶशݴޠϞσϧ • BERTͷຒΊࠐΈ͔Β஌ࣝάϥϑͷΤϯςΟςΟΛग़ྗ

Slide 56

Slide 56 text

KnowBERT [Peters + EMNLP-IJCNLP 2019] 56 • ΤϯςΟςΟͷຒΊࠐΈʹΑͬͯBERTͷຒΊࠐΈΛจ຺͚ͮΔ

Slide 57

Slide 57 text

K-BERT [Liu + AAAI 2020] 57 • ஌ࣝάϥϑͰݕࡧΛֻ͚͔ͯΒBERTʹ௨͢

Slide 58

Slide 58 text

REALM [Guu+ 2020] 58 • ࣄલֶश࣌ʹ৘ใݕࡧʹΑͬͯ৘ใΛิ׬͢Δ

Slide 59

Slide 59 text

̎ɽ֤λεΫʹಛԽͨ͠Ϟσϧͷಈ޲ 59

Slide 60

Slide 60 text

࣭໰Ԡ౴ 60

Slide 61

Slide 61 text

SQuAD [Rajpurkar+ EMNLP 2016] 61 • ࣭໰Ԡ౴ͷͨΊͷσʔληοτ • จষதʹ౴͕͑໌ࣔతʹଘࡏ͢Δ

Slide 62

Slide 62 text

SQuAD2.0 [Rajpurkar+ ACL 2018] 62 • ஈམͷ৘ใ͚͔ͩΒͰ͸౴͑ΒΕͳ͍࣭໰Λ௥Ճͨ͠SQuAD • ͲΕ͕౴͑ΒΕͳ͍࣭໰Ͱ͋Δ͔Λ൑அ͢Δ͜ͱ΋ٻΊΒΕΔ

Slide 63

Slide 63 text

DROP [Dua+ NAACL 2019] 63 • ஈམͷ༷ʑͳՕॴͷ৘ใΛ࢖Θ ͳ͍ͱ౴͑ΒΕͳ͍࣭໰

Slide 64

Slide 64 text

QuAC [Choi+ EMNLP 2018] 64 • Wikipediaͷจষʹ͍ͭͯͷର࿩ܕ࣭໰ Ԡ౴σʔληοτ • ࣭໰͕ର࿩ͷจ຺ʹґଘ͢ΔͳͲจ຺ ͷཧղΛཁ͢Δ

Slide 65

Slide 65 text

CoQA [Reddy+ TACL 2019] 65 • ର࿩త࣭໰Ԡ౴ͷσʔληοτ

Slide 66

Slide 66 text

HotpotQA [Yang+ EMNLP 2018] 66 • ෳ਺ஈམΛލ͙จষཧղ͕ඞཁͱ͞ΕΔ࣭໰Ԡ౴ͷσʔληοτ

Slide 67

Slide 67 text

Natural Questions [Kwiatkowski+ TACL 2019] 67 • ࣮ࡍͷGoogleݕࡧͷ݁ՌΛݩʹͨ͠Open-domain QAͷσʔληοτ

Slide 68

Slide 68 text

RACE [Lai+ EMNLP 2017] 68 • தࠃͷӳޠͷࢼݧͷσʔληοτ • ௕จಡղͷϕϯνϚʔΫ

Slide 69

Slide 69 text

จॻੜ੒ 69

Slide 70

Slide 70 text

GEM [Gehrmann+ 2021] 70 • ݴޠੜ੒λεΫͷϕϯνϚʔΫ

Slide 71

Slide 71 text

BLEURT [Sellam+ 2020] 71 • ϊΠζ͕෇Ճ͞ΕͨWikipediaͰࣄલֶश͠ɼਓؒͷධՁͰfine-tuning ͨ͠BERTΛ༻͍ͨධՁ

Slide 72

Slide 72 text

จॻཁ໿ 72

Slide 73

Slide 73 text

ProphetNet [Qi+ EMNLP 2020] 73 • Nݸઌ·Ͱͷจষ༧ଌ

Slide 74

Slide 74 text

HIBERT [Zhang+ ACL 2019] 74 • BERTʹΑΔநग़ܕཁ໿ • จॻϨϕϧͱจষϨϕϧͷϞσϧΛ༻͍ͯɼ͋Δจষ͕ཁ໿͔෼ྨ

Slide 75

Slide 75 text

DiscoBERT [Xu+ ACL 2020] 75 • ҰจؙʑͰ͸ͳͦ͘ͷҰ෦Λநग़ • จষͷྲྀΕΛάϥϑͰཅʹදݱ

Slide 76

Slide 76 text

BART [Lewis+ ACL 2020] 76 • ૒ํ޲ͷจ຺Λߟྀ͠ͳ͕ΒςΩετੜ੒΋ՄೳͳݴޠϞσϧͷఏҊ • ༷ʑͳํ๏ͰจॻʹϊΠζΛՃ͑Δ͜ͱ͕Մೳ • จॻཁ໿ͳͲͰߴ͍ੑೳΛൃش ࠶ܝ

Slide 77

Slide 77 text

BERTSum [Liu+ EMNLP 2019] 77 • BERTʹΑΔநग़ܕཁ໿ͱந৅ܕཁ໿ • ந৅ܕཁ໿ͷͨΊʹ̎ஈճͷfine-tuningͷఏҊ

Slide 78

Slide 78 text

PEGASUS [Zhang+ ICML 2020] 78 • ந৅ܕཁ໿ͷͨΊͷࣄલֶश๏ͷఏҊ • ϚεΫ͞Εͨॏཁͳ୯ޠͷੜ੒ͱ࢒Γͷจষͷੜ੒

Slide 79

Slide 79 text

QAGS [Wang + ACL 2020] 79 • ཁ໿͔Βੜ੒͞Ε࣭ͨ໰Λݪจͱཁ໿ͦΕͧΕΛ༻͍ͯ౴͑ͤ͞ɼͦͷҰ க౓ΛݟΔ͜ͱͰཁ໿ͷ࣭ΛධՁ

Slide 80

Slide 80 text

Summarization by feedback [Stiennon + NeurIPS 2020] 80 • ਓؒͷϑΟʔυόοΫΛใुʹڧԽֶश

Slide 81

Slide 81 text

ݻ༗දݱநग़ 81

Slide 82

Slide 82 text

Named Entity Recognition 82 [Li+ 2020 A Survey on Deep Learning for Named Entity Recognition]

Slide 83

Slide 83 text

LUKE [Yamada+ ACL 2020] 83 • ୯ޠͱݻ༗දݱʹର͢Δmasked language modeling • Tokenͷछྨʢ୯ޠ or ΤϯςΟςΟʣΛҙࣝͨ͠attentionͷఏҊ

Slide 84

Slide 84 text

BERTͱݻ༗දݱ [Balasubramanian+ RepL4NLP 2020] 84 • BERT͸ݻ༗දݱͷೖΕସ͑ʹରͯ͠੬ऑ

Slide 85

Slide 85 text

จॻ෼ྨ 85

Slide 86

Slide 86 text

TopicBERT [Yamada+ ACL 2020] 86 • Topic modelingΛซ༻͢Δ͜ͱͰจॻ෼ྨͷޮ཰Λ͋͛ͨBERT

Slide 87

Slide 87 text

̏ɽTransformerͷ෼ੳ ධՁͷݟ௚͠ 87

Slide 88

Slide 88 text

TransformerϞσϧͷ෼ੳ 88

Slide 89

Slide 89 text

Multi-head attentionʹ͓͚Δϔουͷ໾ׂ 89 • ҟͳΔϔουͰ֫ಘ͞ΕΔύλʔϯ͸ݶΒΕ͍ͯΔ͕ϔου͕ੑೳʹ༩͑ ΔӨڹ͸͹Β͖͕ͭ͋Δ • ଟ͘ͷhead͸ੑೳʹӨڹͤͣॏཁ౓͸ֶशॳظʹܾ·Δ • Enc-dec attentionͷํ͕self-attentionΑΓmulti-head͕ॏཁ • ಉ͡૚ͷϔου͸ಉ͡Α͏ͳύλʔϯΛࣔ͢ • ݴޠֶͰ͍͏ߏจ΍ڞࢀরʹ஫໨͍ͯ͠Δϔου͕ଘࡏ • ๅ͘͡Ծઆ͕੒Γཱͭ [Kovaleva+ EMNLP 2019] [Michel+ NeurIPS 2020] [Michel+ NeurIPS 2020] [Clark+ BlackBoxNLP 2019] [Clark+ BlackBoxNLP 2019] [Chen+ NeurIPS 2020]

Slide 90

Slide 90 text

֤૚ຖͷදݱͷҧ͍ 90 • ઙ͍૚͸൚༻తͳɼਂ͍૚͸λεΫݻ༗ͷදݱΛ֫ಘ • ઙ͍૚͸token΍पғͷจ຺ʹґΔදݱΛ֫ಘ͢Δ͕૚ΛܦΔͱऑ·Δ • ਂ͍૚͸ΑΓ௕ظͷґଘؔ܎ٴͼҙຯతͳදݱΛ֫ಘ͢Δ [Aken+ CIKM 2019], [Peters+ RepL4NLP 2019], [Hao+ EMNLP 2019] [Lin+ BlackBoxNLP 2019], [Voita+ EMNLP 2019], [Ethayarajh+ EMNLP 2019], [Brunner+ ICLR 2020] [Raganato 2018], [Vig BlackBoxNLP 2019], [Jawahar ACL 2019]

Slide 91

Slide 91 text

BERTͱଟݴޠཧղ 91 • ୯ҰݴޠͰͷֶशͰෳ਺ݴޠʹ൚Խ͢ΔදݱΛ֫ಘՄೳ • ଟݴޠBERT͸ݴޠڞ௨දݱΛ֫ಘͯ͠Δ/ͳ͍ • ଟݴޠBERTͷදݱʹ΋ߏจ໦͕ຒΊࠐ·Ε͍ͯΔ • ޠኮͷҙຯͰॏͳ͍ͬͯΔࣄ͸ॏཁͰ͸ͳ͍ [Artetxe 2019] [Libovicky 2019], [Singh+ ICLR 2019] [Chi ACL 2020] [Wang ICLR 2020]

Slide 92

Slide 92 text

ݴޠֶతߏ଄ͷ෮ݩ 92 [Cenen NeurIPS 2019] • ݴޠ৘ใ͕ҙຯۭؒͱߏจۭؒʹผΕͯදݱ͞Ε͍ͯΔ • ELMOʹ΋BERTʹ΋ߏจ໦͕ຒΊࠐ·Ε͍ͯΔ • BERT͸֊૚ߏ଄Λ࣋ͭߏจදݱΛ֫ಘ͢Δ • Contextual model͸ྑ͍ߏจදݱΛ֫ಘ͢Δ͕ɼҙຯදݱͷҙຯͰ͸non- contextualͳख๏ͱେ͖ͳҧ͍͸ͳ͍ • BERT͸จ๏ͷଟ͘ͷ஌ࣝΛ֫ಘ͢Δ͕͹Β͖ͭ΋େ͖͍ • ʢ೔ຊޠʣBERT͸ޠॱͷ৘ใΛ׆༻͍ͯ͠Δ [Hewitt NAACL-HLT 2019] [Goldberg 2019] [Tenney ICLR 2019] [Warstadt EMNLP 2019] [Kuribayashi ACL 2020]

Slide 93

Slide 93 text

TransformerϞσϧͷऑ఺ 93 [Lin ACL 2020] • ఢରతֶशʹରͯ͠ؤ݈Ͱ͸ͳ͍ • BERT͸ٖ૬ؔΛ׆༻͍ͯ͠Δ • Ұ౓தؒతλεΫʹfine-tuning͢Δͷ͸ѱӨڹΛ༩͑ΔՄೳੑ • ൱ఆʹऑ͍ • Common Sense Knowledge͸ͳ͍ [Jin+ AAAI 2020] [Niven+ ACL 2019] [Wang ACL 2020] [Ettinger ACL 2019], [Kassner ACL 2020]

Slide 94

Slide 94 text

exBERT [Hoover+ ACL 2020] 94 • ֶशࡁΈͷBERT͕֫ಘͨ͠දݱΛՄࢹԽ͢ΔͨΊͷπʔϧ

Slide 95

Slide 95 text

ධՁํ๏ͷݟ௚͠ 95

Slide 96

Slide 96 text

SWAG [Zellers+ EMNLP 2018] 96 • Common sense inferenceͷϕϯνϚʔΫ • Annotationʹ൐͏όΠΞεΛ࡟ݮ͢ΔAdversarial FilteringΛఏҊ

Slide 97

Slide 97 text

HAMLET [Nie+ ACL 2020] 97 • ٖ૬ؔʹ࿭Θ͞Εͳ͍ݴޠϞσϧΛֶश͢ΔͨΊͷɼσʔλͷऩू͔ Β܇࿅ͷ݁ՌΛड͚ͨվળʹࢸΔ·ͰͷաఔͷఏҊ

Slide 98

Slide 98 text

CheckList [Riberio+ ACL 2020 (Best Paper)] 98 • ϒϥοΫϘοΫεςετʹΑΔϞσϧධՁ

Slide 99

Slide 99 text

ࣗಈධՁࢦඪͷ໰୊఺ʹ͍ͭͯ [Mathur+ ACL 2020] 99 • ෳ਺ͷػց຋༁γεςϜΛධՁͨ͠ࡍʹɼ֎Ε஋ͱͳΔγεςϜ͕ଘ ࡏ͢ΔͱࣗಈධՁࢦඪʹΑͬͯධՁΛߦ͏͜ͱ͕ࠔ೉ʹͳΔ఺Λࢦఠ

Slide 100

Slide 100 text

ɽ·ͱΊ 100

Slide 101

Slide 101 text

ࢀߟࢿྉͳͲ 101

Slide 102

Slide 102 text

NLP-progress 102 • ࣗવݴޠॲཧͷλεΫຖͷϕϯνϚʔΫͱSOTA͕·ͱ·͍ͬͯΔ [https://github.com/sebastianruder/NLP-progress]

Slide 103

Slide 103 text

A Primer in BERTology [Rogers+ 2020] 103 • BERTͷதͰԿ͕ى͖͍ͯΔͷ͔ʁͱ͍͏͜ͱΛܦݧతʹௐ΂ͨݚڀͨͪ ͷαʔϕΠ • BERTͷݱঢ়Ͱͷऑ఺ͳͲ΋·ͱ·͍ͬͯΔ [https://arxiv.org/pdf/2002.12327.pdf]

Slide 104

Slide 104 text

ࢀߟࢿྉ 104 • [MLPapers](https://github.com/thunlp/PLMpapers) • [Highlights of ACL 2020](https://medium.com/analytics-vidhya/highlights-of-acl-2020-4ef9f27a4f0c) • [BERT-related Papers](https://github.com/tomohideshibata/BERT-related-papers) • [ML and NLP Research Highlights of 2020](https://ruder.io/research-highlights-2020/) • [จॻཁ໿ͷྺ࢙ΛḷͬͯʢʴBERTʹจॻཁ໿ͤͯ͞ΈΔʣ](https://qiita.com/siida36/items/4c0dbaa07c456a9fadd0) • [ࣄલֶशݴޠϞσϧͷಈ޲](https://speakerdeck.com/kyoun/survey-of-pretrained-language-models) • [ʲNLPʳ2020೥ʹੜ·ΕͨBERTͷ೿ੜܗ·ͱΊ](https://kai760.medium.com/nlp- 2020%E5%B9%B4%E3%81%AB%E7%94%9F%E3%81%BE%E3%82%8C%E3%81%9Fbert%E3%81%AE%E6%B4%BE%E7%94%9F%E5%BD%A2% E3%81%BE%E3%81%A8%E3%82%81-36f2f455919d) • [GPT-3ͷিܸ](https://deeplearning.hatenablog.com/entry/gpt3) • [Rogers+ 2020 A Primer in BERTology: What we know about how BERT works](https://arxiv.org/pdf/2002.12327.pdf) • [Tay+ 2020 Efficient Transformers: A Survey](https://arxiv.org/pdf/2009.06732.pdf) • [Qiu+ 2020 Pre-trained Models for Natural Language Processing: A Survey](https://arxiv.org/pdf/2003.08271.pdf) • [Liu+ 2020 A Survey on Contextual Embeddings](https://arxiv.org/pdf/2003.07278.pdf) • [Xia+ EMNLP 2020 Which *BERT? A Survey Organizing Contextualized Encoders](https://arxiv.org/pdf/2010.00854.pdf) • [Li+ IEEE Transactions on Knowledge and Data Engineering 2018 A Survey on Deep Learning for Named Entity Recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9039685)

Slide 105

Slide 105 text

͓·͚ 105

Slide 106

Slide 106 text

GPT-3 [Brown+ 2020] 106 [https://deeplearning.hatenablog.com/entry/gpt3]

Slide 107

Slide 107 text

GPT-3 [Brown+ 2020] 107 [https://twitter.com/sharifshameem/status/1283322990625607681?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweete mbed%7Ctwterm%5E1283322990625607681%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fdeeple arning.hatenablog.com%2Fentry%2Fgpt3]

Slide 108

Slide 108 text

GPT-3 [Brown+ 2020] 108 [https://twitter.com/sh_reya/status/1284746918959239168?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7 Ctwterm%5E1284746918959239168%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fdeeplearning.ha tenablog.com%2Fentry%2Fgpt3]

Slide 109

Slide 109 text

DALL•E [OpenAI 2021] 109 • ݴޠͷࢦࣔʹैͬͯਓ޻ը૾Λੜ੒͢ΔϞσϧΒ͍͠ • ݴޠͷߏ੒ੑΛखͳ͚͍ͮͯΔΑ͏ʹݟ͑Δͷ͕ͦ͢͝͏ [https://openai.com/blog/dall-e/]

Slide 110

Slide 110 text

DALL•E [OpenAI 2021] 110 [https://openai.com/blog/dall-e/]

Slide 111

Slide 111 text

DALL•E [OpenAI 2021] 111 [https://openai.com/blog/dall-e/]

Slide 112

Slide 112 text

112