Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Unified Language Model Pre-training for Natural...
Search
Scatter Lab Inc.
April 10, 2020
Research
2.3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Unified Language Model Pre-training for Natural Language Understanding and Generation
Scatter Lab Inc.
April 10, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Other Decks in Research
See All in Research
Ghost in the 7‑Zip: The Shadow of Residential Proxies Creeping into Your Life
nttcom
0
920
業界横断 副業コンプライアンス調査 三者(副業者・本業先・発注者)におけるトラブル認知ギャップの構造分析
fkske
0
1.3k
Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)
morishtr
1
250
Harness Engineering and Al Agent
kzinmr
3
1.6k
適応的スパムフィルタのための軽量な類似メッセージカウンタ / jsai2026-adaptive-spam-filter
monochromegane
0
1.5k
Φ-Sat-2のAutoEncoderによる情報圧縮系論文
satai
4
750
セマンティック通信勉強会 6Gに向けたデバイス間効率的な通信の技術紹介・課題・今後展望
satai
3
150
コーディングエージェントとABNを再考
hf149
2
700
正規分布と最適化について
koide3
1
240
PGDM: Physically Guided Diffusion Model for L Downscaling
satai
2
250
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
260
多様なデータを許容し学習し続ける模倣学習 / Advanced Imitation Learning for VLA
prinlab
0
210
Featured
See All Featured
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
400
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
4k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Designing for Timeless Needs
cassininazir
1
250
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Building Adaptive Systems
keathley
44
3k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
160
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.8k
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
The Pragmatic Product Professional
lauravandoore
37
7.3k
How to Ace a Technical Interview
jacobian
281
24k
The browser strikes back
jonoalderson
0
1.1k
Transcript
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong et al., NeurIPS 2019 (Microsoft) ࢲ࢚ (ML Research Scientist, Pingpong)
ݾର ݾର 1. Pre-training Language Model ѐਃ 2. Unified Language
Model 1. Method 2. Pre-training step 3. Fine-tuning step 3. Experiments 1. NLG Task 2. NLU Task
Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ
Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ • BERT,
GPT, ELMOח п ߑधਵ۽ જ ࢿҗܳ ਵա ױ ઓೠ. • (e.g. BERTח নߑೱۄח ౠࢿਵ۽ ੋ೧ ֫ ࢿמਸ ഛࠁೞ݅ NLG taskীࢲח ॶ ࣻ হ.)
•пп LM objectiveח ܲ ݾਸ о. •Bidrectional => NLU •Undirectional
=> NLG •Seq-to-Seq => summarization, Generative question answering Pre-training Language Model ѐਃ
Unified Language Model Pre-training Language Model ѐਃ
Unified Language Model Unified Language Model •unified pre-training ৈ۞ ఋੑ
LMਸ ਤೠ parameterܳ ҕਬೞӝ ٸޙী single transformer݅ ਸ ਃ۽ ೞҊ ৈ۞ LMܳ ߹ب णೡ ਃо হ. •parameter ҕਬо text അਸ ખ ؊ general ೞѱ णೡ ࣻ ѱ ೠ. (زदী optimizeೞӝ ٸ ޙী single LMী ೞৈ ؏ overfitting) •NLU৬ NLG ܳ زदী ࢎਊ оמ
•UNILM ӝઓ LMਸ ా •пп LM ೠ п taskо ઓೞӝ
ٸޙী ܳ multi-task learningਸ ా೧ زदী ण Unified Language Model
•пӝ ܲ LMܳ ण ೞӝ ਤ೧ࢲ parameterח shareೞ݅ Maskingਸ ࢎਊ
•seq-to-seqܳ ೞա transformer ղࠗী ҳ അೞӝ ਤ೧ࢲ ౠೠ ഋక Maskingਸ ࢎ ਊ •पઁ ण షਸ [MASK]۽ ജ ೠ റী ܳ ݏ୶ח taskܳ п LM߹۽ द ೯ •bidirectional LMೡٸח ө NSPೠ. Unified Language Model
•[SOS]ח scpecial start-of-sequence •[EOS]ח NLU task ޙ ҃҅ scpecial end-of-sequence
•Embedding BERTܳ ٮܰݴ textח WordPieceܳ ా೧ tokenize •пп LM task߹۽ ܲ segment embedding ࢎਊػ. Unified Language Model
ࣻधਵ۽ ࢤп೧ࠁݶ п objective ߹۽ M ч ׳ۄ. Unified Language
Model
Pre-training Setup Unified Language Model • training objectiveח п LM
sum •ೞա ߓ ղীח নߑೱ LM objectiveܳ 1/3, द௫झ-द௫झ LM objectiveܳ 1/3, left-to- right and right-to-left LM objectiveח 1/6 ࠺ਯ۽ ࢠ݂ • ۄఠח BERT_largre۽ ୡӝച •pre-trainingীח English Wikipedia2৬ BookCorpusܳ ࢎਊ
Pre-training Setup Unified Language Model •vocabulary size is 28, 996,
maximum length of input sequence is 512, batch size 330 •15% tokenਸ ࣁ о case ೞա۽ ജ • 80% ҃ : tokenਸ [MASK]۽ ജ •10% ҃ : tokenਸ random word۽ ߄Է •10% ҃ : tokenਸ ਗې ױয۽ Ӓ۽ م •݃झఊ दఃח ߑߨ BERTی Ѣ زੌೞա ೞաо ୶оػ Ѫ 80%ח ݒߣ ೞա షਸ ݃झఊೞҊ 20%ח bigramա trigramਸ ݃झఊೠ. •770, 000 stepө ण೮Ҋ 7 hoursبݶ 1݅ stepب ت ( 8ѐ V100ীࢲ)
Fine-tuning on Downstream NLU and NLG Tasks Unified Language Model
•NLUীࢲ fine-tuning दীח [SOS] షਸ representationਵ۽ ࢎਊ ( BERT [CLS] ৬ زੌ ) •NLGܳ fine-tuning दীח target sequenceী ೠ maskingਸ ೞҊ ݏ୶ח taskܳ ೯ೠ. • җীࢲ [EOS] ژೠ ਕ ࣻ ӝ ٸޙী ݽ؛ ઁ [EOS]ܳ ஏ೧ঠ ೞחب ߓ ࣻ Ҋ ೠ.
Experiments Experiments
•CNN/DailyMail => News ӝࢎܳ ࠁҊࢲ ਃডೞח task •RG-N N-gram F1-score
•seq-to-seqܳ ా೧ fine-tuning (masking റী ݏ୶ח task ೯) •beam searchܳ ా೧ decoding ( beam search ী duplicated trigramਸ remove ) •10K training sample ࢎਊदী MASS ખ ؊ ରܳ ࠁੋ. Experiments : Abstractive Summarizaiton
•খী ف ѐח span ஏҊ ӝઓ ߡ৬ زੌೠ ߑधਵ۽ ೯
•ࣁߣ૩ח free-formೠ ߑधਸ ࢎਊਵ۽ seq-to-seqܳ ా೧ answerܳ generationೠ. •inputܳ ݅٘ח ߑध ച ӝ۾, ޙ, passageܳ concatೞৈ first sequenceী ֍Ҋ second segment ܳ ా೧ ਸ ஏ Experiments: QA
•Question generation squad ؘఠ ࣇ җ passageܳ Ҋ ޙਸ ࢤࢿೞח
task •فߣ૩ח DSCT7 ؘఠ ࣇী ೠ ࢿמ Experiments: Question/ Response Generation
•GLUEীࢲ BERT_largeܳ outperform Experiments: GLUE
хࢎפ✌ ୶о ޙ ژח ҾӘೠ ݶ ઁٚ ইې োۅ۽
োۅ ࣁਃ! ࢲ࢚ (ML Research Scientist, Pingpong)
[email protected]