The Natural Language Decathlon: Multitask Learning as Question Answering

스캐터랩(ScatterLab) ੌ࢚؀ച ੋҕ૑מ Scatterlab ML Technical Seminar Session 2 (QA):
백영민 The Natural Language Decathlon: Multitask Learning as Question Answering McCann et al. Salesforce Research Machine Learning Engineer

#1. Concept

!3 Multitask Learning ৈ۞ о૑ taskܳ э੉ ೟ण೧ࠁ੗!

• Method: ౠ੿ objectiveܳ о૑Ҋ ೟णػ model/representationਸ ׮ܲ downstream taskী
੉ਊೞח Ѫ • ੌ߈੸ਵ۽ Language Modeling١ Natural Language੄ ੹߈੸ੋ ౠࢿਸ ೟णೡ ࣻ ੓ח objective ੉ਊ • ੉੼: • Random initializeীࢲ द੘ೞח Ѫ ࠁ׮ જ਷ Ѿҗܳ ࠁ੉Ҋ, ࡅܲ ࣻ۴ਸ оמೞѱ ೧ષ • ߑध • Representation: Word2Vec, Glove ١੄ ﬁxed representation, ULMFit, ELMO ١੄ intermediate representation(context aware)ਸ downstream taskীࢲ ੉ਊೞח Ѫ(߹ب੄ model ઓ੤) • Model: BERT, GPT١ pre-trainingী ੉ਊ೮؍ modelਸ downstream taskীࢲ “ﬁne-tuning” !4 #1 Concept Transfer Learning

• Method: ৈ۞ taskٜਸ ೞա੄ ݽ؛۽ زदী ೟णदఃח Ѫ •
Chunking, POS tagging, NER, SRL, dependency parsing, NLI ١੄ NLP taskٜਸ زੌೠ ݽ؛۽ زदী ೟ ण • ੉੼: • ৈ۞ taskܳ زदী modelingೡ ࣻ ੓਺ • ੜ ೟णغݶ ౠ੿ taskী ೠ੿ػ Ѫ੉ ইצ “General Representation”ਸ ঳ਸ ࣻ ੓਺ • Zero-shot Learning, Meta-Learning ١੄ ੸ਊ оמࢿ • ୭Ӕ ഝߊ൤ োҳо ੉ܖয૑Ҋ ੓ח ࠙ঠ • ই૒ singletask learningী ࠺Үؼ݅ೠ ࢿמਸ ࠁৈ઱Ҋ ੓૓ ঋ૑݅, challengingೠ োҳٜ੉ ݆੉ ੉ܖয૑Ҋ ੓਺. • Image classiﬁcation + NLP • ࣗѐೡ ֤ޙ “MQAN”਷ singletask learningী Ӓա݃ ࠺Үؼ݅ೠ ࢿמਸ ࠁ੐(BERT ੉੹ ֤ޙ ੑפ׮) !5 #1 Concept Multitask Learning

#2. Method

!7 Approach ݽٚ taskܳ QAഋधী ݏ୾ࠁ੗!

• ઱য૓ questionী ؀ೠ ׹ਸ “contextղীࢲ” ଺ח ޙઁܳ ಿ (੿׹੉
Context ղࠗী ੓׮Ҋ о੿) • Ex) SQuAD, RACE… • ࠁా ੿׹੄ द੘ index৬ ՘ indexܳ ଺ח ߑध !8 #2 Approach Question Answering

!9 #2 Approach Question Answering Idea: ݽٚ taskٜਸ “QAഋध”ਵ۽ ٜ݅যࢲ
“Multitask Learning”ਸ ೧ࠁ੗! QA Translation Summary NLI Sentiment Analysis

• ୨ 10ѐ੄ taskܳ Multitask Learning !10 #2 Approach Decathlon

!11 Model Architecture Multitask Question Answering Network(MQAN)

!12 #2 Architecture Overview

!13 #2 Architecture I/O • Input: • Q: Question Sentences
• C: Context Sentences • A: Answer Sentences(for generation - autoregressive) • Output: • General QA: Contextীࢲ ੿׹੄ द੘, ՘ indexܳ ଺਺ • Q(Question Sentence) + C(Context Sentences) + Outer Vocabulary(Generation) ઺ী ࢶఖ

!14 #2 Architecture Feature - Input Representation

!15 #2 Architecture Feature - Alignment <dummy dataܳ ֍ח ੉ਬ>

!16 #2 Architecture Feature - Dual Coattention

!17 #2 Architecture Feature - Compression & Self-Attention

!18 #2 Architecture Feature - Answer Representation

!19 #2 Architecture Feature - Answer Representation

!20 #2 Architecture Feature - Answer

!21 Training Strategy Curriculum learning

!22 #3 Traning Strategy Multitask Learning Strategy • Round-robin Algorithm
• CPU scheduling ߑߨ ઺ ೞա۽ ஹೊఠ ੗ਗਸ ࢎਊೡ ࣻ ੓ח ӝഥܳ “೐۽ࣁझٜীѱ ҕ੿”ೞѱ ࠗৈ • п ೐۽ࣁझী ੌ੿दрਸ ೡ׼, ೡ׼ػ दр੉ ૑աݶ Ӓ ೐۽ࣁझ ਫ਼द ࠁܨ, ׮ܲ ೐۽ࣁझীѱ ӝഥܳ ષ • п Taskী ੌ੿ߓ஖ܳ ೡ׼, ೡ׼ػ ߓ஖о ૑աݶ Ӓ Taskਸ ਫ਼द ࠁܨ, ׮ܲ Taskীѱ ӝഥܳ ષ • Fully Joint • п Task੄ ࣽࢲܳ Ҋ੿ೞҊ, round-robin algorithmਸ ా೧ batchܳ sampling • Single-task trainingীࢲ ੸਷ iterationਵ۽ب ࣻ۴೮؍ taskٜ਷ ੜ غ૑݅ աݠ૑ח single-task݅ ೟ण೮ ਸ ٸ݅ఀ੄ ࢿמਸ ࠁৈ઱૑ ޅೣ

!23 #3 Traning Strategy Multitask Learning Strategy • Curriculum Strategy
• Curriculumਸ ٜ݅যࢲ learningदெࠁ੗! • First Phase:࠺Ү੸ ए਍ taskٜਸ ݢ੷ ೟ण -> Second Phase:য۰਍ taskٜਸ ೟ण • First Phase(SST, QA-SRL, QA-ZRE, WOZ, WikiSQL, MWSC) -> Second Phase(Others)

!24 #3 Traning Strategy Multitask Learning Strategy • Curriculum Strategy
• Curriculumਸ ٜ݅যࢲ learningदெࠁ੗! • First Phase:࠺Ү੸ ए਍ taskٜਸ ݢ੷ ೟ण -> Second Phase:য۰਍ taskٜਸ ೟ण • First Phase(SST, QA-SRL, QA-ZRE, WOZ, WikiSQL, MWSC) -> Second Phase(Others) • Anti-Curriculum Strategy • Curriculumী “߈(Anti)ೞח” ੹ۚ - Curriculum Learning Bengio et al. [2009] • ए਍ taskח ׮ܲ taskٜী ب਑ਸ ઴ ࣻ ੓ח ਬਊೠ representationਸ ೟णೡ ࣻ হ׮! • First Phase:য۰਍ taskٜਸ ݢ੷ ೟ण -> Second Phase:ए਍ taskٜਸ ೟ण • First Phase(SQuAD, IWSLT, CNN/DM, MNLI) -> Second Phase(Others)

#3. Result

!26 Result ־о־о ੜ೮ա?

!27 #1single vs multitask Single vs Multitask Training

!28 #2 curriculum Training Strategy

!29 #2 curriculum Pointer weight distribution

!30 Q & A хࢎ೤פ׮

The Natural Language Decathlon: Multitask Learn...

The Natural Language Decathlon: Multitask Learning as Question Answering

Scatter Lab Inc.

More Decks by Scatter Lab Inc.

Other Decks in Research

Featured

Transcript

스캐터랩(ScatterLab) ੌ࢚؀ച ੋҕ૑מ Scatterlab ML Technical Seminar Session 2 (QA):