Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unified Language Model Pre-training for Natural...

Unified Language Model Pre-training for Natural Language Understanding and Generation

Scatter Lab Inc.

April 10, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. Unified Language Model Pre-training for Natural Language Understanding and Generation

    Li Dong et al., NeurIPS 2019 (Microsoft) ࢲ࢚਋ (ML Research Scientist, Pingpong)
  2. ݾର ݾର 1. Pre-training Language Model ѐਃ 2. Unified Language

    Model 1. Method 2. Pre-training step 3. Fine-tuning step 3. Experiments 1. NLG Task 2. NLU Task
  3. Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ • BERT,

    GPT, ELMOח п੗੄ ߑधਵ۽ જ਷ ࢿҗܳ ঳঻ਵա ױ੼੉ ઓ੤ೠ׮. • (e.g. BERTח নߑೱ੉ۄח ౠࢿਵ۽ ੋ೧ ֫਷ ࢿמਸ ഛࠁೞ৓૑݅ NLG taskীࢲח ॶ ࣻ হ׮.)
  4. •пп੄ LM objectiveח ׮ܲ ݾ੸ਸ о૓׮. •Bidrectional => NLU •Undirectional

    => NLG •Seq-to-Seq => summarization, Generative question answering Pre-training Language Model ѐਃ
  5. Unified Language Model Unified Language Model •unified pre-training਷ ৈ۞ ఋੑ੄

    LMਸ ਤೠ parameterܳ ҕਬೞӝ ٸޙী single transformer݅ ਸ ೙ਃ۽ ೞҊ ৈ۞ LMܳ ߹ب ೟णೡ ೙ਃо হ׮. •parameter੄ ҕਬо text੄ ಴അਸ ખ ؊ general ೞѱ ೟णೡ ࣻ ੓ѱ ೠ׮. (زदী optimizeೞӝ ٸ ޙী single LMী ؀ೞৈ ؏ overfitting) •NLU৬ NLG ܳ زदী ࢎਊ оמ
  6. •UNILM਷ ӝઓ੄ LMਸ ా೤ •пп੄ LM਷ ੸೤ೠ п੗੄ taskо ઓ੤ೞӝ

    ٸޙী ੉ܳ multi-task learningਸ ా೧ زदী ೟ण Unified Language Model
  7. •пӝ ׮ܲ LMܳ ೟ण ೞӝ ਤ೧ࢲ parameterח shareೞ૑݅ Maskingਸ ࢎਊ


    •seq-to-seqܳ ೞա੄ transformer ղࠗী ҳ അೞӝ ਤ೧ࢲ ౠ੉ೠ ഋక੄ Maskingਸ ࢎ ਊ •पઁ ೟ण਷ ੐੄੄ ష௾ਸ [MASK]۽ ஖ജ ೠ ੉റী ੉ܳ ݏ୶ח taskܳ п LM߹۽ द ೯
 •bidirectional LMೡٸח ө૑ NSPೠ׮. Unified Language Model
  8. •[SOS]ח scpecial start-of-sequence
 •[EOS]ח NLU task੄ ޙ੢ ҃҅੉੗ scpecial end-of-sequence


    •Embedding਷ BERTܳ ٮܰݴ textח WordPieceܳ ా೧ tokenize
 •пп੄ LM task߹۽ ׮ܲ segment embedding੉ ࢎਊػ׮. Unified Language Model
  9. Pre-training Setup Unified Language Model •੹୓ training objectiveח п LM੄

    sum •ೞա੄ ߓ஖ ղীח নߑೱ LM objectiveܳ 1/3, द௫झ-द௫झ LM objectiveܳ 1/3, left-to- right and right-to-left LM objectiveח 1/6੄ ࠺ਯ۽ ࢠ೒݂ •੹୓ ౵ۄ޷ఠח BERT_largre۽ ୡӝച •pre-trainingীח English Wikipedia2৬ BookCorpusܳ ࢎਊ
  10. Pre-training Setup Unified Language Model •vocabulary size is 28, 996,

    maximum length of input sequence is 512, batch size 330 •15%੄ tokenਸ ࣁ о૑੄ case ઺ ೞա۽ ஖ജ • 80%੄ ҃਋ : tokenਸ [MASK]۽ ஖ജ •10%੄ ҃਋ : tokenਸ random word۽ ߄Է •10%੄ ҃਋ : tokenਸ ਗې੄ ױয۽ Ӓ؀۽ م •݃झఊ दఃח ߑߨ਷ BERTی Ѣ੄ زੌೞա ೞաо ୶оػ Ѫ੉ 80%ח ݒߣ ೞա੄ ష௾ਸ ݃झఊೞҊ 20%ח bigram੉ա trigramਸ ݃झఊೠ׮. •770, 000 stepө૑ ೟ण೮Ҋ 7 hours੿بݶ 1݅ step੿ب ت׮ ( 8ѐ੄ V100ীࢲ)
  11. Fine-tuning on Downstream NLU and NLG Tasks Unified Language Model

    •NLUীࢲ fine-tuning दীח [SOS] ష௾ਸ representationਵ۽ ࢎਊ ( BERT੄ [CLS] ৬ زੌ ) •NLGܳ fine-tuning दীח target sequenceী ؀ೠ maskingਸ ೞҊ ݏ୶ח taskܳ ૓೯ೠ׮. •੉ җ੿ীࢲ [EOS] ژೠ ૑ਕ૕ ࣻ ੓ӝ ٸޙী ݽ؛਷ ঱ઁ [EOS]ܳ ৘ஏ೧ঠ ೞח૑ب ߓ਎ ࣻ ੓׮ Ҋ ೠ׮.
  12. •CNN/DailyMail => News ӝࢎܳ ࠁҊࢲ ਃডೞח task •RG-N਷ N-gram੄ F1-score

    •seq-to-seqܳ ా೧ fine-tuning (masking റী ݏ୶ח task ૓೯) •beam searchܳ ా೧ decoding ( beam search ઺ী duplicated trigramਸ remove ) •10K training sample ࢎਊदী MASS ખ ؊ ௾ ର੉ܳ ࠁੋ׮. Experiments : Abstractive Summarizaiton
  13. •খী ف ѐח span ৘ஏ੉Ҋ ӝઓ ߡ౟৬ زੌೠ ߑधਵ۽ ૓೯

    •ࣁߣ૩ח free-formೠ ߑधਸ ࢎਊਵ۽ seq-to-seqܳ ా೧ answerܳ generationೠ׮. •inputܳ ݅٘ח ߑध਷ ؀ച ӝ۾, ૕ޙ, passageܳ concatೞৈ first sequenceী ֍Ҋ second segment ܳ ా೧ ੿׹ਸ ৘ஏ Experiments: QA
  14. •Question generation਷ squad ؘ੉ఠ ࣇ੄ ੿׹җ passageܳ ઱Ҋ ૕ޙਸ ࢤࢿೞח

    task •فߣ૩ח DSCT7 ؘ੉ఠ ࣇী ؀ೠ ࢿמ Experiments: Question/ Response Generation