Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Contextualized Word Embeddings

Deep Contextualized Word Embeddings

Deep Contextual Word Embeddings - ELMO Embeddings from Language Model

Mayank Mishra

March 16, 2022
Tweet

More Decks by Mayank Mishra

Other Decks in Science

Transcript

  1. Talk about • What to expect from a Language Model

    ? • Previous Language Models • Deep Contextualized Word Embeddings • Summary
  2. What to expect from a language model? They should capture

    • Complex characteristic of word use (syntax and semantics) • Use varies across linguistic context i.e., model polysemy
  3. Deep Contextualized Word Embeddings (Cont) • Forward Layer • For

    a set of tokens, model the probability of a sentence by computing the probability of token tk given the history (t1, t2, t3, . . . . . . ., tk-1 ) Cost function
  4. Deep Contextualized Word Embeddings (Cont) • Backward Layer • For

    a set of tokens, model the probability of a sentence by computing the probability of token tk given the input (tk + 1, tk + 2, tk + 3, . . . . . . ., tN ) Cost function
  5. Deep Contextualized Word Embeddings (Cont) Training - • Forward and

    Backward layer are trained together • Optimizer – Gradient Descent • Loss Function – Cross Entropy Log likelihood for both forward and backward layer are jointly maximized
  6. Deep Contextualized Word Embeddings (Cont) Input to the model •

    Representation of word token – need some sort of embedding • Context independent embedding generated from ngram char CNN with 2048 channels
  7. Deep Contextualized Word Embeddings (Cont) Salient features of input embeddings

    • Allows to pick morphological features that word level embeddings could miss • Valid representation for out of vocabulary words
  8. Deep Contextualized Word Embeddings (Cont) What if? • Combine the

    lower - level representation in some weighted fashion • Results in Deep context - rich embeddings • What if we can combine the representation with respect to task? How about creating task specific deep contextual embeddings?
  9. Deep Contextualized Word Embeddings (Cont) Let’s think : • Lower

    - level neurons capture local properties such as morphological structuring, syntax related aspects. Can be useful to deal with dependency parsing, POS tagging, etc. • High – level neurons capture context dependent aspects. Can be used on tasks such as word sense disambiguation, etc What if we expose both and combine both to represent a deep contextual representation of words?
  10. Deep Contextualized Word Embeddings (Cont) Function f can be represented

    as : Gamma – scalar to scale the entire vector
  11. Summary Ø2 layer biLSTM with 4096 units and 512 dimension

    projection ØResidual connection between the first and second layer ØTask independent word representation with 2048 n gram char CNN model ØTie weight for forward and backward layer together, jointly maximize the log likelihood ØWeighted representation from all layers followed by a scaler