Slide 8
Slide 8 text
Deep Contextualized Word Embeddings (Cont)
• Backward Layer
• For a set of tokens, model the probability of a sentence by computing the probability of
token tk given the input (tk + 1,
tk + 2,
tk + 3, . . . . . . .,
tN
)
Cost function