Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING

Yemane
August 08, 2016

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING

Benjamin Snyder and Tahira Naseem and Jacob Eisenstein and Regina Barzilay

Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1041–1050,
Honolulu, October 2008. © Association for Computational Linguistics

Yemane

August 08, 2016
Tweet

More Decks by Yemane

Other Decks in Education

Transcript

  1. UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING Benjamin Snyder and Tahira

    Naseem and Jacob Eisenstein and Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1041–1050, Honolulu, October 2008. © Association for Computational Linguistics 1
  2. Motivation • Hypothesis - • the patterns of ambiguity found

    in each language at the part-of- speech level differ in systematic ways • by considering multiple language simultaneously, ambiguity can be reduced in each language 2
  3. Introduction • Multilingual learning - cues are combined from multiple

    languages • The structure of each language becomes more apparent • Model – hierarchical Bayesian model for jointly predicting bilingual sequences part-of-speech tags • Evaluation - on six pairs of languages • Result - significant performance gains over a state-of-the- art monolingual baseline 3
  4. Introduction (2) • The fundamental idea • the patterns of

    ambiguity inherent in part-of-speech tag assignments differ across languages • At the lexical level, • An ambiguous word in one language may correspond to an unambiguous word in the other language. • For example, • English - “can” - auxiliary verb / noun / verb • Serbian - each POS is expressed with a distinct lexical item • At the structural level • English - articles (a, an, the ) greatly reduces the ambiguity of the succeeding tag. • Serbian - a language without articles 4
  5. Model • bilingual model for unsupervised part-of-speech tagging that jointly

    tags parallel text in two languages • Parameters are learned using an untagged bilingual parallel text • the model is applied to a held-out monolingual test set 5
  6. Model (2) 6 generative model - in which the observed

    words are produced by the hidden tags and model parameters hierarchical Bayesian model that exploits both language-specific and cross-lingual patterns to explain the observed bilingual sentences
  7. Model (3) • Assumption - for pairs of words that

    share similar semantic or syntactic function, the associated tags will be statistically correlated • word pairs - allow cross-lingual information to be shared via joint tagging decisions • Word alignment – machine translation to identify these aligned words • model treats aligned-words as observed data • for unaligned parts of the sentence, the tag and word selections are identical to monolingual HMM’s. 7
  8. Generative model • assumes the existence of two tagsets, T

    and T′, and two vocabularies W and W′, one of each for each language (L and L′ ) • Probabilities are drawn from symmetric Dirichlet priors 1. draw emission and transition distribution for language L 2. draw emission and transition distribution for language L′ 3. draw a bilingual coupling distribution over tag pairs T × T′ 4. For each language (a) Draw an alignment (b) Draw a bilingual sequence of part-of-speech tags (x1, ..., xm), (y1, ..., yn) (c) For each part-of-speech tag xi in the first language, emit a word from W (d) For each part-of-speech tag yj in the second language, emit a word from W ′ 8
  9. Generative model (2) 9 - Given an alignment a and

    sets of transition parameters ∅ and ∅′, - conditional probability of a bilingual tag sequence (x1, ...xm), (y1, ..., yn) are factored into transition probabilities for unaligned tags, and joint probabilities over aligned tag pairs - The distribution over aligned tag pairs is defined as a product of each language’s transition probability and the coupling probability: - – multilingual anchor Z = normalization constant
  10. Inference • Gibbs sampling is used to sample from the

    conditional distributions 10 n(xi) is the number of occurrences of the tag xi in x−i, n(xi, ei) is the number of occurrences of the tag-word pair (xi, ei) in (x−i, e−i) Wxi is the number of word types in the vocabulary W that can take tag xi.
  11. Experimental setup • Evaluation • Given a tag dictionary (i.e.,

    a set of possible tags for each word type), • when only incomplete dictionaries are available • model is trained using untagged text • Parallel data - Orwell’s novel “Nineteen Eighty Four” • Original – English • Translation - Bulgarian, Serbian and Slovene • Training data – random ¾ • Test data – ¼ • Word-alignment - GIZA++ 11
  12. Results 13 The tagging accuracy of our bilingual models on

    different language pairs, when a full tag dictionary is provided. First column-cross-lingual entropy of a tag when the tag of the aligned word in the other language is known. Second column - Monolingual Unsupervised Third column – proposed Final column - absolute improvement over the monolingual Bayesian HMM. Serbian and Slovene - error reduction 51.4% and 53.2%
  13. Conclusion • demonstrated the effectiveness of multilingual learning for unsupervised

    part-of-speech tagging • by combining cues from multiple languages, the structure of each becomes more apparent • built model that learns language-specific features while capturing cross-lingual patterns in tag distribution • evaluation shows significant performance gains over a state-of-the-art monolingual baseline. 14