UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING Benjamin Snyder and Tahira
Naseem and Jacob Eisenstein and Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1041–1050, Honolulu, October 2008. © Association for Computational Linguistics 1

Motivation • Hypothesis - • the patterns of ambiguity found
in each language at the part-of- speech level differ in systematic ways • by considering multiple language simultaneously, ambiguity can be reduced in each language 2

Introduction • Multilingual learning - cues are combined from multiple
languages • The structure of each language becomes more apparent • Model – hierarchical Bayesian model for jointly predicting bilingual sequences part-of-speech tags • Evaluation - on six pairs of languages • Result - significant performance gains over a state-of-the- art monolingual baseline 3

Introduction (2) • The fundamental idea • the patterns of
ambiguity inherent in part-of-speech tag assignments differ across languages • At the lexical level, • An ambiguous word in one language may correspond to an unambiguous word in the other language. • For example, • English - “can” - auxiliary verb / noun / verb • Serbian - each POS is expressed with a distinct lexical item • At the structural level • English - articles (a, an, the ) greatly reduces the ambiguity of the succeeding tag. • Serbian - a language without articles 4

Model • bilingual model for unsupervised part-of-speech tagging that jointly
tags parallel text in two languages • Parameters are learned using an untagged bilingual parallel text • the model is applied to a held-out monolingual test set 5

Model (2) 6 generative model - in which the observed
words are produced by the hidden tags and model parameters hierarchical Bayesian model that exploits both language-specific and cross-lingual patterns to explain the observed bilingual sentences

Model (3) • Assumption - for pairs of words that
share similar semantic or syntactic function, the associated tags will be statistically correlated • word pairs - allow cross-lingual information to be shared via joint tagging decisions • Word alignment – machine translation to identify these aligned words • model treats aligned-words as observed data • for unaligned parts of the sentence, the tag and word selections are identical to monolingual HMM’s. 7

Generative model • assumes the existence of two tagsets, T
and T′, and two vocabularies W and W′, one of each for each language (L and L′ ) • Probabilities are drawn from symmetric Dirichlet priors 1. draw emission and transition distribution for language L 2. draw emission and transition distribution for language L′ 3. draw a bilingual coupling distribution over tag pairs T × T′ 4. For each language (a) Draw an alignment (b) Draw a bilingual sequence of part-of-speech tags (x1, ..., xm), (y1, ..., yn) (c) For each part-of-speech tag xi in the first language, emit a word from W (d) For each part-of-speech tag yj in the second language, emit a word from W ′ 8

Generative model (2) 9 - Given an alignment a and
sets of transition parameters ∅ and ∅′, - conditional probability of a bilingual tag sequence (x1, ...xm), (y1, ..., yn) are factored into transition probabilities for unaligned tags, and joint probabilities over aligned tag pairs - The distribution over aligned tag pairs is defined as a product of each language’s transition probability and the coupling probability: - – multilingual anchor Z = normalization constant

Inference • Gibbs sampling is used to sample from the
conditional distributions 10 n(xi) is the number of occurrences of the tag xi in x−i, n(xi, ei) is the number of occurrences of the tag-word pair (xi, ei) in (x−i, e−i) Wxi is the number of word types in the vocabulary W that can take tag xi.

Experimental setup • Evaluation • Given a tag dictionary (i.e.,
a set of possible tags for each word type), • when only incomplete dictionaries are available • model is trained using untagged text • Parallel data - Orwell’s novel “Nineteen Eighty Four” • Original – English • Translation - Bulgarian, Serbian and Slovene • Training data – random ¾ • Test data – ¼ • Word-alignment - GIZA++ 11

Baselines 12 Monolingual unsupervised model Trigram entropy – indicates the
relative performance of the monolingual baseline

Results 13 The tagging accuracy of our bilingual models on
different language pairs, when a full tag dictionary is provided. First column-cross-lingual entropy of a tag when the tag of the aligned word in the other language is known. Second column - Monolingual Unsupervised Third column – proposed Final column - absolute improvement over the monolingual Bayesian HMM. Serbian and Slovene - error reduction 51.4% and 53.2%

Conclusion • demonstrated the effectiveness of multilingual learning for unsupervised
part-of-speech tagging • by combining cues from multiple languages, the structure of each becomes more apparent • built model that learns language-specific features while capturing cross-lingual patterns in tag distribution • evaluation shows significant performance gains over a state-of-the-art monolingual baseline. 14

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING

Yemane

More Decks by Yemane

Other Decks in Education

Featured

Transcript

UNSUPERVISED MULTILINGUAL LEARNING FOR POS TAGGING Benjamin Snyder and Tahira

Motivation • Hypothesis - • the patterns of ambiguity found

Introduction • Multilingual learning - cues are combined from multiple

Introduction (2) • The fundamental idea • the patterns of

Model • bilingual model for unsupervised part-of-speech tagging that jointly

Model (2) 6 generative model - in which the observed

Model (3) • Assumption - for pairs of words that

Generative model • assumes the existence of two tagsets, T

Generative model (2) 9 - Given an alignment a and

Inference • Gibbs sampling is used to sample from the

Experimental setup • Evaluation • Given a tag dictionary (i.e.,

Baselines 12 Monolingual unsupervised model Trigram entropy – indicates the

Results 13 The tagging accuracy of our bilingual models on

Conclusion • demonstrated the effectiveness of multilingual learning for unsupervised