Natural Language Processing (3) Morphological analysis (2)

Natural Language Processing (3) Morphological Analysis (2) Kazuhide Yamamoto Dept.
of Electrical Engineering Nagaoka University of Technology

2 / 18 Time flies like an arrow. Thời gian
bay như một mũi tên. เวลาบินเหมือนลูกศร 光陰似箭. 光陰矢の如し。 An proverb

3 / 18 Time flies like an arrow. • One
may think that the subject of the sentence is "time" and the verb is "fly." • But a word dictionary tells us that the word "fly" also has a noun, that is, an small insect. • Also, a word "time" is also used as verb; if you time an action, you measure how long someone takes to do it.

4 / 18 Part-of-speech (POS) tagging • POS tagging of
a sentence has many possibilities; – each word has several POS candidates in general. – they are not determined independently. • Japanese POS tagging is easier than that for English; – some Japanese words give us hints for function in the sentence. • While conventional taggers use man-made heuristic rules, recent taggers use language statistics, regardless of language.

5 / 18 POS tagging model (1) • Given a
sentence "Time flies like an arrow", or w 1 , w 2 , …, w n in general, the possibility in which their parts-of-speech are C 1 , C 2 , …, C n , respectively, is computed as follows: (transformed by Bayes' theorem) P(C 1, C 2, ... ,C n ∣w 1, w 2, ... , w n ) = P(C 1, C 2, ... ,C n )P(w 1, w 2, ... ,w n ∣C 1, C 2, ... ,C n ) P(w 1, w 2, ... ,w n ) ¿

6 / 18 Bayes' theorem / ベイズの定理 P(A|B) = P(B|A)
P(A) / P(B) • P(A): the prior probability (事前確率) • P(A|B): posterior probability (事後確率). • P(B|A): likelihood (尤度) • P(B) > 0: the probability of B Bayes' theorem gives how the conditional probability of event A given B is related to the converse conditional probability of B given A.

7 / 18 POS tagging model (2) P(C 1, C
2, ... ,C n ) ≈∏ i=1 n P(C i ∣C i−1 ) =∏ i=1 n P(C i−1 ,C i ) P(C i−1 ) P(w 1, w 2, ... ,w n ∣C 1, C 2, ... ,C n ) ≈ P(w 1 ∣C 1 )P(w 2 ∣C 2 )... P(w n ∣C n ) =∏ i=1 n P(w 1 ∣C i )

8 / 18 POS tagging model (3) Consequently, we compute
C 1 , C 2 , …, C n , that maximize ∏ i=1 n P(w i ∣C i )P(C i−1 ,C i ) P(C i−1 )

9 / 18 Actual statistics (1) conditional probabilities of the
latter when the former POS is below: 94.4 mod　n 92.1 pref　n 49.1 p　n 45.5 adj　n 43.3 adv　n 38.5 n　p 36.0 n　n 31.3 v　auxv 30.3 auxv　n 29.6 conj　n What comes after noun? 38.5 p 36.0 n 16.0 symbol 4.7 v 2.8 auxv 0.4 pref 0.1 adj 0.05 adv 0.02 conj

10 / 18 Actual statistics (2) P(wi | Ci) .470
た(auxv) .341 この(mod) .269 その(mod) .266 だ(auxv) .218 する(v) .200 の(p) .195 しかし(conj) .172 ない(adj) (others) .133 同(pref) .047 はい(interj) .039 さらに(adv) .012 日(n) .008 こと(n) .007 １(n)

11 / 18 n-gram language model n-gram is a sub-sequence
of n items from a given sequence. n- gram model predicts next item based on observed last n-1 items. • In particular, we call uni-gram if n=1, bi-gram if n=2, and tri- gram if n=3. • Characteristics depend on what the item is. The most popular one is using word (and/or part-of-speech) sequence, i.e., word n-gram model. We also use character level n-gram model for optical character recognition (OCR).

12 / 18 Hidden Markov Model, HMM n-gram model is
as same as Hidden Markov Model, a statistical model in which the system is assumed to be a Markov process with unobserved (or hidden) parameters. • Part-of-speech n-gram model corresponds to (n−1)-order HMM.

13 / 18 Let us think a little. Once we
have statistics of n word sequences, we can generate most-likely appeared word sequences by these statistics! (See examples at the class)

14 / 18 Kana-Kanji conversion Conventional Kana-Kanji converters (Japanese IME)
use some heuristics due to limitation of computation. • latest-first heuristics – among the candidates of Kanji conversion the last one used is selected; once we use it, it is likely to use it again. • most-used-first heuristics – among the candidates the most-frequently used one is selected.

15 / 18 Advanced Kana-Kanji conversion Recent converters use different
strategies; • use of n-gram – しんぶんきしゃ: the highest probability of きしゃ to follow しんぶん would be 記者. • use of case frames / 格フレーム – きしゃでかえった: if we know the verb is 帰る(to return), the instrument should be 汽車(train) and not 貴社(your company), 記者(writer), nor 喜捨 (charity). • use of co-occurrence / 共起 – モーツァルトのこうえん: we see "A of Mozart" that A should be 公演 (concert), not 公園(park) nor 講演(lecture).

16 / 18 Language statistics and corpus • Many language
statistics are required to compute the possibilities (of those explained so far). • Although they were not obtained so far (thus impossible to get statistics), we now are able to use huge amount of language data, thanks to the IT revolution. – newspaper texts – Web texts – patent collection • We call corpus as large amount of language data.

17 / 18 Constraint and preference In NLP, (what we
call) "rule" consists of the following two kinds: • constraint / 制約 – absolute rule; it is always true and we don't need to consider the cases it doesn't happen. • preference / 選好 – the rule is true in many cases.

18 / 18 Summary: today's key words • n-gram model
• Kana-Kanji conversion • corpus

Natural Language Processing (3) Morphological a...

Natural Language Processing (3) Morphological analysis (2)

自然言語処理研究室

More Decks by 自然言語処理研究室

Other Decks in Education

Featured

Transcript

Natural Language Processing (3) Morphological Analysis (2) Kazuhide Yamamoto Dept.

2 / 18 Time flies like an arrow. Thời gian

3 / 18 Time flies like an arrow. • One

4 / 18 Part-of-speech (POS) tagging • POS tagging of

5 / 18 POS tagging model (1) • Given a

6 / 18 Bayes' theorem / ベイズの定理 P(A|B) = P(B|A)

7 / 18 POS tagging model (2) P(C 1, C

8 / 18 POS tagging model (3) Consequently, we compute

9 / 18 Actual statistics (1) conditional probabilities of the

10 / 18 Actual statistics (2) P(wi | Ci) .470

11 / 18 n-gram language model n-gram is a sub-sequence

12 / 18 Hidden Markov Model, HMM n-gram model is

13 / 18 Let us think a little. Once we

14 / 18 Kana-Kanji conversion Conventional Kana-Kanji converters (Japanese IME)

15 / 18 Advanced Kana-Kanji conversion Recent converters use different

16 / 18 Language statistics and corpus • Many language

17 / 18 Constraint and preference In NLP, (what we

18 / 18 Summary: today's key words • n-gram model