Natural Language Processing (3) Morphological analysis (2)

Natural Language Processing (3) Morphological analysis (2)

C04e17d9b3810e5c0ad22cb8a12589de?s=128

自然言語処理研究室

September 27, 2013
Tweet

Transcript

  1. 1.

    Natural Language Processing (3) Morphological Analysis (2) Kazuhide Yamamoto Dept.

    of Electrical Engineering Nagaoka University of Technology
  2. 2.

    2 / 18 Time flies like an arrow. Thời gian

    bay như một mũi tên. เวลาบินเหมือนลูกศร 光陰似箭. 光陰矢の如し。 An proverb
  3. 3.

    3 / 18 Time flies like an arrow. • One

    may think that the subject of the sentence is "time" and the verb is "fly." • But a word dictionary tells us that the word "fly" also has a noun, that is, an small insect. • Also, a word "time" is also used as verb; if you time an action, you measure how long someone takes to do it.
  4. 4.

    4 / 18 Part-of-speech (POS) tagging • POS tagging of

    a sentence has many possibilities; – each word has several POS candidates in general. – they are not determined independently. • Japanese POS tagging is easier than that for English; – some Japanese words give us hints for function in the sentence. • While conventional taggers use man-made heuristic rules, recent taggers use language statistics, regardless of language.
  5. 5.

    5 / 18 POS tagging model (1) • Given a

    sentence "Time flies like an arrow", or w 1 , w 2 , …, w n in general, the possibility in which their parts-of-speech are C 1 , C 2 , …, C n , respectively, is computed as follows: (transformed by Bayes' theorem) P(C 1, C 2, ... ,C n ∣w 1, w 2, ... , w n ) = P(C 1, C 2, ... ,C n )P(w 1, w 2, ... ,w n ∣C 1, C 2, ... ,C n ) P(w 1, w 2, ... ,w n ) ¿
  6. 6.

    6 / 18 Bayes' theorem / ベイズの定理 P(A|B) = P(B|A)

    P(A) / P(B) • P(A): the prior probability (事前確率) • P(A|B): posterior probability (事後確率). • P(B|A): likelihood (尤度) • P(B) > 0: the probability of B Bayes' theorem gives how the conditional probability of event A given B is related to the converse conditional probability of B given A.
  7. 7.

    7 / 18 POS tagging model (2) P(C 1, C

    2, ... ,C n ) ≈∏ i=1 n P(C i ∣C i−1 ) =∏ i=1 n P(C i−1 ,C i ) P(C i−1 ) P(w 1, w 2, ... ,w n ∣C 1, C 2, ... ,C n ) ≈ P(w 1 ∣C 1 )P(w 2 ∣C 2 )... P(w n ∣C n ) =∏ i=1 n P(w 1 ∣C i )
  8. 8.

    8 / 18 POS tagging model (3) Consequently, we compute

    C 1 , C 2 , …, C n , that maximize ∏ i=1 n P(w i ∣C i )P(C i−1 ,C i ) P(C i−1 )
  9. 9.

    9 / 18 Actual statistics (1) conditional probabilities of the

    latter when the former POS is below: 94.4 mod n 92.1 pref n 49.1 p n 45.5 adj n 43.3 adv n 38.5 n p 36.0 n n 31.3 v auxv 30.3 auxv n 29.6 conj n What comes after noun? 38.5 p 36.0 n 16.0 symbol 4.7 v 2.8 auxv 0.4 pref 0.1 adj 0.05 adv 0.02 conj
  10. 10.

    10 / 18 Actual statistics (2) P(wi | Ci) .470

    た(auxv) .341 この(mod) .269 その(mod) .266 だ(auxv) .218 する(v) .200 の(p) .195 しかし(conj) .172 ない(adj) (others) .133 同(pref) .047 はい(interj) .039 さらに(adv) .012 日(n) .008 こと(n) .007 1(n)
  11. 11.

    11 / 18 n-gram language model n-gram is a sub-sequence

    of n items from a given sequence. n- gram model predicts next item based on observed last n-1 items. • In particular, we call uni-gram if n=1, bi-gram if n=2, and tri- gram if n=3. • Characteristics depend on what the item is. The most popular one is using word (and/or part-of-speech) sequence, i.e., word n-gram model. We also use character level n-gram model for optical character recognition (OCR).
  12. 12.

    12 / 18 Hidden Markov Model, HMM n-gram model is

    as same as Hidden Markov Model, a statistical model in which the system is assumed to be a Markov process with unobserved (or hidden) parameters. • Part-of-speech n-gram model corresponds to (n−1)-order HMM.
  13. 13.

    13 / 18 Let us think a little. Once we

    have statistics of n word sequences, we can generate most-likely appeared word sequences by these statistics! (See examples at the class)
  14. 14.

    14 / 18 Kana-Kanji conversion Conventional Kana-Kanji converters (Japanese IME)

    use some heuristics due to limitation of computation. • latest-first heuristics – among the candidates of Kanji conversion the last one used is selected; once we use it, it is likely to use it again. • most-used-first heuristics – among the candidates the most-frequently used one is selected.
  15. 15.

    15 / 18 Advanced Kana-Kanji conversion Recent converters use different

    strategies; • use of n-gram – しんぶんきしゃ: the highest probability of きしゃ to follow しんぶん would be 記者. • use of case frames / 格フレーム – きしゃでかえった: if we know the verb is 帰る(to return), the instrument should be 汽車(train) and not 貴社(your company), 記者(writer), nor 喜捨 (charity). • use of co-occurrence / 共起 – モーツァルトのこうえん: we see "A of Mozart" that A should be 公演 (concert), not 公園(park) nor 講演(lecture).
  16. 16.

    16 / 18 Language statistics and corpus • Many language

    statistics are required to compute the possibilities (of those explained so far). • Although they were not obtained so far (thus impossible to get statistics), we now are able to use huge amount of language data, thanks to the IT revolution. – newspaper texts – Web texts – patent collection • We call corpus as large amount of language data.
  17. 17.

    17 / 18 Constraint and preference In NLP, (what we

    call) "rule" consists of the following two kinds: • constraint / 制約 – absolute rule; it is always true and we don't need to consider the cases it doesn't happen. • preference / 選好 – the rule is true in many cases.
  18. 18.

    18 / 18 Summary: today's key words • n-gram model

    • Kana-Kanji conversion • corpus