Natural Language Processing (5) Grammar and parsing (2)

Natural Language Processing (5) Grammar and parsing (2)

C04e17d9b3810e5c0ad22cb8a12589de?s=128

自然言語処理研究室

October 18, 2013
Tweet

Transcript

  1. 1.

    1 1 / 21 Natural Language Processing (5) Grammar and

    parsing (2) Kazuhide Yamamoto Dept. of Electrical Engineering Nagaoka University of Technology
  2. 2.

    2 / 21 Parsing: two approaches and two strategies Parsing

    process analyzes an input and produces a tree. It consists of • two approaches, top-down and bottom-up, and • two search strategies, depth-first and breadth-first.
  3. 3.

    3 / 21 Top-down parsing searches for a parse tree

    by trying to build from the root node S to the leaves. (example) S → NP VP → DET N VP → the N VP → the cat VP → the cat V N ... → the cat catchs the mouse. (This shows that the sentence is deviated by the grammar.)
  4. 4.

    4 / 21 Bottom-up parsing starts with words of the

    input, applying rules from the grammar, and tries to build trees from the words. (example) the cat catchs the mouse. ... → the cat V N → the cat VP → the N VP → DET N VP → NP VP → S (Deviation of S shows the input sentence is grammatical; accepted by the given grammar.)
  5. 5.

    5 / 21 Chart parsing • is suitable for natural

    language grammars – and other ambiguous grammars in order to parse efficiently. • uses the dynamic programming (DP) approach – partial hypothesized results are stored in a structure called a chart and can be re-used. – This eliminates backtracking and prevents a combinatorial explosion.
  6. 6.

    6 / 21 Chart parsing: representation A dot (・) is

    used within each rule, that indicates the progress of rule analysis. (Example) S → ・ NP VP Nothing is parsed. S → NP ・ VP NP is parsed successfully and S is made when it follows VP. S → NP VP ・ Analysis is finished to make S. A solid line is used.
  7. 7.

    7 / 21 N P V AUXV PP → NP

    ・ P NP → N ・ S → ・PP VP カレー を 食べ た example of chart graph
  8. 8.

    8 / 21 Bottom-up chart parsing • We first add

    inactive word arcs into the graph. • We expand these inactive arcs so that we can make active arcs. • It is successful if we make (inactive) arc S. (See the demo.)
  9. 9.

    9 / 21 Top-down chart parsing • We first add

    inactive arc S into the graph. • It is successfully parsed if (inactive) arc S changes to active. (See the demo.)
  10. 10.

    10 / 21 カレー/を/食べ/た (I) ate curry. S → PP

    VP PP → NP P VP → PP VP VP → V AUXV NP → N N → カレー (curry) P → を (OBJ) V → 食べ (to eat) AUXV → た (PAST)
  11. 11.

    11 / 21 CYK algorithm • is short for Cocke-Younger-Kasami

    algorithm. Also called as CKY algorithm. • is very efficient; a bottom-up dynamic programming parsing algorithm. • can be used if all rules are written in Chomsky normal form / チョムスキー標準形 : – A → BC or A → α where A, B, and C are non-terminals, and α is terminal. (I will demonstrate how it works.)
  12. 12.

    12 / 21 カレー/を/食べ/た (I) ate curry. S → PP

    VP PP → N P VP → PP VP VP → V AUXV (Rules are slightly changed in order to meet requirement of Chomsky normal form. Compare to two slides before.) N → カレー (curry) P → を (OBJ) V → 食べ (to eat) AUXV → た (PAST)
  13. 13.

    13 / 21 カレー を 食べ た S → PP

    VP PP→ N P VP → PP VP VP → V AUXV CYK algorithm
  14. 14.

    14 / 21 N P V AUXV カレー を 食べ

    た S → PP VP PP→ N P VP → PP VP VP → V AUXV : analysis target
  15. 15.

    15 / 21 N P V AUXV PP カレー を

    食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ?
  16. 16.

    16 / 21 N P V AUXV PP カレー を

    食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ?
  17. 17.

    17 / 21 N P V AUXV PP VP カレー

    を 食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ?
  18. 18.

    18 / 21 N P V AUXV PP VP カレー

    を 食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ? + = ?
  19. 19.

    19 / 21 N P V AUXV PP VP カレー

    を 食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ? + = ?
  20. 20.

    20 / 21 N P V AUXV PP VP S

    カレー を 食べ た S → PP VP PP→ N P VP → PP VP VP → V AUXV + = ? + = ? + = ?
  21. 21.

    21 / 21 Summary: today's key words • bottom-up and

    top-down approach • chart parsing • CYK parsing