Natural Language Processing (4) Grammar and parsing (1)

Natural Language Processing (4) Grammar and parsing (1)

C04e17d9b3810e5c0ad22cb8a12589de?s=128

自然言語処理研究室

October 11, 2013
Tweet

Transcript

  1. 1.

    1 1 / 16 Natural Language Processing (4) Grammar and

    parsing (1) Kazuhide Yamamoto Dept. of Electrical Engineering Nagaoka University of Technology
  2. 2.

    2 / 16 Grammar and parsing • What is grammar?

    • How do we describe grammar? grammar?
  3. 3.

    3 / 16 Grammar and parsing • Formal grammar is

    a set of formation rules for (order of) strings. We simply call grammar for short. • Parsing is a process of syntactic analysis that analyzes a given text to determine its syntactic structure with respect to a given formal grammar.
  4. 4.

    4 / 16 Phrase structure grammar / 句構造文法 • Noam

    Chomsky proposed phrase structure grammar that describes grammar in a formal manner. • Phrase structure grammar is defined by a set of phrase structure rules, that is, rewritten rules. • A rule consists of terminal- and non-terminal symbols and the latter are rewritten by the deviation process. (sentences generated by the rules) birds fly two dogs sleep two birds fly (?) dogs fly (?) two two dogs sleep (example rules) S → np vp np → "dogs" np → "birds" np → "two" np vp → "fly" vp → "sleep"
  5. 5.

    5 / 16 Chomsky hierarchy / チョムスキー階層 In formal language,

    four classes of grammar, known as Chomsky hierarchy, are described. • Type 0: unrestricted language / (0型文法) – generates all (formal) languages and can be recognized by Turing machine. • Type 1: context-sensitive grammar / 文脈依存文法 – αAβ → αγβ ; rules are written with context. • Type 2: context-free grammar / 文脈自由文法 – A → γ ; rules are written without context. • Type 3: regular grammar / 正規文法 – A → a or A → aB where A, B are strings of non-terminals, a is single terminal, and α, β, and γ are strings of non-terminals or terminals.
  6. 7.

    7 / 16 Phrase structure grammar: example S → NP

    VP NP → DET N N NP → DET N NP → N VP → V NP NP VP → V NP N → Mary | foods V → sold DET → the S: sentence NP: noun phrase VP: verb phrase DET: determiner N: noun V: verb Terminal symbols / 終端記号: Mary, foods, sold, and the. Non-terminal symbols / 非終端 記号: others.
  7. 8.

    8 / 16 Finite-state automaton, FSA • A model that

    is composed of a finite number of states, transitions between states, and actions. Start S1 two birds dogs fly sleep End
  8. 9.

    9 / 16 FSA: an example The following simple two-state

    FSA accepts the following strings: • st, sst, ssst, stst, stsst, ... but it does NOT accepts: • s, t, stt, sts, ... s t starting point accepted state(s)
  9. 10.

    10 / 16 noun が、を に は こそ は は

    だけ は Example: noun + {は、が、を、に、だけ、こそ} (Note: This is incomplete!)
  10. 11.

    11 / 16 Context-free grammar, CFG • CFG is an

    important grammar formation to describe block structure. • CFG can be described as push down automaton. • CFG is can be represented as tree. book of you and me book of you and me
  11. 12.

    12 / 16 Problems in phrase structure grammar Phrase structure

    grammar has problems: • A tree doesn't represent meaning of sentence – cf. case grammar. • Many rules are required for representing actual sentences as input. – e.g. In English, different rules are required for representing difference of singular and plural. – particularly those for representing "free-ordered" languages (e.g. Japanese)
  12. 13.

    13 / 16 Case grammar / 格文法 Case grammar was

    created by Charles J. Fillmore in 1968. • focuses on the link between number of subjects, objects, etc., of a verb and the grammatical context. • analyzes the surface syntactic structure of sentences by studying the combination of deep cases (i.e. semantic roles) which are required by a specific verb. • well represents grammars of agglutinative languages such as Japanese and Korean.
  13. 14.

    14 / 16 Three types of natural language • isolating

    language / 孤立言語 – e.g. Chinese – Words have no change • inflectional language / 屈折言語 – e.g. English and many European languages – Shape of words are changed according to (grammatical) gender, number, and case. • agglutinative language / 膠着言語 – e.g. Japanese and Korean – Functional words of gender, number, and case are appended.
  14. 15.

    15 / 16 Case grammar: example John opened the door

    with the key. John opened the door. The door opened. The key opened the door. A verb open requires the following cases; • agent (Jone) • objective (the door) • instrumental (the key) Note that subject, object etc. of the surface structure does not directly corresponds to the deep cases, as can be seen above.
  15. 16.

    16 / 16 Summary: today's key words • phrase structure

    grammar • Chomsky hierarchy • case grammar