Slide 1

Slide 1 text

1 1 / 16 Natural Language Processing (4) Grammar and parsing (1) Kazuhide Yamamoto Dept. of Electrical Engineering Nagaoka University of Technology

Slide 2

Slide 2 text

2 / 16 Grammar and parsing ● What is grammar? ● How do we describe grammar? grammar?

Slide 3

Slide 3 text

3 / 16 Grammar and parsing ● Formal grammar is a set of formation rules for (order of) strings. We simply call grammar for short. ● Parsing is a process of syntactic analysis that analyzes a given text to determine its syntactic structure with respect to a given formal grammar.

Slide 4

Slide 4 text

4 / 16 Phrase structure grammar / 句構造文法 ● Noam Chomsky proposed phrase structure grammar that describes grammar in a formal manner. ● Phrase structure grammar is defined by a set of phrase structure rules, that is, rewritten rules. ● A rule consists of terminal- and non-terminal symbols and the latter are rewritten by the deviation process. (sentences generated by the rules) birds fly two dogs sleep two birds fly (?) dogs fly (?) two two dogs sleep (example rules) S → np vp np → "dogs" np → "birds" np → "two" np vp → "fly" vp → "sleep"

Slide 5

Slide 5 text

5 / 16 Chomsky hierarchy / チョムスキー階層 In formal language, four classes of grammar, known as Chomsky hierarchy, are described. ● Type 0: unrestricted language / (0型文法) – generates all (formal) languages and can be recognized by Turing machine. ● Type 1: context-sensitive grammar / 文脈依存文法 – αAβ → αγβ ; rules are written with context. ● Type 2: context-free grammar / 文脈自由文法 – A → γ ; rules are written without context. ● Type 3: regular grammar / 正規文法 – A → a or A → aB where A, B are strings of non-terminals, a is single terminal, and α, β, and γ are strings of non-terminals or terminals.

Slide 6

Slide 6 text

6 / 16 type 0:unrestricted type 1:context-sensitive type 2:context-free type 3:regular

Slide 7

Slide 7 text

7 / 16 Phrase structure grammar: example S → NP VP NP → DET N N NP → DET N NP → N VP → V NP NP VP → V NP N → Mary | foods V → sold DET → the S: sentence NP: noun phrase VP: verb phrase DET: determiner N: noun V: verb Terminal symbols / 終端記号: Mary, foods, sold, and the. Non-terminal symbols / 非終端 記号: others.

Slide 8

Slide 8 text

8 / 16 Finite-state automaton, FSA ● A model that is composed of a finite number of states, transitions between states, and actions. Start S1 two birds dogs fly sleep End

Slide 9

Slide 9 text

9 / 16 FSA: an example The following simple two-state FSA accepts the following strings: ● st, sst, ssst, stst, stsst, ... but it does NOT accepts: ● s, t, stt, sts, ... s t starting point accepted state(s)

Slide 10

Slide 10 text

10 / 16 noun が、を に は こそ は は だけ は Example: noun + {は、が、を、に、だけ、こそ} (Note: This is incomplete!)

Slide 11

Slide 11 text

11 / 16 Context-free grammar, CFG ● CFG is an important grammar formation to describe block structure. ● CFG can be described as push down automaton. ● CFG is can be represented as tree. book of you and me book of you and me

Slide 12

Slide 12 text

12 / 16 Problems in phrase structure grammar Phrase structure grammar has problems: ● A tree doesn't represent meaning of sentence – cf. case grammar. ● Many rules are required for representing actual sentences as input. – e.g. In English, different rules are required for representing difference of singular and plural. – particularly those for representing "free-ordered" languages (e.g. Japanese)

Slide 13

Slide 13 text

13 / 16 Case grammar / 格文法 Case grammar was created by Charles J. Fillmore in 1968. ● focuses on the link between number of subjects, objects, etc., of a verb and the grammatical context. ● analyzes the surface syntactic structure of sentences by studying the combination of deep cases (i.e. semantic roles) which are required by a specific verb. ● well represents grammars of agglutinative languages such as Japanese and Korean.

Slide 14

Slide 14 text

14 / 16 Three types of natural language ● isolating language / 孤立言語 – e.g. Chinese – Words have no change ● inflectional language / 屈折言語 – e.g. English and many European languages – Shape of words are changed according to (grammatical) gender, number, and case. ● agglutinative language / 膠着言語 – e.g. Japanese and Korean – Functional words of gender, number, and case are appended.

Slide 15

Slide 15 text

15 / 16 Case grammar: example John opened the door with the key. John opened the door. The door opened. The key opened the door. A verb open requires the following cases; ● agent (Jone) ● objective (the door) ● instrumental (the key) Note that subject, object etc. of the surface structure does not directly corresponds to the deep cases, as can be seen above.

Slide 16

Slide 16 text

16 / 16 Summary: today's key words ● phrase structure grammar ● Chomsky hierarchy ● case grammar