1
1 / 16
Natural Language Processing
(4) Grammar and parsing (1)
Kazuhide Yamamoto
Dept. of Electrical Engineering
Nagaoka University of Technology
Slide 2
Slide 2 text
2 / 16
Grammar and parsing
●
What is grammar?
●
How do we describe grammar?
grammar?
Slide 3
Slide 3 text
3 / 16
Grammar and parsing
●
Formal grammar is a set of formation rules for (order of)
strings. We simply call grammar for short.
●
Parsing is a process of syntactic analysis that analyzes a
given text to determine its syntactic structure with respect
to a given formal grammar.
Slide 4
Slide 4 text
4 / 16
Phrase structure grammar / 句構造文法
●
Noam Chomsky proposed phrase structure grammar that
describes grammar in a formal manner.
●
Phrase structure grammar is defined by a set of phrase structure
rules, that is, rewritten rules.
●
A rule consists of terminal- and non-terminal symbols and the
latter are rewritten by the deviation process.
(sentences generated by the rules)
birds fly
two dogs sleep
two birds fly
(?) dogs fly
(?) two two dogs sleep
(example rules)
S → np vp
np → "dogs"
np → "birds"
np → "two" np
vp → "fly"
vp → "sleep"
Slide 5
Slide 5 text
5 / 16
Chomsky hierarchy / チョムスキー階層
In formal language, four classes of grammar, known as Chomsky hierarchy, are
described.
●
Type 0: unrestricted language / (0型文法)
– generates all (formal) languages and can be recognized by Turing machine.
●
Type 1: context-sensitive grammar / 文脈依存文法
– αAβ → αγβ ; rules are written with context.
●
Type 2: context-free grammar / 文脈自由文法
– A → γ ; rules are written without context.
●
Type 3: regular grammar / 正規文法
– A → a or A → aB
where
A, B are strings of non-terminals, a is single terminal, and
α, β, and γ are strings of non-terminals or terminals.
Slide 6
Slide 6 text
6 / 16
type 0:unrestricted
type 1:context-sensitive
type 2:context-free
type 3:regular
Slide 7
Slide 7 text
7 / 16
Phrase structure grammar: example
S → NP VP
NP → DET N N
NP → DET N
NP → N
VP → V NP NP
VP → V NP
N → Mary | foods
V → sold
DET → the
S: sentence
NP: noun phrase
VP: verb phrase
DET: determiner
N: noun
V: verb
Terminal symbols / 終端記号:
Mary, foods, sold, and the.
Non-terminal symbols / 非終端
記号: others.
Slide 8
Slide 8 text
8 / 16
Finite-state automaton, FSA
●
A model that is composed of a finite number of states,
transitions between states, and actions.
Start S1
two
birds
dogs
fly
sleep
End
Slide 9
Slide 9 text
9 / 16
FSA: an example
The following simple two-state FSA accepts the
following strings:
●
st, sst, ssst, stst, stsst, ...
but it does NOT accepts:
●
s, t, stt, sts, ...
s t
starting
point accepted
state(s)
Slide 10
Slide 10 text
10 / 16
noun
が、を
に
は
こそ は
は
だけ
は
Example: noun + {は、が、を、に、だけ、こそ}
(Note: This is incomplete!)
Slide 11
Slide 11 text
11 / 16
Context-free grammar, CFG
●
CFG is an important grammar formation to describe block
structure.
●
CFG can be described as push down automaton.
●
CFG is can be represented as tree.
book of you and me book of you and me
Slide 12
Slide 12 text
12 / 16
Problems in phrase structure grammar
Phrase structure grammar has problems:
●
A tree doesn't represent meaning of sentence
– cf. case grammar.
●
Many rules are required for representing actual sentences
as input.
– e.g. In English, different rules are required for
representing difference of singular and plural.
– particularly those for representing "free-ordered"
languages (e.g. Japanese)
Slide 13
Slide 13 text
13 / 16
Case grammar / 格文法
Case grammar was created by Charles J. Fillmore in 1968.
●
focuses on the link between number of subjects, objects,
etc., of a verb and the grammatical context.
●
analyzes the surface syntactic structure of sentences by
studying the combination of deep cases (i.e. semantic
roles) which are required by a specific verb.
●
well represents grammars of agglutinative languages such
as Japanese and Korean.
Slide 14
Slide 14 text
14 / 16
Three types of natural language
●
isolating language / 孤立言語
– e.g. Chinese
– Words have no change
●
inflectional language / 屈折言語
– e.g. English and many European languages
– Shape of words are changed according to (grammatical) gender,
number, and case.
●
agglutinative language / 膠着言語
– e.g. Japanese and Korean
– Functional words of gender, number, and case are appended.
Slide 15
Slide 15 text
15 / 16
Case grammar: example
John opened the door with the key.
John opened the door.
The door opened.
The key opened the door.
A verb open requires the following cases;
●
agent (Jone)
●
objective (the door)
●
instrumental (the key)
Note that subject, object etc. of the surface structure does not directly corresponds to the
deep cases, as can be seen above.
Slide 16
Slide 16 text
16 / 16
Summary: today's key words
●
phrase structure grammar
●
Chomsky hierarchy
●
case grammar