Automa'c
selec'on
of
predicates
for
common
sense
knowledge
expression
We
aim
at
a
construc'on
of
easy-‐to-‐use
Japanese
common
sense
knowledge
base
for
seman'c
analysis
in
natural
language
processing.
We
define
that
predicates
(verbs,
adjec'ves,
verbal
noun*1)
which
co-‐occur
with
a
noun
are
the
common
sense
knowledge
of
the
noun,
but
of
course,
all
of
the
predicates
which
co-‐occurs
with
noun
are
not
appropriate
as
common
sense
knowledge.
Hence,
we
describe
how
to
select
the
appropriate
predicates.
*1:
The
term
verbal
noun,
or
what
we
call
sahen
noun,
is
subgroup
of
noun
which
is
also
used
as
verb
when
followed
by
a
suffix
“suru”
Defini&on
of
common
sense
knowledge
We
define
the
predicates
characterizing
the
noun
as
common
sense
knowledge,
and
make
the
following
hypothesis
as
specific
property
of
them.
(1) The
predicate
a
is
the
common
sense
knowledge
of
the
noun
n
when
the
pair
of
a
and
n
are
frequently
co-‐occurred
in
sentences.
(2) The
predicate
a
which
co-‐occurs
with
any
noun
is
not
the
appropriate
common
sense
knowledge
because
the
noun
is
characterized
by
the
set
of
common
sense
knowledge.
(3) Whether
the
predicate
a
is
a
correct
common
sense
or
not,
it
depends
on
the
number
of
unique
nouns
which
co-‐occurred
with
a.
Automa&c
selec&on
of
predicates
First,
we
extract
the
pairs
of
noun
and
predicate
which
are
co-‐occurred
in
the
Web
texts,
and
sort
the
nouns
by
number
of
co-‐
occurring
predicates
based
on
the
hypothesis
(1).
The
figure
shows
the
emergence
distribu'on
of
predicate
in
the
top
1,000
nouns
(N=1,000).
As
the
emergence
distribu'on,
we
realize
that
the
number
of
unique
predicates
drama'cally
increase
when
a
number
of
unique
nouns
is
extremely
large
or
few.
Under
the
hypothesis
(2),
we
see
the
predicates
which
co-‐occurred
with
any
nouns
as
incorrect
common
sense.
The
noun
which
co-‐occurred
with
many
predicates
also
have
a
lot
of
dele'ng
predicates
under
the
hypothesis
(3).
As
a
inves'ga'on
result,
the
number
of
dele'ng
predicates
decrease
in
a
staircase
paUern,
and
there
are
singular
points
in
N=700,
1,100,
1,600,
2,500
or
3,600.
Hence,
we
decided
the
number
of
dele'ng
predicates
for
each
noun
based
on
the
result.
0"
500"
1000"
1500"
2000"
2500"
0" 200" 400" 600" 800" 1000"
Containing
the
incorrectly
predicates
based
on
hypothesis
(2)
confidence
for
as
the
common
sense
based
on
the
hypothesis
(1)
high
low
horizontal
axis:
a
number
of
unique
nouns
co-‐occurring
with
predicates
ver'cal
axis:
a
number
of
unique
predicates