Slide 1

Slide 1 text

Automa'c  selec'on  of  predicates   for  common  sense  knowledge   expression Ai  Makabi,  Kazuhide  Yamamoto,   Hiroshi  Matsumoto     Nagaoka  University  of  Technology

Slide 2

Slide 2 text

D&(350$"-1 •! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0) –!E0&%%&'(&,)3-$4,+15+) –!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'%-./01% @*+0G*)H+%&03) I)&%)*"J+02-5).0$%))K$(<2G*)82'-5 H+*/$-*+) L$)B$")80$4*+)&) M+8)*2#+).$0)1$5) #$)12*(2/,2-+N O! !"#$%)2*)&)-&%+)$.)&)&"'( O! PA+0($%+)82'-5)8+<&A2$0) 44WC$)#0&2-)&)1$5 +Q5Q)R$-A+0*&'$-&,)*B*#+% KR KR

Slide 3

Slide 3 text

D&(350$"-1 •! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0) –!E0&%%&'(&,)3-$4,+15+) –!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'% I)&%)*"J+02-5).0$%))K$(<2G*)82'-5 L$)B$")80$4*+)&) M+8)*2#+).$0)1$5) #$)12*(2/,2-+N O! !"#$%)2*)&)-&%+)$.)&"'( O! PA+0($%+)82'-5)8+<&A2$0) 44WC$)#0&2-)&)1$5 +Q5Q)R$-A+0*&'$-&,)*B*#+% KR KR ) ))S$("*)$-)%+#<$1*T) ))U)D"2,12-5)&)($%%$-)*+-*+)3-$4,+15+)8&*+)) ))))VRW:DX) ))U)K0$A212-5)&((+**28,+)0+/0+*+-#&'$-).$0)"*+)))) ))2-)-&#"0&,),&-5"&5+)/0$(+**2-5)#&*3*) )

Slide 4

Slide 4 text

Related  Works  1/2 •  Exis'ng  Upper  Ontologies  (SUMO,  Cyc,  etc.)   –  Contain  many  general  concepts   –  e.g.  Collec'on:  book   •  A  Type  of:  Informa'on  bearing  object  the  form  of  paper   •  Instance  of:  Kind  of  ar'fact  not  dis'nguished  by  brand  or  model   •  Merits:     –  Exploit  rigorously-­‐defined  CSK     •  Demerits:   –  Knowledge  representa'on  cannot  be  matched  fully  with   actual  expressions

Slide 5

Slide 5 text

Related  Works  2/2 •  Defineing  the  CSK  as  some  rela'ons  are  added  to     sentences/words  (ConceptNet)   –  e.g.  犬(dog)   •  CapableOf:  散歩(walk),  寝る(sleep)   •  SymbolOf:  忠誠(loyalty),     •  Merits:     –  Defini'on  is  be_er  suited  to  a  natural  language  processing  task   •  Demerits:   –  For  the  Japanese  ConceptNet,  the  most  concepts  are  collected   manually     •  Coverage  of  CSK  is  excep'onally  low  

Slide 6

Slide 6 text

E$&,)$.)#<+)W#"1B •! !"#$%&'(&,,B)($-*#0"(#)&)`&/&-+*+)RW:D)#<&#) (&-)8+)"',2;+1).$0)*+%&-'()&-&,B*2*)2-)-&#"0&,) ,&-5"&5+)/0$(+**2-5 W+#)$.)/0+12(&#+*) #<&#)($U$(("0)42#<)&))-$"-) a)RW:) ) 44T3A+08) 44T)&1]+('A+) 44T)A+08&,)-$"- A+08) 8&03) 0"- &1]+('A+) /0+_B) ("#+ A+08&,)-$"-) #$)#0&2-9)#$)80+&1 RW:)$.)b1$5c

Slide 7

Slide 7 text

S2-&,)5$&,)U)PA+0A2+4)$.)#<+)RW:D (&# %+49)%+$4) #$)80+&1) /0+_B) #$)80+&1) /0+_B) &-2%&, R$%/"#+)&)*2%2,&02#B)8+#4++-)-$"-*) /"//B B+,/) ))))))TTTT) ) 8&03) #$)80+&1) /0+_B) 1$5 R$%/&0+)&#) #<+)/0+12(&#+U,+A+, !550+5&#+))($-(+/#*)&*) &)"//+0)($-(+/#)b&-2%&,c) 8&*+1)$-)#<+)*2%2,&02#B R$-(+/#) V-$"-X &-2%&, &)"//+0)($-(+/#)b&-2%&,c) 8&*+1)$-)#<+)*2%2,&02#B RW:) V/0+12(&#+X

Slide 8

Slide 8 text

Specific  Property  of  CSK •  We  make  the  three  hypothesis:   1)  The  predicate  a  is  the  CSK  of  the  noun  n  when   the  pair  of  a  and  n  are  frequently  co-­‐occurred  in   sentences.     2)  The  predicate  a  which  co-­‐occurs  with  any  nouns   is  not  the  appropriate  CSK   3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,   it  depends  on  the  number  of  unique  nouns   which  co-­‐occurred  with  a.

Slide 9

Slide 9 text

Specific  Property  of  CSK •  We  make  the  three  hypothesis:   1)  The  predicate  a  is  the  CSK  of  the  noun  n  when   the  pair  of  a  and  n  are  frequently  co-­‐occurred  in   sentences.     2)  The  predicate  a  which  co-­‐occurs  with  any  nouns   is  not  the  appropriate  CSK   3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,   it  depends  on  the  number  of  unique  nouns   which  co-­‐occurred  with  a.

Slide 10

Slide 10 text

W/+(2^()K0$/+0#B)$.)RW: •! M+)%&3+)#<+)#<0++)

Slide 11

Slide 11 text

!"#$%&'()*+,+('$-)$.)K0+12(&#+* C<+)#$/)Yf)/0+12(&#+*)&112-5)#$) &)-$"-)b )V+,+%+-#&0B)*(<$$,Xc  3V#$)+-0$,,)2-)*(<$$,X) *3V#$)+1"(&#+X) 5P3V8+X) GP3V8+($%+X) 3V#$)^-2*<)*(<$$,X) 3V#$)52A+),+**$-*X) 23V#$)#&3+)&-)+6&%X) :N73V&_+-1X) )3V#$),+&0-X) 3V#$)($&(

Slide 12

Slide 12 text

Specific  Property  of  CSK •  We  make  the  three  hypothesis:   1)  The  predicate  a  is  the  CSK  of  the  noun  n  when   the  pair  of  a  and  n  are  frequently  co-­‐occurred  in   sentences.     2)  The  predicate  a  which  co-­‐occurs  with  any  nouns   is  not  the  appropriate  CSK   3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,   it  depends  on  the  number  of  unique  nouns   which  co-­‐occurred  with  a.

Slide 13

Slide 13 text

C<+)#$/)Yf)/0+12(&#+*)&112-5)#$) &)-$"-)b )V+,+%+-#&0B)*(<$$,Xc K0+12(&#+*)42#<)<25<) ($U$(("00+-(+).0+d"+-(B) 42#<)&)-$"-)8"#)(&--$#) (<&0&(#+02;+)#<+)-$"- I-($00+(#)RW:) •! g+0*&',+)4$01*) •! R$U$(("00+1)42#<)%&-B) -$"-*  3V#$)+-0$,,)2-)*(<$$,X) *3V#$)+1"(&#+X) 5P3V8+X) GP3V8+($%+X) 3V#$)^-2*<)*(<$$,X) 3V#$)52A+),+**$-*X) 23V#$)#&3+)&-)+6&%X) :N73V&_+-1X) )3V#$),+&0-X) 3V#$)($&(

Slide 14

Slide 14 text

!" #!!" $!!!" $#!!" %!!!" %#!!" !" %!!" &!!" '!!" (!!" $!!!" \%+05+-(+)12*#028"'$-)$.)/0+12(&#+*) 2-)#<+)#$/)Y9fff)-$"-*)) C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<) %&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23 ?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+) ?"%8+0)$.)"-2d"+)/0+12(&#+*))

Slide 15

Slide 15 text

!" #!!" $!!!" $#!!" %!!!" %#!!" !" %!!" &!!" '!!" (!!" $!!!" \%+05+-(+)12*#028"'$-)$.)/0+12(&#+*) 2-)#<+)#$/)Y9fff)-$"-*)) C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<) %&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23 ?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+) ?"%8+0)$.)"-2d"+)/0+12(&#+*)) C<+)-"%8+0)$.)"-2d"+)/0+12(&#+*9) 4<2(<)($U$(("0)42#<)hff)-$"-*9)2*)Yfff

Slide 16

Slide 16 text

!" #!!" $!!!" $#!!" %!!!" %#!!" !" %!!" &!!" '!!" (!!" $!!!" C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<) %&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23 ?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+) ?"%8+0)$.)"-2d"+)/0+12(&#+*)) \%+05+-(+)12*#028"'$-)$.)/0+12(&#+*) 2-)#<+)#$/)Y9fff)-$"-*)) ($U$(("002-5)42#<) %&-B)-$"-* ($U$(("002-5)42#<) .+4)-$"-*

Slide 17

Slide 17 text

!" #!!" $!!!" $#!!" %!!!" %#!!" !" %!!" &!!" '!!" (!!" $!!!" C<2*)($-#&2-*)#<+)2-($00+(#,B)/0+12(&#+*) 8&*+1)$-)

Slide 18

Slide 18 text

/$4+0) &//0$62%&#+1)("0A+)) ,$5&02#<%2()("0A+ 2-i+('$-) /$2-# \%+05+-(+)12*#028"'$-)$.)/0+12(&#+*) 2-)#<+)#$/)Y9fff)-$"-*)) ?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)V,$5&02#<%X ?"%8+0)$.)"-2d"+)/0+12(&#+*)V,$5&02#<%X))

Slide 19

Slide 19 text

Specific  Property  of  CSK •  We  make  the  three  hypothesis:   1)  The  predicate  a  is  the  CSK  of  the  noun  n  when   the  pair  of  a  and  n  are  frequently  co-­‐occurred  in   sentences.     2)  The  predicate  a  which  co-­‐occurs  with  any  nouns   is  not  the  appropriate  CSK   3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,   it  depends  on  the  number  of  unique  nouns   which  co-­‐occurred  with  a.

Slide 20

Slide 20 text

W/+(2^()K0$/+0#B)$.)RW: •! M+)%&3+)#<+)#<0++)

Slide 21

Slide 21 text

\%+05+-(+)12*#028"'$-)$.)#<+)#$/)?) /0+12(&#+*)($U$(("002-5)42#<)-$"-) ?jYff ?jYfff >25

Slide 22

Slide 22 text

1+(0+&*2-5) 2-)&)*#&20(&*+)/&_+0- C<+)#$/)?)-$"-*)($U$(("002-5)42#<)%&-B)/0+12(&#+*) C<+)-"%8+0)$.)1+,+'-5)/0+12(&#+*) C<+)-"%8+0)$.)1+,+'-5) /0+12(&#+*).$0)+&(<)-$"-) 2*)1+(21+1)8&*+1)$-) #<+)

Slide 23

Slide 23 text

?"%8+0)$.)1+,+'-5)/0+12(&#+*) .$0)+&(<)-$"- Table I MBER OF DELETING PREDICATES FOR EACH NOUN (N=THE UNIQUE NUMBER OF CO-OCCURRED PREDICATES) Scope of the nouns Deletion N≤700 427 700

Slide 24

Slide 24 text

Added  CSK  for  each  noun he weighted scores for predicates co-occurring with n sing Harman normalized frequency. A predicate is cor ommon sense knowledge for a noun when the predic core is high. The equation of Harman normalized freque s as follows (n: noun, a: predicate, na,n : appearance uency of predicate a with noun n). TF(a, n) = log2 (na,n + 1) log2 ( k nk,n) •  The  following  equa'on  computes  weighted   scores  for  predicates  co-­‐occurring  with  noun   using  Harman  normalized  frequency     A  predicate  is  appreciate  as  correct  CSK  for  a  noun   when  the  predicate  score  is  high.     relate), ΍Δ (do), ͔͚Δ (build, hang, run, lack) Figure 6. The deleting predicates for all noun se the selected predicates as common sense knowl- nd add them to each noun. In particular, we calculate ghted scores for predicates co-occurring with noun arman normalized frequency. A predicate is correct n sense knowledge for a noun when the predicate high. The equation of Harman normalized frequency llows (n: noun, a: predicate, na,n : appearance fre- of predicate a with noun n). T F (a, n) = log2 (na,n + 1) log2 ( k nk,n) (1) Figure 6. The deleting predicates for all noun use the selected predicates as common sense know nd add them to each noun. In particular, we calcula ighted scores for predicates co-occurring with nou Harman normalized frequency. A predicate is corre n sense knowledge for a noun when the predica high. The equation of Harman normalized frequen ollows (n: noun, a: predicate, na,n : appearance fr of predicate a with noun n). TF(a, n) = log2 (na,n + 1) log2 ( k nk,n) ( redicates for all noun es as common sense knowl- n. In particular, we calculate tes co-occurring with noun ency. A predicate is correct a noun when the predicate arman normalized frequency icate, na,n : appearance fre- n n). (na,n + 1) B. Evaluatio We take their assign follows (Tab The propose noun as the On the othe which frequ much higher “ݘ (dog)”, “Ұॹ (be to :  noun :  predicate :  appearance  frequency  of  predicate  a  with  noun  n    

Slide 25

Slide 25 text

Baselines 1)  Do  not  delete  the  any  predicates,  just  use  the   weighted  predicates  by  Harman  normalized   frequency  (baseline  1)   2)  Do  not  delete  the  any  predicates,  just  use  the   weighted  predicates  by  TF-­‐IDF  score   (baseline  2)   3)  Remove  the  427  dele'ng  predicates  in   N≤700,  and  use  the  weighted  predicates  by   Harman  normalized  frequency  (baseline  3)  

Slide 26

Slide 26 text

893#:*'%";%3&&<,$'+%:4'+ D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(< :7U<&A+V :7U<&A+V SG63V1$)-$#) +&#X) 3V#$)#&3+)$"#) .$0)&)4&,3X) GPU?'!"#'V %3-?'%2",'26'41% 1SG63V1$)-$#) 80++1X) @D=P3V802-5) "/X) 6PU?'V 3V#$),2A+X) :K>RA3V82#+)#$) 1+&#=P3V5+#)&,,) #<2-X) :S663V("#+X

Slide 27

Slide 27 text

893#:*'%";%3&&<,$'+%:4'+ D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(< :7U<&A+V :7U<&A+V SG63V1$)-$#) +&#X) 3V#$)#&3+)$"#) .$0)&)4&,3X) GPU?'!"#'V %3-?'%2",'26'41% 1SG63V1$)-$#) 80++1X) @D=P3V802-5) "/X) 6PU?'V 3V#$),2A+X) :K>RA3V82#+)#$) 1+&#=P3V5+#)&,,) #<2-X) :S663V("#+X 1+&#

Slide 28

Slide 28 text

893#:*'%";%3&&<,$'+%:4'+ D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(< :7U<&A+V :7U<&A+V SG63V1$)-$#) +&#X) 3V#$)#&3+)$"#) .$0)&)4&,3X) GPU?'!"#'V %3-?'%2",'26'41% 1SG63V1$)-$#) 80++1X) @D=P3V802-5) "/X) 6PU?'V 3V#$),2A+X) :K>RA3V82#+)#$) 1+&#=P3V5+#)&,,) #<2-X) :S663V("#+X 6 5 U !//0$/02&#+) /0+12(&#+*)&0+)1+,+#+1

Slide 29

Slide 29 text

Error  Analysis  1/3 •  Although  a  predicate  co-­‐occurs  with  a  noun   many  'mes,  there  are  unrelated  pairs   – Do  not  check  the  dependency  rela'on  between   them   Solu'on:     Use  only  the  predicates  which  depend  on  the  target   nouns  as  candidate  of  CSK  

Slide 30

Slide 30 text

Error  Analysis  2/3 •  Could  not  assign  nouns,  which  can  also  be   used  as  suffix  to  appropriate  predicates   –   美しい月です (This  is  the  beau'ful  moon)   – 月ごとに決済する  (We  make  a  charge  for  each   month)   Solu'on:     U'lize  the  rela'on  of  another  co-­‐occurred  nouns    e.g.,  If  the  “月”  is  co-­‐occurred  with  a  noun  “太陽 (sun)”,  it  may  mean  the  moon  

Slide 31

Slide 31 text

Error  Analysis  3/3 •  Include  nouns  which  are  used  for  defining  the   rela'on  of  nouns   – 原因  (cause)   – 理由  (reason)   Solu'on:   Discuss  how  we  limit  the  nouns  of  adding  target  

Slide 32

Slide 32 text

Conclusion •  Described  the  selec'on  method  of  appropriate   predicate  as  CSK  for  construc'ng  the  CSKB.     –  Method  for  sta's'cally  selec'ng  CSK  of  nouns   u'lizing  the  unique  number  of  co-­‐occurred   predicates.     •  Evaluated  sets  of  CSK  which  are  assigned  to  each   noun  compared  with  three  baselines   –  Demonstrated  assumed  characteris'cs  of  CKS  in  our   study     –  Gave  a  subjec've  evalua'on   •  Plan  to  make  a  quan'ta've  evalua'on