Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modeling Fine-Grained Entity Types with Box Embeddings

Yasumasa Onoe
September 08, 2021

Modeling Fine-Grained Entity Types with Box Embeddings

Neural entity typing models typically represent entity types as vectors in a high-dimensional space, but such spaces are not well-suited to modeling these types' complex interdependencies. We study the ability of box embeddings, which represent entity types as d-dimensional hyperrectangles, to represent hierarchies of fine-grained entity type labels even when these relationships are not defined explicitly in the ontology. Our model represents both types and entity mentions as boxes. Each mention and its context are fed into a BERT-based model to embed that mention in our box space; essentially, this model leverages typological clues present in the surface text to hypothesize a type representation for the mention. Soft box containment can then be used to derive probabilities, both the posterior probability of a mention exhibiting a given type and the conditional probability relations between types themselves. We compare our approach with a strong vector-based typing model, and observe state-of-the-art performance on several entity typing benchmarks. In addition to competitive typing performance, our box-based model shows better performance in prediction consistency (predicting a supertype and a subtype together) and confidence (i.e., calibration), implying that the box-based model captures the latent type hierarchies better than the vector-based model does.

Yasumasa Onoe

September 08, 2021
Tweet

Other Decks in Research

Transcript

  1. +KGTCTEJKECN5VTWEVWTGKP(PVKV[6[RGU  /1&01 -"/0,+ 21%,/ -,)&1& &+ *20& &+ EQCTUG

    ȤPG Ɣ 6[RGUGVECPDGNCTIG(I!M 7)(6&JQK 
  2. +KGTCTEJKECN5VTWEVWTGKP(PVKV[6[RGU  /1&01 -"/0,+ 21%,/ -,)&1& &+ *20& &+ EQCTUG

    ȤPG Ɣ 6[RGUGVECPDGNCTIG(I!M 7)(6&JQK  Ɣ 0QVCNYC[NKMGCVTGGȀ9JCVKH -"/0,+KU21%,/CPF-,)&1& &+"
  3. %QZ(ODGFFKPIU  -"/0,+ -,)&1& &+ /1&01 21%,/ 8GEVQT5RCEG %QZ5RCEG "/0,+

    21%,/ ,)&1& &+ 27++" ,))&+0 6X]DQQH&ROOLQV
  4. (ODGFFGFKP%QZ5RCEG 6JKU9QTM   %QZ5RCEG "/0,+ 21%,/ ,)&1& &+ 27++"

    ,))&+0 5WKVCDNGIGQOGVT[ Ɣ %QZ$EQPVCKPUCPQVJGTDQZ% ĺ$KUUWRGTV[RGQH% Ɣ %QZGUQXGTNCRĺECPDGDQVJ Ɣ %QZGUFKULQKPVĺOWVWCNN[ GZENWUKXG
  5. $RRNKECVKQPUQH%QZ(ODGFFKPIU  Ɣ 9QTF0GVJ[RGTP[ORTGFKEVKQP 8KNPKU.K'CUIWRVC2CVGN   ż /QFGNVJGVTCPUKVKXGENQUWTGDGVYGGPYQTFUWUKPIDQZGU Ɣ

    7PEGTVCKPMPQYNGFIGITCRJU &JGP ż (PVKVKGUCUDQZGUCPFTGNCVKQPUCUDQZVQDQZVTCPUHQTOCVKQPU Ɣ 4GEQOOGPFCVKQPU[UVGOU <JCPI ż 7UGTUCUDQZGUCPFKVGOUCURQKPVU
  6. $RRNKECVKQPUQH%QZ(ODGFFKPIU  Ɣ 9QTF0GVJ[RGTP[ORTGFKEVKQP 8KNPKU.K'CUIWRVC2CVGN  ż /QFGNVJGVTCPUKVKXGENQUWTGDGVYGGPYQTFUWUKPIDQZGU Ɣ 7PEGTVCKPMPQYNGFIGITCRJU

    &JGP ż (PVKVKGUCUDQZGUCPFTGNCVKQPUCUDQZVQDQZVTCPUHQTOCVKQPU Ɣ 4GEQOOGPFCVKQPU[UVGOU <JCPI ż 7UGTUCUDQZGUCPFKVGOUCURQKPVU Ɣ =6JKUYQTM?)KPGITCKPGFGPVKV[V[RKPI ż 0CVWTCNNCPIWCIGKPRWV ż (XGT[VJKPIKUCDQZ GPVKV[OGPVKQPUCPFV[RGU
  7. /QFGN1XGTXKGY  Ɣ (ODGFCPGPVKV[OGPVKQPCPF GPVKV[V[RGUKPVQVJGUCOGDQZ URCEG Ɣ 7UGDQZQRGTCVKQPU GIXQNWOG KPVGTUGEVKQPGVE

    VQEQORWVG 3 W\SH_PHQWLRQDQGFRQWH[W Ɣ 6TGCVGCEJV[RGRTGFKEVKQPCUCP KPFGRGPFGPVDKPCT[ENCUUKȤECVKQP %QZ5RCEG "/0,+ 21%,/ ,)&1& &+ 27++" ,))&+0
  8.  + ,!"/ 7KH+XQJHU*DPHVWKH ILUVWRIEHVWVHOOLQJERRNV E\6X]DQQH&ROOLQV "+1&,++!,+1"51 /+0#,/* >&/6@ Ɣ

    (PEQFGOGPVKQPCPFEQPVGZVYKVJ%(46  %QZDCUGF(PVKV[6[RKPI/QFGN "+1&,++!,+1"51 " 1,/
  9. Ɣ (PEQFGOGPVKQPCPFEQPVGZVYKVJ%(46 Ɣ 6TCPUHQTOVJGGPEQFKPIKPVQCDQZ "+1"/ ##0"1  %QZDCUGF(PVKV[6[RKPI/QFGN 5,&+1p &+,&+1p

     + ,!"/ 7KH+XQJHU*DPHVWKH ILUVWRIEHVWVHOOLQJERRNV E\6X]DQQH&ROOLQV "+1&,++!,+1"51 /+0#,/* >&/6@ "+1&,++!,+1"51 " 1,/
  10. Ɣ 2 V[RG^OGPVKQP  8QN V[RGDQZŀOGPVKQPDQZ 8QN OGPVKQPDQZ Ɣ &QORWVGUQHVXQNWOGQHKPVGTUGEVKQP

     %QZDCUGF/WNVKNCDGN6[RG&NCUUKȤGT 21%,/ -,)&1& &+ 6X]DQQH &ROOLQV 6X]DQQH &ROOLQV ͳPVGTUGEVKQP%QZ
  11.  %QZDCUGF/WNVKNCDGN6[RG&NCUUKȤGT 21%,/ -,)&1& &+ 6X]DQQH &ROOLQV 6X]DQQH &ROOLQV ͳPVGTUGEVKQP%QZ

    2*"),5"0ƒ0$2-1n:8:8„ Ɣ 2 V[RG^OGPVKQP  8QN V[RGDQZŀOGPVKQPDQZ 8QN OGPVKQPDQZ Ɣ &QORWVGUQHVXQNWOGQHKPVGTUGEVKQP
  12.  %QZDCUGF/WNVKNCDGN6[RG&NCUUKȤGT 21%,/ -,)&1& &+ 6X]DQQH &ROOLQV 6X]DQQH &ROOLQV &QORWVGͳPVGTUGEVKQP8QNWOG

    /GPVKQPCPF&QPVGZV%QZ8QNWOG ͳPVGTUGEVKQP%QZ 2*"),5"0ƒ0$2-1n:8:8„ Ɣ 2 V[RG^OGPVKQP  8QN V[RGDQZŀOGPVKQPDQZ 8QN OGPVKQPDQZ Ɣ &QORWVGUQHVXQNWOGQHKPVGTUGEVKQP
  13. (ZRGTKOGPVU Ɣ (PVKV[6[RKPI ż 'QGUCDQZOQFGNKORTQXGV[RKPIRGTHQTOCPEG" Ɣ &QPUKUVGPE[ ż 9JGPCOQFGNRTGFKEVUCȤPGV[RGFQGUKVCNUQRTGFKEVVJGEQCTUGV[RG" Ɣ

    4QDWUVPGUU ż +QYOWEJFQGURGTHQTOCPEGFGITCFGKHNCDGNUCTGPQKU[" Ɣ %QZ(FIGU Ɣ &CNKDTCVKQP Ɣ 'QYPUVTGCO6CUM 
  14. (PVKV[6[RKPI 7)(6 &JQK QH6[RGUM %CUGNKPGU Ɣ 8GEVQTOQFGN %(46 Ɣ 8GEVQTOQFGN

    (./Q   1PQG'WTTGVV Ɣ +[RGTDQNKEOQFGN   .°RG\5VTWDG 'GX4GUWNVU  7UGFFKͯGTGPVCWIOGPVGFFCVC %QZ 8GEVQT %(46 8GEVQT (./Q +[RGTDQNKE /CETQ)
  15. (PVKV[6[RKPI 1PVQ0QVGU *KNNKEM QH6[RGUV[RGU %CUGNKPGU Ɣ 8GEVQTOQFGN %(46 Ɣ 'QEWOGPVNGXGN

    <JCPI Ɣ +KGTCTEJKECNTCPMKPI &JGP  6GUV4GUWNVU  0QVG9GTGRQTV%%0CPF)ͳ*(4TGUWNVUKPQWTRCRGT %QZ 8GEVQT %(46 +KGT TCPMKPI 'QENGXGN /CETQ)
  16. (PVKV[6[RKPI 6GUV4GUWNVU  0QVG9GTGRQTV%%0CPF)ͳ*(4TGUWNVUKPQWTRCRGT %QZ 8GEVQT %(46 +KGT TCPMKPI 'QENGXGN

    /GPVKQP $VVP /CETQ) 1PVQ0QVGU *KNNKEM QH6[RGUV[RGU %CUGNKPGU Ɣ 8GEVQTOQFGN %(46 Ɣ 'QEWOGPVNGXGN <JCPI Ɣ +KGTCTEJKECNTCPMKPI &JGP  Ɣ /GPVKQPURCPCVVGPVKQP .KP,K ż 7UGFNCTIGUECNGCWIOGPVGFFCVC
  17. (PVKV[6[RKPI 6GUV4GUWNVU  0QVG9GTGRQTV%%0CPF)ͳ*(4TGUWNVUKPQWTRCRGT %QZ 8GEVQT %(46 +KGT TCPMKPI 'QENGXGN

    /GPVKQP $VVP /CETQ) 1PVQ0QVGU *KNNKEM QH6[RGUV[RGU %CUGNKPGU Ɣ 8GEVQTOQFGN %(46 Ɣ 'QEWOGPVNGXGN <JCPI Ɣ +KGTCTEJKECNTCPMKPI &JGP  Ɣ /GPVKQPURCPCVVGPVKQP .KP,K ż 7UGFNCTIGUECNGCWIOGPVGFFCVC Ɣ %QZKUDGVVGTVJCPXGEVQTKPIGPGTCN
  18. &QPUKUVGPE[  9JGPCOQFGNRTGFKEVUCȤPGV[RG CTVKUV FQGUKVCNUQRTGFKEVVJGEQCTUG V[RG RGTUQP " Ɣ EQCTUGV[RGUȤPGV[RGU

    ż RGTUQP ż NQECVKQP ż RNCEG ż QTICPK\CVKQP 5WRGTV[RG RGTUQP NQECVKQP RNCEG QTICPK\CVKQP $EEWTCE[
  19. &QPUKUVGPE[  9JGPCOQFGNRTGFKEVUCȤPGV[RG CTVKUV FQGUKVCNUQRTGFKEVVJGEQCTUG V[RG RGTUQP " Ɣ EQCTUGV[RGUȤPGV[RGU

    ż RGTUQP ż NQECVKQP ż RNCEG ż QTICPK\CVKQP Ɣ %QZKUEQPUKUVGPVN[DGVVGT VJCPXGEVQTHQTCNNEQCTUG V[RGU 5WRGTV[RG RGTUQP NQECVKQP RNCEG QTICPK\CVKQP $EEWTCE[
  20. 4QDWUVPGUU +QYOWEJFQGURGTHQTOCPEGFGITCFGKHNCDGNUCTGPQKU["   Ɣ 1TKIKPCN7)(6VTCKPKPIFCVC Ɣ 0QKUGF&QCTUGFTQRVJGIQNF EQCTUGV[RGUYKVJR 

    Ɣ 0QKUGF)KPG 7NVTCȤPGFTQRVJG IQNFȤPGCPFWNVTCȤPGV[RGUYKVJR  Ɣ %QZKUOQTGTQDWUVCICKPUVOKUUKPI NCDGNUVJCPXGEVQT /CETQ) %QZ 8GEVQT
  21.   1,/ G G G -"/0,+ 7KHLQWHUVHFWLRQRIWKH 1,/ER[$1'PHQWLRQ ER[HVRIDOOWKHGHYH[DPSOHVZKHUHWKHPRGHO

    SUHGLFWVWKH 1,/W\SH LHWKHHIIHFWLYH UHJLRQRIWKH 1,/ER[   1,/ER[ 0HQWLRQER[RIWKH GHYH[DPSOH G  7KHHIIHFWLYHUHJLRQRI WKH 1,/ER[ (ͯGEVKXG4GIKQP