Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Елена Тутубалина, RuREBus-2020 Shared Task: Russian Relation Extraction for Business

4153fb64761a860efdb4d7029f1c64d4?s=47 opentalks3
February 05, 2021

OpenTalks.AI - Елена Тутубалина, RuREBus-2020 Shared Task: Russian Relation Extraction for Business

4153fb64761a860efdb4d7029f1c64d4?s=128

opentalks3

February 05, 2021
Tweet

Transcript

  1. Russian Relation Extraction for Business Vitaly Ivanin, Ekaterina Artemova, Tatiana

    Batura, Vladimir Ivanov, Veronika Sarkisyan, Elena Tutubalina, and Ivan Smurov ABBYY, Russia Moscow Institute of Physics and Technology, Russia National Research University Higher School of Economics, Russia Novosibirsk State University, Russia Innopolis University, Russia Kazan Federal University, Russia Lomonosov Moscow State University RuREBus-2020 Shared Task:
  2. • Named entity recognition and relation extraction are established and

    well researched NLP tasks • Scores obtained on standard academic corpora (such as CoNLL-03 and SemEval-2010 task 8) by SOTA systems are high and in many cases are close to human performance • Given these considerations some representatives of academia claim that NER and RE are essentially “solved tasks” • NER and RE are widely used by business, but the performance is typically much lower than reported on academic corpora • One can assume that the reason for this is that academic corpora have some major differences from industrial and thus it is reasonable to create a business-oriented corpus and test modern methods on it. Our motivation 2
  3. In our opinion there are two key differences between academic

    and industrial corpora: • Academic baselines typically consist of well-written news or biography texts • Business case texts are usually domain-specific(e. g. legal) texts that can contain less than perfect language or other irregularities. • Entities in academia are usually compact and well-defined • Entities in industry are often much more loose, spanning for many words and with less than clear borders. Academic corpora v. s. industrial 3
  4. • Create a corpus of strategic planning documents with entity

    and relation annotation • Organize a shared task on this corpus thus establishing a reasonable baseline on it • We intend our corpus to serve as a “lower bound estimate” for industrial applications 4 Our goals Common NLP pipeline
  5. We intended to make our shared task as close to

    real-world scenario as possible. To do this we allowed several features, often frown upon in academia (to our best knowledge ignored by our participants): • We did not restrict participations for open-source systems exclusively • We allowed participants to create additional markup, provided they report it and send us markup created by them • We provided a large corpus of unmarked texts of same domain Business-like features 5
  6. Each Russian federal and municipal subject publishes several strategic planning

    documents per year. The overall collection contains more than 30 thousand documents with the following features: • uniformity of texts: documents have the same domain, purpose, very similar style and size; • shared scope: documents mention various types of economic and social entities and relations at different levels of management; • fixed modalities: a fixed list of modalities in documents that cover current state of the economy or society (problems), as well as plans for future (actions, tasks, etc.) Strategic planning documents 6
  7. Description of entities 7 Entity Entity description Examples (eng) Examples

    (rus) ACT (activity) event or specific activity restoration work drug prevention реставрационные работы профилактика наркомании BIN (binary) one-time action / binary characteristic modernization invest модернизация инвестировать CMP (compare) comparative characteristic decrease of level more ecological снижение уровня более экологичный ECO (economics) economic entity / infrastructure object PJSC Sberbank hospital complex ПАО Сбербанк больничный комплекс INST (institutions) institutions, structures and organizations Youth Employment Center city administration центр занятости молодёжи администрация города MET (metrics) numerical indicator / object on which a comparison operation is defined unemployment rate total length of roads уровень безработицы общая протяжённость дорог SOC (social) entity related to social rights or social amenities leisure activities historical heritage досуг населения историческое наследие QUA (qualitative ) quality characteristic high quality stable высококачественный стабильный
  8. Description of relations 8 Relation Examples (eng) Examples (rus) PNG

    <ACT> landscaping work </ACT> <BIN> is not completed</BIN> <ACT> работы по благоустройству </ACT> <BIN> не завершены </BIN> FPS <CMP> increase of </CMP> <MET> life expectancy </MET> <CMP> повышение </CMP> <MET> средней продолжительности жизни </MET> NNT <MET> rate of incidence </MET> <BIN> stabilized </BIN> <MET> темп заболеваемости </MET> <BIN> стабилизировался </BIN> GOL <CMP> decrease of </CMP> <MET> unemployment level </MET> <CMP> снижение </CMP> <MET> уровня безработицы </MET> TSK <QUA> capital </QUA> <ACT> kindergarten repair </ACT> <QUA> капитальный </QUA> <ACT> ремонт детского сада </ACT> Positive (PS) Neutral (NT) Negative (NG) Past (P) PPS PNT PNG Present (N) NPS NNT NNG Future (F) FPS FNT FNG + GOL - abstract goals + TSK - tasks Relation types Examples of annotation
  9. Why this entities and relations are useful? • Our main

    motivation was to create a showcase scenario for non- traditional entities and relation • However we believe that our markup can be useful for analysis of e- government documents • See our article “So What's the Plan? Mining Strategic Planning Document” at DTSG for details
  10. Annotation pipeline: brat interface 10 Annotation interface

  11. Active learning

  12. Named entities total mean len (std) BIN 30201 1.05 (0.28)

    MET 14161 4.23 (3.50) QUA 7719 1.14 (0.52) CMP 9288 1.16 (0.78) SOC 10834 2.77 (2.31) INST 7903 3.69 (2.81) ECO 24853 2.78 (2.19) ACT 12274 4.74 (4.57) NE statistics Dataset: - 188 train documents - 30 test documents
  13. Relations Label Count NNT 534 NNG 844 NPS 755 PPS

    528 PNG 84 Label Count GOL 3563 TSK 4613 PNT 190 FPS 1167 FNG 229 FNT 141 Relations by class
  14. Shared task 1. Named Entity Recognition Given: raw text files

    Expected: char spans with labels Evaluation: micro span-based F1 1. Relation Extraction with given Named Entities Given: NEs spans Expected: relations in format (class, head span idx, tail span idx) Evaluation: micro F1 measure 1. End-to-end Relation Extraction Given: raw text files Expected: NEs & relations in format (class, head idx, tail idx) Evaluation: micro F1 measure
  15. Task results Team Task 1 (NER) Task 2 (RE with

    NEs) davletov-aa 0.561 0.394 Sdernal 0.464 0.441 ksmith 0.463 0.152 viby 0.417 0.218 dimsolo 0.400 - bond005 0.338 0.045 Student2020 0.253 - Span based F1 measure
  16. Task 1

  17. Task 2

  18. After task 1 and 3 deadline Team Task 1 Task

    3 max before deadline 0.561 0.062 (ksmith) davletov-aa 0.561 0.132 bondarenko 0.498 (top 2) -
  19. Best architectures NER - top-1 : Multilingual-BERT + MLP -

    top-2: RuBERT + MLP RE - top-1 : R-BERT - top-2 : Multi task learning NER + RE R-BERT architecture
  20. Long spans problem Label ACT BIN CMP ECO INST MET

    QUA SOC F1 diff 0.28 0.03 0.00 0.23 0.21 0.27 0.00 0.19 Mean char len 34 12 10 24 27 31 12 21 Difference between span-based-F1 and char-based-F1
  21. SemEval-2020 Task 11 • Giovanni Da San Martino, Alberto Barron-Cedeno,

    Henning Wachsmuth, Rostislav Petrov, and Preslav Nakov. SemEval-2020 task 11: Detection of propaganda techniques in news articles. In Proceedings of the 14th International Workshop on Semantic Evaluation, SemEval 2020, Barcelona, Spain, September 2020.
  22. Conclusions • We presented new Russian dataset for NER and

    RE • We presented large raw text corpus for this domain • Proposed dataset represents worst-case industrial application scenario • Shared task results demonstrate that dataset could be treated as testing ground for industrial applications of NER and RE
  23. Thanks for your attention! Github: https://github.com/dialogue-evaluation/RuREBus