PACLIC2022_Japanese Named Entity Recognition from Automatic Speech Recognition Using Pre-trained Models

Japanese Named Entity Recognition from Automatic Speech Recognition Using Pre-trained
Models Seiichiro Kondo*1, Naoya Ueda*1, Teruaki Oka*1, Masakazu Sugiyama*2, Asahi Hentona*2, Mamoru Komachi*1 *1 Tokyo Metropolitan University *2 AI Shift @PACLIC 2022

Background • We plan to address interactive voice response services
in two steps 1. Named entity recognition (NER) 2. Entity linking (EL) • In this study, we tackle NER from speech recognition results ASR User いちごっぱの交通情報を教えて (Give me traffic information on “いちごっぱ”) voice input いちごっぱ Linking ： “157”, ] }, { “value”: “国道158号”, “synonyms”: [ “158”, “いちごっぱ”, “1コッパ”, ： Named entity dictionary response generation Return traffic information on “国道158号” (Japan National Route 158) to the user. NER （ Subject of this study ） 1

Japanese NER from Automatic Speech Recognition (ASR) • Problems ØASR
errors ØUnknown named entities from abbreviations and aliases ØThe surface forms and meanings can be entirely different in spite of close sounds → NER using conventional methods is difficult 2 国道158号（route 158）（Kokudou-hyaku-goju-hachi-go ） 158 （Hyaku-goju-hachi / ichi-go-hachi ）いちごっぱ（ichi-go-ppa ） 1コッパ（ichikoppa ）イチゴったー（ichigotta ） 15PA（ichigopa ） abbreviation aliase ASR error

Our Setting • Previous study ØRaghuvanshi et al. (2019) used
additional information not contained in the text ØOmachi et al. (2021) postulated that an end-to-end (E2E) approach might be preferable • Processing ASR texts ØNo information other than text is used ØWe used existing ASR to enable flexible exchange of modules and resources 3

Our Method • Using pretrained models ØContextual information may be
used effectively for ASR error ØAssuming that contextual information can be used effectively by pre-trained models trained on a large number of sentences • Models used in our experiments ØBERT based model (Devlin et al., 2019) üEncoder model ØT5 (Raffel et al., 2020) üEncoder-decoder model 4

NER using BERT based models route 国道 8 号線
鯖 ##江 I-route O I-route B-label B-route O O BERT The token for label specification Input sentence（subword unit） Route 8 from Sabae to Fukui The label of named entity から O まで O 福井 O 5

NER using T5 Route 8 from Sabae to Fukui extract
road names 6

Data details • Data source ØSystem driven dialogue log containing
road traffic information in Fukui, Japan ØData were obtained over a certain period, not arbitrarily sampled • Prepared dictionary ØTwo dictionaries ØAddress in Fukui ØRoute in Fukui ØCertain aliases, abbreviations, and speech recognition errors were registered 7

Data details text Match 鯖江から敦賀市へ向かう⾼速道路 (Highway from Sabae to Tsuruga
City) Fallback えーとサザエさん、サザエ市春江町 (Well, Sazae-san, Sazae City, Harue-cho) Example • “Sazae (turban shell)” is a recognition error of “Sabae” • “Harue-cho” was not entered in the dictionary 8 • Two types of data ØMatch üNER succeeded by existing system üLabeled by dictionary matching ØFallback üNER failed by existing system üManually annotated considering ASR errors based on whether the named entities exist in Fukui

Data details train dev test match utterance 1,757 220 220
address 1,220 144 147 route 802 104 110 fallback utterance 949 118 122 address 197 30 26 route 92 8 17 Data statistics 9 • Randomly split so that train, dev and test are 8:1:1 • Match data is approximately twice as large as fallback data.

Experiments setting • Four NER systems ØString matching model based
on a dictionary Øtwo pretrained BERT-based models üBERT üELECTRA ØT5 • Pretrained models information Hyperparameters of Fine tuning lr Batch-size epoch Pretrained data BERT 0.00005 8 3 wiki (30M) ELECTRA 0.00005 8 20 mC4 (200M) T5 0.0005 8 20 mC4+wiki40b 10

Result • For the Match test data, the performance of
the pre-training model was equivalent to that of the dictionary match • For the Fallback data, the use of a pre-trained model improved performance over dictionary match • Adding Fallback data to the training data improved performance significantly, especially for T5 • Comparison with human recognition suggests still room for improvement based on the performance of the pre-trained models method P R F1 P R F1 String Match 96.3 100 98.1 — — — Trained using all data Trained by match data BERT 97.3 97.3 97.3 97.3 97.3 97.3 ELECTRA 96.9 98.1 97.5 97.7 98.1 97.9 T5 98.0 97.7 97.9 97.3 97.7 97.5 P: precision R: recall method P R F1 P R F1 human 80.0 97.6 87.9 — — — String Match 50.0 23.3 31.7 — — — Trained using all data Trained by match data BERT 67.9 83.7 75.0 58.8 46.5 51.9 ELECTRA 66.0 72.1 68.9 54.5 41.9 47.4 T5 74.0 86.0 79.6 41.3 60.5 49.1 11 Match data Fallback data

Error analysis in fallback data method error False positive False
negative NT PM ND AE others Human 11 8 2 0 0 1 BERT 21 14 3 2 2 0 T5 19 13 1 1 4 0 NT: Not tagged as named entity in test data PM: Partial match to the extraction span ND: Named entity not in the dictionary AE: ASR error • Extraction errors of BERT is higher in False positive Ø Some incorrect labels start with “I” for spans • In False negative errors, some challenges remain to be improved Ø In many of these cases, the user's utterances were too short to make use of contextual information Ø For ND, it may be necessary to use external data Ø For AE, although it would be ASR system dependent, error type-specific data may be effective 12

Examples model text translation BERT/T5 (address) 横倉ってどこや Where is Yokokura
BERT (route) ⻘年の道 Youth Road BERT (address) T5 (address) あの⾼みの⽅のエルパ⾏きのバスは取った後あの⾼みの⽅のエルパ⾏きのバスは取った後 After taking the bus to Elpa at that height Bold and underlined texts denote the reference and hypothesis. 道: road ⾼みの（⽅の）: height ⽅: direction * 13

Conclusion • Findings ØData generated by dictionary matching was extracted
by the pretrained models ØPre-trained models can extract unknown named entities ØAdding some manually annotated data is effective 14

Additional materials 15

T5（Text-To-Text Transfer Transformer） • Encoder-Decoder model • BERT-Style pre-training 16
X1 X2 X3 X4 X5 X6 X7 X8 Encoder BOS X1 _ _ X4 X5 _ X7 Decoder X1 X2 X3 X4 X5 X6 X7 X8 Attention

Covered Evaluation • For the Match test data, the performance
of the pre-training model was equivalent to that of the dictionary match • For the Fallback data, the use of a pre-trained model improved performance over dictionary match • Adding Fallback data to the training data improved performance significantly, especially for T5 method data P R F1 c_P c_R c_F1 P R F1 c_P c_R c_F1 String Match Match 96.3 100 98.1 96.3 100 98.1 — — — — — — Fallback 50.0 23.3 31.7 50.0 23.3 31.7 — — — — — — Trained using all data Trained by match data BERT Match 97.3 97.3 97.3 99.2 99.2 99.2 97.3 97.3 97.3 98.8 98.8 98.8 Fallback 67.9 83.7 75.0 67.9 83.7 75.0 58.8 46.5 51.9 58.8 46.5 51.9 ELECTRA Match 96.9 98.1 97.5 98.1 99.2 98.6 97.7 98.1 97.9 99.2 99.6 99.4 Fallback 66.0 72.1 68.9 66.0 72.1 68.9 54.5 41.9 47.4 57.6 44.2 50.0 T5 Match 98.0 97.7 97.9 99.2 98.8 99.0 97.3 97.7 97.5 98.5 98.8 98.6 Fallback 74.0 86.0 79.6 74.0 86.0 79.6 41.3 60.5 49.1 42.3 62.8 50.9 17

Additional Examples model text translation BERT (address) 吉⽥郡永平寺町 Yoshida-gun
Eiheiji-cho T5 (address) ⽥尻町から福井市までの福井市内まで From Tajiri-cho to Fukui City to Fukui City BERT/T5 (route) イチゴったー Ichigotta T5 (address) 低い low BERT/T5 (route) アイワかどう Aiwakado Bold and underlined texts denote the reference and hypothesis. • イチゴったー（Ichigotta ）: It may be ASR error of “いちごっぱ (ichi-go-ppa, 158)”, which is a colloquial expression for “国道158号 (kokudouhyaku-goju-hachi-gou, Japan National Route 158)”. • 低い（hikui ）: It may be ASR error of “福井（Fukui ）” • アイワかどう（aiwakado ）: It may be ASR error of “舞若道（maiwakado ）”, which is abbreviations of “舞鶴若狭⾃動⾞道（Maizuru-wakasa-jidosyado ）”. 18

PACLIC2022_Japanese Named Entity Recognition fr...

PACLIC2022_Japanese Named Entity Recognition from Automatic Speech Recognition Using Pre-trained Models

maskcott

More Decks by maskcott

Featured

Transcript

Japanese Named Entity Recognition from Automatic Speech Recognition Using Pre-trained

Background • We plan to address interactive voice response services

Japanese NER from Automatic Speech Recognition (ASR) • Problems ØASR

Our Setting • Previous study ØRaghuvanshi et al. (2019) used

Our Method • Using pretrained models ØContextual information may be

NER using BERT based models route 国道 8 号線

NER using T5 Route 8 from Sabae to Fukui extract

Data details • Data source ØSystem driven dialogue log containing

Data details text Match 鯖江から敦賀市へ向かう⾼速道路 (Highway from Sabae to Tsuruga

Data details train dev test match utterance 1,757 220 220

Experiments setting • Four NER systems ØString matching model based

Result • For the Match test data, the performance of

Error analysis in fallback data method error False positive False

Examples model text translation BERT/T5 (address) 横倉ってどこや Where is Yokokura

Conclusion • Findings ØData generated by dictionary matching was extracted

Additional materials 15

T5（Text-To-Text Transfer Transformer） • Encoder-Decoder model • BERT-Style pre-training 16

Covered Evaluation • For the Match test data, the performance

Additional Examples model text translation BERT (address) 吉⽥郡永平寺町 Yoshida-gun