自問自答:命名實體識別應用於精準醫療服務-以智能理賠為例

自問自答: 命名實體識別應用於精準醫療服務 - 以智能理賠為例中國信託商業銀行數據暨科技研發處江侑倫

© CTBC 第1頁 / 共頁關於我 [email protected] 臺灣⼤學⽣物機電⼯程學系
/ 研究所中央研究院資訊科學研究所中國信託商業銀⾏數據暨科技研發處 Yu-Lun Chiang allenyummy https://allenyummy.github.io https://allenyummy.medium.com https://medium.com/allenyummy-note 江侑倫 (Yu-Lun Chiang)

© CTBC 第2頁 / 共頁 AGENDA 01 智能醫療理賠應⽤場景從智能理賠應⽤場景切⼊，闡述現今概況與需求，
使受眾明確知道本場演講的主題類型。 02 通往⾃然語⾔處理世界導⼊⾃然語⾔處理技術，並闡述技術應⽤於場景時所遇到的問題痛點與困境，並加以提出解決之道。 03 成果展⽰與靈感啟發展⽰現階段的成果，並分享在處理該場景的過程中，受到的種種靈感啟發。 04 Q&A 問答時間透過與觀眾之間的⼀問⼀答，除了解惑之外，更能加深呼應本次演講內容。

© CTBC 第3頁 / 共頁相較過往的⼈⼯審閱速率，⼈⼯智慧處理效率提升 30 %
01 智能醫療理賠應⽤場景⽇本保險業巨頭富國⽣命保險於 2017 年 1 ⽉採⽤ IBM Watson Explorer，⽤以檢視及整合保險索賠個案的醫療證明、過往醫療紀錄、⼿術⾦額、住院⽇數等，並會覆核申索⼈合約，以決定保險賠償額。新聞連結。實例成效需求根據 Bain & Company 在 2019 年的針對亞太地區保險業的報吿指出，由於亞太地區中產階級⼈⼝不斷增加，且法規不斷放寬，促使其保險業成⻑速度居全球之冠。保險業者開始採⽤進階分析、機器學習與其他⼈⼯智慧驅動的分析⼯具，幫助保險公司提升顧客互動、防治詐欺與簡化繁瑣的內部流程。 30% 72% 理賠部⾨⼈⼒從 47 ⼈縮編⾄ 13 ⼈，縮減⼈⼒達 72% 1.4E 建置花費 2E ⽇圓，每年維護費⽤ 1500W ⽇圓，但每年可節省約 1.4E⽇圓，約2年回本

© CTBC 第4頁 / 共頁 01 智能醫療理賠應⽤場景⼊院⽇期⼿術別
出院⽇期急診⽇期⾨診⽇期⼿術⽇期劑量次數處置器官診斷病名從診斷證明書中，萃取出結構化資訊快速建檔與歸檔，減少冗⻑處理時間透過深度學習，提⾼理賠判斷精準度⼤腸癌。--以下空⽩-- 診斷證明書病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、 10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩--

© CTBC 第5頁 / 共頁 02 通往⾃然語⾔處理世界光學字元辨識 (OCR)
Optical Character Recognition 命名實體識別 (NER) Named Entity Recognition 電腦視覺範疇⾃然語⾔處理範疇擷取結構化資訊後端資料庫 (Database) 診斷病名：⼤腸癌⼊院⽇期：西元2019年10⽉5⽇急診⽇期：西元2019年10⽉5⽇⾨診⽇期：10⽉16⽇、10⽉21⽇病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、 10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- 轉化為⽂本資料⼈⼯登打紀錄⼈⼯識別資訊智能理賠過往理賠處理程序耗時且耗⼈⼒，無法批量處理個案，增加審核理賠程序時間。當個案量⼀多，判斷準確率飄忽不定，無法控制品質，唯恐增加審核理賠程序錯誤率。⼤腸癌。--以下空⽩-- 診斷證明書病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、 10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩--

© CTBC 第6頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⽤
序列標註看待命名實體識別 ) 光學字元辨識 (OCR) Optical Character Recognition 命名實體識別 (NER) Named Entity Recognition 電腦視覺範疇擷取結構化資訊後端資料庫 (Database) 轉化為⽂本資料智能理賠診斷病名：⼤腸癌⼊院⽇期：西元2019年10⽉5⽇急診⽇期：西元2019年10⽉5⽇⾨診⽇期：10⽉16⽇、10⽉21⽇⾃然語⾔處理範疇病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、 10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- 病患於西元2019年10⽉5⽇⾄本院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B 急診 I 急診 I 急診 I 急診 I 急診 I 急診 I 急診 I 急診 B 出院 I 出院 I 出院 I 出院 B ⾨診 I ⾨診 I ⾨診 I ⾨診 B ⾨診 I ⾨診 I ⾨診 I ⾨診註: B 表⽰ Begin, I 表⽰ Inside/Intermediate, O 表⽰ Outside/Other (Ramshaw and Marcus, 1999)。註: 尚有其他序列標注⽅式如 BIOES 等等。序列標註 (SL) Sequence Labeling，屬於⼀種分類問題 x = 𝑥! , 𝑥" , 𝑥# , 𝑥$ , … , 𝑥% 𝑦 = 𝑦! , 𝑦" , 𝑦# , 𝑦$ , … , 𝑦% 𝑦 = 𝑓 𝑥 試圖找出 x 和 y 之間的關係 ∈ 𝐸 = 𝐵急診, 𝐼急診, 𝐵門診, 𝐼門診, … , 𝑂

序列標註看待命名實體識別 ) 光學字元辨識 (OCR) Optical Character Recognition 命名實體識別 (NER) Named Entity Recognition 電腦視覺範疇擷取結構化資訊後端資料庫 (Database) 轉化為⽂本資料智能理賠診斷病名：⼤腸癌⼊院⽇期：西元2019年10⽉5⽇急診⽇期：西元2019年10⽉5⽇⾨診⽇期：10⽉16⽇、10⽉21⽇⾃然語⾔處理範疇病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、 10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- 註: Vaswani et al., 2017 提出 self-attention 機制。註: Peters et al., 2018 提出 ELMo 之後，開創了預訓練 (Pretraining) 與微調 (Finetuning) 兩階段框架的時代。註: Delvin et al., 2018 提出 Bert，集⼤成。註: 更多有關中⽂的⾮監督式學習任務，可參考百度的 ERNIE 三部曲。預訓練 (Pretraining) 透過⾮監督學習任務，從⼤量無標註⽂本中學習語境化表徵 ① 克漏字測驗 (Masked Language Modeling, MLM) ② 下句預測 (Next Sentence Prediction, NSP) BERT 等等之 Transformers模型 ① [CLS] 命名實體應⽤於精準醫療服務 [SEP] ② [CLS] 命名實體應⽤於精準醫療服務 [SEP] 以智能理賠為例 [SEP] ① [CLS] 命名實體應⽤於精準醫療服務 [SEP] ② 兩者句⼦是否有前後關係：YES or No 微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 BERT 等等之 Transformers模型 [CLS] 病患於西元2019年10⽉5⽇⾄本院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- [SEP] [CLS] O O B-急診 I-急診 ..... O B-⾨診 I-⾨診 ..... O O .... O [SEP] ① 序列標注任務 (詞性標注、命名實體識別、分詞 …) ② 問答任務 (閱讀理解任務) … 遷移學習序列標註 (SL) Sequence Labeling，屬於⼀種分類問題 x = 𝑥! , 𝑥" , 𝑥# , 𝑥$ , … , 𝑥% 𝑦 = 𝑦! , 𝑦" , 𝑦# , 𝑦$ , … , 𝑦% 𝑦 = 𝑓 𝑥 試圖找出 x 和 y 之間的關係 ∈ 𝐸 = 𝐵急診, 𝐼急診, 𝐵門診, 𝐼門診, … , 𝑂

© CTBC 第8頁 / 共頁 02 通往⾃然語⾔處理世界 ( 多含義實體
) 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B 急診 I 急診 I 急診 I 急診 I 急診 I 急診 I 急診 I 急診 B 出院 I 出院 I 出院 I 出院 B ⾨診 I ⾨診 I ⾨診 I ⾨診 B ⾨診 I ⾨診 I ⾨診 I ⾨診病患因上述疾病於110年12⽉07⽇⾨診⼿術切除囊腫,曾於110年12⽉09⽇,110年12⽉10⽇,110年12⽉26 ⽇⾄本院⾨診治療。該員因上述病情,110年4⽉7⽇⼊院施⾏右側乳癌根除⼿術,到110年4⽉12⽇出院,該員於109年3⽉23⽇,110 年3⽉31⽇,110年04⽉16⽇⾄本院就醫,共3次。(以下空⽩)。因上述疾病於110年02⽉25⽇曾⾄⾨診求診,於同⽇再度⾄急診求診予⼊院治療,於110年02⽉29⽇出院,前後住院5⽇,於110年03⽉02⽇⾄⾨診追蹤。(以下空⽩) 註: 以上例⼦為作者⾃⾏撰寫。微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 BERT 等等之 Transformers模型 [CLS] 病患於西元2019年10⽉5⽇⾄本院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- [SEP] [CLS] O O B-急診 I-急診 ..... O B-⾨診 I-⾨診 ..... O O .... O [SEP] ① 序列標注任務 (詞性標注、命名實體識別、分詞 …) ② 問答任務 (閱讀理解任務) … 預訓練 (Pretraining) 透過⾮監督學習任務，從⼤量無標註⽂本中學習語境化表徵 ① 克漏字測驗 (Masked Language Modeling, MLM) ② 下句預測 (Next Sentence Prediction, NSP) 多含義實體其他範例⼊院⽇期怎麼辦？

⾃問⾃答看待命名實體識別 ) 微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 BERT 等等之 Transformers模型 [CLS] 病患於西元2019年10⽉5⽇⾄本院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- [SEP] [CLS] O O B-急診 I-急診 ..... O B-⾨診 I-⾨診 ..... O O .... O [SEP] ① 序列標注任務 (詞性標注、命名實體識別、分詞 …) ② 問答任務 (閱讀理解任務) … 微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 BERT 等等之 Transformers模型 ①[CLS] 請找出⼊院⽇期。 [SEP] 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。 [SEP] ②[CLS] 請找出急診⽇期。 [SEP] 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。 [SEP] ③[CLS] 請找出出院⽇期。 [SEP] 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。 [SEP] ④[CLS] 請找出⾨診⽇期。 [SEP] 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。 [SEP] ⑤[CLS] 請找出⼿術名稱。 [SEP] 病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。 [SEP] … 預訓練 (Pretraining) 透過⾮監督學習任務，從⼤量無標註⽂本中學習語境化表徵 ① 克漏字測驗 (Masked Language Modeling, MLM) ② 下句預測 (Next Sentence Prediction, NSP) ① O O O B I I I I I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ② O O O B I I I I I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ③ O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O ④ O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O B I I I O O O O O O O O O ⑤ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ⾃問⾃答 (Ask yourself !!!)，融合序列標註與閱讀理解 x = 𝑞! , 𝑞" , … , 𝑞% , 𝑥! , 𝑥" , 𝑥# , 𝑥$ , … , 𝑥% 𝑦 = 𝑦! , 𝑦" , 𝑦# , 𝑦$ , … , 𝑦% 𝑦 = 𝑓 𝑥 試圖找出 x 和 y 之間的關係 ∈ 𝐸 = 𝐵, 𝐼, 𝑂

© CTBC 第10頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⾃問⾃答
vs. 序列標註 ) 微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 BERT 等等之 Transformers模型 [CLS] 病患於西元2019年10⽉5⽇⾄本院急診，於10⽉7⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。--以下空⽩-- [SEP] [CLS] O O B-急診 I-急診 ..... O B-⾨診 I-⾨診 ..... O O .... O [SEP] ① 序列標注任務 (詞性標注、命名實體識別、分詞 …) ② 問答任務 (閱讀理解任務) … 預訓練 (Pretraining) 透過⾮監督學習任務，從⼤量無標註⽂本中學習語境化表徵 ① 克漏字測驗 (Masked Language Modeling, MLM) ② 下句預測 (Next Sentence Prediction, NSP) ⾃問⾃答 (Ask yourself !!!)，融合序列標註與閱讀理解 x = 𝑞! , 𝑞" , … , 𝑞% , 𝑥! , 𝑥" , 𝑥# , 𝑥$ , … , 𝑥% 𝑦 = 𝑦! , 𝑦" , 𝑦# , 𝑦$ , … , 𝑦% 𝑦 = 𝑓 𝑥 試圖找出 x 和 y 之間的關係 ∈ 𝐸 = 𝐵, 𝐼, 𝑂 序列標註 (SL) Sequence Labeling，屬於⼀種分類問題 x = 𝑥! , 𝑥" , 𝑥# , 𝑥$ , … , 𝑥% 𝑦 = 𝑦! , 𝑦" , 𝑦# , 𝑦$ , … , 𝑦% 𝑦 = 𝑓 𝑥 試圖找出 x 和 y 之間的關係 ∈ 𝐸 = 𝐵急診, 𝐼急診, 𝐵門診, 𝐼門診, … , 𝑂 ⾃問⾃答序列標註實體數量 N 標籤數量 3 2*N+1 資料集 N x M M 優點 Ø 同時處理⼀般與多含義實體 Ø 資料增量 Ø 僅能處理⼀般實體 Ø 訓練時間短缺點 Ø 訓練時間增加 M 倍 Ø 加劇資料不平衡狀況 Ø 無法處理多含義實體 Ø 仍有資料不平衡狀況

© CTBC 第11頁 / 共頁 02 通往⾃然語⾔處理世界 ( 實作
) Huggingface transformers allenyummy/EHR_NER [連結] 微調 (Finetuning) 根據下游任務的屬性，從標註⽂本中，學習任務之⽬標 ① 序列標注任務 (詞性標注、命名實體識別、分詞 …) ② 問答任務 (閱讀理解任務) … 預訓練 (Pretraining) 透過⾮監督學習任務，從⼤量無標註⽂本中學習語境化表徵 ① 克漏字測驗 (Masked Language Modeling, MLM) ② 下句預測 (Next Sentence Prediction, NSP) 挑選合適的預訓練模型 Huggingface model hub [連結] hfl/chinese-bert-wwm 細節之處 ① 根據下游任務屬性，設計模型架構主要是 input layer 與 output layer ② 整理資料集，根據資料集特性，使⽤相應做法資料集不平衡，調整損失函數、正則化 … ③ 挑選合適的度量衡與指標 precision, recall, F1 score ④ 花式調參找最佳解 transformers build-in func, optuna, ray, talos ⑤ 訓練模型技巧 earlystop, dropout, clipping, warmstart, … 註: hfl/chinese-bert-wwm 由哈⼯⼤迅⾶聯合實驗室發表 (Cui et al., 2019) 註: 若是常⾒的下游任務，hf 在 AutoFunction 中已把模型架構寫好了，儘管呼叫即可！反之，若有其他需求，則繼承 transformers.XXXPretrainedModel 實作客製化模型架構 (詳請可⾒ allenyummy/EHR_NER/models) 三位⼀體將資料集整理 Input Layer 需要的輸⼊值 token embeddings + segment embeddings + positional embeddings • 繼承 torch.utils.data.dataset 類別 (map-style)，實作 __getitem__() 與 __len__() • 使⽤ huggingface/datasets repo 整理輸⼊值註: trainer 內使⽤ torch.utils.data.DataLoader 建⽴資料流，恰與 torch.utils.data.dataset 連動。詳請可⾒allenyummy/EHR_NER/utils/feaproducer.py 使⽤ Trainer (Pytorch) 或是 TFTrainer (TensorFlow)，輸⼊上述的模型架構、資料集、各種訓練超參數、算分機制，開始微調模型參數開始微調囉

© CTBC 第12頁 / 共頁 03 成果展⽰ Chiang et
al., 2021, Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling ⼀般實體資料集 Flat NER ⼀般與多含義實體資料集 Nested NER 醫囑數量 4,328 7,907 每篇醫囑平均字數 70.43 76.08 ⼀般實體數量 21,616 43,577 多含義實體數量 0 6,978

© CTBC 第13頁 / 共頁 03 成果展⽰病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7 ⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。
序列標註⾃問⾃答急診⽇期西元2019年 10⽉5⽇出院⽇期 10⽉7⽇⾨診⽇期 10⽉16⽇⾨診⽇期 10⽉21⽇⼊院⽇期西元2019年 10⽉5⽇急診⽇期西元2019年 10⽉5⽇出院⽇期 10⽉7⽇⾨診⽇期 10⽉16⽇⾨診⽇期 10⽉21⽇

© CTBC 第14頁 / 共頁 03 成果展⽰專利取得
醫囑資訊擷取系統中華⺠國專利資訊檢索系統論⽂發表論⽂⼊選第33屆計算語⾔與語⾳研討會 Short Paper (ROCLING 2021) Chiang et al., 2021, Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling 源碼開放 allenyummy/EHR_NER [連結] Github

© CTBC 第15頁 / 共頁 03 靈感啟發以銅為鑑，可正⾐冠；以古為鑑，可知興替；
以⼈為鑑，可明得失。 • Alex et al., 2007, multi-layer CRFs • Ju et al., 2018, stacked flat NER layer • Wang et al., 2020a, pyramid layer 從外⾄內 (或從內⾄外) 提取實體堆疊法 Stack-based approaches 圖譜法 Graph-based approaches • Finkel and Manning, 2009, CRF with parse tree • Lu and Roth, 2015, hypergraph • Wang and Lu, 2018, neural segmental hypergraph • Katiyar and Cardie, 2018, LSTM with hypergraph • Luo and Zhao, 2020, bipartite flat graph network 使⽤圖譜提取實體區域法 Region-based approaches 先找實體位置，再賦予實體標籤 • Xu et al., 2017, FOFE & FFNN • Fisher and Vlachos, 2019, merge and label • Xia et al, 2019, detect and classify • Zheng et al., 2019, get boundary and then classify • Wang et al., 2020b, head-tail detector and token tagger 閱讀理解法 Machine Reading Comprehension approaches • Levy et al, 2017, MRC for relation extraction • Li et al., 2019, MRC for relation extraction • McCann et al, 2018, MRC for NLP Decathlon • Yin et al., 2020, MRC for sentiment analysis • Li et al., 2020, MRC for named entity recognition 使⽤問答框架，重新塑造 NLP 問題 • Segal et al., 2019, multi-span extraction 輔以提取策略

© CTBC 第16頁 / 共頁 Reference [6-1] Lance A
Ramshaw and Mitchell P Marcus. 1999. Text chunking using ransformation-based learning. In Natural language processing using very large corpora. Springer, 157–176. [7-1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). [7-2] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018). [7-3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). [7-4] Sun, Y., Wang, S., Li, Y., Feng, S., Chen, X., Zhang, H., ... & Wu, H. (2019). Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223. [7-5] Sun, Y., Wang, S., Li, Y., Feng, S., Tian, H., Wu, H., & Wang, H. (2020, April). Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 8968-8975). [7-6] Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., ... & Wang, H. (2021). ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv preprint arXiv:2107.02137. [11-1] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. 2019. Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101 (2019).

© CTBC 第17頁 / 共頁【stack-based approaches】 [15-1] Beatrice
Alex, Barry Haddow, and Claire Grover. 2007. Recognising nested named entities in biomedical text. In Biological, translational, and clinical language processing. 65–72. [15-2] Meizhi Ju, Makoto Miwa, and Sophia Ananiadou. 2018. A neural layered model for nested named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1446–1459. [15-3] Wang, J., Shou, L., Chen, K., & Chen, G. (2020, July). Pyramid: A layered model for nested named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5918-5928). 【graph-based approaches】 [15-4] Jenny Rose Finkel and Christopher D Manning. 2009. Nested named entity recognition. In Proceedings of the 2009 conference on empirical methods in natural language processing. 141–150. [15-5] Wei Lu and Dan Roth. 2015. Joint mention extraction and classification with mention hypergraphs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 857–867. [15-6] Wang, B., & Lu, W. (2018). Neural segmental hypergraphs for overlapping mention recognition. arXiv preprint arXiv:1810.01817. [15-7] Arzoo Katiyar and Claire Cardie. 2018. Nested named entity recognition revisited. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 861–871. [15-8] Luo, Y., & Zhao, H. (2020). Bipartite flat-graph network for nested named entity recognition. arXiv preprint arXiv:2005.00436. 【region-based approaches】 [15-9] Mingbin Xu, Hui Jiang, and Sedtawut Watcharawittayakul. 2017. A local detection approach for named entity recognition and mention detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1237–1247. [15-10] Joseph Fisher and Andreas Vlachos. 2019. Merge and Label: A novel neural network architecture for nested NER. arXiv preprint arXiv:1907.00464 (2019). [15-11] Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, and Philip Yu. 2019. Multi-grained named entity recognition. arXiv preprint arXiv:1906.08449 (2019). [15-12] Zheng, C., Cai, Y., Xu, J., Leung, H. F., & Xu, G. (2019). A boundary-aware neural model for nested named entity recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics. [15-13] Wang, Y., Li, Y., Tong, H., & Zhu, Z. (2020, November). HIT: nested named entity recognition via head-tail pair and token interaction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6027-6036). 【machine reading comprehension approaches】 [15-14] Levy, O., Seo, M., Choi, E., & Zettlemoyer, L. (2017). Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115. [15-15] Li, X., Yin, F., Sun, Z., Li, X., Yuan, A., Chai, D., ... & Li, J. (2019). Entity-relation extraction as multi-turn question answering. arXiv preprint arXiv:1905.05529. [15-16] McCann, B., Keskar, N. S., Xiong, C., & Socher, R. (2018). The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730. [15-17] Yin, D., Meng, T., & Chang, K. W. (2020). Sentibert: A transferable transformer-based architecture for compositional sentiment semantics. arXiv preprint arXiv:2005.04114. [15-18] Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2019. A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476 (2019). 【multi-spans extraction】 [15-19] Elad Segal, Avia Efrat, Mor Shoham, Amir Globerson, and Jonathan Berant. 2019. A simple and effective model for answering multi-span questions. arXiv preprint arXiv:1909.13375 (2019).

© CTBC 第18頁 / 共頁 Q&A 問答時間 Thank You
For Your Attention Stay Tuned ! [email protected] allenyummy Yu-Lun Chiang PPT Link https://speakerdeck.com/allenyummy/zi-wen-zi-da-ming-ming-shi-ti-shi-bie-ying-yong-yu-jing-zhun-yi- liao-fu-wu-yi-zhi-neng-li-pei-wei-li

自問自答:命名實體識別應用於精準醫療服務-以智能理賠為例

自問自答:命名實體識別應用於精準醫療服務-以智能理賠為例

Yu-Lun Chiang

More Decks by Yu-Lun Chiang

Other Decks in Research

Featured

Transcript

自問自答: 命名實體識別應用於精準醫療服務 - 以智能理賠為例中國信託商業銀行數據暨科技研發處江侑倫

© CTBC 第1頁 / 共頁關於我 [email protected] 臺灣⼤學⽣物機電⼯程學系

© CTBC 第2頁 / 共頁 AGENDA 01 智能醫療理賠應⽤場景從智能理賠應⽤場景切⼊，闡述現今概況與需求，

© CTBC 第3頁 / 共頁相較過往的⼈⼯審閱速率，⼈⼯智慧處理效率提升 30 %

© CTBC 第4頁 / 共頁 01 智能醫療理賠應⽤場景⼊院⽇期⼿術別

© CTBC 第5頁 / 共頁 02 通往⾃然語⾔處理世界光學字元辨識 (OCR)

© CTBC 第6頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⽤

© CTBC 第7頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⽤

© CTBC 第8頁 / 共頁 02 通往⾃然語⾔處理世界 ( 多含義實體

© CTBC 第9頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⽤

© CTBC 第10頁 / 共頁 02 通往⾃然語⾔處理世界 ( ⾃問⾃答

© CTBC 第11頁 / 共頁 02 通往⾃然語⾔處理世界 ( 實作

© CTBC 第12頁 / 共頁 03 成果展⽰ Chiang et

© CTBC 第13頁 / 共頁 03 成果展⽰病患於西元2019年10⽉5⽇⾄本院⼊院急診，於10⽉7 ⽇出院。10⽉16⽇、10⽉21⽇⾄本院⾨診追蹤治療。

© CTBC 第14頁 / 共頁 03 成果展⽰專利取得

© CTBC 第15頁 / 共頁 03 靈感啟發以銅為鑑，可正⾐冠；以古為鑑，可知興替；

© CTBC 第16頁 / 共頁 Reference [6-1] Lance A

© CTBC 第17頁 / 共頁【stack-based approaches】 [15-1] Beatrice

© CTBC 第18頁 / 共頁 Q&A 問答時間 Thank You