Slide 1

Slide 1 text

Decomposed Meta-Learning for Few-Shot Named Entity Recognition Tingting Ma1, Huiqiang Jiang2, Qianhui Wu2, Tiejun Zhao1, Chin-Yew Lin2 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia ACL2022 Toshihiko Sakai 2023/6/5

Slide 2

Slide 2 text

What is this paper about? Advantages compared with existing work Key point of the proposed method 2 Propose few-shot span detection and few-shot entity typing for few-shot Named Entity Recognition ・Define few-shot span detection as a sequence labeling problem ・Train the span detector by MAML(model-agnostic meta-learning) to find a good model parameter initialization ・Propose MAML-ProtoNet to find a good embedding space Decomposed meta-learning procedure to separately the span detection model and the entity typing model How to verify the advantage and effectiveness of the proposal Discussion point or remaining problems that should be improved Related papers should be read afterwards Evaluate two groups of datasets and validate by ablation study ・ Evaluated the proposed method a few datasets. ・Compare the proposed method to other few-shot NER method that use meta-learning Triantafillou+: Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples, ICLR ‘20

Slide 3

Slide 3 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 3

Slide 4

Slide 4 text

Meta-learning 4 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf ■ Learning to learn from few examples(few-shot learning) Support Set Query Set Support Set Query Set Episode

Slide 5

Slide 5 text

Benefit of meta-learning 5 1. Learn from a few examples(few-shot learning) 2. Adapting to novel tasks quickly 3. Build more generalizable systems Meta-Learning: https://meta-learning.fastforwardlabs.com/

Slide 6

Slide 6 text

N-way K-shot setting N: the number of classes K: the number of examples 6 Meta-Learning: https://meta-learning.fastforwardlabs.com/ Support Set Query Set Entity Class ■ Meta-Training ■ Meta-Testing

Slide 7

Slide 7 text

An example of 2-way 1-shot setting in NER 7 two entity class two entity classes and each class has one example(shot)

Slide 8

Slide 8 text

Method of meta-learning 8 1. Gradient-based(Model-agnostic meta-learning) 2. Black-box adaptation(Neural process) 3. Model-based(Prototypical network) NTTコミュニケーション科学基礎研究所 岩田具治 メタ学習入門: https://www.kecl.ntt.co.jp/as/members/iwata/ibisml2021.pdf

Slide 9

Slide 9 text

MAML(Model-agnostic meta-learning) 9 ■ Meta-learning objective is to help the model quickly adapt to learn a new task ■ The key idea in MAML is to establish initial model parameters in the meta-training phase that maximize its performance on the new task Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

Slide 10

Slide 10 text

ProtoNet ■ Learn a class prototype in metric space ■ Compute the average of the feature vector for each class the support set ■ Using the distance function, Calculate the distance between the query set and ■ The class predicted and trained by Softmax 10 Snell+: Prototypical networks for few-shot learning, NIPS ‘17

Slide 11

Slide 11 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 11

Slide 12

Slide 12 text

Introduction 12 Sang+: Introduction to the conll-2003 shared task: Language independent named entity recognition, CoNLL ‘03 Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09 Named Entity Recognition[Sang+ 2003],[Ratinov+ 2009] Input morpa is a fully implemented parser for a text-to-speech system

Slide 13

Slide 13 text

Introduction ■ Deep neural architectures have shown great success in supervised NER with a amount of labeled data available ■ In practical applications, NER system are usually expected to rapidly adapt to some new entity types unseen during training ■ It is costly while not flexible to collect a number of additional labeled data for these types ■ Few-shot NER has attracted in recent years 13

Slide 14

Slide 14 text

Previous studies for few-shot NER ■ Token-level metric-learning ● ProtoNet[Snell+ 2017] compare each query token to the prototype of each entity class ● compare each query token with each token of support examples and assign the label according to their distances[Fritzler+ 2019] ■ Span-level metric-learning[Yu+ 2021] ● Recently, bypass the issue of token-wise label dependency while explicitly utilizing phrasal representations 14 Snell+: Prototypical networks for few-shot learning, NIPS ‘17 Fritzler+: Few-shot classification in named entity recognition task, ACM/SIGAPP ‘19 Yu+: Few-shot intent classification and slot filling with retrieved examples, NAACL ‘21

Slide 15

Slide 15 text

Challenges in Metric Learning Challenge 1: domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples Challenge 2: span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: domain transfer ■ Insufficient available information for domain transfer to different domains ■ Support examples only used for similarity calculation during inference in previous method 15

Slide 16

Slide 16 text

Challenges in Metric Learning Challenge 1: Limited effectiveness with large domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples ☑ Few-shot span detection: MAML[Finn+ 2017] to find a good model parameter initialization that could fast adapt to new entity classes ☑ Few-shot entity typing: MAML-ProtoNet to narrow the gap between source domains and the target domain Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains 16

Slide 17

Slide 17 text

Challenges in Metric Learning Challenge 1: Limited effectiveness with large domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) ☑ Few-shot span detection: sequence labeling problem to avoid handling overlapping spans ☑ Span detection model locates named entities, class-agnostic. Feeds entity spans to typing model for class inference, eliminating noisy "O" prototype. Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains 17

Slide 18

Slide 18 text

Challenges in Metric Learning Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) ■ Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains ■ Support examples only used for similarity calculation during inference in previous method ☑ Few-shot span detection: model could better transfer to the target domain ☑ Few-shot entity typing: MAML-ProtoNet can find a better embedding space than ProtoNet to represent entity spans from different classes 18

Slide 19

Slide 19 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 19

Slide 20

Slide 20 text

Methology ■ a) Entity Span Detection ■ b) Entity Typing 20

Slide 21

Slide 21 text

Methology ■ a) Entity Span Detection ■ b) Entity Typing 21

Slide 22

Slide 22 text

Entity Span Detector ■ The span detection model aims at locating all the named entities ■ Promote the learning of domain-invariant internal representations rather than domain-specific features by MAML[Finn+ 2017] ■ meta-learned model is expected to be more sensitive to target-domain support examples ■ Expected only a few fine-tune steps on new examples can make rapid progress without overfitting 22 Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

Slide 23

Slide 23 text

Entity Span Detector ■ Basic detector 23 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf Eq(3) ※ BERT-base-uncased

Slide 24

Slide 24 text

Methology ■ a) Entity Span Detection ■ b) Entity Typing 24

Slide 25

Slide 25 text

Entity Typing ■ Entity typing model use ProtoNet for the backbone ■ Learn training episodes and calculate the probability span belongs to an entity class based on the distance between span representation and the prototype ■ MAML enhanced ProtoNet 25 ProtoNet

Slide 26

Slide 26 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 26

Slide 27

Slide 27 text

Experiments 27 ■ Evaluate performance of named entities micro F1-score ■ Datasets Few-NERD Cross-dataset ● CoNLL-2003 ● GUM ● WNUT-2017 ● Ontonotes ※ two domains for training, one for validation, the remaining for test

Slide 28

Slide 28 text

Results 28 Few-NERD +10.60 Cross-Dataset +19.71

Slide 29

Slide 29 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 29

Slide 30

Slide 30 text

Ablation Study 30 Validate the contributions of method components. 1) Ours w/o MAML 2) Ours w/o Span Detector 3) Ours w/o Span Detector w/o MAML 4) Ours w/o ProtoNet

Slide 31

Slide 31 text

1)Ours w/o MAML 31 ■ Train both the Span detection model and the ProtoNet in a conventional supervised learning. ■ Fine-tune with few-shot examples.

Slide 32

Slide 32 text

2)Ours w/o Span Detector 32 ■ Remove the mention detection step and integrate MAML with token-level prototypical networks

Slide 33

Slide 33 text

3)Ours w/o Span Detector w/o MAML 33 ■ Eliminate the meta-learning procedure from 2) Ours w/o Span Detector ■ Become the conventional token-level prototypical networks

Slide 34

Slide 34 text

4)Ours w/o ProtoNet 34 ■ Apply the original MAML algorithm ■ Train a BERT-based tagger for few-shot NER

Slide 35

Slide 35 text

Ablation Study Result 35 Point1. meta-learning procedure is effective Exploring information contained in support examples with the proposed meta-learning procedure for few-shot transfer

Slide 36

Slide 36 text

Ablation Study Result 36 Point2. Decomposed framework is effective(span detection and entity typing) mitigate the problem of noisy prototype for non-entities Ours > 2) 1) > 3)

Slide 37

Slide 37 text

Ablation Study Result 37 Point3 ProtoNet is neccessary the model to adapt the up-most classification layer without sharing knowledge with training episodes leads to unsatisfactory results.

Slide 38

Slide 38 text

How does MAML promote the span detector? 38 ■ Sup-Span: train a span detector in the fully data ■ Sup-Span-f.t.: fine-tune the model learned by Sup-Span ■ MAML-Span-f.t: span detector with MAML ■ Sup-Span only predicts “Broadway” missing the “New Century Theatre” → fully supervised manner can’t detect un-seen entity spans

Slide 39

Slide 39 text

How does MAML promote the span detector? 39 ■ Sup-Span-f.t. can successfully detect “New Century Theatre” However, still wrong detect “Broadway” → fine-tuning can benefit supervised model on new entity. But, it may bias too much to the training data ■ MAML-Span-f.t.(Ours) can detect successfully

Slide 40

Slide 40 text

How does MAML promote the span detector? 40 ■ Proposed meta-learning procedure could better leverage support examples from novel episodes ■ Help the model adapt to new episodes more effectively Few-NERD 5-way 1-2shot

Slide 41

Slide 41 text

How does MAML enhance the ProtoNet? 41 ■ MAML-ProtoNet achieves superior performance than the conventional ProtoNet ■ verifies the effectiveness of leveraging the support examples to refine the learned embedding space at test time Analysis on entity typing under Few-NERD 5-way 1-2shot

Slide 42

Slide 42 text

Contents ■ Meta-learning(Few-shot learning) ■ Introduction ■ Methodology Entity Span Detection Entity Typing ■ Experiments ■ Ablation Study ■ Conclusion 42

Slide 43

Slide 43 text

Conclusion 43 ■ This paper proposed decomposed meta-learning method for few-shot NER Entity span detection ● formulate the few-shot span detection as a sequence labeling problem ● employ MAML to learn a good parameter initialization Entity typing ● propose MAML-ProtoNet ● find a better embedding space than conventional ProtoNet to represent entity spans from different classes

Slide 44

Slide 44 text

44

Slide 45

Slide 45 text

How does MAML enhance the ProtoNet? 45

Slide 46

Slide 46 text

Meta-learning datasets 46 ■ Few-NERD Training: 20,000 episodes Validation: 1,000 episodes Test: 5,000 episodes ■ Cross dataset Training episode: 2 datasets Validation episodes and test episodes: 1 datasets 5-shot: train/valid/test=200/100/100 1-shot: train/valid/test=400/100/200 ● OntoNotes : train/valid/test=400/200/100