Decomposed Meta-Learning for Few-Shot Named Entity Recognition Tingting Ma1, Huiqiang Jiang2, Qianhui Wu2, Tiejun Zhao1, Chin-Yew Lin2 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia ACL2022 Toshihiko Sakai 2023/6/5
What is this paper about? Advantages compared with existing work Key point of the proposed method 2 Propose few-shot span detection and few-shot entity typing for few-shot Named Entity Recognition ・Define few-shot span detection as a sequence labeling problem ・Train the span detector by MAML(model-agnostic meta-learning) to find a good model parameter initialization ・Propose MAML-ProtoNet to find a good embedding space Decomposed meta-learning procedure to separately the span detection model and the entity typing model How to verify the advantage and effectiveness of the proposal Discussion point or remaining problems that should be improved Related papers should be read afterwards Evaluate two groups of datasets and validate by ablation study ・ Evaluated the proposed method a few datasets. ・Compare the proposed method to other few-shot NER method that use meta-learning Triantafillou+: Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples, ICLR ‘20
Meta-learning 4 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf ■ Learning to learn from few examples(few-shot learning) Support Set Query Set Support Set Query Set Episode
Benefit of meta-learning 5 1. Learn from a few examples(few-shot learning) 2. Adapting to novel tasks quickly 3. Build more generalizable systems Meta-Learning: https://meta-learning.fastforwardlabs.com/
N-way K-shot setting N: the number of classes K: the number of examples 6 Meta-Learning: https://meta-learning.fastforwardlabs.com/ Support Set Query Set Entity Class ■ Meta-Training ■ Meta-Testing
MAML(Model-agnostic meta-learning) 9 ■ Meta-learning objective is to help the model quickly adapt to learn a new task ■ The key idea in MAML is to establish initial model parameters in the meta-training phase that maximize its performance on the new task Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17
ProtoNet ■ Learn a class prototype in metric space ■ Compute the average of the feature vector for each class the support set ■ Using the distance function, Calculate the distance between the query set and ■ The class predicted and trained by Softmax 10 Snell+: Prototypical networks for few-shot learning, NIPS ‘17
Introduction 12 Sang+: Introduction to the conll-2003 shared task: Language independent named entity recognition, CoNLL ‘03 Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09 Named Entity Recognition[Sang+ 2003],[Ratinov+ 2009] Input morpa is a fully implemented parser for a text-to-speech system
Introduction ■ Deep neural architectures have shown great success in supervised NER with a amount of labeled data available ■ In practical applications, NER system are usually expected to rapidly adapt to some new entity types unseen during training ■ It is costly while not flexible to collect a number of additional labeled data for these types ■ Few-shot NER has attracted in recent years 13
Previous studies for few-shot NER ■ Token-level metric-learning ● ProtoNet[Snell+ 2017] compare each query token to the prototype of each entity class ● compare each query token with each token of support examples and assign the label according to their distances[Fritzler+ 2019] ■ Span-level metric-learning[Yu+ 2021] ● Recently, bypass the issue of token-wise label dependency while explicitly utilizing phrasal representations 14 Snell+: Prototypical networks for few-shot learning, NIPS ‘17 Fritzler+: Few-shot classification in named entity recognition task, ACM/SIGAPP ‘19 Yu+: Few-shot intent classification and slot filling with retrieved examples, NAACL ‘21
Challenges in Metric Learning Challenge 1: domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples Challenge 2: span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: domain transfer ■ Insufficient available information for domain transfer to different domains ■ Support examples only used for similarity calculation during inference in previous method 15
Challenges in Metric Learning Challenge 1: Limited effectiveness with large domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples ☑ Few-shot span detection: MAML[Finn+ 2017] to find a good model parameter initialization that could fast adapt to new entity classes ☑ Few-shot entity typing: MAML-ProtoNet to narrow the gap between source domains and the target domain Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains 16
Challenges in Metric Learning Challenge 1: Limited effectiveness with large domain gaps ■ Direct use of learned metrics without target domain adaptation ■ Insufficient exploration of information from support examples Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) ☑ Few-shot span detection: sequence labeling problem to avoid handling overlapping spans ☑ Span detection model locates named entities, class-agnostic. Feeds entity spans to typing model for class inference, eliminating noisy "O" prototype. Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains 17
Challenges in Metric Learning Challenge 2: Limitations of span-level metric learning methods ■ Handling overlapping spans during decoding process requires careful handling ■ Noisy class prototype for non-entities(e.g., “O”) ■ Challenge 3: Limited information for domain transfer and inference ■ Insufficient available information for domain transfer to different domains ■ Support examples only used for similarity calculation during inference in previous method ☑ Few-shot span detection: model could better transfer to the target domain ☑ Few-shot entity typing: MAML-ProtoNet can find a better embedding space than ProtoNet to represent entity spans from different classes 18
Entity Span Detector ■ The span detection model aims at locating all the named entities ■ Promote the learning of domain-invariant internal representations rather than domain-specific features by MAML[Finn+ 2017] ■ meta-learned model is expected to be more sensitive to target-domain support examples ■ Expected only a few fine-tune steps on new examples can make rapid progress without overfitting 22 Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17
Entity Typing ■ Entity typing model use ProtoNet for the backbone ■ Learn training episodes and calculate the probability span belongs to an entity class based on the distance between span representation and the prototype ■ MAML enhanced ProtoNet 25 ProtoNet
Experiments 27 ■ Evaluate performance of named entities micro F1-score ■ Datasets Few-NERD Cross-dataset ● CoNLL-2003 ● GUM ● WNUT-2017 ● Ontonotes ※ two domains for training, one for validation, the remaining for test
Ablation Study Result 35 Point1. meta-learning procedure is effective Exploring information contained in support examples with the proposed meta-learning procedure for few-shot transfer
Ablation Study Result 36 Point2. Decomposed framework is effective(span detection and entity typing) mitigate the problem of noisy prototype for non-entities Ours > 2) 1) > 3)
Ablation Study Result 37 Point3 ProtoNet is neccessary the model to adapt the up-most classification layer without sharing knowledge with training episodes leads to unsatisfactory results.
How does MAML promote the span detector? 38 ■ Sup-Span: train a span detector in the fully data ■ Sup-Span-f.t.: fine-tune the model learned by Sup-Span ■ MAML-Span-f.t: span detector with MAML ■ Sup-Span only predicts “Broadway” missing the “New Century Theatre” → fully supervised manner can’t detect un-seen entity spans
How does MAML promote the span detector? 39 ■ Sup-Span-f.t. can successfully detect “New Century Theatre” However, still wrong detect “Broadway” → fine-tuning can benefit supervised model on new entity. But, it may bias too much to the training data ■ MAML-Span-f.t.(Ours) can detect successfully
How does MAML promote the span detector? 40 ■ Proposed meta-learning procedure could better leverage support examples from novel episodes ■ Help the model adapt to new episodes more effectively Few-NERD 5-way 1-2shot
How does MAML enhance the ProtoNet? 41 ■ MAML-ProtoNet achieves superior performance than the conventional ProtoNet ■ verifies the effectiveness of leveraging the support examples to refine the learned embedding space at test time Analysis on entity typing under Few-NERD 5-way 1-2shot
Conclusion 43 ■ This paper proposed decomposed meta-learning method for few-shot NER Entity span detection ● formulate the few-shot span detection as a sequence labeling problem ● employ MAML to learn a good parameter initialization Entity typing ● propose MAML-ProtoNet ● find a better embedding space than conventional ProtoNet to represent entity spans from different classes