[論文輪講]Decomposed Meta-Learning for Few-Shot Named Entity Recognition

Decomposed Meta-Learning for Few-Shot Named Entity Recognition Tingting Ma1, Huiqiang
Jiang2, Qianhui Wu2, Tiejun Zhao1, Chin-Yew Lin2 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia ACL2022 Toshihiko Sakai 2023/6/5

What is this paper about? Advantages compared with existing work
Key point of the proposed method 2 Propose few-shot span detection and few-shot entity typing for few-shot Named Entity Recognition ・Define few-shot span detection as a sequence labeling problem ・Train the span detector by MAML(model-agnostic meta-learning) to find a good model parameter initialization ・Propose MAML-ProtoNet to find a good embedding space Decomposed meta-learning procedure to separately the span detection model and the entity typing model How to verify the advantage and effectiveness of the proposal Discussion point or remaining problems that should be improved Related papers should be read afterwards Evaluate two groups of datasets and validate by ablation study ・ Evaluated the proposed method a few datasets. ・Compare the proposed method to other few-shot NER method that use meta-learning Triantaﬁllou+: Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples, ICLR ‘20

Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span
Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 3

Meta-learning 4 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf ▪ Learning to learn from
few examples(few-shot learning) Support Set Query Set Support Set Query Set Episode

Benefit of meta-learning 5 1. Learn from a few examples(few-shot
learning) 2. Adapting to novel tasks quickly 3. Build more generalizable systems Meta-Learning: https://meta-learning.fastforwardlabs.com/

N-way K-shot setting N: the number of classes K: the
number of examples 6 Meta-Learning: https://meta-learning.fastforwardlabs.com/ Support Set Query Set Entity Class ▪ Meta-Training ▪ Meta-Testing

An example of 2-way 1-shot setting in NER 7 two
entity class two entity classes and each class has one example(shot)

Method of meta-learning 8 1. Gradient-based(Model-agnostic meta-learning) 2. Black-box adaptation(Neural
process) 3. Model-based(Prototypical network) NTTコミュニケーション科学基礎研究所岩田具治メタ学習入門: https://www.kecl.ntt.co.jp/as/members/iwata/ibisml2021.pdf

MAML(Model-agnostic meta-learning) 9 ▪ Meta-learning objective is to help the
model quickly adapt to learn a new task ▪ The key idea in MAML is to establish initial model parameters in the meta-training phase that maximize its performance on the new task Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

ProtoNet ▪ Learn a class prototype in metric space ▪
Compute the average of the feature vector for each class the support set ▪ Using the distance function, Calculate the distance between the query set and ▪ The class predicted and trained by Softmax 10 Snell+: Prototypical networks for few-shot learning, NIPS ‘17

Introduction 12 Sang+: Introduction to the conll-2003 shared task: Language
independent named entity recognition, CoNLL ‘03 Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09 Named Entity Recognition[Sang+ 2003],[Ratinov+ 2009] Input morpa is a fully implemented parser for a text-to-speech system

Introduction ▪ Deep neural architectures have shown great success in
supervised NER with a amount of labeled data available ▪ In practical applications, NER system are usually expected to rapidly adapt to some new entity types unseen during training ▪ It is costly while not flexible to collect a number of additional labeled data for these types ▪ Few-shot NER has attracted in recent years 13

Previous studies for few-shot NER ▪ Token-level metric-learning • ProtoNet[Snell+
2017] compare each query token to the prototype of each entity class • compare each query token with each token of support examples and assign the label according to their distances[Fritzler+ 2019] ▪ Span-level metric-learning[Yu+ 2021] • Recently, bypass the issue of token-wise label dependency while explicitly utilizing phrasal representations 14 Snell+: Prototypical networks for few-shot learning, NIPS ‘17 Fritzler+: Few-shot classification in named entity recognition task, ACM/SIGAPP ‘19 Yu+: Few-shot intent classification and slot filling with retrieved examples, NAACL ‘21

Challenges in Metric Learning Challenge 1: domain gaps ▪ Direct
use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples Challenge 2: span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: domain transfer ▪ Insufficient available information for domain transfer to different domains ▪ Support examples only used for similarity calculation during inference in previous method 15

Challenges in Metric Learning Challenge 1: Limited effectiveness with large
domain gaps ▪ Direct use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples ☑ Few-shot span detection: MAML[Finn+ 2017] to find a good model parameter initialization that could fast adapt to new entity classes ☑ Few-shot entity typing: MAML-ProtoNet to narrow the gap between source domains and the target domain Challenge 2: Limitations of span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains 16

Challenges in Metric Learning Challenge 1: Limited effectiveness with large
domain gaps ▪ Direct use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples Challenge 2: Limitations of span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) ☑ Few-shot span detection: sequence labeling problem to avoid handling overlapping spans ☑ Span detection model locates named entities, class-agnostic. Feeds entity spans to typing model for class inference, eliminating noisy "O" prototype. Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains 17

Challenges in Metric Learning Challenge 2: Limitations of span-level metric
learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) ▪ Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains ▪ Support examples only used for similarity calculation during inference in previous method ☑ Few-shot span detection: model could better transfer to the target domain ☑ Few-shot entity typing: MAML-ProtoNet can find a better embedding space than ProtoNet to represent entity spans from different classes 18

Methology ▪ a) Entity Span Detection ▪ b) Entity Typing
20

21

Entity Span Detector ▪ The span detection model aims at
locating all the named entities ▪ Promote the learning of domain-invariant internal representations rather than domain-specific features by MAML[Finn+ 2017] ▪ meta-learned model is expected to be more sensitive to target-domain support examples ▪ Expected only a few fine-tune steps on new examples can make rapid progress without overfitting 22 Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

Entity Span Detector ▪ Basic detector 23 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf
Eq(3) ※ BERT-base-uncased

24

Entity Typing ▪ Entity typing model use ProtoNet for the
backbone ▪ Learn training episodes and calculate the probability span belongs to an entity class based on the distance between span representation and the prototype ▪ MAML enhanced ProtoNet 25 ProtoNet

Experiments 27 ▪ Evaluate performance of named entities micro F1-score
▪ Datasets Few-NERD Cross-dataset • CoNLL-2003 • GUM • WNUT-2017 • Ontonotes ※ two domains for training, one for validation, the remaining for test

Results 28 Few-NERD +10.60 Cross-Dataset +19.71

Ablation Study 30 Validate the contributions of method components. 1)
Ours w/o MAML 2) Ours w/o Span Detector 3) Ours w/o Span Detector w/o MAML 4) Ours w/o ProtoNet

1)Ours w/o MAML 31 ▪ Train both the Span detection
model and the ProtoNet in a conventional supervised learning. ▪ Fine-tune with few-shot examples.

2)Ours w/o Span Detector 32 ▪ Remove the mention detection
step and integrate MAML with token-level prototypical networks

3)Ours w/o Span Detector w/o MAML 33 ▪ Eliminate the
meta-learning procedure from 2) Ours w/o Span Detector ▪ Become the conventional token-level prototypical networks

4)Ours w/o ProtoNet 34 ▪ Apply the original MAML algorithm
▪ Train a BERT-based tagger for few-shot NER

Ablation Study Result 35 Point1. meta-learning procedure is effective Exploring
information contained in support examples with the proposed meta-learning procedure for few-shot transfer

Ablation Study Result 36 Point2. Decomposed framework is effective(span detection
and entity typing) mitigate the problem of noisy prototype for non-entities Ours > 2) 1) > 3)

Ablation Study Result 37 Point3 ProtoNet is neccessary the model
to adapt the up-most classification layer without sharing knowledge with training episodes leads to unsatisfactory results.

How does MAML promote the span detector? 38 ▪ Sup-Span:
train a span detector in the fully data ▪ Sup-Span-f.t.: fine-tune the model learned by Sup-Span ▪ MAML-Span-f.t: span detector with MAML ▪ Sup-Span only predicts “Broadway” missing the “New Century Theatre” → fully supervised manner can’t detect un-seen entity spans

How does MAML promote the span detector? 39 ▪ Sup-Span-f.t.
can successfully detect “New Century Theatre” However, still wrong detect “Broadway” → fine-tuning can benefit supervised model on new entity. But, it may bias too much to the training data ▪ MAML-Span-f.t.(Ours) can detect successfully

How does MAML promote the span detector? 40 ▪ Proposed
meta-learning procedure could better leverage support examples from novel episodes ▪ Help the model adapt to new episodes more effectively Few-NERD 5-way 1-2shot

How does MAML enhance the ProtoNet? 41 ▪ MAML-ProtoNet achieves
superior performance than the conventional ProtoNet ▪ verifies the effectiveness of leveraging the support examples to refine the learned embedding space at test time Analysis on entity typing under Few-NERD 5-way 1-2shot

Conclusion 43 ▪ This paper proposed decomposed meta-learning method for
few-shot NER Entity span detection • formulate the few-shot span detection as a sequence labeling problem • employ MAML to learn a good parameter initialization Entity typing • propose MAML-ProtoNet • find a better embedding space than conventional ProtoNet to represent entity spans from different classes

How does MAML enhance the ProtoNet? 45

Meta-learning datasets 46 ▪ Few-NERD Training: 20,000 episodes Validation: 1,000
episodes Test: 5,000 episodes ▪ Cross dataset Training episode: 2 datasets Validation episodes and test episodes: 1 datasets 5-shot: train/valid/test=200/100/100 1-shot: train/valid/test=400/100/200 • OntoNotes : train/valid/test=400/200/100

[論文輪講]Decomposed Meta-Learning for Few-Shot Nam...

[論文輪講]Decomposed Meta-Learning for Few-Shot Named Entity Recognition

More Decks by tossy

Other Decks in Research

Featured

Transcript