Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[論文輪講]Decomposed Meta-Learning for Few-Shot Named Entity Recognition

tossy
June 05, 2023

[論文輪講]Decomposed Meta-Learning for Few-Shot Named Entity Recognition

研究室の論文輪講の資料です

tossy

June 05, 2023
Tweet

More Decks by tossy

Other Decks in Research

Transcript

  1. Decomposed Meta-Learning for
    Few-Shot Named Entity Recognition
    Tingting Ma1, Huiqiang Jiang2, Qianhui Wu2, Tiejun Zhao1, Chin-Yew Lin2
    1 Harbin Institute of Technology, Harbin, China
    2 Microsoft Research Asia
    ACL2022
    Toshihiko Sakai
    2023/6/5

    View full-size slide

  2. What is this paper about?
    Advantages compared with existing work
    Key point of the proposed method
    2
    Propose few-shot span detection and few-shot
    entity typing for few-shot Named Entity
    Recognition
    ・Define few-shot span detection as a sequence
    labeling problem
    ・Train the span detector by
    MAML(model-agnostic meta-learning) to find a
    good model parameter initialization
    ・Propose MAML-ProtoNet to find a good
    embedding space
    Decomposed meta-learning procedure to
    separately the span detection model and the
    entity typing model
    How to verify the advantage and effectiveness of the
    proposal
    Discussion point or remaining problems that should be
    improved
    Related papers should be read afterwards
    Evaluate two groups of datasets and validate
    by ablation study
    ・ Evaluated the proposed method a few
    datasets.
    ・Compare the proposed method to other
    few-shot NER method that use meta-learning
    Triantafillou+: Meta-Dataset: A Dataset of Datasets for Learning to Learn
    from Few Examples, ICLR ‘20

    View full-size slide

  3. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    3

    View full-size slide

  4. Meta-learning
    4
    O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf
    ■ Learning to learn from few examples(few-shot learning)
    Support Set
    Query Set
    Support Set
    Query Set
    Episode

    View full-size slide

  5. Benefit of meta-learning
    5
    1. Learn from a few examples(few-shot learning)
    2. Adapting to novel tasks quickly
    3. Build more generalizable systems
    Meta-Learning: https://meta-learning.fastforwardlabs.com/

    View full-size slide

  6. N-way K-shot setting
    N: the number of classes
    K: the number of examples
    6
    Meta-Learning: https://meta-learning.fastforwardlabs.com/
    Support Set
    Query Set
    Entity Class
    ■ Meta-Training
    ■ Meta-Testing

    View full-size slide

  7. An example of 2-way 1-shot setting in NER
    7
    two entity class
    two entity classes and
    each class has one
    example(shot)

    View full-size slide

  8. Method of meta-learning
    8
    1. Gradient-based(Model-agnostic
    meta-learning)
    2. Black-box adaptation(Neural process)
    3. Model-based(Prototypical network)
    NTTコミュニケーション科学基礎研究所 岩田具治 メタ学習入門: https://www.kecl.ntt.co.jp/as/members/iwata/ibisml2021.pdf

    View full-size slide

  9. MAML(Model-agnostic meta-learning)
    9
    ■ Meta-learning objective is to help
    the model quickly adapt to learn
    a new task
    ■ The key idea in MAML is to
    establish initial model parameters
    in the meta-training phase that
    maximize its performance on the
    new task
    Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

    View full-size slide

  10. ProtoNet
    ■ Learn a class prototype in
    metric space
    ■ Compute the average of the
    feature vector for each class
    the support set
    ■ Using the distance function,
    Calculate the distance
    between the query set and
    ■ The class predicted and
    trained by Softmax
    10
    Snell+: Prototypical networks for few-shot learning, NIPS ‘17

    View full-size slide

  11. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    11

    View full-size slide

  12. Introduction
    12
    Sang+: Introduction to the conll-2003 shared task: Language independent named entity recognition, CoNLL ‘03
    Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09
    Named Entity Recognition[Sang+ 2003],[Ratinov+ 2009]
    Input
    morpa is a fully implemented parser for a text-to-speech system

    View full-size slide

  13. Introduction
    ■ Deep neural architectures have shown great success in
    supervised NER with a amount of labeled data available
    ■ In practical applications, NER system are usually
    expected to rapidly adapt to some new entity types
    unseen during training
    ■ It is costly while not flexible to collect a number of
    additional labeled data for these types
    ■ Few-shot NER has attracted in recent years
    13

    View full-size slide

  14. Previous studies for few-shot NER
    ■ Token-level metric-learning
    ● ProtoNet[Snell+ 2017]
    compare each query token to the prototype of each entity
    class
    ● compare each query token with each token of support
    examples and assign the label according to their
    distances[Fritzler+ 2019]
    ■ Span-level metric-learning[Yu+ 2021]
    ● Recently, bypass the issue of token-wise label dependency
    while explicitly utilizing phrasal representations
    14
    Snell+: Prototypical networks for few-shot learning, NIPS ‘17
    Fritzler+: Few-shot classification in named entity recognition task, ACM/SIGAPP ‘19
    Yu+: Few-shot intent classification and slot filling with retrieved examples, NAACL ‘21

    View full-size slide

  15. Challenges in Metric Learning
    Challenge 1: domain gaps
    ■ Direct use of learned metrics without target domain adaptation
    ■ Insufficient exploration of information from support examples
    Challenge 2: span-level metric learning methods
    ■ Handling overlapping spans during decoding process requires careful
    handling
    ■ Noisy class prototype for non-entities(e.g., “O”)
    Challenge 3: domain transfer
    ■ Insufficient available information for domain transfer to different
    domains
    ■ Support examples only used for similarity calculation during
    inference in previous method
    15

    View full-size slide

  16. Challenges in Metric Learning
    Challenge 1: Limited effectiveness with large domain gaps
    ■ Direct use of learned metrics without target domain adaptation
    ■ Insufficient exploration of information from support examples
    ☑ Few-shot span detection: MAML[Finn+ 2017] to find a good model parameter initialization
    that could fast adapt to new entity classes
    ☑ Few-shot entity typing: MAML-ProtoNet to narrow the gap between source domains and
    the target domain
    Challenge 2: Limitations of span-level metric learning methods
    ■ Handling overlapping spans during decoding process requires careful
    handling
    ■ Noisy class prototype for non-entities(e.g., “O”)
    Challenge 3: Limited information for domain transfer and inference
    ■ Insufficient available information for domain transfer to different
    domains
    16

    View full-size slide

  17. Challenges in Metric Learning
    Challenge 1: Limited effectiveness with large domain gaps
    ■ Direct use of learned metrics without target domain adaptation
    ■ Insufficient exploration of information from support examples
    Challenge 2: Limitations of span-level metric learning methods
    ■ Handling overlapping spans during decoding process requires careful
    handling
    ■ Noisy class prototype for non-entities(e.g., “O”)
    ☑ Few-shot span detection: sequence labeling problem to avoid handling
    overlapping spans
    ☑ Span detection model locates named entities, class-agnostic. Feeds
    entity spans to typing model for class inference, eliminating noisy "O"
    prototype.
    Challenge 3: Limited information for domain transfer and inference
    ■ Insufficient available information for domain transfer to different
    domains
    17

    View full-size slide

  18. Challenges in Metric Learning
    Challenge 2: Limitations of span-level metric learning methods
    ■ Handling overlapping spans during decoding process requires careful
    handling
    ■ Noisy class prototype for non-entities(e.g., “O”)

    Challenge 3: Limited information for domain transfer and inference
    ■ Insufficient available information for domain transfer to different
    domains
    ■ Support examples only used for similarity calculation during inference
    in previous method
    ☑ Few-shot span detection: model could better transfer to the target
    domain
    ☑ Few-shot entity typing: MAML-ProtoNet can find a better embedding
    space than ProtoNet to represent entity spans from
    different classes
    18

    View full-size slide

  19. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    19

    View full-size slide

  20. Methology
    ■ a) Entity Span Detection
    ■ b) Entity Typing
    20

    View full-size slide

  21. Methology
    ■ a) Entity Span Detection
    ■ b) Entity Typing
    21

    View full-size slide

  22. Entity Span Detector
    ■ The span detection model aims at locating all the named
    entities
    ■ Promote the learning of domain-invariant internal
    representations rather than domain-specific features by
    MAML[Finn+ 2017]
    ■ meta-learned model is expected to be more sensitive to
    target-domain support examples
    ■ Expected only a few fine-tune steps on new examples can
    make rapid progress without overfitting
    22
    Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17

    View full-size slide

  23. Entity Span Detector
    ■ Basic detector
    23
    O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf
    Eq(3)
    ※ BERT-base-uncased

    View full-size slide

  24. Methology
    ■ a) Entity Span Detection
    ■ b) Entity Typing
    24

    View full-size slide

  25. Entity Typing
    ■ Entity typing model use
    ProtoNet for the backbone
    ■ Learn training episodes and
    calculate the probability
    span belongs to an entity
    class based on the distance
    between span
    representation and the
    prototype
    ■ MAML enhanced ProtoNet
    25
    ProtoNet

    View full-size slide

  26. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    26

    View full-size slide

  27. Experiments
    27
    ■ Evaluate performance of named entities micro F1-score
    ■ Datasets
    Few-NERD
    Cross-dataset
    ● CoNLL-2003
    ● GUM
    ● WNUT-2017
    ● Ontonotes
    ※ two domains for training, one for validation,
    the remaining for test

    View full-size slide

  28. Results
    28
    Few-NERD
    +10.60
    Cross-Dataset
    +19.71

    View full-size slide

  29. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    29

    View full-size slide

  30. Ablation Study
    30
    Validate the contributions of method components.
    1) Ours w/o MAML
    2) Ours w/o Span Detector
    3) Ours w/o Span Detector w/o MAML
    4) Ours w/o ProtoNet

    View full-size slide

  31. 1)Ours w/o MAML
    31
    ■ Train both the Span detection model and the ProtoNet in a
    conventional supervised learning.
    ■ Fine-tune with few-shot examples.

    View full-size slide

  32. 2)Ours w/o Span Detector
    32
    ■ Remove the mention detection step and integrate MAML
    with token-level prototypical networks

    View full-size slide

  33. 3)Ours w/o Span Detector w/o MAML
    33
    ■ Eliminate the meta-learning procedure from 2) Ours w/o Span Detector
    ■ Become the conventional token-level prototypical networks

    View full-size slide

  34. 4)Ours w/o ProtoNet
    34
    ■ Apply the original MAML algorithm
    ■ Train a BERT-based tagger for few-shot NER

    View full-size slide

  35. Ablation Study Result
    35
    Point1. meta-learning procedure is effective
    Exploring information contained in support examples with the
    proposed meta-learning procedure for few-shot transfer

    View full-size slide

  36. Ablation Study Result
    36
    Point2. Decomposed framework is effective(span detection and
    entity typing)
    mitigate the problem of noisy prototype for non-entities
    Ours > 2)
    1) > 3)

    View full-size slide

  37. Ablation Study Result
    37
    Point3 ProtoNet is neccessary
    the model to adapt the up-most classification layer without
    sharing knowledge with training episodes leads to
    unsatisfactory results.

    View full-size slide

  38. How does MAML promote the span detector?
    38
    ■ Sup-Span: train a span detector in the fully data
    ■ Sup-Span-f.t.: fine-tune the model learned by Sup-Span
    ■ MAML-Span-f.t: span detector with MAML
    ■ Sup-Span only predicts “Broadway” missing the “New
    Century Theatre”
    → fully supervised manner can’t detect un-seen entity
    spans

    View full-size slide

  39. How does MAML promote the span detector?
    39
    ■ Sup-Span-f.t. can successfully detect “New Century
    Theatre”
    However, still wrong detect “Broadway”
    → fine-tuning can benefit supervised model on new entity.
    But, it may bias too much to the training data
    ■ MAML-Span-f.t.(Ours) can detect successfully

    View full-size slide

  40. How does MAML promote the span detector?
    40
    ■ Proposed meta-learning procedure could better leverage
    support examples from novel episodes
    ■ Help the model adapt to new episodes more effectively
    Few-NERD 5-way 1-2shot

    View full-size slide

  41. How does MAML enhance the ProtoNet?
    41
    ■ MAML-ProtoNet achieves superior performance than the
    conventional ProtoNet
    ■ verifies the effectiveness of leveraging the support
    examples to refine the learned embedding space at test
    time
    Analysis on entity typing under Few-NERD 5-way
    1-2shot

    View full-size slide

  42. Contents
    ■ Meta-learning(Few-shot learning)
    ■ Introduction
    ■ Methodology
    Entity Span Detection
    Entity Typing
    ■ Experiments
    ■ Ablation Study
    ■ Conclusion
    42

    View full-size slide

  43. Conclusion
    43
    ■ This paper proposed decomposed meta-learning method
    for few-shot NER
    Entity span detection
    ● formulate the few-shot span detection as a
    sequence labeling problem

    employ MAML to learn a good parameter
    initialization
    Entity typing

    propose MAML-ProtoNet

    find a better embedding space than conventional
    ProtoNet to represent entity spans from different
    classes

    View full-size slide

  44. How does MAML enhance the ProtoNet?
    45

    View full-size slide

  45. Meta-learning datasets
    46
    ■ Few-NERD
    Training: 20,000 episodes
    Validation: 1,000 episodes
    Test: 5,000 episodes
    ■ Cross dataset
    Training episode: 2 datasets
    Validation episodes and test episodes: 1 datasets
    5-shot: train/valid/test=200/100/100
    1-shot: train/valid/test=400/100/200

    OntoNotes : train/valid/test=400/200/100

    View full-size slide