Revisiting Few-Shot learning for Natural Language Understanding

Revisiting Few-Shot learning for Natural Language Understanding Yanan Zheng Tsinghua
Univeristy

Roadmap • Few-Shot Learning for Natural Language Understanding • Evaluation
Protocols • Robustness • Performance • Possible Future Directions • Potenial follow-up? • No Pretraining?

True Few-Shot Learning • Goal: to quickly learn a new
task with very few labeled samples. • Differences between True FSL and FSL: • no access to multiple dataset episodes, large validation set or visible test data. train #4 and dev #4 train #3 and dev #3 train #2 and dev #2 train #1 and dev #1 A few Labeled data

Recap of Few-Shot Methods • We mainly focus on methods
based on pretrained models. • Directly inference with frozen model • Finetuning a pretrained model • Standard Finetuning • Prompt-based Finetuning Figure taken from: Timo Schick and Hinrich Schütze. 2021b. It’s not just size that matters: Small language models are also few-shot learners. pages 2339–2352.

FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Motivation
• Using different evaluation protocols, the relative performance between different methods have been subverted. • Prior works have been evaluated under a diverse set of protocols. e.g., using pre-fixed hyper-parameters [1,4] à the risk of overestimation [2] e.g., using small dev set to select hyperparameters [3] à details such as how to split the small dev set and which data splits to use, makes huge differences. [1] T. Schick and H. Schütze. It’s not just size that matters: Small language models are also few-shot learners. pages 2339–2352, 2021 [2] T. Zhang, F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. Revisiting few-sample BERT fine-tuning.CoRR, abs/2006.05987, 2020 [3] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang. GPT understands, too.CoRR, abs/2103.10385, 2021. [4] R. R. Menon, M. Bansal, S. Srivastava, and C. Raffel. Improving and simplifying patternexploiting training.CoRR, abs/2103.11955, 2021

Evaluation Framework Desiderata: o Test Performance of selected hyperparameter
o Correlation between small development and test sets (over a distribution of hyperparameters). o Stability w.r.t. Number of Runs (the hyper- hyperparameter K)

Finding 1. The absolute performance and the relative gap of
few-shot methods were in general not accurately estimated in prior literature. It highlights the importance of evaluation for obtaining reliable conclusions. Finding2. Moreover, the benefits of some few-shot methods (e.g., ADAPET) decrease on larger pretrained models like DeBERTa. Semi-supervised few-shot methods (i.e., iPET and Noisy) generally improve 1–2 points on average compared to minimal few-shot methods on both models. Re-Evaluation of State-of-the-art Few-Shot Methods

Finding 3. Gains of different methods are largely complementary. A
combination of methods largely outperforms individual ones, performing close to a strong fully-supervised baseline with RoBERTa. Finding 4. No single few-shot method dominates most NLU tasks. This highlights the need for the development of few-shot methods with more consistent and robust performance across tasks. Re-Evaluation of State-of-the-art Few-Shot Methods

FewNLU Toolkit and Leaderboard • Leaderboard: https://fewnlu.github.io • FewNLU Toolkit:
https://github.com/THUDM/FewNLU

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning •
Observation 1: The few-shot performance with label-flipped samples are generally better than those with label-preserved augmented samples.

Observation 2: Both replacing and correcting noisy samples largely improve performance to prevent the failure mode. Moreover, correcting the labels brings large gains, indicating label flipping tends to alleviate the issue.

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning

On eight tasks and two base models of different scales, FlipDA achieves a good trade-off between effectiveness and robustness—it substantially improves many tasks while not negatively affecting the others.

Adding “not”. • Changing ‘dwindles’ to its antonym “increased”.

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning RTE
Task (entail/not-entail): (hard-to-easy direction): from not-entail to ential (easy-to-hard direction): from entail to not-entail BoolQ Task (yes/on): (hard-to-easy direction): from No to Yes (easy-to-hard direction): from Yes to No Augmenting along the hard-to-easy direction would benefit the few- shot performance more.

Ptuning: Few-Shot Learning with Continuous Prompts • Motivation: Discrete patterns
suffer from instability. • For example, even changing a single word would cause drastic change in performance by almost 20 points.

The intuition is that continuous prompts incorporate a certain degree
of learnability into the input, which may learn to offset the effects of minor changes in discrete prompts to improve training stability. Ptuning: Few-Shot Learning with Continuous Prompts

A Few More Aspects to GO • Possible Follow-up: •
How to appropriately decide a hyper-parameter search space • To further increase diveristy of augmented data of FlipDA • Parameter-efficient learning with Ptuning • Is it possible to get rid of pretraining? (NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework)

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
Framework • Motivation: Cramming-for-the-exams

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
Framework TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP.

Thank you for Your Listening.

Revisiting Few-Shot learning for Natural Langua...

Revisiting Few-Shot learning for Natural Language Understanding

wing.nus

More Decks by wing.nus

Featured

Transcript

Revisiting Few-Shot learning for Natural Language Understanding Yanan Zheng Tsinghua

Roadmap • Few-Shot Learning for Natural Language Understanding • Evaluation

True Few-Shot Learning • Goal: to quickly learn a new

Recap of Few-Shot Methods • We mainly focus on methods

FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Motivation

Evaluation Framework Desiderata: o Test Performance of selected hyper- parameter

Finding 1. The absolute performance and the relative gap of

Finding 3. Gains of different methods are largely complementary. A

FewNLU Toolkit and Leaderboard • Leaderboard: https://fewnlu.github.io • FewNLU Toolkit:

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning •

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning •

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning •

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning •

FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning RTE

Ptuning: Few-Shot Learning with Continuous Prompts • Motivation: Discrete patterns

The intuition is that continuous prompts incorporate a certain degree

A Few More Aspects to GO • Possible Follow-up: •

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient

Thank you for Your Listening.