A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

Shuntaro Yada PhD Student [Review]  Deep Bayesian Active Learning for
Natural Language Processing (Siddhant & Lipton, 2018) ౦ژେֶਤॻؗ৘ใֶݚڀࣨ

Basic Concepts • Bayesian Active Learning methods • for Bayesian
Neural Network models • on Natural Language Processing (NLP) tasks 2 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Active Learning? ‘"Active learning" means students engage with the material,
participate in the class, and collaborate with each other’ —[Stanford | Teaching Commons] 3 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [image]: wbur.org - Hear & Now (Teaching method context)

Active Learning? • Only a few labelled data points vs.
vast amount of unlabelled data • Need more labels but unable to label them all (due to the budget, etc.) • How about labelling only informative data points? 4 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)

Active Learning? • Like semi-supervised learning, active learning tries to
increase training data (but in a different way) • Choose an informative data point (unlablled yet), then ask Oracles (e.g., human annotators) for its label • Import the labelled data point into training data 5 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)

Active Learning? • Basically, active learning research is about how
to choose informative data points from unlabelled data • E.g., if an ML model can produce the probability of output predictions, choosing predictions of lowest predicted probability (uncertainty) is one way (i.e., Least Confidence method) • Such methods are called Acquisition Functions 6 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Bayesian Neural Networks? • Neural networks (NNs) usually learn the
weights as point estimates • Bayesian NNs learn the distribution of the weights 7 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: Blundell et al. (2015)

Bayesian Neural Networks? • The benefit/motivation of Bayesian NNs is
to model uncertainty in the prediction (Blundell et al., 2015) 1. regularisation via a compression cost on the weights 2. richer representations and predictions from cheap model averaging, and 3. exploration in simple reinforcement learning problems such as contextual bandits. • Another way to avoid over-fitting (e.g., early stopping, weight decay, dropout, etc.) 8 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Siddhant & Lipton, 2018 • Classical active learning methods use
aleatoric but not epistemic uncertainty (Kendall and Gal, 2017) ‣ Least confidence is just a heuristic ‣ Bayesian acquisition functions can make use of real uncertainty • Now, we also have Bayesian framework for deep neural networks • Bayesian active learning + Bayesian neural networks 9 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Research Question ‘can active learning be applied on a new
dataset with an arbitrarily architecture, without peeking at the labels to perform hyperparameter tuning?’ 10 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Building Blocks • Bayesian Active Learning by Disagreement (BALD) —
Houlsby et al. (2011) • Bayesian Deep Learning ‣ Monte Carlo Dropout (MC Dropout) — Gal and Ghahramani (2016) ‣ Bayes by Backprop — Blundell et al. (2015) • BALD + MC Dropout vs. BALD + Bayes by Backprop 11 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

NLP Task Setup 12 • Basic Concepts • Active Learning?
• Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources Sentiment Classification Named Entity Recognition Semantic Role Labelling Data TREC CoNLL2003 CoNLL2003 MaReview OntoNotes CoNLL2012 Model SVM CRF BiLSTM-CRF CNN CNN-CNN-LSTM BiLSTM CNN-BiLSTM-CRF

Result See Figure 1–3 • Acquisition functions > random sampling
• Deep active learning > non-neural active learning • Bayesian acquisition functions > classic acquisition functions • Deep Bayesian active learning > classic active learning 13 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Learning Resources • Codes: asiddhant/Active-NLP: Bayesian Deep Active Learning for
Natural Language Processing Tasks • Textbook: Probabilistic Programming & Bayesian Methods for Hackers ‣ Bayesian statistics in general ‣ Google Colab ver. is available with TensorFlow 2.0- alpha 14 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

Shuntaro Yada

More Decks by Shuntaro Yada

Other Decks in Research

Featured

Transcript

Shuntaro Yada PhD Student [Review]  Deep Bayesian Active Learning for

Basic Concepts • Bayesian Active Learning methods • for Bayesian

Active Learning? ‘"Active learning" means students engage with the material,

Active Learning? • Only a few labelled data points vs.

Active Learning? • Like semi-supervised learning, active learning tries to

Active Learning? • Basically, active learning research is about how

Bayesian Neural Networks? • Neural networks (NNs) usually learn the

Bayesian Neural Networks? • The benefit/motivation of Bayesian NNs is

Siddhant & Lipton, 2018 • Classical active learning methods use

Research Question ‘can active learning be applied on a new

Building Blocks • Bayesian Active Learning by Disagreement (BALD) —

NLP Task Setup 12 • Basic Concepts • Active Learning?

Result See Figure 1–3 • Acquisition functions > random sampling

Learning Resources • Codes: asiddhant/Active-NLP: Bayesian Deep Active Learning for