Slide 1

Slide 1 text

Shuntaro Yada PhD Student [Review]
 Deep Bayesian Active Learning for Natural Language Processing (Siddhant & Lipton, 2018) ౦ژେֶਤॻؗ৘ใֶݚڀࣨ

Slide 2

Slide 2 text

Basic Concepts • Bayesian Active Learning methods • for Bayesian Neural Network models • on Natural Language Processing (NLP) tasks 2 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 3

Slide 3 text

Active Learning? ‘"Active learning" means students engage with the material, participate in the class, and collaborate with each other’ —[Stanford | Teaching Commons] 3 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [image]: wbur.org - Hear & Now (Teaching method context)

Slide 4

Slide 4 text

Active Learning? • Only a few labelled data points vs. vast amount of unlabelled data • Need more labels but unable to label them all (due to the budget, etc.) • How about labelling only informative data points? 4 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)

Slide 5

Slide 5 text

Active Learning? • Like semi-supervised learning, active learning tries to increase training data (but in a different way) • Choose an informative data point (unlablled yet), then ask Oracles (e.g., human annotators) for its label • Import the labelled data point into training data 5 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)

Slide 6

Slide 6 text

Active Learning? • Basically, active learning research is about how to choose informative data points from unlabelled data • E.g., if an ML model can produce the probability of output predictions, choosing predictions of lowest predicted probability (uncertainty) is one way (i.e., Least Confidence method) • Such methods are called Acquisition Functions 6 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 7

Slide 7 text

Bayesian Neural Networks? • Neural networks (NNs) usually learn the weights as point estimates • Bayesian NNs learn the distribution of the weights 7 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: Blundell et al. (2015)

Slide 8

Slide 8 text

Bayesian Neural Networks? • The benefit/motivation of Bayesian NNs is to model uncertainty in the prediction (Blundell et al., 2015) 1. regularisation via a compression cost on the weights 2. richer representations and predictions from cheap model averaging, and 3. exploration in simple reinforcement learning problems such as contextual bandits. • Another way to avoid over-fitting (e.g., early stopping, weight decay, dropout, etc.) 8 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 9

Slide 9 text

Siddhant & Lipton, 2018 • Classical active learning methods use aleatoric but not epistemic uncertainty (Kendall and Gal, 2017) ‣ Least confidence is just a heuristic ‣ Bayesian acquisition functions can make use of real uncertainty • Now, we also have Bayesian framework for deep neural networks • Bayesian active learning + Bayesian neural networks 9 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 10

Slide 10 text

Research Question ‘can active learning be applied on a new dataset with an arbitrarily architecture, without peeking at the labels to perform hyperparameter tuning?’ 10 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 11

Slide 11 text

Building Blocks • Bayesian Active Learning by Disagreement (BALD) — Houlsby et al. (2011) • Bayesian Deep Learning ‣ Monte Carlo Dropout (MC Dropout) — Gal and Ghahramani (2016) ‣ Bayes by Backprop — Blundell et al. (2015) • BALD + MC Dropout vs. BALD + Bayes by Backprop 11 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 12

Slide 12 text

NLP Task Setup 12 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources Sentiment Classification Named Entity Recognition Semantic Role Labelling Data TREC CoNLL2003 CoNLL2003 MaReview OntoNotes CoNLL2012 Model SVM CRF BiLSTM-CRF CNN CNN-CNN-LSTM BiLSTM CNN-BiLSTM-CRF

Slide 13

Slide 13 text

Result See Figure 1–3 • Acquisition functions > random sampling • Deep active learning > non-neural active learning • Bayesian acquisition functions > classic acquisition functions • Deep Bayesian active learning > classic active learning 13 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources

Slide 14

Slide 14 text

Learning Resources • Codes: asiddhant/Active-NLP: Bayesian Deep Active Learning for Natural Language Processing Tasks • Textbook: Probabilistic Programming & Bayesian Methods for Hackers ‣ Bayesian statistics in general ‣ Google Colab ver. is available with TensorFlow 2.0- alpha 14 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources