Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

Shuntaro Yada

April 26, 2019
Tweet

More Decks by Shuntaro Yada

Other Decks in Research

Transcript

  1. Shuntaro Yada PhD Student [Review]
 Deep Bayesian Active Learning for

    Natural Language Processing (Siddhant & Lipton, 2018) ౦ژେֶਤॻؗ৘ใֶݚڀࣨ
  2. Basic Concepts • Bayesian Active Learning methods • for Bayesian

    Neural Network models • on Natural Language Processing (NLP) tasks 2 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  3. Active Learning? ‘"Active learning" means students engage with the material,

    participate in the class, and collaborate with each other’ —[Stanford | Teaching Commons] 3 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [image]: wbur.org - Hear & Now (Teaching method context)
  4. Active Learning? • Only a few labelled data points vs.

    vast amount of unlabelled data • Need more labels but unable to label them all (due to the budget, etc.) • How about labelling only informative data points? 4 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)
  5. Active Learning? • Like semi-supervised learning, active learning tries to

    increase training data (but in a different way) • Choose an informative data point (unlablled yet), then ask Oracles (e.g., human annotators) for its label • Import the labelled data point into training data 5 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: DataCamp (Machine learning context)
  6. Active Learning? • Basically, active learning research is about how

    to choose informative data points from unlabelled data • E.g., if an ML model can produce the probability of output predictions, choosing predictions of lowest predicted probability (uncertainty) is one way (i.e., Least Confidence method) • Such methods are called Acquisition Functions 6 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  7. Bayesian Neural Networks? • Neural networks (NNs) usually learn the

    weights as point estimates • Bayesian NNs learn the distribution of the weights 7 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources [Image]: Blundell et al. (2015)
  8. Bayesian Neural Networks? • The benefit/motivation of Bayesian NNs is

    to model uncertainty in the prediction (Blundell et al., 2015) 1. regularisation via a compression cost on the weights 2. richer representations and predictions from cheap model averaging, and 3. exploration in simple reinforcement learning problems such as contextual bandits. • Another way to avoid over-fitting (e.g., early stopping, weight decay, dropout, etc.) 8 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  9. Siddhant & Lipton, 2018 • Classical active learning methods use

    aleatoric but not epistemic uncertainty (Kendall and Gal, 2017) ‣ Least confidence is just a heuristic ‣ Bayesian acquisition functions can make use of real uncertainty • Now, we also have Bayesian framework for deep neural networks • Bayesian active learning + Bayesian neural networks 9 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  10. Research Question ‘can active learning be applied on a new

    dataset with an arbitrarily architecture, without peeking at the labels to perform hyperparameter tuning?’ 10 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  11. Building Blocks • Bayesian Active Learning by Disagreement (BALD) —

    Houlsby et al. (2011) • Bayesian Deep Learning ‣ Monte Carlo Dropout (MC Dropout) — Gal and Ghahramani (2016) ‣ Bayes by Backprop — Blundell et al. (2015) • BALD + MC Dropout vs. BALD + Bayes by Backprop 11 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  12. NLP Task Setup 12 • Basic Concepts • Active Learning?

    • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources Sentiment Classification Named Entity Recognition Semantic Role Labelling Data TREC CoNLL2003 CoNLL2003 MaReview OntoNotes CoNLL2012 Model SVM CRF BiLSTM-CRF CNN CNN-CNN-LSTM BiLSTM CNN-BiLSTM-CRF
  13. Result See Figure 1–3 • Acquisition functions > random sampling

    • Deep active learning > non-neural active learning • Bayesian acquisition functions > classic acquisition functions • Deep Bayesian active learning > classic active learning 13 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources
  14. Learning Resources • Codes: asiddhant/Active-NLP: Bayesian Deep Active Learning for

    Natural Language Processing Tasks • Textbook: Probabilistic Programming & Bayesian Methods for Hackers ‣ Bayesian statistics in general ‣ Google Colab ver. is available with TensorFlow 2.0- alpha 14 • Basic Concepts • Active Learning? • Bayesian Neural Networks? • Siddhant & Lipton, 2018 • Learning Resources