Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

A paper review of 'deep Bayesian active learning for natural language processing' (Siddhant & Lipton, 2018)

Shuntaro Yada

April 26, 2019
Tweet

More Decks by Shuntaro Yada

Other Decks in Research

Transcript

  1. Shuntaro Yada
    PhD Student
    [Review]

    Deep Bayesian Active Learning
    for Natural Language Processing
    (Siddhant & Lipton, 2018)
    ౦ژେֶਤॻؗ৘ใֶݚڀࣨ

    View full-size slide

  2. Basic Concepts
    • Bayesian Active Learning methods
    • for Bayesian Neural Network models
    • on Natural Language Processing (NLP) tasks
    2
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  3. Active Learning?
    ‘"Active learning" means students engage with the
    material, participate in the class, and collaborate with
    each other’ —[Stanford | Teaching Commons]
    3
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources
    [image]: wbur.org - Hear & Now
    (Teaching method context)

    View full-size slide

  4. Active Learning?
    • Only a few labelled data points vs. vast amount of
    unlabelled data
    • Need more labels but unable to label them all (due
    to the budget, etc.)
    • How about labelling only informative data points?
    4
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources
    [Image]: DataCamp
    (Machine learning context)

    View full-size slide

  5. Active Learning?
    • Like semi-supervised learning, active learning tries
    to increase training data (but in a different way)
    • Choose an informative data point (unlablled yet),
    then ask Oracles (e.g., human annotators) for its
    label
    • Import the labelled data point into training data
    5
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources
    [Image]: DataCamp
    (Machine learning context)

    View full-size slide

  6. Active Learning?
    • Basically, active learning research is about how to
    choose informative data points from unlabelled data
    • E.g., if an ML model can produce the probability of
    output predictions, choosing predictions of lowest
    predicted probability (uncertainty) is one way (i.e.,
    Least Confidence method)
    • Such methods are called Acquisition Functions
    6
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  7. Bayesian Neural Networks?
    • Neural networks (NNs) usually learn the weights as
    point estimates
    • Bayesian NNs learn the distribution of the weights
    7
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources
    [Image]: Blundell et al. (2015)

    View full-size slide

  8. Bayesian Neural Networks?
    • The benefit/motivation of Bayesian NNs is to model
    uncertainty in the prediction (Blundell et al., 2015)
    1. regularisation via a compression cost on the weights
    2. richer representations and predictions from cheap
    model averaging, and
    3. exploration in simple reinforcement learning
    problems such as contextual bandits.
    • Another way to avoid over-fitting (e.g., early stopping,
    weight decay, dropout, etc.)
    8
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  9. Siddhant & Lipton, 2018
    • Classical active learning methods use aleatoric but
    not epistemic uncertainty (Kendall and Gal, 2017)
    ‣ Least confidence is just a heuristic
    ‣ Bayesian acquisition functions can make use of real
    uncertainty
    • Now, we also have Bayesian framework for deep
    neural networks
    • Bayesian active learning + Bayesian neural networks
    9
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  10. Research Question
    ‘can active learning be applied on a new
    dataset with an arbitrarily architecture,
    without peeking at the labels to perform
    hyperparameter tuning?’
    10
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  11. Building Blocks
    • Bayesian Active Learning by Disagreement (BALD)
    — Houlsby et al. (2011)
    • Bayesian Deep Learning
    ‣ Monte Carlo Dropout (MC Dropout) — Gal and
    Ghahramani (2016)
    ‣ Bayes by Backprop — Blundell et al. (2015)
    • BALD + MC Dropout vs. BALD + Bayes by Backprop
    11
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  12. NLP Task Setup
    12
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources
    Sentiment
    Classification
    Named Entity
    Recognition
    Semantic
    Role Labelling
    Data
    TREC CoNLL2003 CoNLL2003
    MaReview OntoNotes CoNLL2012
    Model
    SVM CRF
    BiLSTM-CRF
    CNN CNN-CNN-LSTM
    BiLSTM CNN-BiLSTM-CRF

    View full-size slide

  13. Result
    See Figure 1–3
    • Acquisition functions > random sampling
    • Deep active learning > non-neural active learning
    • Bayesian acquisition functions > classic acquisition
    functions
    • Deep Bayesian active learning > classic active
    learning
    13
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide

  14. Learning Resources
    • Codes: asiddhant/Active-NLP: Bayesian Deep Active
    Learning for Natural Language Processing Tasks
    • Textbook: Probabilistic Programming & Bayesian
    Methods for Hackers
    ‣ Bayesian statistics in general
    ‣ Google Colab ver. is available with TensorFlow 2.0-
    alpha
    14
    • Basic Concepts
    • Active Learning?
    • Bayesian Neural
    Networks?
    • Siddhant & Lipton, 2018
    • Learning Resources

    View full-size slide