Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bayesian Dropout and Beyond

Lukasz
June 29, 2017

Bayesian Dropout and Beyond

Neural network regularization & dropout in Bayesian Inference context.

Lukasz

June 29, 2017
Tweet

More Decks by Lukasz

Other Decks in Science

Transcript

  1. Bayesian Dropout and Beyond Lukasz Krawczyk, 29th June 2017 Homo

    Apriorius Homo Pragmaticus Homo Friquentistus Homo Sapiens Homo Bayesianis
  2. 2017/06/29 2 / 27 Agenda • About me • Bayesian

    Neural Networks • Regularization • Dropout
  3. 2017/06/29 3 / 27 About me • Data Scientist at

    Asurion Japan Holdings • Data Scientist & Full Stack Developer at Abeja Inc. • MSc degree from Jagiellonian University, Poland • Open Source & Bayesian Inference advocate
  4. 2017/06/29 5 / 27 Bayesian Inference Bayesian Inference • General

    purpose framework • Generative models • Clarity of FS + Power of ML – White-box modelling – Black-box fitting (MCMC,VI) – Uncertainity → Intuitive insights • Learning from very small datasets • Probabilistic Programming
  5. 2017/06/29 6 / 27 Bayesian Inference Inference MCMC or VI

    Posterior Data Credibile Region Uncertainity Better Insights Prior μ σ Model Assumptions about data controlled by the prior
  6. 2017/06/29 9 / 27 Example – standard NN x 1

    x 2 y 0.1 1.0 0 0.1 -1.3 1 … sigmoid tanh Data Backpropagation
  7. 2017/06/29 10 / 27 Example – NN using Bayesian Inference

    π n=2 Bayesian Approximation Data x 1 x 2 y 0.1 1.0 [0,1,...] 0.1 -1.3 [1,1,...] … MCMC • NUTS • HMC • Metropolis • Gibbs VI • ADVI • OPVI
  8. 2017/06/29 14 / 27 Bayesian Regularization OR b μ laplace

    L1 regularization L2 regularization μ σ
  9. 2017/06/29 15 / 27 Code (pymc3 + Lasagne) with pm.Model()

    as model: # weights with L2 normalization w_in_1 = Normal('w_in_1', 0, sd=1, shape=(n_in, n_hidden)) w_1_2 = Normal('w_1_2', 0, sd=1, shape=(n_hidden, n_hidden)) w_2_out = Normal('w_2_out', 0, sd=1, shape=(n_hidden, n_out)) # layers l_in = InputLayer(in_shape, input_var=X_shared) l_1 = DenseLayer(l_in, n_hidden, W=w_in_1, nonlinearity=tanh) l_2 = DenseLayer(l_1, n_hidden, W=w_1_2, nonlinearity=tanh) l_out = DenseLayer(l_2, n_out, W=w_2_out, nonlinearity=softmax) p = Deterministic('p', lasagne.layers.get_output(l_out)) out = Categorical('out', p=p, observed=y_shared) x y y x μ σ
  10. 2017/06/29 16 / 27 Bayesian Regularization G ~ μ σ

    L2 regularization with automated hyperparameter optimization
  11. 2017/06/29 17 / 27 Code (pymc3 + Lasagne) with pm.Model()

    as model: # regularization hyperparameters r_in_1 = HalfNormal('r_in_1', sd=1) r_1_2 = HalfNormal('r_1_2', sd=1) r_2_out = HalfNormal('r_2_out', sd=1) # weights with L2 normalization w_in_1 = Normal('w_in_1', 0, sd=r_in_1, shape=(n_in, n_hidden)) w_1_2 = Normal('w_1_2', 0, sd=r_1_2, shape=(n_hidden, n_hidden)) w_2_out = Normal('w_2_out', 0, sd=r_2_out, shape=(n_hidden, n_out)) # layers l_in = InputLayer(in_shape, input_var=X_shared) l_1 = DenseLayer(l_in, n_hidden, W=w_in_1, nonlinearity=tanh) l_2 = DenseLayer(l_1, n_hidden, W=w_1_2, nonlinearity=tanh) l_out = DenseLayer(l_2, n_out, W=w_2_out, nonlinearity=softmax) p = Deterministic('p', lasagne.layers.get_output(l_out)) out = Categorical('out', p=p, observed=y_shared) x y y x μ σ G ~
  12. 2017/06/29 18 / 27 Results • Learning regularization directly from

    data • Transfering knowledge to other models
  13. 2017/06/29 20 / 27 Dropout • Standard dropout is already

    a form of Bayesian Approximation • Experiments shows it has good influence on learning process • The output is predicted by the weighted average of submodel predictions p=0.5
  14. 2017/06/29 21 / 27 Bayesian Dropout layer n = σ(W

    * Z * layer n-1 + b) W ~ Normal(mu, std) p Z ~ Bernoulli(p)
  15. 2017/06/29 22 / 27 Code (pymc3 + Lasagne) with pm.Model()

    as model: # weights with L2 normalization w_in_1 = Normal('w_in_1', 0, sd=1, shape=(n_in, n_hidden)) w_1_2 = Normal('w_1_2', 0, sd=1, shape=(n_hidden, n_hidden)) w_2_out = Normal('w_2_out', 0, sd=1, shape=(n_hidden, n_out)) # dropout d_in_1 = Bernoulli('d_in_1', p=0.5, shape=n_in) d_1_2 = Bernoulli('d_1_2', p=0.5, shape=n_hidden) d_2_out = Bernoulli('d_2_out', p=0.5, shape=n_hidden) # layers l_in = InputLayer(in_shape, input_var=X_shared) l_1 = DenseLayer(l_in, n_hidden, W=T.dot(T.nlinalg.diag(d_in_1), w_in_1), nonlinearity=tanh) l_2 = DenseLayer(l_1, n_hidden, W=T.dot(T.nlinalg.diag(d_1_2), w_1_2), nonlinearity=tanh) l_out = DenseLayer(l_2, n_out, W=T.dot(T.nlinalg.diag(d_2_out), w_2_out), nonlinearity=softmax) p = Deterministic('p', lasagne.layers.get_output(l_out)) out = Categorical('out', p=p, observed=y_shared) x y y x p μ σ
  16. 2017/06/29 24 / 27 Results • https:/ /github.com/uhho/bnn-experiments • We

    can use this information for building standard NN – Feature selection – Architecture decisions
  17. 2017/06/29 25 / 27 Summary Scientific perspective • NN models

    with small datasets • Complex hierarchical neural networks (Bayesian CNN/RNN) • Reduced overfitting • Faster training Business perspective • Clear and intuitive models • Uncertainity in Finance & Insurance is extremely important • Better trust and adoption of Neural Network-based models
  18. 2017/06/29 26 / 27 Links & Sources • Code –

    https:/ /github.com/uhho/bnn-experiments • Papers – „Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning“ Y.Gal, Z. Ghahramani (Cambridge University) http:/ /mlg.eng.cam.ac.uk/yarin/PDFs/NIPS_2015_deep_learning_uncertainty.pdf – "Bayesian Dropout" T. Herlau, M. Mørup, M. N. Schmidt (Technical University of Denmark) https:/ /www.researchgate.net/publication/280970177_Bayesian_Dropout – "A Bayesian encourages dropout" S. Maeda (Kyoto University) https:/ /arxiv.org/pdf/1412.7003.pdf