Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Дмитрий Ветров, Open problems in deep learning: A Bayesian solution

OpenTalks.AI - Дмитрий Ветров, Open problems in deep learning: A Bayesian solution

OpenTalks.AI

March 01, 2018
Tweet

More Decks by OpenTalks.AI

Other Decks in Science

Transcript

  1. Open problems in deep learning: A Bayesian solution Dmitry P.

    Vetrov Research professor at HSE, head of joint Samsung-HSE lab Head of Bayesian methods research group http://bayesgroup.ru
  2. Deep Learning • Revolution in machine learning • Deep neural

    networks approach to human intelligence on a number of problems • May solve quite non-standard problems such as image2caption and artistic style transfer
  3. Open problems in Deep learning • Overfitting Neural networks are

    prone to catastrophic overfitting on noisy data • Interpretability Nobody knows HOW neural network makes decisions – crucial for healthcare and finances. Legislative restrictions are expected • Uncertainty estimation Current neural networks are very over-confident even when they make mistakes. In many applications (e.g. self-driving cars) it is important to estimate the uncertainty of prediction • Adversarial examples Neural networks can be easily fooled by barely visible perturbations of data
  4. Bayesian framework • Treats everything as a random variables •

    Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem
  5. Bayesian framework • Treats everything as a random variables •

    Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem
  6. Bayesian framework • Treats everything as a random variables •

    Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem
  7. Bayesian framework • Treats everything as a random variables •

    Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem
  8. Frequentist vs. Bayesian frameworks • It can be shown that

    • In other words frequentist framework is a limit case of Bayesian one! • The number of tunable parameters in Modern ML models is comparable with the sizes of training data d n • We have no choice but to be Bayesian!
  9. Advantages of Bayesian framework • Regularization Prevents overfitting on the

    training data because prior does not allow to tune parameters too much
  10. Advantages of Bayesian framework • Regularization Prevents overfitting on the

    training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model
  11. Advantages of Bayesian framework • Regularization Prevents overfitting on the

    training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network
  12. Advantages of Bayesian framework • Regularization Prevents overfitting on the

    training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor
  13. Advantages of Bayesian framework • Regularization Prevents overfitting on the

    training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor • Scalability Stochastic variational inference allows to approximate posteriors using deep neural networks
  14. Avoiding narrow extrema • [Stochastic variational] Bayesian inference corresponds to

    the injection of noise in gradients • The larger is noise the less is spatial resolution • Bayesian DNN simply DOES NOT SEE narrow local minima
  15. Avoiding catastrophic overfitting • Bayesian model selection procedures effectively apply

    well-known Occam’s razor • They search for the simplest model capable to explain training data • If there are no dependencies between inputs and outputs Bayesian DNN will never be able to learn them since there always exists a simpler NULL-model
  16. Ensembles of ML algorithms • If we have several ML

    algorithms their average is generally better than the application of single best one • The problem is we need to train and keep them all in memory • Such technique is not scalable! • Bayesian ensembles are very compact (yet consist of continuum number of elements) – you only need to sample from posterior accuracy Single algorithms Single best Ensemble
  17. Robustness to adversarial attacks • Adversarial examples is another problem

    in DNN • Single DNNs are very sensitive to adversarial attacks • Ensembles of continuum of DNNs almost cannot be fooled
  18. Setting desirable properties By selecting the proper prior we may

    encourage the desired properties in Bayesian DNN: • Sparsity (compression) • Group sparsity (acceleration) • Rich ensembles (improves final accuracy, better uncertainty estimation) • Reliability (robustness to adversarial attacks) • Interpretability (hard attention maps) Techniques to become Bayesian soon • GANs • Normalization algorithms (batchnorm, weightnorm, etc.)
  19. Conclusions • Bayesian framework is extremely powerful and extends ML

    tools • We do have scalable algorithms for approximate Bayesian inference • Bayes + Deep Learning = • Even the first attempts of NeuroBayesian inference give impressive results • Summer school on NeuroBayesian methods, August, 2018, Moscow, http://deepbayes.ru