OpenTalks.AI

Open problems in deep learning: A Bayesian solution Dmitry P.
Vetrov Research professor at HSE, head of joint Samsung-HSE lab Head of Bayesian methods research group http://bayesgroup.ru

Idea of the talk

Deep Learning • Revolution in machine learning • Deep neural
networks approach to human intelligence on a number of problems • May solve quite non-standard problems such as image2caption and artistic style transfer

Open problems in Deep learning • Overfitting Neural networks are
prone to catastrophic overfitting on noisy data • Interpretability Nobody knows HOW neural network makes decisions – crucial for healthcare and finances. Legislative restrictions are expected • Uncertainty estimation Current neural networks are very over-confident even when they make mistakes. In many applications (e.g. self-driving cars) it is important to estimate the uncertainty of prediction • Adversarial examples Neural networks can be easily fooled by barely visible perturbations of data

Bayesian framework • Treats everything as a random variables •
Allows to encode our ignorance in terms of distributions • Makes use of Bayes theorem

Frequentist vs. Bayesian frameworks

Frequentist vs. Bayesian frameworks • It can be shown that
• In other words frequentist framework is a limit case of Bayesian one! • The number of tunable parameters in Modern ML models is comparable with the sizes of training data d n • We have no choice but to be Bayesian!

Bayesian Neural networks

Advantages of Bayesian framework • Regularization Prevents overfitting on the
training data because prior does not allow to tune parameters too much

training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model

training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network

training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor

training data because prior does not allow to tune parameters too much • Extensibility Bayesian inference results to posterior which can be now used as prior in next model • Ensembling Posterior distribution over the weights defines the ensemble of neural networks rather than single network • Model selection Automatically selects the simplest possible model that explains observed data thus performing Occam’s razor • Scalability Stochastic variational inference allows to approximate posteriors using deep neural networks

Dropout Binary dropout Gaussian dropout

Visualization LeNet-5: convolutional layer LeNet-5: fully-connected layer (100 x 100
patch)

Avoiding narrow extrema • [Stochastic variational] Bayesian inference corresponds to
the injection of noise in gradients • The larger is noise the less is spatial resolution • Bayesian DNN simply DOES NOT SEE narrow local minima

Avoiding catastrophic overfitting • Bayesian model selection procedures effectively apply
well-known Occam’s razor • They search for the simplest model capable to explain training data • If there are no dependencies between inputs and outputs Bayesian DNN will never be able to learn them since there always exists a simpler NULL-model

Ensembles of ML algorithms • If we have several ML
algorithms their average is generally better than the application of single best one • The problem is we need to train and keep them all in memory • Such technique is not scalable! • Bayesian ensembles are very compact (yet consist of continuum number of elements) – you only need to sample from posterior accuracy Single algorithms Single best Ensemble

Real data example

Robustness to adversarial attacks • Adversarial examples is another problem
in DNN • Single DNNs are very sensitive to adversarial attacks • Ensembles of continuum of DNNs almost cannot be fooled

Setting desirable properties By selecting the proper prior we may
encourage the desired properties in Bayesian DNN: • Sparsity (compression) • Group sparsity (acceleration) • Rich ensembles (improves final accuracy, better uncertainty estimation) • Reliability (robustness to adversarial attacks) • Interpretability (hard attention maps) Techniques to become Bayesian soon • GANs • Normalization algorithms (batchnorm, weightnorm, etc.)

Conclusions • Bayesian framework is extremely powerful and extends ML
tools • We do have scalable algorithms for approximate Bayesian inference • Bayes + Deep Learning = • Even the first attempts of NeuroBayesian inference give impressive results • Summer school on NeuroBayesian methods, August, 2018, Moscow, http://deepbayes.ru

OpenTalks.AI - Дмитрий Ветров, Open problems in...

OpenTalks.AI - Дмитрий Ветров, Open problems in deep learning: A Bayesian solution

More Decks by OpenTalks.AI

Other Decks in Science

Featured

Transcript

Open problems in deep learning: A Bayesian solution Dmitry P.

Idea of the talk

Deep Learning • Revolution in machine learning • Deep neural

Open problems in Deep learning • Overfitting Neural networks are

Bayesian framework • Treats everything as a random variables •

Bayesian framework • Treats everything as a random variables •

Bayesian framework • Treats everything as a random variables •

Bayesian framework • Treats everything as a random variables •

Frequentist vs. Bayesian frameworks

Frequentist vs. Bayesian frameworks • It can be shown that

Bayesian Neural networks

Advantages of Bayesian framework • Regularization Prevents overfitting on the

Advantages of Bayesian framework • Regularization Prevents overfitting on the

Advantages of Bayesian framework • Regularization Prevents overfitting on the

Advantages of Bayesian framework • Regularization Prevents overfitting on the

Advantages of Bayesian framework • Regularization Prevents overfitting on the

Dropout Binary dropout Gaussian dropout

Dropout Binary dropout Gaussian dropout

Visualization LeNet-5: convolutional layer LeNet-5: fully-connected layer (100 x 100

Avoiding narrow extrema • [Stochastic variational] Bayesian inference corresponds to

Avoiding catastrophic overfitting • Bayesian model selection procedures effectively apply

Ensembles of ML algorithms • If we have several ML

Real data example

Robustness to adversarial attacks • Adversarial examples is another problem

Setting desirable properties By selecting the proper prior we may

Conclusions • Bayesian framework is extremely powerful and extends ML