Bayesian Inference & Neural Networks
Lukasz Krawczyk, 1st March 2017
Homo
Aprioriu
s
Homo
Pragmatic
us
Homo
Friquentist
us
Homo
Sapiens
Homo
Bayesia
nis
Slide 2
Slide 2 text
2017/07/03 2 / 26
Agenda
●
About me
●
The Problem
●
Bayesian Inference
●
Hierarchical Models
●
Bayesian Inference & Neural Networks
Slide 3
Slide 3 text
2017/07/03 3 / 26
About me
●
Data Scientist at Asurion Japan Holdings
●
Previous: Data Scientist at Abeja Inc.
●
MSc degree from Jagiellonian University, Poland
●
Contributor to several ML libraries
Slide 4
Slide 4 text
2017/07/03 4 / 26
PART 1
The Problem
Slide 5
Slide 5 text
2017/07/03 5 / 26
„Missing Uncertainty“
Making a confident error is the worst thing we can do
With DL models we generally only have point estimates of parameters and
predictions
Hard to make decisions when we‘re not able to tell whether a
DL model is certain about its output or not
Trust and adoption of DL is still low
Slide 6
Slide 6 text
2017/07/03 6 / 26
PART 2
Bayesian Inference
Slide 7
Slide 7 text
2017/07/03 7 / 26
Bayesian Inference
Inference Posterior
Data Credibile Region
Uncertainity
Better Insights
Prior
μ
σ
Model
Assumptions about data
controlled by the prior
Slide 8
Slide 8 text
2017/07/03 8 / 26
Bayes formula
P(θ
true
∣D)=
P(D∣θ
true
) P(θ
true
)
P(D)
●
P(θ
true
| D): The posterior
●
the probability of the model parameters given the data: this is the result we want to
compute.
●
P(D | θ
true
): The likelihood
●
proportional to the likelihood estimation in the frequentist approach.
●
P(θ
true
): The model prior
●
encodes what we knew about the model prior to the application of the data D.
●
P(D): The data probability
●
which in practice amounts to simply a normalization term.
Slide 9
Slide 9 text
2017/07/03 9 / 26
Bayesian Inference
Bayesian
Inference
●
General purpose framework
●
Generative models
●
Clarity of FS + Power of ML
– White-box modelling
– Black-box fitting (NUTS,
ADVI)
– Uncertainity → Intuitive
insights
●
Learning from very small
datasets
●
Probabilistic Programming
Slide 10
Slide 10 text
2017/07/03 10 / 26
Bayesian Inference
●
Bayesian Optimization (GP)
●
Hierarchical models (badass
models)
●
Bonus points
– Robust in high dimensions
– Minibatches
– Knowledge transfer
Bayesian
Inference
Slide 11
Slide 11 text
2017/07/03 11 / 26
Bayesian Inference
Very easy way to cook your laptop
Slide 12
Slide 12 text
2017/07/03 12 / 26
PART 3
Hierarchical Models
Slide 13
Slide 13 text
2017/07/03 13 / 26
Hierarchical Models – parameter pooling
Pooled
Unpooled
Partial-pooling
More accurate fitting
Not enough data
Generalization
Small datasets
Missing variations
among groups
Slide 14
Slide 14 text
2017/07/03 14 / 26
Example – call duration model
●
Each advisor has his/her own
distribution
●
Overall Call Center distribution is
controlled by hyper parameter
}
Slide 15
Slide 15 text
2017/07/03 15 / 26
Hierarchical Models - benefits
●
Modelling is very easy and intuitive
●
Natural hierarchical structure of observational data
●
Variation among individual groups
●
Knowledge transfer between groups
2017/07/03 17 / 26
Synergy
●
Replace weights with probability distributions
Slide 18
Slide 18 text
2017/07/03 18 / 26
Example – standard NN
x
1
x
2
y
0.1 1.0 0
0.1 -1.3 1
…
2 hidden
layers
sigmoid
tanh
Data
Backpropagation
Slide 19
Slide 19 text
2017/07/03 19 / 26
Example – NN with Bayesian
Backpropagation
π
n=2
Bayesian
Backpropagation
2 hidden
layers
Data
x
1
x
2
y
0.1 1.0 [0,1,...]
0.1 -1.3 [1,1,...]
…
Slide 20
Slide 20 text
2017/07/03 20 / 26
Results Uncertainity
Standard NN NN with Bayesian Backpropagation
Slide 21
Slide 21 text
2017/07/03 21 / 26
Synergy – going deeper
M
S
~
G
~
Bayesian Hierarchical Model
μ
σ
Weight regularization similar to L2
Slide 22
Slide 22 text
2017/07/03 22 / 26
Synergy – going deeper
M
S
~
G
~
Bayesian Hierarchical Model
μ
σ
Regularization
Weight regularization similar to L2
Slide 23
Slide 23 text
2017/07/03 23 / 26
Synergy – going deeper
Bayesian Hierarchical Model
μ
σ
M
S G
~ ~
Slide 24
Slide 24 text
2017/07/03 24 / 26
Synergy – going deeper
Bayesian Hierarchical Model
μ
σ
M
S G
~ ~
Knowledge transfer
Slide 25
Slide 25 text
2017/07/03 25 / 26
Why is this important?
Scientific perspective
●
NN models with small
datasets
●
Complex hierarchical neural
networks (Bayesian CNN)
●
Minibatches
●
Knowledge transfer
Business perspective
●
Clear and intuitive models
●
Uncertainity in Finance &
Insurance is extremely
important
●
Better trust and adoption
of Neural Network-based
models