Bayesian Inference & Neural Networks

Slide 1

Slide 1 text

Bayesian Inference & Neural Networks Lukasz Krawczyk, 1st March 2017 Homo Aprioriu s Homo Pragmatic us Homo Friquentist us Homo Sapiens Homo Bayesia nis

Slide 2

Slide 2 text

2017/07/03 2 / 26 Agenda ● About me ● The Problem ● Bayesian Inference ● Hierarchical Models ● Bayesian Inference & Neural Networks

Slide 3

Slide 3 text

2017/07/03 3 / 26 About me ● Data Scientist at Asurion Japan Holdings ● Previous: Data Scientist at Abeja Inc. ● MSc degree from Jagiellonian University, Poland ● Contributor to several ML libraries

Slide 4

Slide 4 text

2017/07/03 4 / 26 PART 1 The Problem

Slide 5

Slide 5 text

2017/07/03 5 / 26 „Missing Uncertainty“ Making a confident error is the worst thing we can do With DL models we generally only have point estimates of parameters and predictions Hard to make decisions when we‘re not able to tell whether a DL model is certain about its output or not Trust and adoption of DL is still low

Slide 6

Slide 6 text

2017/07/03 6 / 26 PART 2 Bayesian Inference

Slide 7

Slide 7 text

2017/07/03 7 / 26 Bayesian Inference Inference Posterior Data Credibile Region Uncertainity Better Insights Prior μ σ Model Assumptions about data controlled by the prior

Slide 8

Slide 8 text

2017/07/03 8 / 26 Bayes formula P(θ true ∣D)= P(D∣θ true ) P(θ true ) P(D) ● P(θ true | D): The posterior ● the probability of the model parameters given the data: this is the result we want to compute. ● P(D | θ true ): The likelihood ● proportional to the likelihood estimation in the frequentist approach. ● P(θ true ): The model prior ● encodes what we knew about the model prior to the application of the data D. ● P(D): The data probability ● which in practice amounts to simply a normalization term.

Slide 9

Slide 9 text

2017/07/03 9 / 26 Bayesian Inference Bayesian Inference ● General purpose framework ● Generative models ● Clarity of FS + Power of ML – White-box modelling – Black-box fitting (NUTS, ADVI) – Uncertainity → Intuitive insights ● Learning from very small datasets ● Probabilistic Programming

Slide 10

Slide 10 text

2017/07/03 10 / 26 Bayesian Inference ● Bayesian Optimization (GP) ● Hierarchical models (badass models) ● Bonus points – Robust in high dimensions – Minibatches – Knowledge transfer Bayesian Inference

Slide 11

Slide 11 text

2017/07/03 11 / 26 Bayesian Inference Very easy way to cook your laptop

Slide 12

Slide 12 text

2017/07/03 12 / 26 PART 3 Hierarchical Models

Slide 13

Slide 13 text

2017/07/03 13 / 26 Hierarchical Models – parameter pooling Pooled Unpooled Partial-pooling More accurate fitting Not enough data Generalization Small datasets Missing variations among groups

Slide 14

Slide 14 text

2017/07/03 14 / 26 Example – call duration model ● Each advisor has his/her own distribution ● Overall Call Center distribution is controlled by hyper parameter }

Slide 15

Slide 15 text

2017/07/03 15 / 26 Hierarchical Models - benefits ● Modelling is very easy and intuitive ● Natural hierarchical structure of observational data ● Variation among individual groups ● Knowledge transfer between groups

Slide 16

Slide 16 text

2017/07/03 16 / 26 PART 4 Bayesian Inference & Neural Networks

Slide 17

Slide 17 text

2017/07/03 17 / 26 Synergy ● Replace weights with probability distributions

Slide 18

Slide 18 text

2017/07/03 18 / 26 Example – standard NN x 1 x 2 y 0.1 1.0 0 0.1 -1.3 1 … 2 hidden layers sigmoid tanh Data Backpropagation

Slide 19

Slide 19 text

2017/07/03 19 / 26 Example – NN with Bayesian Backpropagation π n=2 Bayesian Backpropagation 2 hidden layers Data x 1 x 2 y 0.1 1.0 [0,1,...] 0.1 -1.3 [1,1,...] …

Slide 20

Slide 20 text

2017/07/03 20 / 26 Results Uncertainity Standard NN NN with Bayesian Backpropagation

Slide 21

Slide 21 text

2017/07/03 21 / 26 Synergy – going deeper M S ~ G ~ Bayesian Hierarchical Model μ σ Weight regularization similar to L2

Slide 22

Slide 22 text

2017/07/03 22 / 26 Synergy – going deeper M S ~ G ~ Bayesian Hierarchical Model μ σ Regularization Weight regularization similar to L2

Slide 23

Slide 23 text

2017/07/03 23 / 26 Synergy – going deeper Bayesian Hierarchical Model μ σ M S G ~ ~

Slide 24

Slide 24 text

2017/07/03 24 / 26 Synergy – going deeper Bayesian Hierarchical Model μ σ M S G ~ ~ Knowledge transfer

Slide 25

Slide 25 text

2017/07/03 25 / 26 Why is this important? Scientific perspective ● NN models with small datasets ● Complex hierarchical neural networks (Bayesian CNN) ● Minibatches ● Knowledge transfer Business perspective ● Clear and intuitive models ● Uncertainity in Finance & Insurance is extremely important ● Better trust and adoption of Neural Network-based models

Slide 26

Slide 26 text

2017/07/03 26 / 26 Thank you!