Bayesian Inference is known to make machines biased

Slide 1

Slide 1 text

Introduction to Probabilistic Programming

Slide 2

Slide 2 text

(S, P) observations probability distributions on S Statistical Model

Slide 3

Slide 3 text

Bayes Theorem

Slide 4

Slide 4 text

P(A|B) = P(B|A)P(A) P(B)

Slide 5

Slide 5 text

P(̣|⬤) = P(⬤|̣)P(̣) P(⬤)

Slide 6

Slide 6 text

posterior= prior x likelihood evidence

Slide 7

Slide 7 text

Number Guessing

Slide 8

Slide 8 text

guess the arithmetical concept

Slide 9

Slide 9 text

2, 4, 8

Slide 10

Slide 10 text

Humans are biased towards induction

Slide 11

Slide 11 text

Humans learn from positive examples

Slide 12

Slide 12 text

Empirical Distribution

Slide 13

Slide 13 text

How to bias the machine?

Slide 14

Slide 14 text

Similarity Based Rule Based vs

Slide 15

Slide 15 text

Rule Based Hypothesis Elimination More complex ways to generalise Requires strong priors

Slide 16

Slide 16 text

Similarity Based Require negative examples Require “similarity” measure Improbable features impacting result

Slide 17

Slide 17 text

Priors (biases) Hypothesis spaces

Slide 18

Slide 18 text

odd numbers even numbers primes powers of two ending with n etc... Hypothesis Spaces H:

Slide 19

Slide 19 text

Model Averaging

Slide 20

Slide 20 text

Bayesian Ockham’s Razor

Slide 21

Slide 21 text

Seen: 16

Slide 22

Slide 22 text

Seen: 16, 8, 2, 64

Slide 23

Slide 23 text

Size principle

Slide 24

Slide 24 text

Ruling out unnatural concepts

Slide 25

Slide 25 text

Uniform Prior

Slide 26

Slide 26 text

Seen: 16 prior, likelihood and posterior

Slide 27

Slide 27 text

Seen: 2, 4, 8, 16 prior, likelihood and posterior

Slide 28

Slide 28 text

Data can overwhelm the prior.

Slide 29

Slide 29 text

Hierarchical Bayes Models

Slide 30

Slide 30 text

Seen: 16 Seen: 60 Seen: 2 8 16 64 Seen: 16 23 19 20 Machine

Slide 31

Slide 31 text

Seen: 16 Seen: 60 Seen: 2 8 16 64 Seen: 16 23 19 20 Huma

Slide 32

Slide 32 text

which can be approximated with MCMC To make inference, we need to integrate:

Slide 33

Slide 33 text

Monte Carlo

Slide 34

Slide 34 text

Represents a Probability Distribution by set of samples from it

Slide 35

Slide 35 text

Supported by Law Of Large Numbers™

Slide 36

Slide 36 text

Coin Flips (take 50 (sample (flip 0.5)))

Slide 37

Slide 37 text

Coin Flips (take 50000 (sample (flip 0.4)))

Slide 38

Slide 38 text

Coin Flips (take 5000 (sample (flip 0.5)))

Slide 39

Slide 39 text

MC (Monte Carlo)

Slide 40

Slide 40 text

1. Deﬁne domain of inputs 2. Generate inputs over domain and it’s PD 3. Perform a deterministic computation 4. Aggregate

Slide 41

Slide 41 text

Machine Learning: a Probabilistic Perspective https://www.cs.ubc.ca/~murphyk/MLbook/ Bayesian models of cognition https://cocosci.berkeley.edu/tom/papers/bayeschapter.pdf Bayesian Methods for Hackers https://github.com/CamDavidsonPilon/Probabilistic-Programming-and- Bayesian-Methods-for-Hackers Anglican http://www.robots.ox.ac.uk/~fwood/anglican/index.html Introduction to Markov Chain Monte Carlo http://www.mcmchandbook.net/HandbookChapter1.pdf Reading List

Slide 42

Slide 42 text

No content