Philipp Singer
April 20, 2016
110

# Introduction to Bayesian Statistics

April 20, 2016

## Transcript

1. Introduction to Bayesian Statistics
Machine Learning and Data Mining
Philipp Singer
CC image courtesy of user mattbuck007 on Flickr

2. 2
Conditional Probability

3. 3
Conditional Probability

Probability of event A given that B is true

P(cough|cold) > P(cough)

Fundamental in probability theory

4. 4

Another perspective on conditional probability

Conditional probability via growing trimmed trees

5. 5
Bayes Theorem

6. 6
Bayes Theorem

P(A|B) is conditional probability of observing A
given B is true

P(B|A) is conditional probability of observing B
given A is true

P(A) and P(B) are probabilities of A and B without
conditioning on each other

7. 7
Visualize Bayes Theorem
Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
All possible
outcomes
Some event

8. 8
Visualize Bayes Theorem
All people
in study
People having
cancer

9. 9
Visualize Bayes Theorem
All people
in study
People where
screening test
is positive

10. 10
Visualize Bayes Theorem
People having
positive screening
test and cancer

11. 11
Visualize Bayes Theorem

Given the test is positive, what is the probability that said
person has cancer?

12. 12
Visualize Bayes Theorem

Given the test is positive, what is the probability that said
person has cancer?

13. 13
Visualize Bayes Theorem

Given that someone has cancer, what is the probability that said

14. 14
Example: Fake coin

Two coins
– One fair
– One unfair

What is the probability of having the fair coin
CC image courtesy of user pagedooley on Flickr

15. 15
Example: Fake coin
CC image courtesy of user pagedooley on Flickr

16. 16
Example: Fake coin
CC image courtesy of user pagedooley on Flickr

17. 17
Update of beliefs

Allows new evidence to update beliefs

Prior can also be posterior of previous update

18. 18
Example: Fake coin
CC image courtesy of user pagedooley on Flickr

Belief update

What is probability of seeing a fair coin after we

19. 19
Bayesian Inference

20. 20
Source: https://xkcd.com/1132/

21. 21
Bayesian Inference

Statistical inference of parameters
Parameters
Data
knowledge

22. 22
Coin flip example

Flip a coin several times

Is it fair?

Let's use Bayesian inference

23. 23
Binomial model

Flipping tails: 1-p

Binomial model

24. 24
Prior

Conjugate prior
– Posterior of same distribution as prior
– Beta distribution conjugate to binomial

Beta prior

25. 25
Beta distribution

Continuous probability distribution

Interval [0,1]

Two shape parameters: α and β
– If >= 1, interpret as pseudo counts
– α would refer to flipping heads

26. 26
Beta distribution

27. 27
Beta distribution

28. 28
Beta distribution

29. 29
Beta distribution

30. 30
Beta distribution

31. 31
Posterior

Posterior also Beta distribution

For exact deviation:
http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

32. 32
Posterior

Assume
– Binomial p = 0.4
– Uniform Beta prior: α=1 and β=1
– 200 random variates from binomial distribution (Heads=80)
– Update posterior

33. 33
Posterior

Assume
– Binomial p = 0.4
– Biased Beta prior: α=50 and β=10
– 200 random variates from binomial distribution (Heads=80)
– Update posterior

34. 34
Posterior

Convex combination of prior and data

The stronger our prior belief, the more data we
need to overrule the prior

The less prior belief we have, the quicker the
data overrules the prior

35. 36
So is the coin fair?

Examine posterior
– 95% posterior density interval
– ROPE [1]: Region of practical equivalence for null hypothesis
– Fair coin: [0.45,0.55]

95% HDI: (0.33, 0.47)

Cannot reject null

More samples→ we can
[1] Kruschke, John. Doing Bayesian data analysis: A tutorial
with R, JAGS, and Stan. Academic Press, 2014.

36. 37
Bayesian Model Comparison

Parameters marginalized out

Average of likelihood weighted by prior
Evidence

37. 38
Bayesian Model Comparison

Bayes factors [1]

Ratio of marginal likelihoods

Interpretation table by Kass & Raftery [1]

>100 → decisive evidence against M2
[1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
Journal of the american statistical association 90.430 (1995): 773-795.

38. 39
So is the coin fair?

Null hypothesis

Alternative hypothesis
– Anything is possible
– Beta(1,1)

Bayes factor

39. 40
So is the coin fair?

n = 200

k = 80

Bayes factor

(Decent) preference for alt. hypothesis

40. 41
Other priors

Prior can encode (theories) hypotheses

Biased hypothesis: Beta(101,11)

Haldane prior: Beta(0.001, 0.001)
– u-shaped
– high probability on p=1 or (1-p)=1

41. 42
Frequentist approach

So is the coin fair?

Binomial test with null p=0.5
– one-tailed
– 0.0028

Chi² test

42. 43
Posterior prediction

Posterior mean

If data large→converges to MLE

MAP: Maximum a posteriori
– Bayesian estimator
– uses mode

43. 44
Bayesian prediction

Posterior predictive distribution

Distribution of unobserved observations
conditioned on observed data (train, test)
Frequentist
MLE

44. 45
Alternative Bayesian Inference

Often marginal likelihood not easy to evaluate
– No analytical solution
– Numerical integration expensive

Alternatives
– Monte Carlo integration

Markov Chain Monte Carlo (MCMC)

Gibbs sampling

Metropolis-Hastings algorithm
– Laplace approximation
– Variational Bayes

45. 46
Bayesian (Machine) Learning

46. 47
Bayesian Models

Example: Markov Chain Model
– Dirichlet prior, Categorical Likelihood

Bayesian networks

Topic models (LDA)

Hierarchical Bayesian models

47. 48
Generalized Linear Model

Multiple linear regression

Logistic regression

Bayesian ANOVA

48. 49
Bayesian Statistical Tests

Alternatives to frequentist approaches

Bayesian correlation

Bayesian t-test

49. 50
Questions?
Philipp Singer
[email protected]
Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf