Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp
Singer CC image courtesy of user mattbuck007 on Flickr

2 Conditional Probability

3 Conditional Probability • Probability of event A given that
B is true • P(cough|cold) > P(cough) • Fundamental in probability theory

4 Before we start with Bayes ... • Another perspective
on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk

5 Bayes Theorem

6 Bayes Theorem • P(A|B) is conditional probability of observing
A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other

7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some
event

8 Visualize Bayes Theorem All people in study People having
cancer

9 Visualize Bayes Theorem All people in study People where
screening test is positive

10 Visualize Bayes Theorem People having positive screening test and
cancer

11 Visualize Bayes Theorem • Given the test is positive,
what is the probability that said person has cancer?

12 Visualize Bayes Theorem • Given the test is positive,
what is the probability that said person has cancer?

13 Visualize Bayes Theorem • Given that someone has cancer,
what is the probability that said person had a positive test?

14 Example: Fake coin • Two coins – One fair
– One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr

15 Example: Fake coin CC image courtesy of user pagedooley
on Flickr

on Flickr

17 Update of beliefs • Allows new evidence to update
beliefs • Prior can also be posterior of previous update

on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads

19 Bayesian Inference

20 Source: https://xkcd.com/1132/

21 Bayesian Inference • Statistical inference of parameters Parameters Data
Additional knowledge

22 Coin flip example • Flip a coin several times
• Is it fair? • Let's use Bayesian inference

23 Binomial model • Probability p of flipping heads •
Flipping tails: 1-p • Binomial model

24 Prior • Prior belief about parameter(s) • Conjugate prior
– Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior

25 Beta distribution • Continuous probability distribution • Interval [0,1]
• Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads

26 Beta distribution

31 Posterior • Posterior also Beta distribution • For exact
deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

32 Posterior • Assume – Binomial p = 0.4 –
Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior

33 Posterior • Assume – Binomial p = 0.4 –
Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior

34 Posterior • Convex combination of prior and data •
The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior

36 So is the coin fair? • Examine posterior –
95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.

37 Bayesian Model Comparison • Parameters marginalized out • Average
of likelihood weighted by prior Evidence

38 Bayesian Model Comparison • Bayes factors [1] • Ratio
of marginal likelihoods • Interpretation table by Kass & Raftery [1] • >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.

39 So is the coin fair? • Null hypothesis •
Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor

40 So is the coin fair? • n = 200
• k = 80 • Bayes factor • (Decent) preference for alt. hypothesis

41 Other priors • Prior can encode (theories) hypotheses •
Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1

42 Frequentist approach • So is the coin fair? •
Binomial test with null p=0.5 – one-tailed – 0.0028 • Chi² test

43 Posterior prediction • Posterior mean • If data large→converges
to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode

44 Bayesian prediction • Posterior predictive distribution • Distribution of
unobserved observations conditioned on observed data (train, test) Frequentist MLE

45 Alternative Bayesian Inference • Often marginal likelihood not easy
to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes

46 Bayesian (Machine) Learning

47 Bayesian Models • Example: Markov Chain Model – Dirichlet
prior, Categorical Likelihood • Bayesian networks • Topic models (LDA) • Hierarchical Bayesian models

48 Generalized Linear Model • Multiple linear regression • Logistic
regression • Bayesian ANOVA

49 Bayesian Statistical Tests • Alternatives to frequentist approaches •
Bayesian correlation • Bayesian t-test

50 Questions? Philipp Singer [email protected] Image credit: talk of Mike
West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

More Decks by Philipp Singer

Other Decks in Education

Featured

Transcript