71

# Introduction to Bayesian Statistics

April 20, 2016

## Transcript

1. ### Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp

Singer CC image courtesy of user mattbuck007 on Flickr

3. ### 3 Conditional Probability • Probability of event A given that

B is true • P(cough|cold) > P(cough) • Fundamental in probability theory
4. ### 4 Before we start with Bayes ... • Another perspective

on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk

6. ### 6 Bayes Theorem • P(A|B) is conditional probability of observing

A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other

event

cancer
9. ### 9 Visualize Bayes Theorem All people in study People where

screening test is positive

cancer
11. ### 11 Visualize Bayes Theorem • Given the test is positive,

what is the probability that said person has cancer?
12. ### 12 Visualize Bayes Theorem • Given the test is positive,

what is the probability that said person has cancer?
13. ### 13 Visualize Bayes Theorem • Given that someone has cancer,

what is the probability that said person had a positive test?
14. ### 14 Example: Fake coin • Two coins – One fair

– One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr

on Flickr

on Flickr
17. ### 17 Update of beliefs • Allows new evidence to update

beliefs • Prior can also be posterior of previous update
18. ### 18 Example: Fake coin CC image courtesy of user pagedooley

on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads

22. ### 22 Coin flip example • Flip a coin several times

• Is it fair? • Let's use Bayesian inference
23. ### 23 Binomial model • Probability p of flipping heads •

Flipping tails: 1-p • Binomial model
24. ### 24 Prior • Prior belief about parameter(s) • Conjugate prior

– Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior
25. ### 25 Beta distribution • Continuous probability distribution • Interval [0,1]

• Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads

31. ### 31 Posterior • Posterior also Beta distribution • For exact

deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
32. ### 32 Posterior • Assume – Binomial p = 0.4 –

Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
33. ### 33 Posterior • Assume – Binomial p = 0.4 –

Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
34. ### 34 Posterior • Convex combination of prior and data •

The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior
35. ### 36 So is the coin fair? • Examine posterior –

95% posterior density interval – ROPE : Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can  Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
36. ### 37 Bayesian Model Comparison • Parameters marginalized out • Average

of likelihood weighted by prior Evidence
37. ### 38 Bayesian Model Comparison • Bayes factors  • Ratio

of marginal likelihoods • Interpretation table by Kass & Raftery  • >100 → decisive evidence against M2  Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
38. ### 39 So is the coin fair? • Null hypothesis •

Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor
39. ### 40 So is the coin fair? • n = 200

• k = 80 • Bayes factor • (Decent) preference for alt. hypothesis
40. ### 41 Other priors • Prior can encode (theories) hypotheses •

Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
41. ### 42 Frequentist approach • So is the coin fair? •

Binomial test with null p=0.5 – one-tailed – 0.0028 • Chi² test
42. ### 43 Posterior prediction • Posterior mean • If data large→converges

to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode
43. ### 44 Bayesian prediction • Posterior predictive distribution • Distribution of

unobserved observations conditioned on observed data (train, test) Frequentist MLE
44. ### 45 Alternative Bayesian Inference • Often marginal likelihood not easy

to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes

46. ### 47 Bayesian Models • Example: Markov Chain Model – Dirichlet

prior, Categorical Likelihood • Bayesian networks • Topic models (LDA) • Hierarchical Bayesian models
47. ### 48 Generalized Linear Model • Multiple linear regression • Logistic

regression • Bayesian ANOVA
48. ### 49 Bayesian Statistical Tests • Alternatives to frequentist approaches •

Bayesian correlation • Bayesian t-test
49. ### 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike

West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf