Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Education

Transcript

  1. Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp

    Singer CC image courtesy of user mattbuck007 on Flickr
  2. 3 Conditional Probability • Probability of event A given that

    B is true • P(cough|cold) > P(cough) • Fundamental in probability theory
  3. 4 Before we start with Bayes ... • Another perspective

    on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk
  4. 6 Bayes Theorem • P(A|B) is conditional probability of observing

    A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other
  5. 11 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  6. 12 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  7. 13 Visualize Bayes Theorem • Given that someone has cancer,

    what is the probability that said person had a positive test?
  8. 14 Example: Fake coin • Two coins – One fair

    – One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
  9. 17 Update of beliefs • Allows new evidence to update

    beliefs • Prior can also be posterior of previous update
  10. 18 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads
  11. 22 Coin flip example • Flip a coin several times

    • Is it fair? • Let's use Bayesian inference
  12. 23 Binomial model • Probability p of flipping heads •

    Flipping tails: 1-p • Binomial model
  13. 24 Prior • Prior belief about parameter(s) • Conjugate prior

    – Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior
  14. 25 Beta distribution • Continuous probability distribution • Interval [0,1]

    • Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
  15. 31 Posterior • Posterior also Beta distribution • For exact

    deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
  16. 32 Posterior • Assume – Binomial p = 0.4 –

    Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  17. 33 Posterior • Assume – Binomial p = 0.4 –

    Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  18. 34 Posterior • Convex combination of prior and data •

    The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior
  19. 36 So is the coin fair? • Examine posterior –

    95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
  20. 38 Bayesian Model Comparison • Bayes factors [1] • Ratio

    of marginal likelihoods • Interpretation table by Kass & Raftery [1] • >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
  21. 39 So is the coin fair? • Null hypothesis •

    Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor
  22. 40 So is the coin fair? • n = 200

    • k = 80 • Bayes factor • (Decent) preference for alt. hypothesis
  23. 41 Other priors • Prior can encode (theories) hypotheses •

    Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
  24. 42 Frequentist approach • So is the coin fair? •

    Binomial test with null p=0.5 – one-tailed – 0.0028 • Chi² test
  25. 43 Posterior prediction • Posterior mean • If data large→converges

    to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode
  26. 44 Bayesian prediction • Posterior predictive distribution • Distribution of

    unobserved observations conditioned on observed data (train, test) Frequentist MLE
  27. 45 Alternative Bayesian Inference • Often marginal likelihood not easy

    to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
  28. 47 Bayesian Models • Example: Markov Chain Model – Dirichlet

    prior, Categorical Likelihood • Bayesian networks • Topic models (LDA) • Hierarchical Bayesian models
  29. 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike

    West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf