Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

6dd5a1c14ac7614e279cb2a3ea112790?s=128

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Education

Transcript

  1. Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp

    Singer CC image courtesy of user mattbuck007 on Flickr
  2. 2 Conditional Probability

  3. 3 Conditional Probability • Probability of event A given that

    B is true • P(cough|cold) > P(cough) • Fundamental in probability theory
  4. 4 Before we start with Bayes ... • Another perspective

    on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk
  5. 5 Bayes Theorem

  6. 6 Bayes Theorem • P(A|B) is conditional probability of observing

    A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other
  7. 7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some

    event
  8. 8 Visualize Bayes Theorem All people in study People having

    cancer
  9. 9 Visualize Bayes Theorem All people in study People where

    screening test is positive
  10. 10 Visualize Bayes Theorem People having positive screening test and

    cancer
  11. 11 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  12. 12 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  13. 13 Visualize Bayes Theorem • Given that someone has cancer,

    what is the probability that said person had a positive test?
  14. 14 Example: Fake coin • Two coins – One fair

    – One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
  15. 15 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  16. 16 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  17. 17 Update of beliefs • Allows new evidence to update

    beliefs • Prior can also be posterior of previous update
  18. 18 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads
  19. 19 Bayesian Inference

  20. 20 Source: https://xkcd.com/1132/

  21. 21 Bayesian Inference • Statistical inference of parameters Parameters Data

    Additional knowledge
  22. 22 Coin flip example • Flip a coin several times

    • Is it fair? • Let's use Bayesian inference
  23. 23 Binomial model • Probability p of flipping heads •

    Flipping tails: 1-p • Binomial model
  24. 24 Prior • Prior belief about parameter(s) • Conjugate prior

    – Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior
  25. 25 Beta distribution • Continuous probability distribution • Interval [0,1]

    • Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
  26. 26 Beta distribution

  27. 27 Beta distribution

  28. 28 Beta distribution

  29. 29 Beta distribution

  30. 30 Beta distribution

  31. 31 Posterior • Posterior also Beta distribution • For exact

    deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
  32. 32 Posterior • Assume – Binomial p = 0.4 –

    Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  33. 33 Posterior • Assume – Binomial p = 0.4 –

    Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  34. 34 Posterior • Convex combination of prior and data •

    The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior
  35. 36 So is the coin fair? • Examine posterior –

    95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
  36. 37 Bayesian Model Comparison • Parameters marginalized out • Average

    of likelihood weighted by prior Evidence
  37. 38 Bayesian Model Comparison • Bayes factors [1] • Ratio

    of marginal likelihoods • Interpretation table by Kass & Raftery [1] • >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
  38. 39 So is the coin fair? • Null hypothesis •

    Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor
  39. 40 So is the coin fair? • n = 200

    • k = 80 • Bayes factor • (Decent) preference for alt. hypothesis
  40. 41 Other priors • Prior can encode (theories) hypotheses •

    Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
  41. 42 Frequentist approach • So is the coin fair? •

    Binomial test with null p=0.5 – one-tailed – 0.0028 • Chi² test
  42. 43 Posterior prediction • Posterior mean • If data large→converges

    to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode
  43. 44 Bayesian prediction • Posterior predictive distribution • Distribution of

    unobserved observations conditioned on observed data (train, test) Frequentist MLE
  44. 45 Alternative Bayesian Inference • Often marginal likelihood not easy

    to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
  45. 46 Bayesian (Machine) Learning

  46. 47 Bayesian Models • Example: Markov Chain Model – Dirichlet

    prior, Categorical Likelihood • Bayesian networks • Topic models (LDA) • Hierarchical Bayesian models
  47. 48 Generalized Linear Model • Multiple linear regression • Logistic

    regression • Bayesian ANOVA
  48. 49 Bayesian Statistical Tests • Alternatives to frequentist approaches •

    Bayesian correlation • Bayesian t-test
  49. 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike

    West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf