Save 37% off PRO during our Black Friday Sale! »

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

6dd5a1c14ac7614e279cb2a3ea112790?s=128

Philipp Singer

April 20, 2016
Tweet

Transcript

  1. Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp

    Singer CC image courtesy of user mattbuck007 on Flickr
  2. 2 Conditional Probability

  3. 3 Conditional Probability • Probability of event A given that

    B is true • P(cough|cold) > P(cough) • Fundamental in probability theory
  4. 4 Before we start with Bayes ... • Another perspective

    on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk
  5. 5 Bayes Theorem

  6. 6 Bayes Theorem • P(A|B) is conditional probability of observing

    A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other
  7. 7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some

    event
  8. 8 Visualize Bayes Theorem All people in study People having

    cancer
  9. 9 Visualize Bayes Theorem All people in study People where

    screening test is positive
  10. 10 Visualize Bayes Theorem People having positive screening test and

    cancer
  11. 11 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  12. 12 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  13. 13 Visualize Bayes Theorem • Given that someone has cancer,

    what is the probability that said person had a positive test?
  14. 14 Example: Fake coin • Two coins – One fair

    – One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
  15. 15 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  16. 16 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  17. 17 Update of beliefs • Allows new evidence to update

    beliefs • Prior can also be posterior of previous update
  18. 18 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads
  19. 19 Bayesian Inference

  20. 20 Source: https://xkcd.com/1132/

  21. 21 Bayesian Inference • Statistical inference of parameters Parameters Data

    Additional knowledge
  22. 22 Coin flip example • Flip a coin several times

    • Is it fair? • Let's use Bayesian inference
  23. 23 Binomial model • Probability p of flipping heads •

    Flipping tails: 1-p • Binomial model
  24. 24 Prior • Prior belief about parameter(s) • Conjugate prior

    – Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior
  25. 25 Beta distribution • Continuous probability distribution • Interval [0,1]

    • Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
  26. 26 Beta distribution

  27. 27 Beta distribution

  28. 28 Beta distribution

  29. 29 Beta distribution

  30. 30 Beta distribution

  31. 31 Posterior • Posterior also Beta distribution • For exact

    deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
  32. 32 Posterior • Assume – Binomial p = 0.4 –

    Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  33. 33 Posterior • Assume – Binomial p = 0.4 –

    Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior
  34. 34 Posterior • Convex combination of prior and data •

    The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior
  35. 36 So is the coin fair? • Examine posterior –

    95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
  36. 37 Bayesian Model Comparison • Parameters marginalized out • Average

    of likelihood weighted by prior Evidence
  37. 38 Bayesian Model Comparison • Bayes factors [1] • Ratio

    of marginal likelihoods • Interpretation table by Kass & Raftery [1] • >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
  38. 39 So is the coin fair? • Null hypothesis •

    Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor
  39. 40 So is the coin fair? • n = 200

    • k = 80 • Bayes factor • (Decent) preference for alt. hypothesis
  40. 41 Other priors • Prior can encode (theories) hypotheses •

    Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
  41. 42 Frequentist approach • So is the coin fair? •

    Binomial test with null p=0.5 – one-tailed – 0.0028 • Chi² test
  42. 43 Posterior prediction • Posterior mean • If data large→converges

    to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode
  43. 44 Bayesian prediction • Posterior predictive distribution • Distribution of

    unobserved observations conditioned on observed data (train, test) Frequentist MLE
  44. 45 Alternative Bayesian Inference • Often marginal likelihood not easy

    to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
  45. 46 Bayesian (Machine) Learning

  46. 47 Bayesian Models • Example: Markov Chain Model – Dirichlet

    prior, Categorical Likelihood • Bayesian networks • Topic models (LDA) • Hierarchical Bayesian models
  47. 48 Generalized Linear Model • Multiple linear regression • Logistic

    regression • Bayesian ANOVA
  48. 49 Bayesian Statistical Tests • Alternatives to frequentist approaches •

    Bayesian correlation • Bayesian t-test
  49. 50 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike

    West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf