Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Bayesian Inference: A Coin Flipping Example

Introduction to Bayesian Inference: A Coin Flipping Example

6dd5a1c14ac7614e279cb2a3ea112790?s=128

Philipp Singer

May 13, 2016
Tweet

Transcript

  1. Introduction to Bayesian Statistics Data Science Philipp Singer CC image

    courtesy of user mattbuck007 on Flickr Additional juypter notebook material: http://nbviewer.jupyter.org/github/psinger/notebooks/blob/master/bayesian_inference.ipynb
  2. 2 Conditional Probability

  3. 3 Conditional Probability • Probability of event A given that

    B is true • P(cough|cold) > P(cough) • Fundamental in probability theory
  4. 4 Before we start with Bayes ... • Another perspective

    on conditional probability • Conditional probability via growing trimmed trees • https://www.youtube.com/watch?v=Zxm4Xxvzohk
  5. 5 Bayes Theorem

  6. 6 Bayes Theorem • P(A|B) is conditional probability of observing

    A given B is true • P(B|A) is conditional probability of observing B given A is true • P(A) and P(B) are probabilities of A and B without conditioning on each other
  7. 7 Visualize Bayes Theorem Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ All possible outcomes Some

    event
  8. 8 Visualize Bayes Theorem All people in study People having

    cancer
  9. 9 Visualize Bayes Theorem All people in study People where

    screening test is positive
  10. 10 Visualize Bayes Theorem People having positive screening test and

    cancer
  11. 11 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  12. 12 Visualize Bayes Theorem • Given the test is positive,

    what is the probability that said person has cancer?
  13. 13 Visualize Bayes Theorem • Given that someone has cancer,

    what is the probability that said person had a positive test?
  14. 14 Example: Fake coin • Two coins – One fair

    – One unfair • What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr
  15. 15 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  16. 16 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr
  17. 17 Update of beliefs • Allows new evidence to update

    beliefs • Prior can also be posterior of previous update
  18. 18 Example: Fake coin CC image courtesy of user pagedooley

    on Flickr • Belief update • What is probability of seeing a fair coin after we have already seen one Heads
  19. 19 Bayesian Inference

  20. 20 Source: https://xkcd.com/1132/

  21. 21 Bayesian Inference • Statistical inference of parameters Parameters Data

    Additional knowledge
  22. 22 Frequentist vs. Bayesian statistics • Frequentist – There is

    a true parameter that is fixed – Data is random – Repeated measurements (frequencies) – Point estimates • Bayesian – True parameter drawn from probability distribution – Data is fixed – Degrees of certainty – Probabilistic statements http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
  23. 23 Coin flip example • Flip a coin several times

    • Which model? • How can we fit the model? • Confidence vs. Credible intervals • Prediction • Hypotheses testing 80 120
  24. 24 Binomial model: frequentist perspective • Probability p of flipping

    heads • Flipping tails: 1-p • Binomial model
  25. 25 Model fitting: frequentist approach • Parameter is fixed, data

    is random • Find estimate for parameter (point estimate) • Maximum likelihood estimation (MLE) – Estimate parameter by maximizing likelihood – Covered in previous lectures
  26. 26 Binomial model: Bayesian perspective • Probability p of flipping

    heads • Flipping tails: 1-p • Binomial model
  27. 27 Full Bayesian model Binomial distribution Beta distribution Beta distribution

    Posterior combines our prior belief about the parameters with observed data. This allows to make probabilistic statements about the parameters. Data is fixed, parameters are random!
  28. 28 Prior • Prior belief about parameter(s) • Conjugate prior

    – Posterior of same distribution as prior – Beta distribution conjugate to binomial • Beta prior
  29. 29 Beta distribution • Continuous probability distribution • Interval [0,1]

    • Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads
  30. 30 Beta distribution

  31. 31 Beta distribution

  32. 32 Beta distribution

  33. 33 Beta distribution

  34. 34 Beta distribution

  35. 35 Model fitting: Bayesian approach • Data fixed, parameter chosen

    from probability distribution • “Learn” posterior • Posterior also Beta distribution • For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
  36. 36 Posterior • Posterior – 80 Heads, 120 Tails –

    Biased Beta prior: α=1 and β=1 – Update posterior (stepwise) This slide contains a video not working in the PDF version! The static output does not represent the final posterior which is α=81 and β=121.
  37. 37 Posterior • Posterior – 80 Heads, 120 Tails –

    Biased Beta prior: α=50 and β=10 – Update posterior (stepwise) This slide contains a video not working in the PDF version! The static output does not represent the final posterior which is α=130 and β=130.
  38. 38 Posterior • Convex combination of prior and data •

    The stronger our prior belief, the more data we need to overrule the prior • The less prior belief we have, the quicker the data overrules the prior
  39. 39 Confidence vs. credible intervals • Confidence interval (frequentist) –

    There is a true (fixed) unknown population parameter h – Derive confidence interval from sample – Interval constructed this way will contain h 95% of time – Again: parameter fixed, data random • Credible interval (Bayesian) – Parameter random, data fixed – Probabilistic statements about parameter – 95% credible interval:
  40. 40 Confidence vs. credible intervals • Confidence interval – Uncertainty

    about the interval we obtained • Credible interval – Uncertainty about the parameter • Sources: – http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/ – http://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval – https://zenodo.org/record/16991 – http://freakonometrics.hypotheses.org/18117
  41. 41 Confidence vs. credible intervals • Confidence interval – 95%:

    [0.33, 0.47] • Credible interval – Directly from posterior – 95%: (0.33, 0.47)
  42. 42 Confidence vs. credible intervals • Confidence interval – 95%:

    [0.33, 0.47] • Credible interval – Directly from posterior – 95%: (0.33, 0.47) Given observed data, there is a 95% probability that the true value of p falls within credible interval There is a 95% probability that when I create confidence intervals of this sort, the CI will include the population parameter p.
  43. 43 Confidence vs. credible intervals • Confidence interval – 95%:

    [0.33, 0.47] • Credible interval – Directly from posterior – 95%: (0.33, 0.47) Confidence and credible intervals are not always equal! See: http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/ http://bayes.wustl.edu/etj/articles/confidence.pdf
  44. 44 So is the coin fair? Frequentist approach • Null

    hypothesis test • Binomial test with null p=0.5 – one-tailed – 0.0028 – → reject null hypothesis • Alternative: Chi² test
  45. 45 So is the coin fair? Bayesian approach • Examine

    posterior – 95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] • 95% HDI: (0.33, 0.47) • Cannot reject null • More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
  46. 46 Bayesian Model Comparison • Parameters marginalized out • Average

    of likelihood weighted by prior Evidence
  47. 47 Bayesian Model Comparison • Bayes factors [1] • Ratio

    of marginal likelihoods • Interpretation table by Kass & Raftery [1] • >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795.
  48. 48 So is the coin fair? • Null hypothesis •

    Alternative hypothesis – Anything is possible – Beta(1,1) • Bayes factor
  49. 49 So is the coin fair? • n = 200

    • k = 80 • Bayes factor • (Decent) preference for alt. hypothesis
  50. 50 Other priors • Prior can encode (theories) hypotheses •

    Biased hypothesis: Beta(101,11) • Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1
  51. 51 Prediction: Frequentist approach • Predict based on MLE •

    Example: – Training: 80 Heads, 120 Tails – Test/Prediction: 2 Heads, 0 Tails
  52. 52 Prediction: Bayesian approach • Posterior mean • If data

    large→converges to MLE • MAP: Maximum a posteriori – Bayesian estimator – uses mode
  53. 53 Prediction: Bayesian approach • Posterior predictive distribution • Distribution

    of unobserved observations conditioned on observed data (train, test)
  54. 54 Alternative Bayesian Inference • Often marginal likelihood not easy

    to evaluate – No analytical solution – Numerical integration expensive • Alternatives – Monte Carlo integration • Markov Chain Monte Carlo (MCMC) • Gibbs sampling • Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes
  55. 55 Generalized Linear Model • Multiple linear regression • Logistic

    regression • Bayesian ANOVA
  56. 56 Bayesian Statistical Tests • Alternatives to frequentist approaches •

    Bayesian correlation • Bayesian t-test
  57. 57 Resources • Harvard Data Science Course Lectures 16/17 http://cs109.github.io/2015/pages/videos.html

    • Doing Bayesian Data Analysis • Bayesian Methods for Hackers https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayes ian-Methods-for-Hackers • Google :)
  58. 58

  59. 59 Questions? Philipp Singer philipp.singer@gesis.org Image credit: talk of Mike

    West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf