Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Education

Transcript

  1. Introduction to Bayesian Statistics
    Machine Learning and Data Mining
    Philipp Singer
    CC image courtesy of user mattbuck007 on Flickr

    View full-size slide

  2. 2
    Conditional Probability

    View full-size slide

  3. 3
    Conditional Probability

    Probability of event A given that B is true

    P(cough|cold) > P(cough)

    Fundamental in probability theory

    View full-size slide

  4. 4
    Before we start with Bayes ...

    Another perspective on conditional probability

    Conditional probability via growing trimmed trees

    https://www.youtube.com/watch?v=Zxm4Xxvzohk

    View full-size slide

  5. 5
    Bayes Theorem

    View full-size slide

  6. 6
    Bayes Theorem

    P(A|B) is conditional probability of observing A
    given B is true

    P(B|A) is conditional probability of observing B
    given A is true

    P(A) and P(B) are probabilities of A and B without
    conditioning on each other

    View full-size slide

  7. 7
    Visualize Bayes Theorem
    Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
    All possible
    outcomes
    Some event

    View full-size slide

  8. 8
    Visualize Bayes Theorem
    All people
    in study
    People having
    cancer

    View full-size slide

  9. 9
    Visualize Bayes Theorem
    All people
    in study
    People where
    screening test
    is positive

    View full-size slide

  10. 10
    Visualize Bayes Theorem
    People having
    positive screening
    test and cancer

    View full-size slide

  11. 11
    Visualize Bayes Theorem

    Given the test is positive, what is the probability that said
    person has cancer?

    View full-size slide

  12. 12
    Visualize Bayes Theorem

    Given the test is positive, what is the probability that said
    person has cancer?

    View full-size slide

  13. 13
    Visualize Bayes Theorem

    Given that someone has cancer, what is the probability that said
    person had a positive test?

    View full-size slide

  14. 14
    Example: Fake coin

    Two coins
    – One fair
    – One unfair

    What is the probability of having the fair coin
    after flipping Heads?
    CC image courtesy of user pagedooley on Flickr

    View full-size slide

  15. 15
    Example: Fake coin
    CC image courtesy of user pagedooley on Flickr

    View full-size slide

  16. 16
    Example: Fake coin
    CC image courtesy of user pagedooley on Flickr

    View full-size slide

  17. 17
    Update of beliefs

    Allows new evidence to update beliefs

    Prior can also be posterior of previous update

    View full-size slide

  18. 18
    Example: Fake coin
    CC image courtesy of user pagedooley on Flickr

    Belief update

    What is probability of seeing a fair coin after we
    have already seen one Heads

    View full-size slide

  19. 19
    Bayesian Inference

    View full-size slide

  20. 20
    Source: https://xkcd.com/1132/

    View full-size slide

  21. 21
    Bayesian Inference

    Statistical inference of parameters
    Parameters
    Data
    Additional
    knowledge

    View full-size slide

  22. 22
    Coin flip example

    Flip a coin several times

    Is it fair?

    Let's use Bayesian inference

    View full-size slide

  23. 23
    Binomial model

    Probability p of flipping heads

    Flipping tails: 1-p

    Binomial model

    View full-size slide

  24. 24
    Prior

    Prior belief about parameter(s)

    Conjugate prior
    – Posterior of same distribution as prior
    – Beta distribution conjugate to binomial

    Beta prior

    View full-size slide

  25. 25
    Beta distribution

    Continuous probability distribution

    Interval [0,1]

    Two shape parameters: α and β
    – If >= 1, interpret as pseudo counts
    – α would refer to flipping heads

    View full-size slide

  26. 26
    Beta distribution

    View full-size slide

  27. 27
    Beta distribution

    View full-size slide

  28. 28
    Beta distribution

    View full-size slide

  29. 29
    Beta distribution

    View full-size slide

  30. 30
    Beta distribution

    View full-size slide

  31. 31
    Posterior

    Posterior also Beta distribution

    For exact deviation:
    http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

    View full-size slide

  32. 32
    Posterior

    Assume
    – Binomial p = 0.4
    – Uniform Beta prior: α=1 and β=1
    – 200 random variates from binomial distribution (Heads=80)
    – Update posterior

    View full-size slide

  33. 33
    Posterior

    Assume
    – Binomial p = 0.4
    – Biased Beta prior: α=50 and β=10
    – 200 random variates from binomial distribution (Heads=80)
    – Update posterior

    View full-size slide

  34. 34
    Posterior

    Convex combination of prior and data

    The stronger our prior belief, the more data we
    need to overrule the prior

    The less prior belief we have, the quicker the
    data overrules the prior

    View full-size slide

  35. 36
    So is the coin fair?

    Examine posterior
    – 95% posterior density interval
    – ROPE [1]: Region of practical equivalence for null hypothesis
    – Fair coin: [0.45,0.55]

    95% HDI: (0.33, 0.47)

    Cannot reject null

    More samples→ we can
    [1] Kruschke, John. Doing Bayesian data analysis: A tutorial
    with R, JAGS, and Stan. Academic Press, 2014.

    View full-size slide

  36. 37
    Bayesian Model Comparison

    Parameters marginalized out

    Average of likelihood weighted by prior
    Evidence

    View full-size slide

  37. 38
    Bayesian Model Comparison

    Bayes factors [1]

    Ratio of marginal likelihoods

    Interpretation table by Kass & Raftery [1]

    >100 → decisive evidence against M2
    [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
    Journal of the american statistical association 90.430 (1995): 773-795.

    View full-size slide

  38. 39
    So is the coin fair?

    Null hypothesis

    Alternative hypothesis
    – Anything is possible
    – Beta(1,1)

    Bayes factor

    View full-size slide

  39. 40
    So is the coin fair?

    n = 200

    k = 80

    Bayes factor

    (Decent) preference for alt. hypothesis

    View full-size slide

  40. 41
    Other priors

    Prior can encode (theories) hypotheses

    Biased hypothesis: Beta(101,11)

    Haldane prior: Beta(0.001, 0.001)
    – u-shaped
    – high probability on p=1 or (1-p)=1

    View full-size slide

  41. 42
    Frequentist approach

    So is the coin fair?

    Binomial test with null p=0.5
    – one-tailed
    – 0.0028

    Chi² test

    View full-size slide

  42. 43
    Posterior prediction

    Posterior mean

    If data large→converges to MLE

    MAP: Maximum a posteriori
    – Bayesian estimator
    – uses mode

    View full-size slide

  43. 44
    Bayesian prediction

    Posterior predictive distribution

    Distribution of unobserved observations
    conditioned on observed data (train, test)
    Frequentist
    MLE

    View full-size slide

  44. 45
    Alternative Bayesian Inference

    Often marginal likelihood not easy to evaluate
    – No analytical solution
    – Numerical integration expensive

    Alternatives
    – Monte Carlo integration

    Markov Chain Monte Carlo (MCMC)

    Gibbs sampling

    Metropolis-Hastings algorithm
    – Laplace approximation
    – Variational Bayes

    View full-size slide

  45. 46
    Bayesian (Machine) Learning

    View full-size slide

  46. 47
    Bayesian Models

    Example: Markov Chain Model
    – Dirichlet prior, Categorical Likelihood

    Bayesian networks

    Topic models (LDA)

    Hierarchical Bayesian models

    View full-size slide

  47. 48
    Generalized Linear Model

    Multiple linear regression

    Logistic regression

    Bayesian ANOVA

    View full-size slide

  48. 49
    Bayesian Statistical Tests

    Alternatives to frequentist approaches

    Bayesian correlation

    Bayesian t-test

    View full-size slide

  49. 50
    Questions?
    Philipp Singer
    [email protected]
    Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf

    View full-size slide