Frequentism and Bayesianism: What's the Big Deal? (SciPy 2014)

Slide 1

Slide 1 text

Jake VanderPlas SciPy 2014

Slide 2

Slide 2 text

What this talk is… An introduction to the essential diﬀerences between frequentist & Bayesian analyses. A brief discussion of tools available in Python to perform these analyses A thinly-veiled argument for the use of Bayesian methods in science.

Slide 3

Slide 3 text

What this talk is not… A complete discussion of frequentist/ Bayesian statistics & the associated examples. (For more detail, see the accompanying SciPy proceedings paper & references within)

Slide 4

Slide 4 text

The frequentist/Bayesian divide is fundamentally a question of philosophy: the deﬁnition of probability.

Slide 5

Slide 5 text

What is probability? Fundamentally related to the frequencies of repeated events. - Frequentists Fundamentally related to our own certainty or uncertainty of events. - Bayesians

Slide 6

Slide 6 text

Thus we analyze… Variation of data & derived quantities in terms of ﬁxed model parameters. - Frequentists Variation of beliefs about parameters in terms of ﬁxed observed data. - Bayesians

Slide 7

Slide 7 text

Simple Example: Photon Flux Given the observed data, what is the best estimate of the true value?

Slide 8

Slide 8 text

Frequentist Approach: Maximum Likelihood Model: each observation Fi drawn from a Gaussian of width ei

Slide 9

Slide 9 text

Building the Likelihood…

Slide 10

Slide 10 text

Building the Likelihood…

Slide 11

Slide 11 text

Building the Likelihood…

Slide 12

Slide 12 text

Building the Likelihood…

Slide 13

Slide 13 text

Building the Likelihood…

Slide 14

Slide 14 text

Building the Likelihood…

Slide 15

Slide 15 text

Building the Likelihood…

Slide 16

Slide 16 text

Building the Likelihood…

Slide 17

Slide 17 text

Building the Likelihood…

Slide 18

Slide 18 text

Building the Likelihood…

Slide 19

Slide 19 text

Building the Likelihood…

Slide 20

Slide 20 text

Building the Likelihood…

Slide 21

Slide 21 text

Building the Likelihood…

Slide 22

Slide 22 text

“Maximum Likelihood” estimate…

Slide 23

Slide 23 text

Analytically maximize to ﬁnd: Frequentist Point Estimate:

Slide 24

Slide 24 text

Analytically maximize to ﬁnd: Frequentist Point Estimate: For our 30 data points, we have 999 +/- 4 In Python:

Slide 25

Slide 25 text

Bayesian Approach: Posterior Probability Compute our knowledge of F given the data, encoded as a probability: To compute this, we use Bayes’ Theorem

Slide 26

Slide 26 text

Bayes’ Theorem Posterior Likelihood Prior Model Evidence

Slide 27

Slide 27 text

Bayes’ Theorem Posterior Likelihood Prior Model Evidence (Often simply a normalization, but useful for model evaluation, etc.) Again, we ﬁnd 999 +/- 4

Slide 28

Slide 28 text

For very simple problems, frequentist & Bayesian results are often practically indistinguishable

Slide 29

Slide 29 text

The diﬀerence becomes apparent in more complicated situations… -  Handling of nuisance parameters -  Interpretation of Uncertainty -  Incorporation of prior information -  Comparison & evaluation of Models -  etc.

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Example 1: Nuisance Parameters

Slide 32

Slide 32 text

Nuisance Parameters: Bayes’ Billiard Game Alice and Bob have a gambling problem… Bayes 1763 Eddy 2004

Slide 33

Slide 33 text

Nuisance Parameters: Bayes’ Billiard Game Carol has designed a game for them to play…

Slide 34

Slide 34 text

Bob’s Area Alice’s Area Nuisance Parameters: Bayes’ Billiard Game -  The ﬁrst ball divides the table -  Additional balls give a point to A or B -  First person to six points wins

Slide 35

Slide 35 text

Bob’s Area Alice’s Area “A Black Box” Nuisance Parameters: Bayes’ Billiard Game -  The ﬁrst ball divides the table -  Additional balls give a point to A or B -  First person to six points wins

Slide 36

Slide 36 text

Bob’s Area Alice’s Area “A Black Box” Nuisance Parameters: Bayes’ Billiard Game Question: in a certain game, Alice has 5 points and Bob has 3. What are the odds that Bob will go on to win?

Slide 37

Slide 37 text

Bob’s Area Alice’s Area “A Black Box” Nuisance Parameters: Bayes’ Billiard Game Note: the division of the table is a nuisance parameter: a parameter which aﬀects the problem and must be accounted for, but is not of immediate interest.

Slide 38

Slide 38 text

A Frequentist Approach p = probability of Alice winning any roll (nuisance parameter) Maximum likelihood estimate gives Probability of Bob winning (he needs 3 points): P(B) = 0.053; Odds of 18 to 1 against

Slide 39

Slide 39 text

A Bayesian Approach Marginalization: B = Bob wins D = observed data Some algebraic manipulation… Find P(B|D) = 0.091; odds of 10 to 1 against

Slide 40

Slide 40 text

Bayes’ Billiard Game Results: Frequentist: 18 to 1 odds Bayesian: 10 to 1 odds

Slide 41

Slide 41 text

Bayes’ Billiard Game Results: Frequentist: 18 to 1 odds Bayesian: 10 to 1 odds Diﬀerence: Bayes approach allows nuisance parameters to vary, through marginalization.

Slide 42

Slide 42 text

Conditioning vs. Marginalization p B Conditioning (akin to Frequentist here) B

Slide 43

Slide 43 text

Conditioning vs. Marginalization p B Marginalization (Bayesian approach here) B

Slide 44

Slide 44 text

Example 2: Uncertainties

Slide 45

Slide 45 text

Uncertainties: “Conﬁdence” vs “Credibility” “If this experiment is repeated many times, in 95% of these cases the computed conﬁdence interval will contain the true θ.” - Frequentists

Slide 46

Slide 46 text

Uncertainties: “Conﬁdence” vs “Credibility” “If this experiment is repeated many times, in 95% of these cases the computed conﬁdence interval will contain the true θ.” - Frequentists “Given our observed data, there is a 95% probability that the value of θ lies within the credible region”. - Bayesians

Slide 47

Slide 47 text

Slide 48

Slide 48 text

Uncertainties: Jaynes’ Truncated Exponential Consider a model: We observe D = {10, 12, 15} What are the 95% bounds on Θ? Jaynes 1976

Slide 49

Slide 49 text

Common-sense Approach D = {10, 12, 15} Each point must be greater than Θ, and the smallest observed point is x = 10. Therefore we can immediately write the common-sense bound Θ < 10

Slide 50

Slide 50 text

Frequentist Approach The expectation of x is: So an unbiased estimator is: Now we compute the sampling distribution of the mean for p(x):

Slide 51

Slide 51 text

Frequentist Approach The expectation of x is: So an unbiased estimator is: 95% conﬁdence interval: 10.2 < Θ < 12.2 Now we compute the sampling distribution of the mean for p(x):

Slide 52

Slide 52 text

Bayesian Approach Bayes’ Theorem: Likelihood: With a ﬂat prior, we get this posterior:

Slide 53

Slide 53 text

Bayesian Approach Bayes’ Theorem: Likelihood: 95% credible region: 9.0 < Θ < 10.0 With a ﬂat prior, we get this posterior:

Slide 54

Slide 54 text

Jaynes’ Truncated Exponential Results: Common Sense Bound: Θ < 10 Frequentist unbiased 95% conﬁdence interval: 10.2 < Θ < 12.2 Bayesian 95% credible region: 9.0 < Θ < 10.0

Slide 55

Slide 55 text

Frequentism is not wrong! It’s just answering a diﬀerent question than we might expect.

Slide 56

Slide 56 text

Confidence vs. Credibility Bayesianism: probabilisitic statement about model parameters given a fixed credible region Frequentism: probabilistic statement about a recipe for generating confidence intervals given a fixed model parameter

Slide 57

Slide 57 text

Conﬁdence vs. Credibility Bayesian Credible Region: = Parameter = Interval

Slide 58

Slide 58 text

Conﬁdence vs. Credibility Bayesian Credible Region: Frequentist Conﬁdence Interval: = Parameter = Interval

Slide 59

Slide 59 text

Conﬁdence vs. Credibility Bayesian Credible Region: Frequentist Conﬁdence Interval: = Parameter = Interval Our Particular Interval

Slide 60

Slide 60 text

Please Remember This: In general, a frequentist 95% Conﬁdence Interval is not 95% likely to contain the true value! This very common mistake is a Bayesian interpretation of a frequentist construct.

Slide 61

Slide 61 text

Typical Conversation: Statistician: “95% of such conﬁdence intervals in repeated experiments will contain the true value” Scientist: “So there’s a 95% chance that the value is in this interval?”

Slide 62

Slide 62 text

Typical Conversation: Statistician: “No: you see, parameters by deﬁnition can’t vary, so referring to chance in that context is meaningless. The 95% refers to the interval itself.” Scientist: “Oh, so there’s a 95% chance that the value is in this interval?”

Slide 63

Slide 63 text

Typical Conversation: Statistician: “No. It’s this: the long-term limiting frequency of the procedure for constructing this interval ensures that 95% of the resulting ensemble of intervals contains the value. Scientist: “Ah, I see: so there’s a 95% chance that the value is in this interval, right?”

Slide 64

Slide 64 text

Typical Conversation: Statistician: “No… it’s that… well… just write down what I said, OK?” Scientist: “OK, got it. The value is 95% likely to be in the interval.”

Slide 65

Slide 65 text

(Editorial aside…) Non-statisticians naturally understand uncertainty in a Bayesian manner. Wouldn’t it be less confusing if we simply used Bayesian methods?

Slide 66

Slide 66 text

A more practical example…

Slide 67

Slide 67 text

Final Example: Line of Best Fit

Slide 68

Slide 68 text

Final Example: Line of Best Fit The Model: Bayesian Approach uses Bayes’ Theorem:

Slide 69

Slide 69 text

Final Example: Line of Best Fit The Prior: Is a ﬂat prior on the slope appropriate?

Slide 70

Slide 70 text

Final Example: Line of Best Fit The Prior: Is a ﬂat prior on the slope appropriate? No!

Slide 71

Slide 71 text

Final Example: Line of Best Fit By symmetry arguments, we can motivate the following uninformative prior: Or equivalently, a ﬂat prior on these:

Slide 72

Slide 72 text

Frequentist Result: StatsModels

Slide 73

Slide 73 text

frequentist

Slide 74

Slide 74 text

Bayesian Result: emcee (1/2)

Slide 75

Slide 75 text

Bayesian Result: emcee (2/2)

Slide 76

Slide 76 text

frequentist emcee

Slide 77

Slide 77 text

Bayesian Result: pymc (1/2)

Slide 78

Slide 78 text

Bayesian Result: pymc (2/2)

Slide 79

Slide 79 text

frequentist emcee pyMC

Slide 80

Slide 80 text

Bayesian Result: PyStan (1/2)

Slide 81

Slide 81 text

Bayesian Result: PyStan (2/2)

Slide 82

Slide 82 text

frequentist emcee pyMC pyStan

Slide 83

Slide 83 text

Conclusion: -  Frequentism & Bayesianism fundamentally differ in their definition of probability. -  Results are similar for simple problems, but often differ for more complicated problems. -  Bayesianism provides a more natural handling of nuisance parameters, and a more natural interpretation of errors. -  Both paradigms are useful in the right situation, but be careful to interpret the results (especially frequentist results) correctly!

Slide 84

Slide 84 text

[email protected] @jakevdp jakevdp http:/ /jakevdp.github.io Thank You! For more details on this topic, see the accompanying proceedings paper, or the blog posts at the above site