David W Hogg
October 24, 2021
960

# The likelihood function and its discontents

Talk given at Spergelfest, 2021 October.

October 24, 2021

## Transcript

1. David W. Hogg — likelihood function and its discontents
The likelihood function
and its discontents
David W. Hogg
NYU — MPIA — Flatiron

2. David W. Hogg — likelihood function and its discontents
What I’m not going to talk about
2

3. David W. Hogg — likelihood function and its discontents
Spergel changed my life
● my appointment at NYU
● my role at CCA
● saving me from myself?
○ (example: Gaia Sprints; see Price-Whelan talk)
3

4. David W. Hogg — likelihood function and its discontents
What I am going to talk about
4

5. David W. Hogg — likelihood function and its discontents
The likelihood function
● p(data | model)
5

6. David W. Hogg — likelihood function and its discontents
Everyone needs this function
● Frequentists: Optimize this to get a Cramér-Rao-bound-saturating estimator!
● Bayesians: The only way to update your posterior pdf is with this function!
6

7. David W. Hogg — likelihood function and its discontents
Ideal case
● data = (mean expectation) + noise
● mean expectation is a deterministic function of model parameters
● noise is from a known function (God help us if it isn’t Gaussian)
● p(data | parameters)
7

8. David W. Hogg — likelihood function and its discontents
Ideal case: marginalization
● p(data | nuisances, pars)
● p(nuisances)
● p(data | pars) = ∫ p(data | nuisances, pars) p(nuisances) d(nuisances)
○ see arXiv:1205.4446
8

9. David W. Hogg — likelihood function and its discontents
Aside: The way I give talks is the way MK hears talks
● Kamionkowski: All I hear is “blah blah blah blah blah Marc Kamionkowski blah
blah blah blah.”
● Hogg: All I say is “blah blah blah blah blah Hogg paper blah blah blah blah.”
9

10. David W. Hogg — likelihood function and its discontents
You don’t have to be Bayesian: proﬁling
● p
proﬁle
(data | pars) = max
nuisances
p(data | nuisances, pars)
● In the Gaussian likelihood case, this is identical to marginalization for some
generic choice of p(nuisances)!
10

11. David W. Hogg — likelihood function and its discontents
How do you share data and results?
● If someone else wants to combine your data with theirs, they need your
likelihood function!
● ...or share something from which you can reconstruct the LF.
11

12. David W. Hogg — likelihood function and its discontents
Aside: The Lutz-Kelker correction and H
0
● This is absurdly Inside Baseball, but
● the Lutz-Kelker corrections for parallaxes deliver maximum-posterior
estimators under a particular prior.
● If you use them in a data analysis, you are including a prior pdf you don’t
control.
● They will bias your results.
12

13. David W. Hogg — likelihood function and its discontents
Aside: The Gaia Collaboration is awesome
● ESA Gaia isn’t releasing all of its raw data (yet).
● But it is releasing parallaxes (say) and proper motions (say).
● These are the parameters of a likelihood function!
○ see, for example, arXiv:1804.07766
○ connection to negative parallaxes.
13

14. David W. Hogg — likelihood function and its discontents
Aside: NASA WMAP and the Lambda Archive
● The NASA Lambda Archive is about sharing likelihoods.
14

15. David W. Hogg — likelihood function and its discontents
Gaussian likelihoods are almost always approximations
● Hennawi to Philcox: What do you do when the density ﬁeld isn’t Gaussian,
because then your likelihood function isn’t Gaussian?
● Philcox to Hennawi: Actually even in the Gaussian-random-ﬁeld case, the
likelihood isn’t Gaussian!
● CMB likelihood functions are close to Gaussian because of the central limit
theorem, not because the ﬁeld is Gaussian!
15

16. David W. Hogg — likelihood function and its discontents
Aside: CMB and LSS projects use surrogate likelihoods
● Do lots of simulations, compute mean and variance (tensor) on observables.
● Use those as the mean and variance (tensor) of a likelihood function?
● That’s a surrogate.
○ It’s clever, wrong in detail, and ﬁne in high signal-to-noise circumstances.
16

17. David W. Hogg — likelihood function and its discontents
Non-Gaussian likelihood: Ideal case
● Likelihood function is non-Gaussian but computationally tractable.
● Frequentists: Optimize (the hard way).
● Bayesians: Use MCMC.
17

18. David W. Hogg — likelihood function and its discontents
MCMC is a method of last resort
● You must choose a prior pdf over all parameters.
○ No, you can’t “sample your likelihood function”.
○ No, ﬂat priors are not equivalent to “no priors”.
○ No, you can’t undo your choice of priors later.
● You must evaluate your LF (and priors) at many useless points.
○ My most highly cited paper (arXiv:1202.3665) is on a method almost no-one should ever use!
18

19. David W. Hogg — likelihood function and its discontents
The point of science is not to ﬁgure out your posterior!
○ (Because I have different knowledge...)
○ (...and because I will get new data in the future.)
19

20. David W. Hogg — likelihood function and its discontents
Non-Gaussian likelihood: Non-ideal case
● The likelihood function is non-Gaussian and it is computationally intractable.
● Example: In LSS, the LF for the galaxy positions
○ (You would have to marginalize out all phases and all of galaxy formation.)
20

21. David W. Hogg — likelihood function and its discontents
Non-Gaussian likelihood: Non-ideal case
● Frequentist: Make sensibly and cleverly deﬁned estimators
○ (like the Landy-Szalay estimator for the correlation function)
○ (and see arXiv:2011.01836)
● Bayesian: Likelihood-free inference
21

22. David W. Hogg — likelihood function and its discontents
Likelihood-free inference
● Simulate the crap out of the Universe.
● Choose the simulations that look really, really like the data.
● This makes MCMC seem like a good idea.
○ It is (usually) way, way more expensive.
22

23. David W. Hogg — likelihood function and its discontents
Surrogate statistics
● What does it mean for a simulation to look really, really like the data?
● You have to choose some set of statistics (and a metric).
● Choose them by intuition and cleverness!
○ or...
23

24. David W. Hogg — likelihood function and its discontents
Machine learning
● Here’s a responsible use of machine learning in astrophysics!
● Use the machine to ﬁnd the statistics of the data that are most discriminatory