The likelihood function and its discontents

David W. Hogg — likelihood function and its discontents The
likelihood function and its discontents David W. Hogg NYU — MPIA — Flatiron

David W. Hogg — likelihood function and its discontents What
I’m not going to talk about 2

David W. Hogg — likelihood function and its discontents Spergel
changed my life • my appointment at NYU • my role at CCA • saving me from myself? ◦ (example: Gaia Sprints; see Price-Whelan talk) 3

David W. Hogg — likelihood function and its discontents What
I am going to talk about 4

likelihood function • p(data | model) 5

David W. Hogg — likelihood function and its discontents Everyone
needs this function • Frequentists: Optimize this to get a Cramér-Rao-bound-saturating estimator! • Bayesians: The only way to update your posterior pdf is with this function! 6

David W. Hogg — likelihood function and its discontents Ideal
case • data = (mean expectation) + noise • mean expectation is a deterministic function of model parameters • noise is from a known function (God help us if it isn’t Gaussian) • p(data | parameters) 7

David W. Hogg — likelihood function and its discontents Ideal
case: marginalization • p(data | nuisances, pars) • p(nuisances) • p(data | pars) = ∫ p(data | nuisances, pars) p(nuisances) d(nuisances) ◦ see arXiv:1205.4446 8

David W. Hogg — likelihood function and its discontents Aside:
The way I give talks is the way MK hears talks • Kamionkowski: All I hear is “blah blah blah blah blah Marc Kamionkowski blah blah blah blah.” • Hogg: All I say is “blah blah blah blah blah Hogg paper blah blah blah blah.” 9

David W. Hogg — likelihood function and its discontents You
don’t have to be Bayesian: proﬁling • p proﬁle (data | pars) = max nuisances p(data | nuisances, pars) • In the Gaussian likelihood case, this is identical to marginalization for some generic choice of p(nuisances)! 10

David W. Hogg — likelihood function and its discontents How
do you share data and results? • If someone else wants to combine your data with theirs, they need your likelihood function! • So when you share your data, either share your data… • ...or share something from which you can reconstruct the LF. 11

The Lutz-Kelker correction and H 0 • This is absurdly Inside Baseball, but • the Lutz-Kelker corrections for parallaxes deliver maximum-posterior estimators under a particular prior. • If you use them in a data analysis, you are including a prior pdf you don’t control. • They will bias your results. 12

The Gaia Collaboration is awesome • ESA Gaia isn’t releasing all of its raw data (yet). • But it is releasing parallaxes (say) and proper motions (say). • These are the parameters of a likelihood function! ◦ see, for example, arXiv:1804.07766 ◦ connection to negative parallaxes. 13

NASA WMAP and the Lambda Archive • The NASA Lambda Archive is about sharing likelihoods. 14

David W. Hogg — likelihood function and its discontents Gaussian
likelihoods are almost always approximations • Hennawi to Philcox: What do you do when the density field isn’t Gaussian, because then your likelihood function isn’t Gaussian? • Philcox to Hennawi: Actually even in the Gaussian-random-field case, the likelihood isn’t Gaussian! • CMB likelihood functions are close to Gaussian because of the central limit theorem, not because the field is Gaussian! 15

CMB and LSS projects use surrogate likelihoods • Do lots of simulations, compute mean and variance (tensor) on observables. • Use those as the mean and variance (tensor) of a likelihood function? • That’s a surrogate. ◦ It’s clever, wrong in detail, and ﬁne in high signal-to-noise circumstances. 16

David W. Hogg — likelihood function and its discontents Non-Gaussian
likelihood: Ideal case • Likelihood function is non-Gaussian but computationally tractable. • Frequentists: Optimize (the hard way). • Bayesians: Use MCMC. 17

David W. Hogg — likelihood function and its discontents MCMC
is a method of last resort • You must choose a prior pdf over all parameters. ◦ No, you can’t “sample your likelihood function”. ◦ No, ﬂat priors are not equivalent to “no priors”. ◦ No, you can’t undo your choice of priors later. • You must evaluate your LF (and priors) at many useless points. ◦ My most highly cited paper (arXiv:1202.3665) is on a method almost no-one should ever use! 18

point of science is not to ﬁgure out your posterior! • You might care about your posterior pdf. • But I don’t care about your posterior pdf. • I care about the likelihood function, evaluated at your data! ◦ (Because I have different knowledge...) ◦ (...and because I will get new data in the future.) 19

likelihood: Non-ideal case • The likelihood function is non-Gaussian and it is computationally intractable. • Example: In LSS, the LF for the galaxy positions ◦ (You would have to marginalize out all phases and all of galaxy formation.) 20

likelihood: Non-ideal case • Frequentist: Make sensibly and cleverly deﬁned estimators ◦ (like the Landy-Szalay estimator for the correlation function) ◦ (and see arXiv:2011.01836) • Bayesian: Likelihood-free inference 21

David W. Hogg — likelihood function and its discontents Likelihood-free
inference • Simulate the crap out of the Universe. • Choose the simulations that look really, really like the data. • This makes MCMC seem like a good idea. ◦ It inherits everything that’s bad about MCMC. ◦ It is (usually) way, way more expensive. 22

David W. Hogg — likelihood function and its discontents Surrogate
statistics • What does it mean for a simulation to look really, really like the data? • You have to choose some set of statistics (and a metric). • Choose them by intuition and cleverness! ◦ or... 23

David W. Hogg — likelihood function and its discontents Machine
learning • Here’s a responsible use of machine learning in astrophysics! • Use the machine to ﬁnd the statistics of the data that are most discriminatory about theory. • Lots of approaches and recent results. ◦ (see, eg, Zoltan Haiman’s slides from Friday!) • My take: Impose exact symmetries! ◦ arXiv:2106.06610 24

David W. Hogg — likelihood function and its discontents Summary
• The likelihood function is important. • If your likelihood function is hard to work with, I pity you. • Machine learning is likely to play a role in the future of data analysis. 25

The likelihood function and its discontents

The likelihood function and its discontents

David W Hogg

More Decks by David W Hogg

Other Decks in Science

Featured

Transcript

David W. Hogg — likelihood function and its discontents The

David W. Hogg — likelihood function and its discontents What

David W. Hogg — likelihood function and its discontents Spergel

David W. Hogg — likelihood function and its discontents What

David W. Hogg — likelihood function and its discontents The

David W. Hogg — likelihood function and its discontents Everyone

David W. Hogg — likelihood function and its discontents Ideal

David W. Hogg — likelihood function and its discontents Ideal

David W. Hogg — likelihood function and its discontents Aside:

David W. Hogg — likelihood function and its discontents You

David W. Hogg — likelihood function and its discontents How

David W. Hogg — likelihood function and its discontents Aside:

David W. Hogg — likelihood function and its discontents Aside:

David W. Hogg — likelihood function and its discontents Aside:

David W. Hogg — likelihood function and its discontents Gaussian

David W. Hogg — likelihood function and its discontents Aside:

David W. Hogg — likelihood function and its discontents Non-Gaussian

David W. Hogg — likelihood function and its discontents MCMC

David W. Hogg — likelihood function and its discontents The

David W. Hogg — likelihood function and its discontents Non-Gaussian

David W. Hogg — likelihood function and its discontents Non-Gaussian

David W. Hogg — likelihood function and its discontents Likelihood-free

David W. Hogg — likelihood function and its discontents Surrogate

David W. Hogg — likelihood function and its discontents Machine

David W. Hogg — likelihood function and its discontents Summary