Slide 1

Slide 1 text

David W. Hogg — likelihood function and its discontents The likelihood function and its discontents David W. Hogg NYU — MPIA — Flatiron

Slide 2

Slide 2 text

David W. Hogg — likelihood function and its discontents What I’m not going to talk about 2

Slide 3

Slide 3 text

David W. Hogg — likelihood function and its discontents Spergel changed my life ● my appointment at NYU ● my role at CCA ● saving me from myself? ○ (example: Gaia Sprints; see Price-Whelan talk) 3

Slide 4

Slide 4 text

David W. Hogg — likelihood function and its discontents What I am going to talk about 4

Slide 5

Slide 5 text

David W. Hogg — likelihood function and its discontents The likelihood function ● p(data | model) 5

Slide 6

Slide 6 text

David W. Hogg — likelihood function and its discontents Everyone needs this function ● Frequentists: Optimize this to get a Cramér-Rao-bound-saturating estimator! ● Bayesians: The only way to update your posterior pdf is with this function! 6

Slide 7

Slide 7 text

David W. Hogg — likelihood function and its discontents Ideal case ● data = (mean expectation) + noise ● mean expectation is a deterministic function of model parameters ● noise is from a known function (God help us if it isn’t Gaussian) ● p(data | parameters) 7

Slide 8

Slide 8 text

David W. Hogg — likelihood function and its discontents Ideal case: marginalization ● p(data | nuisances, pars) ● p(nuisances) ● p(data | pars) = ∫ p(data | nuisances, pars) p(nuisances) d(nuisances) ○ see arXiv:1205.4446 8

Slide 9

Slide 9 text

David W. Hogg — likelihood function and its discontents Aside: The way I give talks is the way MK hears talks ● Kamionkowski: All I hear is “blah blah blah blah blah Marc Kamionkowski blah blah blah blah.” ● Hogg: All I say is “blah blah blah blah blah Hogg paper blah blah blah blah.” 9

Slide 10

Slide 10 text

David W. Hogg — likelihood function and its discontents You don’t have to be Bayesian: profiling ● p profile (data | pars) = max nuisances p(data | nuisances, pars) ● In the Gaussian likelihood case, this is identical to marginalization for some generic choice of p(nuisances)! 10

Slide 11

Slide 11 text

David W. Hogg — likelihood function and its discontents How do you share data and results? ● If someone else wants to combine your data with theirs, they need your likelihood function! ● So when you share your data, either share your data… ● ...or share something from which you can reconstruct the LF. 11

Slide 12

Slide 12 text

David W. Hogg — likelihood function and its discontents Aside: The Lutz-Kelker correction and H 0 ● This is absurdly Inside Baseball, but ● the Lutz-Kelker corrections for parallaxes deliver maximum-posterior estimators under a particular prior. ● If you use them in a data analysis, you are including a prior pdf you don’t control. ● They will bias your results. 12

Slide 13

Slide 13 text

David W. Hogg — likelihood function and its discontents Aside: The Gaia Collaboration is awesome ● ESA Gaia isn’t releasing all of its raw data (yet). ● But it is releasing parallaxes (say) and proper motions (say). ● These are the parameters of a likelihood function! ○ see, for example, arXiv:1804.07766 ○ connection to negative parallaxes. 13

Slide 14

Slide 14 text

David W. Hogg — likelihood function and its discontents Aside: NASA WMAP and the Lambda Archive ● The NASA Lambda Archive is about sharing likelihoods. 14

Slide 15

Slide 15 text

David W. Hogg — likelihood function and its discontents Gaussian likelihoods are almost always approximations ● Hennawi to Philcox: What do you do when the density field isn’t Gaussian, because then your likelihood function isn’t Gaussian? ● Philcox to Hennawi: Actually even in the Gaussian-random-field case, the likelihood isn’t Gaussian! ● CMB likelihood functions are close to Gaussian because of the central limit theorem, not because the field is Gaussian! 15

Slide 16

Slide 16 text

David W. Hogg — likelihood function and its discontents Aside: CMB and LSS projects use surrogate likelihoods ● Do lots of simulations, compute mean and variance (tensor) on observables. ● Use those as the mean and variance (tensor) of a likelihood function? ● That’s a surrogate. ○ It’s clever, wrong in detail, and fine in high signal-to-noise circumstances. 16

Slide 17

Slide 17 text

David W. Hogg — likelihood function and its discontents Non-Gaussian likelihood: Ideal case ● Likelihood function is non-Gaussian but computationally tractable. ● Frequentists: Optimize (the hard way). ● Bayesians: Use MCMC. 17

Slide 18

Slide 18 text

David W. Hogg — likelihood function and its discontents MCMC is a method of last resort ● You must choose a prior pdf over all parameters. ○ No, you can’t “sample your likelihood function”. ○ No, flat priors are not equivalent to “no priors”. ○ No, you can’t undo your choice of priors later. ● You must evaluate your LF (and priors) at many useless points. ○ My most highly cited paper (arXiv:1202.3665) is on a method almost no-one should ever use! 18

Slide 19

Slide 19 text

David W. Hogg — likelihood function and its discontents The point of science is not to figure out your posterior! ● You might care about your posterior pdf. ● But I don’t care about your posterior pdf. ● I care about the likelihood function, evaluated at your data! ○ (Because I have different knowledge...) ○ (...and because I will get new data in the future.) 19

Slide 20

Slide 20 text

David W. Hogg — likelihood function and its discontents Non-Gaussian likelihood: Non-ideal case ● The likelihood function is non-Gaussian and it is computationally intractable. ● Example: In LSS, the LF for the galaxy positions ○ (You would have to marginalize out all phases and all of galaxy formation.) 20

Slide 21

Slide 21 text

David W. Hogg — likelihood function and its discontents Non-Gaussian likelihood: Non-ideal case ● Frequentist: Make sensibly and cleverly defined estimators ○ (like the Landy-Szalay estimator for the correlation function) ○ (and see arXiv:2011.01836) ● Bayesian: Likelihood-free inference 21

Slide 22

Slide 22 text

David W. Hogg — likelihood function and its discontents Likelihood-free inference ● Simulate the crap out of the Universe. ● Choose the simulations that look really, really like the data. ● This makes MCMC seem like a good idea. ○ It inherits everything that’s bad about MCMC. ○ It is (usually) way, way more expensive. 22

Slide 23

Slide 23 text

David W. Hogg — likelihood function and its discontents Surrogate statistics ● What does it mean for a simulation to look really, really like the data? ● You have to choose some set of statistics (and a metric). ● Choose them by intuition and cleverness! ○ or... 23

Slide 24

Slide 24 text

David W. Hogg — likelihood function and its discontents Machine learning ● Here’s a responsible use of machine learning in astrophysics! ● Use the machine to find the statistics of the data that are most discriminatory about theory. ● Lots of approaches and recent results. ○ (see, eg, Zoltan Haiman’s slides from Friday!) ● My take: Impose exact symmetries! ○ arXiv:2106.06610 24

Slide 25

Slide 25 text

David W. Hogg — likelihood function and its discontents Summary ● The likelihood function is important. ● If your likelihood function is hard to work with, I pity you. ● Machine learning is likely to play a role in the future of data analysis. 25