Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The likelihood function and its discontents

David W Hogg
October 24, 2021

The likelihood function and its discontents

Talk given at Spergelfest, 2021 October.

David W Hogg

October 24, 2021
Tweet

More Decks by David W Hogg

Other Decks in Science

Transcript

  1. David W. Hogg — likelihood function and its discontents
    The likelihood function
    and its discontents
    David W. Hogg
    NYU — MPIA — Flatiron

    View Slide

  2. David W. Hogg — likelihood function and its discontents
    What I’m not going to talk about
    2

    View Slide

  3. David W. Hogg — likelihood function and its discontents
    Spergel changed my life
    ● my appointment at NYU
    ● my role at CCA
    ● saving me from myself?
    ○ (example: Gaia Sprints; see Price-Whelan talk)
    3

    View Slide

  4. David W. Hogg — likelihood function and its discontents
    What I am going to talk about
    4

    View Slide

  5. David W. Hogg — likelihood function and its discontents
    The likelihood function
    ● p(data | model)
    5

    View Slide

  6. David W. Hogg — likelihood function and its discontents
    Everyone needs this function
    ● Frequentists: Optimize this to get a Cramér-Rao-bound-saturating estimator!
    ● Bayesians: The only way to update your posterior pdf is with this function!
    6

    View Slide

  7. David W. Hogg — likelihood function and its discontents
    Ideal case
    ● data = (mean expectation) + noise
    ● mean expectation is a deterministic function of model parameters
    ● noise is from a known function (God help us if it isn’t Gaussian)
    ● p(data | parameters)
    7

    View Slide

  8. David W. Hogg — likelihood function and its discontents
    Ideal case: marginalization
    ● p(data | nuisances, pars)
    ● p(nuisances)
    ● p(data | pars) = ∫ p(data | nuisances, pars) p(nuisances) d(nuisances)
    ○ see arXiv:1205.4446
    8

    View Slide

  9. David W. Hogg — likelihood function and its discontents
    Aside: The way I give talks is the way MK hears talks
    ● Kamionkowski: All I hear is “blah blah blah blah blah Marc Kamionkowski blah
    blah blah blah.”
    ● Hogg: All I say is “blah blah blah blah blah Hogg paper blah blah blah blah.”
    9

    View Slide

  10. David W. Hogg — likelihood function and its discontents
    You don’t have to be Bayesian: profiling
    ● p
    profile
    (data | pars) = max
    nuisances
    p(data | nuisances, pars)
    ● In the Gaussian likelihood case, this is identical to marginalization for some
    generic choice of p(nuisances)!
    10

    View Slide

  11. David W. Hogg — likelihood function and its discontents
    How do you share data and results?
    ● If someone else wants to combine your data with theirs, they need your
    likelihood function!
    ● So when you share your data, either share your data…
    ● ...or share something from which you can reconstruct the LF.
    11

    View Slide

  12. David W. Hogg — likelihood function and its discontents
    Aside: The Lutz-Kelker correction and H
    0
    ● This is absurdly Inside Baseball, but
    ● the Lutz-Kelker corrections for parallaxes deliver maximum-posterior
    estimators under a particular prior.
    ● If you use them in a data analysis, you are including a prior pdf you don’t
    control.
    ● They will bias your results.
    12

    View Slide

  13. David W. Hogg — likelihood function and its discontents
    Aside: The Gaia Collaboration is awesome
    ● ESA Gaia isn’t releasing all of its raw data (yet).
    ● But it is releasing parallaxes (say) and proper motions (say).
    ● These are the parameters of a likelihood function!
    ○ see, for example, arXiv:1804.07766
    ○ connection to negative parallaxes.
    13

    View Slide

  14. David W. Hogg — likelihood function and its discontents
    Aside: NASA WMAP and the Lambda Archive
    ● The NASA Lambda Archive is about sharing likelihoods.
    14

    View Slide

  15. David W. Hogg — likelihood function and its discontents
    Gaussian likelihoods are almost always approximations
    ● Hennawi to Philcox: What do you do when the density field isn’t Gaussian,
    because then your likelihood function isn’t Gaussian?
    ● Philcox to Hennawi: Actually even in the Gaussian-random-field case, the
    likelihood isn’t Gaussian!
    ● CMB likelihood functions are close to Gaussian because of the central limit
    theorem, not because the field is Gaussian!
    15

    View Slide

  16. David W. Hogg — likelihood function and its discontents
    Aside: CMB and LSS projects use surrogate likelihoods
    ● Do lots of simulations, compute mean and variance (tensor) on observables.
    ● Use those as the mean and variance (tensor) of a likelihood function?
    ● That’s a surrogate.
    ○ It’s clever, wrong in detail, and fine in high signal-to-noise circumstances.
    16

    View Slide

  17. David W. Hogg — likelihood function and its discontents
    Non-Gaussian likelihood: Ideal case
    ● Likelihood function is non-Gaussian but computationally tractable.
    ● Frequentists: Optimize (the hard way).
    ● Bayesians: Use MCMC.
    17

    View Slide

  18. David W. Hogg — likelihood function and its discontents
    MCMC is a method of last resort
    ● You must choose a prior pdf over all parameters.
    ○ No, you can’t “sample your likelihood function”.
    ○ No, flat priors are not equivalent to “no priors”.
    ○ No, you can’t undo your choice of priors later.
    ● You must evaluate your LF (and priors) at many useless points.
    ○ My most highly cited paper (arXiv:1202.3665) is on a method almost no-one should ever use!
    18

    View Slide

  19. David W. Hogg — likelihood function and its discontents
    The point of science is not to figure out your posterior!
    ● You might care about your posterior pdf.
    ● But I don’t care about your posterior pdf.
    ● I care about the likelihood function, evaluated at your data!
    ○ (Because I have different knowledge...)
    ○ (...and because I will get new data in the future.)
    19

    View Slide

  20. David W. Hogg — likelihood function and its discontents
    Non-Gaussian likelihood: Non-ideal case
    ● The likelihood function is non-Gaussian and it is computationally intractable.
    ● Example: In LSS, the LF for the galaxy positions
    ○ (You would have to marginalize out all phases and all of galaxy formation.)
    20

    View Slide

  21. David W. Hogg — likelihood function and its discontents
    Non-Gaussian likelihood: Non-ideal case
    ● Frequentist: Make sensibly and cleverly defined estimators
    ○ (like the Landy-Szalay estimator for the correlation function)
    ○ (and see arXiv:2011.01836)
    ● Bayesian: Likelihood-free inference
    21

    View Slide

  22. David W. Hogg — likelihood function and its discontents
    Likelihood-free inference
    ● Simulate the crap out of the Universe.
    ● Choose the simulations that look really, really like the data.
    ● This makes MCMC seem like a good idea.
    ○ It inherits everything that’s bad about MCMC.
    ○ It is (usually) way, way more expensive.
    22

    View Slide

  23. David W. Hogg — likelihood function and its discontents
    Surrogate statistics
    ● What does it mean for a simulation to look really, really like the data?
    ● You have to choose some set of statistics (and a metric).
    ● Choose them by intuition and cleverness!
    ○ or...
    23

    View Slide

  24. David W. Hogg — likelihood function and its discontents
    Machine learning
    ● Here’s a responsible use of machine learning in astrophysics!
    ● Use the machine to find the statistics of the data that are most discriminatory
    about theory.
    ● Lots of approaches and recent results.
    ○ (see, eg, Zoltan Haiman’s slides from Friday!)
    ● My take: Impose exact symmetries!
    ○ arXiv:2106.06610
    24

    View Slide

  25. David W. Hogg — likelihood function and its discontents
    Summary
    ● The likelihood function is important.
    ● If your likelihood function is hard to work with, I pity you.
    ● Machine learning is likely to play a role in the future of data analysis.
    25

    View Slide