Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DSRD.pdf

Ben Nelson
June 25, 2018
28

 DSRD.pdf

Talk about exoplanets and Bayesian statistics for NU Data Science Research Day

Ben Nelson

June 25, 2018
Tweet

Transcript

  1. Bayesian Methods for Detecting and Characterizing Planets Around Other Stars

    Benjamin Nelson @exobenelson NU Data Science Scholar Insight Data Science Fellow June 25, 2018 Data Science Research Day
  2. Statistical Challenges in Characterizing Exoplanets Is there evidence for a

    planet in my data? For one system, what are the planet properties (orbital period, eccentricity, mass, etc.)? What can be inferred from populations of exoplanets? June 25, 2018 Data Science Research Day frequentist vs. Bayesian model comparison sampling in ~10s of parameters hierarchical Bayesian modeling sampling in ~100s to 1000s of parameters
  3. Statistical Challenges in Characterizing Exoplanets Is there evidence for a

    planet in my data? For one system, what are the planet properties (orbital period, eccentricity, mass, etc.)? What can be inferred from populations of exoplanets? June 25, 2018 Data Science Research Day frequentist vs. Bayesian model comparison sampling in ~10s of parameters hierarchical Bayesian modeling sampling in ~100s to 1000s of parameters
  4. What does it mean to “discover” a planet? Frequentist Approach

    Reject the null hypothesis that a model without a planet could reasonably explain the data Bayesian Approach Evidence (i.e., marginalized likelihood) for a model with the planet is much greater than alternative models without the planet June 25, 2018 Data Science Research Day
  5. Computing the “evidence” for an n-planet model prior probability distribution

    likelihood function (i.e., sampling distribution) fully marginalized likelihood (i.e., Bayesian “evidence”) June 25, 2018 Data Science Research Day Z p(d|M) = p(θ|M)p(d|θ, M)dθ
  6. Thermodynamic integration (HD208487, Gregory 2007) Nested sampling / MultiNest (GJ667C,

    Feroz & Hobson 2014) Geometric path Monte Carlo (GJ581, Hou+ 2014) Transdimensional MCMC w/ nested sampling (ν Oph, Brewer & Donovan 2015) Importance sampling (GJ876, Nelson+ 2016; HD9174, Jenkins+ 2017) Computing the “evidence” for an n-planet model June 25, 2018 Data Science Research Day Z p(d|M) = p(θ|M)p(d|θ, M)dθ
  7. Evidence Challenge How accurately/precisely can one compute the “evidence” for

    {0, 1, 2, 3} planets in RV data, given a set of priors and likelihood function? June 25, 2018 Data Science Research Day
  8. Evidence Challenge How accurately/precisely can one compute the “evidence” for

    {0, 1, 2, 3} planets in RV data, given a set of priors and likelihood function? Z p(d|M) = p(θ)p(d|θ, M)dθ June 25, 2018 Data Science Research Day
  9. EPRV3 Evidence Challenge More details and results at: github.com/EPRV3EvidenceChallenge/ Methods

    teams submitted: Frequentist BIC leave-one-out cross-validation time-series cross-validation Bayesian Chib’s approximation Laplace approximation Laplace approximation + l1 periodogram Perrakis estimator importance sampling + MCMC importance sampling + variational Bayes nested sampling (MultiNest) nested sampling + MCMC diffusive nested sampling (DNest4) June 25, 2018 Data Science Research Day
  10. What different methods say about n vs n+1 planets dataset

    numbers log Odds Ratio Broad Narrow June 25, 2018 Data Science Research Day
  11. What different methods say about n vs n+1 planets dataset

    numbers log Odds Ratio Broad Narrow June 25, 2018 Data Science Research Day
  12. Statistical Challenges in Characterizing Exoplanets Is there evidence for a

    planet in my data? For one system, what are the planet properties (orbital period, eccentricity, mass, etc.)? What can be inferred from populations of exoplanets? June 25, 2018 Data Science Research Day frequentist vs. Bayesian model comparison sampling in ~10s of parameters hierarchical Bayesian modeling sampling in ~100s to 1000s of parameters
  13. June 25, 2018 Data Science Research Day Different ways to

    do MCMC Nelson, Ford, & Payne (2014) Radial velocity Using N-body Differential evolution Markov Chain Monte Carlo
  14. Hoffman & Gelman (2011) Carpenter+ (2017) Comparing the performance of

    these Python packages jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/ andrewgelman.com/2015/10/15/whats-the-one-thing-you-have-to-know-about-pystan-and-pymc-click-here-to-find-out/ June 25, 2018 Data Science Research Day Different ways to do MCMC Nelson, Ford, & Payne (2014) Radial velocity Using N-body Differential evolution Markov Chain Monte Carlo Goodman & Weare (2010) Foreman-Mackey+ (2013) affine-invariant ensemble sampler Salvatier, Wiecki, & Fonnesbeck (2016) wide variety of samplers Hamiltonian Monte Carlo No U-Turn Sampler
  15. Statistical Challenges in Characterizing Exoplanets Is there evidence for a

    planet in my data? For one system, what are the planet properties (orbital period, eccentricity, mass, etc.)? What can be inferred from populations of exoplanets? June 25, 2018 Data Science Research Day frequentist vs. Bayesian model comparison sampling in ~10s of parameters hierarchical Bayesian modeling sampling in ~100s to 1000s of parameters
  16. probabilistic graphical models with daft (pip install daft) Modeling Two

    Overlapping Populations June 25, 2018 Data Science Research Day population-level parameters individual-level parameters data
  17. Hamiltonian Monte Carlo arXiv: 1701.02434 + Sampler Hoffman & Gelman

    2011 June 25, 2018 Data Science Research Day Sampling a 300+ Dimensional Space
  18. Hamiltonian Monte Carlo arXiv: 1701.02434 + Sampler Hoffman & Gelman

    2011 June 25, 2018 Data Science Research Day Sampling a 300+ Dimensional Space
  19. June 25, 2018 Data Science Research Day Multiple HJ populations

    can be inferred from current data. RV+Kepler data are well explained as a single population with xl ≈ 2 For HAT+WASP data... 85% consistent with high-e migration history 15% consistent with disk migration history Within the limitations of our chosen models…
  20. June 25, 2018 Data Science Research Day Want to play

    around with different sampling methods? chi-feng.github.io/mcmc-demo/