$30 off During Our Annual Pro Sale. View Details »

Gaussian process models of correlated noise in microlensing light curves

Gaussian process models of correlated noise in microlensing light curves

Talk at 23rd international microlensing conference at Flatiron Institute in NYC.

Fran Bartolić

January 30, 2019
Tweet

More Decks by Fran Bartolić

Other Decks in Science

Transcript

  1. Gaussian process models of correlated noise in microlensing light curves

    23rd International Microlensing Conference @ Flatiron CCA Fran Bartolić University of St Andrews fbartolic
  2. 2 Introduction + = Observed data Deterministic forward model Probabilistic

    noise model
  3. 3 Introduction Deterministic forward model Probabilistic noise model Likelihood (assuming

    independent noise) Mean function is the number of data points are the measured fluxes are the reported error bars
  4. Point Source Point Lens Deterministic forward model 4 • Reparametrization

    necessary for efficient sampling Mean function Parameters
  5. Point Source Point Lens Deterministic forward model 5 Change of

    Parametrization (Dominik 2009)
  6. 6 The likelihood function Covariance matrix Probabilistic noise model Likelihood

    (assuming independent noise) or or
  7. 7 Gaussian processes (GPs) - theory • Formally, GPs are

    probability distributions over functions • In practice, GPs are multivariate Gaussians with a covariance matrix specified by a covariance function • GPs are used extensively in machine learning, statistics, and astronomy for both regression and classification problems Discrete Continuous Covariance (kernel) function
  8. 8 Gaussian processes (GPs) - theory • The covariance function

    models the covariance between any two points • Covariance functions are parametrized with “hyperparameters” Squared exponential kernel
  9. 9 Gaussian processes (GPs) - theory Matern 3/2 kernel •

    The covariance function models the covariance between any two points • Covariance functions are parametrized with “hyperparameters”
  10. 10 Gaussian processes (GPs) - likelihood function Probabilistic noise model

    • Inverting a matrix scales as ! • Fortunately, Dan Foreman-Mackey’s celerite library does GPs in ! ! • Other approximate schemes such as variationally sparse GPs as implemented in GPFlow are potentially interesting
  11. 11 Priors • No such thing as non-informative prior •

    Priors can only be understood in the context of the likelihood • Prior predictive checks are a great way of testing assumptions • Using Bayesian methods and GPs doesn’t mean you can’t overfit, need to use informative prior for length scale hyperparameter (Geir-Arne Fuglstad et. al. 2018) Inverse Gamma prior
  12. 12 Sampling the posterior with Hamiltonian Monte Carlo (HMC) •

    To sample the posterior, I use Hamiltonian Monte Carlo (HMC), it’s orders of magnitude more efficient than Metropolis-Hastings or affine invariant samplers (emcee) • HMC is pretty much the only thing working in high dimensions (10s to 100s of parameters) • HMC requires gradient of log-likelihood w.r. to all model parameters, automatic differentiation is key • Don’t write your own HMC sampler, use existing libraries such as PyMC3, Stan, or TensorFlow, these will complain when sampling fails which happens very often! Bayes’ theorem Model parameters
  13. 13 Results

  14. 14 Results • Generally, including a GP in the model

    leads to a different posterior over the physical parameters of interest • Just how different the posterior is depends on data quality
  15. 15 Extending the model • How to deal with outliers?

    - robust GPs with Student T noise, mixture model, sigma clipping? • How to deal with reported error bars? - hierarchical model for rescaling factors? • How to incorporate other information in the noise model? - Simultaneously fitting GPs to other stars in the field, tractable approximations of multi-dimensional GPs? • GPs with binary lens events? - Sort out the model without GPs first
  16. 16 Take home messages • Modeling assumptions matter • If

    you’re doing Bayesian analysis state your likelihood function and your priors • Clever parametrizations can speed up MCMC by several orders of magnitude • GPs provide elegant framework for handling correlated noise, recent innovations make them computationally tractable • If you want to use gradient based optimizers or samplers, look into machine learning frameworks such as TensorFlow and PyTorch Hack session ideas: • Microlensing data handling infrastructure, cross-matching catalogs • Interfacing VBBinaryLensing with Dnest4 Diffusive Nested Sampling code • Forward modeling light curves with inverse-ray shooting algorithm built on TensorFlow
  17. 17 Additional slides - ugly posteriors