Fran Bartolić
January 30, 2019
56

Gaussian process models of correlated noise in microlensing light curves

Talk at 23rd international microlensing conference at Flatiron Institute in NYC.

January 30, 2019

Transcript

1. Gaussian process models of correlated noise in microlensing light curves

23rd International Microlensing Conference @ Flatiron CCA Fran Bartolić University of St Andrews fbartolic

noise model
3. 3 Introduction Deterministic forward model Probabilistic noise model Likelihood (assuming

independent noise) Mean function is the number of data points are the measured fluxes are the reported error bars
4. Point Source Point Lens Deterministic forward model 4 • Reparametrization

necessary for efficient sampling Mean function Parameters
5. Point Source Point Lens Deterministic forward model 5 Change of

Parametrization (Dominik 2009)
6. 6 The likelihood function Covariance matrix Probabilistic noise model Likelihood

(assuming independent noise) or or
7. 7 Gaussian processes (GPs) - theory • Formally, GPs are

probability distributions over functions • In practice, GPs are multivariate Gaussians with a covariance matrix specified by a covariance function • GPs are used extensively in machine learning, statistics, and astronomy for both regression and classification problems Discrete Continuous Covariance (kernel) function
8. 8 Gaussian processes (GPs) - theory • The covariance function

models the covariance between any two points • Covariance functions are parametrized with “hyperparameters” Squared exponential kernel
9. 9 Gaussian processes (GPs) - theory Matern 3/2 kernel •

The covariance function models the covariance between any two points • Covariance functions are parametrized with “hyperparameters”
10. 10 Gaussian processes (GPs) - likelihood function Probabilistic noise model

• Inverting a matrix scales as ! • Fortunately, Dan Foreman-Mackey’s celerite library does GPs in ! ! • Other approximate schemes such as variationally sparse GPs as implemented in GPFlow are potentially interesting
11. 11 Priors • No such thing as non-informative prior •

Priors can only be understood in the context of the likelihood • Prior predictive checks are a great way of testing assumptions • Using Bayesian methods and GPs doesn’t mean you can’t overfit, need to use informative prior for length scale hyperparameter (Geir-Arne Fuglstad et. al. 2018) Inverse Gamma prior
12. 12 Sampling the posterior with Hamiltonian Monte Carlo (HMC) •

To sample the posterior, I use Hamiltonian Monte Carlo (HMC), it’s orders of magnitude more efficient than Metropolis-Hastings or affine invariant samplers (emcee) • HMC is pretty much the only thing working in high dimensions (10s to 100s of parameters) • HMC requires gradient of log-likelihood w.r. to all model parameters, automatic differentiation is key • Don’t write your own HMC sampler, use existing libraries such as PyMC3, Stan, or TensorFlow, these will complain when sampling fails which happens very often! Bayes’ theorem Model parameters

14. 14 Results • Generally, including a GP in the model

leads to a different posterior over the physical parameters of interest • Just how different the posterior is depends on data quality
15. 15 Extending the model • How to deal with outliers?

- robust GPs with Student T noise, mixture model, sigma clipping? • How to deal with reported error bars? - hierarchical model for rescaling factors? • How to incorporate other information in the noise model? - Simultaneously fitting GPs to other stars in the field, tractable approximations of multi-dimensional GPs? • GPs with binary lens events? - Sort out the model without GPs first
16. 16 Take home messages • Modeling assumptions matter • If

you’re doing Bayesian analysis state your likelihood function and your priors • Clever parametrizations can speed up MCMC by several orders of magnitude • GPs provide elegant framework for handling correlated noise, recent innovations make them computationally tractable • If you want to use gradient based optimizers or samplers, look into machine learning frameworks such as TensorFlow and PyTorch Hack session ideas: • Microlensing data handling infrastructure, cross-matching catalogs • Interfacing VBBinaryLensing with Dnest4 Diffusive Nested Sampling code • Forward modeling light curves with inverse-ray shooting algorithm built on TensorFlow