Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nathan Hara

Nathan Hara

(Université de Genève)

https://s3-seminar.github.io/seminars/nathan-hara/

Title — An optimal exoplanet detection criterion

Abstract — Over 4000 exoplanets have been detected so far. They have deeply transformed our understanding of planetary system formation, and expanded our possibilities to search for life outside of Earth. The smaller and the more distant to their host stars exoplanets are, the harder they are to detect. Earth « twins » orbiting solar-type stars are still out of reach because of very complex astrophysical and instrumental noises. To overcome this difficulty, we need new methods to analyse unevenly sampled, multi-variate time series: better models, computational methods and decision rules to claim detections. In this talk I will mostly focus on the latter aspect. Exoplanet detections are claimed based on the value of a statistical significance metric: if it is greater than a certain threshold, a detection is claimed. I will address the question of the optimal significance metric in the general setting of detection of parametric signals, and advocate for a Bayesian hypothesis testing framework where hypotheses are indexed by continuous variables.

Biography — Nathan Hara is a research fellow at the university of Geneva since 2017, which he joined after a PhD with Jacques Laskar and Gwenaël Boué at Paris Observatory. He works on statistical techniques to detect exoplanets, in particular Earth twins, which are prime candidates for the detection of life outside of Earth, and observational programs to unveil multi-planetary systems.

S³ Seminar

April 22, 2022
Tweet

More Decks by S³ Seminar

Other Decks in Science

Transcript

  1. An optimal detection criterion

    for parametric signals
    Nathan Hara

    Université de Genève
    GPR
    V

    Oxfor
    d

    29 March 2022
    With Thibault de Poyferré, Jean-Baptiste Delisle, Marc Ho
    ff
    mann
    Nicolas Unger, Rodrigo Díaz, Damien Ségransan

    View Slide

  2. Radial
    velocity
    Star
    Spectrograph
    Observer
    2

    View Slide

  3. Detecting exoplanets in RV data SCMA VII
    -3
    -2
    -1
    2
    0
    1
    ×10 -6
    z (AU)
    2
    3
    Motion of the star in the observational reference frame
    0 4
    ×10 -6
    y (AU)
    2
    ×10 -6
    x (AU)
    -2 0
    -2
    -4 -4
    Motion of the star
    To observer
    0 200 400 600 800 1000
    Time (days)
    -0.08
    -0.06
    -0.04
    -0.02
    0
    0.02
    0.04
    0.06
    0.08
    Velocity along z axis (m/s)
    Radial velocity as a function of time
    2
    10 1
    0
    5
    ×10 -5
    y (AU)
    -1
    -2
    x (AU)
    ×10 -6
    0
    Motion of the star in the observational reference frame
    -5
    1.5
    ×10 -5
    z (AU)
    1
    0.5
    0
    -0.5
    -1
    Motion of the star
    To observer
    0 200 400 600 800 1000
    Time (days)
    -0.5
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    Velocity along z axis (m/s)
    Radial velocity as a function of time
    Motion of the star Radial velocity
    Decomposition of the signal
    in periodic components


    The amplitude of a periodic
    component is proportional
    to the planet projected mass
    Radial velocities
    3

    View Slide

  4. Detecting exoplanets in RV data SCMA VII
    Signal shape
    The signal shape depends on the orbital eccentricity


    Nearly circular


    Eccentric


    Very eccentric


    Orbit


    Signal


    Star


    Planet


    Credit: Perryman 2011
    Radial velocity
    4

    View Slide

  5. Detecting exoplanets in RV data SCMA VII
    Data model
    Signal


    Credit: Perryman 2011
    Radial velocity


    Time-series Sum of Keplerian components Other deterministic terms
    AAADKnicjVHPaxQxGH0df7TWX6sevQSXwpaFZaaI9lIo9iJ4qeC2hU4dMmm6hk4mwyRTXIb9j/xPvHkpxasHPXq1l36J01ItRTPMzMv73nvJl+RVoayL45O56MbNW7fnF+4s3r13/8HD3qPHW9Y0tZBjYQpT7+TcykKVcuyUK+ROVUuu80Ju54cbvr59JGurTPnOTSu5p/mkVAdKcEdU1pOMTQduma2x1DY6LZRWzmatWktm79syq2bsTabYIBXGDo4yRdJharSc8EwtD2WmfIFdMGzI0gnXmnvg5EfXlkZZOct6/XgUh8GugqQDfXRj0/SOkWIfBgINNCRKOMIFOCw9u0gQoyJuDy1xNSEV6hIzLJK3IZUkBSf2kL4Tmu12bElzn2mDW9AqBb01ORmWyGNIVxP2q7FQb0KyZ6/LbkOm39uU/nmXpYl1+EDsv3znyv/1+V4cDrAaelDUUxUY353oUppwKn7n7FJXjhIq4jzep3pNWATn+Tmz4LGhd3+2PNS/B6Vn/Vx02gY//C7pgpO/r/Mq2FoZJS9GK2+f99dfdVe9gKd4hgHd50us4zU2Mabsz/iJXziNPkVfopPo629pNNd5nuCPEX07A6PbtiM=
    y(t) =
    np
    X
    i=1
    Ki(cos(vi(t) + !i) + ei cos !i) + + noise
    Radial velocity
    5
    Stochastic term

    View Slide

  6. Detecting exoplanets in RV data SCMA VII
    Noise I
    6
    Photon noise Instrumental systematics
    Example: SOPHIE drift as a function of time
    Credit: François
    Bouchy
    Nominal error bars


    + jitter (instrumental and/or stellar)
    Error on measurement =
    AAACxXicjVHLSsNAFD2Nr1pfVZdugkVwVZIi6rLoQpdVbCvUIsl0WofmxWRSKEX8Abf6a+If6F94Z0xBLaITkpw5954zc+/1k0CkynFeC9bc/MLiUnG5tLK6tr5R3txqpXEmGW+yOIjlte+lPBARbyqhAn6dSO6FfsDb/vBUx9sjLlMRR1dqnPBu6A0i0RfMU0RditJtueJUHbPsWeDmoIJ8NeLyC27QQwyGDCE4IijCATyk9HTgwkFCXBcT4iQhYeIc9yiRNqMsThkesUP6DmjXydmI9tozNWpGpwT0SlLa2CNNTHmSsD7NNvHMOGv2N++J8dR3G9Pfz71CYhXuiP1LN838r07XotDHsalBUE2JYXR1LHfJTFf0ze0vVSlySIjTuEdxSZgZ5bTPttGkpnbdW8/E30ymZvWe5bkZ3vUtacDuz3HOglat6h5WaxcHlfpJPuoidrCLfZrnEeo4RwNN8u7jEU94ts6s0FLW6DPVKuSabXxb1sMHh6+PiQ==
    i AAAC43icjVHLSgMxFD2O73fVpSCDRRCEMi2iLotuxJWC1YK1ZWYaa3BeJhlBijt37sStP+BW/0X8A/0Lb2IKPhDNMDPnnnvPSW5ukEVcKs976XP6BwaHhkdGx8YnJqemCzOzBzLNRchqYRqloh74kkU8YTXFVcTqmWB+HETsMDjb0vnDCyYkT5N9dZmx49jvJPyEh74iqlVYaMhzoboNyTux3+LNirvi2mCnWblqFYpeyTPL/QnKFhRh125aeEYDbaQIkSMGQwJFOIIPSc8RyvCQEXeMLnGCEDd5hiuMkTanKkYVPrFn9O1QdGTZhGLtKY06pF0iegUpXSyRJqU6QVjv5pp8bpw1+5t313jqs13SP7BeMbEKp8T+petV/lene1E4wYbpgVNPmWF0d6F1yc2t6JO7n7pS5JARp3Gb8oJwaJS9e3aNRpre9d36Jv9qKjWr49DW5njTp6QBl7+P8yc4qJTKa6XK3mqxumlHPYJ5LGKZ5rmOKraxixp5X+MBj3hymHPj3Dp3H6VOn9XM4cty7t8B2mmbUw==
    q
    2
    i
    + 2
    J
    Nominal jitter
    Not on all spectrographs!

    View Slide

  7. Detecting exoplanets in RV data SCMA VII
    Credit: NASA/SDO
    Convection cells on the surface of
    the star (granulation)
    Creates noise at the time-scale of the stellar
    rotation period
    Creates correlated noise
    Noise II: stellar activity
    7
    Saar & Donahue 1997, Meunier et al. 2010, Boisse et al.
    2011, Dumusque et al. 2014
    See also Cegla 2019
    From Dumusque et al. 2011
    Stochastic apparition of spots and faculae on
    the surface


    + Inhibition of the convective blueshift
    Approaching
    limb
    Receding


    limb
    Super


    granulation
    Meso


    granulation
    granulation
    P-modes

    View Slide

  8. Detecting exoplanets in RV data SCMA VII
    Stellar activity effect
    0 200 400 600 800 1000 1200 1400 1600
    Time (days)
    -20
    -10
    0
    10
    20
    RV (m/s)
    Ideal and noisy RV signals
    Observations
    Ideal signal
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.2
    0.4
    0.6
    Normalized RSS
    Ideal and noisy RV signals, Generalized Lomb-Scargle periodogram
    Observations
    Ideal signal
    8
    Simulated observations:


    System 1, RV
    fi
    tting challenge


    Dumusque et al. 2017
    Periods of the injected planets
    Low frequency structures
    Stellar

    rotation

    View Slide

  9. Detecting exoplanets in RV data SCMA VII
    • Unevenly sampled

    Close to ~1 day sampling
    step with missing samples

    • ~ 40 - 1000 data points

    • Corrupted by uncorrelated
    and complex correlated
    noise

    -15
    -10
    -5
    0
    5
    10
    15
    3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000
    ΔRV [m/s]
    Date (BJD - 2,450,000.0) [d]
    Radial velocity time-series: summary
    Main characteristics
    Objectives
    • How many planets?

    • With what orbital elements?
    9

    View Slide

  10. Detecting exoplanets in RV data SCMA VII
    Sun radial velocities


    observed by HARPS-N
    Expected signal due to
    the Earth
    The sun is observed as a planet hosting star
    Challenge II: dealing with the complex noises
    10
    Dumusque et al. 2021


    Collier-Cameron et al. 2019
    Credit: Annelies Mortier
    We need to deal with these
    complex noises to detect


    exo-Earths
    Credit: NASA/SDO
    + instrumental

    effects

    View Slide

  11. RV data analysis in a nutshell
    p(θ, η ∣ y) ≈ p(θ, η ∣ I, ̂
    RV ) =
    p(I, ̂
    RV ∣ θ, η)p(θ, η)
    p(I, ̂
    RV )
    ̂
    RV = RVcenter of mass
    + RVcontam
    + measurement error
    Planet parameters


    Other parameters


    Data:


    Shape variation indicators:
    θ
    η
    y
    I
    How to reduce the
    spectrum?
    What model do I use?
    How do I compute
    everything?
    Based on this, how
    do I take a decision
    on the number of
    planets?
    RV_center of mass is a pure Doppler shift


    Other effects also affect the spectral shape
    Upcoming review with Eric Ford

    View Slide

  12. RV data analysis in a nutshell
    reduce model compute decide
    (reduce model decide)
    compute

    View Slide

  13. Decision: how many planets?
    13
    Periodograms
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    Normalized RSS
    Generalized Lomb-Scargle periodogram 3 sines with SNR 10
    Periodogram
    True spectrum
    Tallest peak
    More precise but
    Fast, numerically stable but

    Looks for one planet at a time
    Much heavier computational
    workload, convergence not trivial to
    ensure, not giving information on the
    period

    AAADBnicjVHLSsNAFD3GV31XXboZLIKClFRE3Qg+EFwqWBWslEk66tB0EpKJWNru/RN37sStP+BW8Q/0L7wzjeAD0QlJzj33njNz53pRIBPtui89Tm9f/8Bgbmh4ZHRsfCI/OXWYhGnsi7IfBmF87PFEBFKJspY6EMdRLHjDC8SRV982+aNLEScyVAe6GYnTBj9X8kz6XBNVzW/uVFuKVbS40i0WBVwJnXQ6bJ1VpNIsmm+2K/pCaL7I1AKF3aBtghrrBtV8wS26drGfoJSBArK1F+afUUENIXykaEBAQRMOwJHQc4ISXETEnaJFXExI2rxAB8OkTalKUAUntk7fc4pOMlZRbDwTq/Zpl4DemJQMc6QJqS4mbHZjNp9aZ8P+5t2ynuZsTfp7mVeDWI0LYv/SfVT+V2d60TjDmu1BUk+RZUx3fuaS2lsxJ2efutLkEBFncI3yMWHfKj/umVlNYns3d8tt/tVWGtbEflab4s2ckgZc+j7On+BwqVhaKS7tLxc2trJR5zCDWczTPFexgV3soUzeN3jEE56da+fWuXPuu6VOT6aZxpflPLwD3j2osA==
    En planets =
    Z
    p(y|✓, n)p(✓|n)d✓
    Bayesian techniques
    Lomb 1976, Ferraz-Mello 1981, Scargle 1982, Baluev 2008,
    2009, 2013, 2015, Zechmeister & Küster 2009, Sulis 2016
    Gregory 2007, Gregory & Ford 2007,
    Tuomi et al. 2011, Diaz et al. 2016
    FAP < 0.1 %
    Evidence n + 1 planets
    Evidence n planets
    > 150

    View Slide

  14. Take 1: sparse recovery

    View Slide

  15. periodogram
    ℓ1
    Radial velocity data analysis with compressed
    sensing techniques Hara, Boué, Laskar Correia 2017
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    Normalized RSS
    Generalized Lomb-Scargle periodogram 3 sines with SNR 10
    Periodogram
    True spectrum
    Tallest peak
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.2
    0.4
    0.6
    0.8
    1
    RV (m/s)
    l1-periodogram 3 sines with SNR 10
    l1-periodogram
    True spectrum
    Interprétation
    Based on sparse recovery techniques

    (Chen & Donoho 1998)
    Nelson, Ford et al 2020

    6 systems with 200 points in 22s
    15
    Analytical estimate of false alarm
    probability
    E
    ff i
    cient modelling of correlated noise I,

    Delisle, Hara, Ségransan, 2020

    View Slide

  16. periodogram
    ℓ1
    16

    View Slide

  17. periodogram
    ℓ1
    17

    View Slide

  18. periodogram
    ℓ1
    18

    View Slide

  19. periodogram
    ℓ1
    19

    View Slide

  20. periodogram
    ℓ1
    20

    View Slide

  21. periodogram
    ℓ1
    21

    View Slide

  22. periodogram
    ℓ1
    22

    View Slide

  23. periodogram
    ℓ1
    23

    View Slide

  24. Application
    Interprétation
    The SOPHIE search for northern extrasolar planets. XVI. HD 15829:
    A compact planetary system in a near-3:2 mean motion resonance
    chain, Hara et al. 2020, A&A
    Six transiting planets and a chain of Laplace resonances in
    TOI-178, Leleu, Alibert, Hara et al. 2021, A&A
    HD 158259 l1 periodogram, noise model with best cross validation score
    HD 158259: 5 to 6 planets
    TOI 178: 6 planets
    24
    Période (jours)
    Périodogramme classique
    Do not transit
    Transits

    (TESS)
    Close to a 3:2 mean

    motion resonance
    SOPHIE radial velocities
    ESPRESSO (PI) + CHEOPS GTO
    Equilibrium temperature (K)
    Density (g/cm3)
    https://github.com/nathanchara/l1periodogram
    5 outer planets in a chain of Laplace resonances

    View Slide

  25. Works well, but what would work best?

    Take 2: optimal detection criterion

    View Slide

  26. Question
    • What do we mean by optimal detection criterion
    ?

    • What is the optimal solution
    ?

    • How does it perform?
    Hara, de Poyferré, Delisle, Hoffmann 2022 (submitted, arXiv:2203.04957
    )

    Hara, Unger, Delisle, Díaz, Ségransan 2021 (A&A, accepted)

    View Slide

  27. What do we mean by « optimal detection
    criterion »?

    View Slide

  28. Definition of a detection
    p(y ∣ (θj
    )j=1..n
    , η)
    data Vector of orbital elements

    of planet j
    n planets in the model
    Other parameters

    (O
    ff
    sets, trends,

    hyperparameters of

    a Gaussian process)
    General likelihood model
    We de
    fi
    ne a detection claim as


    « There are n planets, one planet with
    orbital elements , …, one planet
    with orbital elements »


    s are regions of the parameter spac
    e

    θ ∈ Θ1
    θ ∈ Θn
    Θi Parameter space
    Θ1
    Θ2

    View Slide

  29. General framework
    p(y ∣ (θj
    )j=1..n
    , η)
    data Vector of parameters

    of pattern j
    n patterns in the model
    Nuisance parameters
    General likelihood model
    We de
    fi
    ne a detection claim as


    « There are n patterns, one pattern with
    parameters , …, one pattern with
    parameters »


    s are regions of the parameter spac
    e

    θ ∈ Θ1
    θ ∈ Θn
    Θi
    Θ1
    Θ2
    Parameter space

    View Slide

  30. Definition of a detection: RV case
    Orbital

    frequency
    p(y ∣ (θj
    )j=1..n
    , η)
    Time-series of spectra

    Or RV
    Kj
    , ej
    , ϖj
    , M0j
    and ωj
    = 2π/Pj
    Example

    we claim the detection of two planets with a certain accuracy on their frequencies
    There is one planet with orbital frequency between

    and
    ω1

    Δω
    2
    ω1
    +
    Δω
    2
    There is one planet with orbital frequency between

    and
    ω2

    Δω
    2
    ω2
    +
    Δω
    2

    View Slide

  31. False and missed detections
    Correct detectio
    n

    I claimed that there is a planet with orbital frequency between and ,
    and there is one
    ω1

    Δω
    2
    ω1
    +
    Δω
    2
    There are truly three planets at these frequencies
    Orbital

    frequency
    False detection
    I claimed that there is one planet with orbital frequency between and ,
    but there is none.
    ω1

    Δω
    2
    ω1
    +
    Δω
    2
    Missed detections
    I claimed that there are two planets, but there are three (one missed detection)

    Alternately: I missed two planets truly present

    View Slide

  32. What do we mean by optimal detection criterion?
    A decision rule selecting the maximizing the expected value of the utilit
    y

    (Von Neumann and Morgenstern 1947) or equivalently minimizing
    Cost = Number of false detections + Number of missed detection
    s

    • As a function of
    Θi
    , i = 1..n
    γ ×
    γ
    « There are n planets, one planet with orbital elements , …, one planet
    with orbital elements »


    θ ∈ Θ1
    θ ∈ Θn
    Or


    Minimizing (Expected number of missed detections) with constraint Expected number of false
    detections <
    • As a function of
    x
    x
    Expectation is taken on the posterior probability p((θj
    )j=1..n
    , η ∣ y)

    View Slide

  33. Computing the cost function « There are n planets, one planet with
    orbital elements , …, one planet
    with orbital elements »


    θ ∈ Θ1
    θ ∈ Θn

    View Slide

  34. What is the optimal solution?

    View Slide

  35. Orbital

    frequency
    The complicated case: overlapping detections
    This case concentrates most of the theoretical complications
    Orbital

    frequency
    Everything becomes simple
    Δω
    Δω 2Δω
    If the prior forbids two signals to be too close

    View Slide

  36. Solution for exoplanets I
    Orbital

    frequency
    Δω
    Δω
    2Δω
    Minimize Cost function = Number of false detections + Number of missed detection
    s

    As a function of
    Minimizing (Number of missed detections)


    with constraint on the expected Number of false detections <
    As a function of
    γ ×
    γ
    x
    x
    Both problems have the same solution

    View Slide

  37. Solution for exoplanets II
    Simply compute the posterior probability to have a planet in a frequency interval
    Orbital

    frequency
    Δω
    TIP = p
    (
    planet with frequency in interval [ω −
    Δω
    2
    , ω +
    Δω
    2 ] y
    )
    FIP = 1 − TIP
    TIP = True inclusion
    probabilit
    y

    FIP = False inclusion
    probability
    Data from Lovis et al.
    2011

    View Slide

  38. Computing the TIP/FIP
    Orbital

    frequency
    Δω
    TIP = p
    (
    planet with frequency in interval [ω −
    Δω
    2
    , ω +
    Δω
    2 ] y
    )
    =
    nmax

    k=1
    p
    (
    planet with frequency in interval [ω −
    Δω
    2
    , ω +
    Δω
    2 ] y, k planets
    )
    p(k planets|y)
    p(k planets|y) =
    p(y|k planets)p(k planets)
    ∑nmax
    j=1
    p(y| j planets)p(j planets)
    Simply compute the posterior probability to have a planet in a frequency interval
    By-products of Bayesia
    n

    evidence calculations
    We use Polychord (Handley et al. 2015a, b)
    Δω =

    Tobs

    View Slide

  39. Computational trick
    Gaussian mixture prior on x knowing θ
    RV1 planet
    = A cos ν + B sin ν
    y = M(θ)x
    Radial velocity
    P(y ∣ θ) =

    P(y ∣ θ, x)p(x ∣ θ)dx
    has an analytical expression
    Need to explore 3 parameters per planet instead of 5
    Parameters on which the model depends
    non-linearly (eccentricity, period…)
    Gaussian mixture: possibility to have a multimodal prior on mass

    View Slide

  40. How do we decide on n_max?
    TIP =
    nmax

    k=1
    p
    (
    planet with frequency in interval [ω −
    Δω
    2
    , ω +
    Δω
    2 ] y, k planets
    )
    p(k planets|y)
    For
    fi
    xed n_max:

    several runs
    Increment n_max:

    Does the


    FIP periodogram change?
    p(k planets|y) =
    p(y|k planets)p(k planets)
    ∑nmax
    j=1
    p(y| j planets)p( j planets)

    View Slide

  41. How does our new detection criterion
    perform?

    View Slide

  42. FIP: performances
    Simulation: 1000 systems with 0,1 ou 2 planets generated on 80 time-stamp
    s

    (Circular, Log-uniform period, Rayleigh prior on K, uniform on phase
    )

    Search for planets with different methods with correct priors
    Periodogram or periodogram

    + false alarm probability (FAP) or 

    Bayes factor

    FIP periodogram

    + FIP, FAP or Bayes factor
    ℓ1
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    Normalized RSS
    Generalized Lomb-Scargle periodogram 3 sines with SNR 10
    Periodogram
    True spectrum
    Tallest peak
    10 0 10 1 10 2 10 3 10 4
    Period (days)
    0
    0.2
    0.4
    0.6
    0.8
    1
    RV (m/s)
    l1-periodogram 3 sines with SNR 10
    l1-periodogram
    True spectrum
    There is one planet with

    orbital frequency between

    and
    ω1
    − Δω ω1
    + Δω
    False detection
    True detection
    Hara et al. 201
    7

    github.com/nathanchara/l1periodogram

    View Slide

  43. FIP: performances Simulation: 1000 systems with 0,1 or 2 planets
    White noise simulation Red noise simulation (exponential kernel)

    View Slide

  44. Interpretation
    Simulation: 1000 systems with 0,1 or 2
    planets
    On average, among N independent
    detections with TIP = p, pN detections
    are correct
    TIP: True inclusion probability

    View Slide

  45. Robustness to a prior change We analyse the data
    with the wrong prior
    Dashed lines: FIP periodogram +
    Bayes factor
    Plain lines: FIP
    periodogram + FIP
    Data generated with


    • Periods log-uniform on 1-100 day
    s

    • Semi-amplitude Rayleigh prior with σ

    View Slide

  46. HD 10180
    AAAE/3icjZLLbtNAFIZPTYASLm1hyWZECyqLjGyP47F3FQiJFQqIXqS6qmx3mlr1JfIFqQpZ8Cbs2CG2vABb2CHeAN6Cc8YuCVQIJkrmzD/nO7dJNEmTqjbNb0vGpd7lK1eXr/Wv37h5a2V17fZOVTRlrLbjIi3KvSisVJrkartO6lTtTUoVZlGqdqPTx3S/+0qVVVLkL+uziTrIwnGeHCdxWKN0uGb4QaTGST6tw6hJw3I2jV93n1l/lIa5qiv2gG0EaTHefPJwg+lDlYyz8HDaiTOtMjZ6NtI7Yy+avE4yxVgQsOCEajvfaPVNvZHrwPNs7gzRNrktSBLcs9XAMj08WLJiFKNvLQDC58LVgOWTZHHTUwOfYItlSc6GLqsIsueQ9HxuijaL02aRmEVSHLeFJOtyiQVMuNxvizOdNpflqwH6ki01KPw2mzPHXCn5sCuRaCa5LdVAVy08TTlWSw0XKFNwx24p0SXDGk2pGzthQrT5kETQnYNDHwlHg8IkyeWuiSDFYg6STtsibkTKBdKT3PHbsWhvHK2DpKXTEzr059NBNlD50a9/yuHqOs5FL3bRsDpjHbo1Kla/QgBHUEAMDWSgIIca7RRCqPCzDxaYMEHtAKaolWgl+l7BDPrINuil0CNE9RR/x3ja79QczxSz0nSMWVL8lkgyuI9MgX4l2pSN6ftGRyb1b7GnOibVdoZ71MXKUK3hBNV/ceee/8tRLzUcg6d7SLCniVaou7iL0uipUOVsoasaI0xQI/sI70u0Y02ez5lpptK902xDff9de5JK57jzbeAHVYkPbP35nBeNHZtbLrefO+tbj7qnXoa7cA828T0lbMFTGME2xMZb45Px2fjSe9N713vf+9C6Gksdcwd+W72PPwETNQ8j
    Planets log(E) log(E)
    PNP Runtime
    0 -882.45 0.23 3.82e-108 17s
    1 -839.36 0.19 1.08e-93 1 min 56 s
    2 -789.03 0.24 3.72e-76 6 min 57 s
    3 -736.95 0.04 1.19e-57 17 min 39 s
    4 -677.56 0.15 7.27e-36 38 min 41 s
    5 -603.42 0.13 1.12e-07 1 h 33 min 31 s
    6 -590.14 0.30 6.60e-02 4 h 46 min 46 s
    7 -587.49 0.22 9.34e-01 14 h 59 min 57 s
    Favours 7 planets and the evidence keeps increasing

    View Slide

  47. How do I compute all this?

    View Slide

  48. Detecting exoplanets in RV data SCMA VII
    Computationally heavy
    Numerical aspects
    48

    View Slide

  49. Detecting exoplanets in RV data SCMA VII
    Matrix inversions are typically in but faster for certain matrices
    For semi-separable matrices, the inversion is in
    AAACyXicjVHLSsNAFD2Nr1pfVZdugkWom5KoqMuiG0HQCvYBtUqSTmtsXk4mYi2u/AG3+mPiH+hfeGdMQS2iE5KcOfeeM3PvtSPPjYVhvGa0sfGJyansdG5mdm5+Ib+4VIvDhDus6oReyBu2FTPPDVhVuMJjjYgzy7c9Vrd7+zJev2E8dsPgVPQj1vKtbuB2XMcSRNWOi0fnm+sX+YJRMtTSR4GZggLSVQnzLzhDGyEcJPDBEEAQ9mAhpqcJEwYi4loYEMcJuSrOcI8caRPKYpRhEdujb5d2zZQNaC89Y6V26BSPXk5KHWukCSmPE5an6SqeKGfJ/uY9UJ7ybn3626mXT6zAJbF/6YaZ/9XJWgQ62FU1uFRTpBhZnZO6JKor8ub6l6oEOUTESdymOCfsKOWwz7rSxKp22VtLxd9UpmTl3klzE7zLW9KAzZ/jHAW1jZK5Xdo42SqU99JRZ7GCVRRpnjso4wAVVMn7Co94wrN2qF1rt9rdZ6qWSTXL+La0hw+fC5C9
    O(N
    3)
    AAACx3icjVHLSsNAFD2Nr1pfVZdugkWom5IWUZdFN7rRCvYBtUiSTtvQJBMmk2IpLvwBt/pn4h/oX3hnTEEtohOSnDn3njNz73Ui34ulZb1mjLn5hcWl7HJuZXVtfSO/udWIeSJcVne5z0XLsWPmeyGrS0/6rBUJZgeOz5rO8FTFmyMmYo+H13IcsU5g90Ov57m2VNRl8WL/Nl+wSpZe5iwop6CAdNV4/gU36ILDRYIADCEkYR82YnraKMNCRFwHE+IEIU/HGe6RI21CWYwybGKH9O3Trp2yIe2VZ6zVLp3i0ytIaWKPNJzyBGF1mqnjiXZW7G/eE+2p7jamv5N6BcRKDIj9SzfN/K9O1SLRw7GuwaOaIs2o6tzUJdFdUTc3v1QlySEiTuEuxQVhVyunfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXP45zlnQqJTKh6XK1UGhepKOOosd7KJI8zxCFWeooU7eAzziCc/GucGNkXH3mWpkUs02vi3j4QP4hpAY
    O(N)
    Numerical methods: S+LEAF matrices
    AAADxnicjVHbbtNAEB3XXEq4pfDIy4oIKYgSxXmACoFUAQ99LBJpK9XFWm8myTbrS3fXQGVZ4gd4hU9D/AH8BbMbV4FWCNayPXPOnLMzu2mppLHD4fdgLbx0+crV9Wud6zdu3rrd3bizZ4pKCxyLQhX6IOUGlcxxbKVVeFBq5FmqcD9dvHL8/nvURhb5W3ta4lHGZ7mcSsEtQclGAPEEpzF+LIteVMcZt/N0WmPzru5FTdPxpHKMwqklaIloh2g5my+hFGcyr/Gk8qZNh7FY8RQVQc8EKtTSokMXfZvITWaT44fsBWXxa1SWM+uy2FRZUpvneeKb0FktmsYZ9XliWCwK04/zKjErzSOWOsbI/DwTa3rdSPVjaiRLJ3zFNpudGPPJqtuk2xsOhn6xi0HUBj1o127R/QYxTKAAARVkgJCDpVgBB0PPIUQwhJKwI6gJ0xRJzyM00CFtRVVIFZzQBX1nlB22aE658zReLWgXRa8mJYMHpCmoTlPsdmOer7yzQ//mXXtP19sp/dPWKyPUwpzQf+nOKv9X52axMIUtP4OkmUqPuOlE61L5U3Gds9+msuRQEubiCfGaYuGVZ+fMvMb42d3Zcs//8JUOdbloayv46bqkC47OX+fFYG80iJ4MRm9Gve2X7VWvwz24D326z6ewDTuwC2MQwSz4HHwJvoY7YR5W4Ydl6VrQau7CHyv89As1e+ly
    k(ti, tj) = k( t) =
    X
    s(as cos(⌫s t) + bs sin(⌫s t)) e s t, (1)
    CELERITE kernels yield semi-separable covariance matrices (Foreman Mackey et al. 2017)
    Inversion still in AAACx3icjVHLSsNAFD2Nr1pfVZdugkWom5IWUZdFN7rRCvYBtUiSTtvQJBMmk2IpLvwBt/pn4h/oX3hnTEEtohOSnDn3njNz73Ui34ulZb1mjLn5hcWl7HJuZXVtfSO/udWIeSJcVne5z0XLsWPmeyGrS0/6rBUJZgeOz5rO8FTFmyMmYo+H13IcsU5g90Ov57m2VNRl8WL/Nl+wSpZe5iwop6CAdNV4/gU36ILDRYIADCEkYR82YnraKMNCRFwHE+IEIU/HGe6RI21CWYwybGKH9O3Trp2yIe2VZ6zVLp3i0ytIaWKPNJzyBGF1mqnjiXZW7G/eE+2p7jamv5N6BcRKDIj9SzfN/K9O1SLRw7GuwaOaIs2o6tzUJdFdUTc3v1QlySEiTuEuxQVhVyunfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXP45zlnQqJTKh6XK1UGhepKOOosd7KJI8zxCFWeooU7eAzziCc/GucGNkXH3mWpkUs02vi3j4QP4hpAY
    O(N)
    Delisle, Hara, Ségransan 2020 b
    49
    S+LEAF matrices
    = semi-separable matrix + Leaf matrix
    AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVZIi6rIoiMsW7ANqkSSd1tDJg8xEKEV/wK1+m/gH+hfeGaegFtEJSc6ce8+Zuff6KQ+FdJzXgrWwuLS8Ulwtra1vbG6Vt3faIsmzgLWChCdZ1/cE42HMWjKUnHXTjHmRz1nHH5+reOeOZSJM4is5SVk/8kZxOAwDTxLVbN+UK07V0cueB64BFZjVSMovuMYACQLkiMAQQxLm8CDo6cGFg5S4PqbEZYRCHWe4R4m0OWUxyvCIHdN3RLueYWPaK0+h1QGdwunNSGnjgDQJ5WWE1Wm2jufaWbG/eU+1p7rbhP6+8YqIlbgl9i/dLPO/OlWLxBCnuoaQako1o6oLjEuuu6Jubn+pSpJDSpzCA4pnhAOtnPXZ1hqha1e99XT8TWcqVu0Dk5vjXd2SBuz+HOc8aNeq7nG11jyq1M/MqIvYwz4OaZ4nqOMSDbS09yOe8GxdWNwSVv6ZahWMZhfflvXwARymj2I=
    V
    AAADH3icjVHNThsxGByW/tAF2gDHXqxGSEFI0QZV0AsSbS89gmADglDk3TiJhfdHXm8lFOVheBNuvVXcKl6gKqf2EfrZNagFVdSr3R3PNzP2ZyelkpWJoqupYPrR4ydPZ56Fs3Pzz180Fha7VVHrVMRpoQp9kPBKKJmL2EijxEGpBc8SJfaT0/e2vv9J6EoW+Z45K8Vxxoe5HMiUG6JOGoddtsl6GTcjnY37kg8nrbcrbPWWMlqqSU+JgWnFux/3eloOR+aOoPaCXRbfKsIwPGk0o3bkBrsPOh404cd20fiKHvookKJGBoEchrACR0XPETqIUBJ3jDFxmpB0dYEJQvLWpBKk4MSe0ndIsyPP5jS3mZVzp7SKoleTk2GZPAXpNGG7GnP12iVb9l/ZY5dp93ZG/8RnZcQajIh9yHej/F+f7cVggDeuB0k9lY6x3aU+pXanYnfO/ujKUEJJnMV9qmvCqXPenDNznsr1bs+Wu/p3p7SsnadeW+Pa7pIuuHP3Ou+D7lq7s95e23nd3Hrnr3oGL/EKLbrPDWzhA7YRU/YFvuEHfgbnwefgS3D5WxpMec8S/hrB1S+EFLIz
    V = diag(A) + tril UST + triu SUT

    View Slide

  50. Detecting exoplanets in RV data SCMA VII 50
    Complexity
    AAADH3icjVHNThsxGByW/tAF2gDHXqxGSEFI0QZV0AsSbS89gmADglDk3TiJhfdHXm8lFOVheBNuvVXcKl6gKqf2EfrZNagFVdSr3R3PNzP2ZyelkpWJoqupYPrR4ydPZ56Fs3Pzz180Fha7VVHrVMRpoQp9kPBKKJmL2EijxEGpBc8SJfaT0/e2vv9J6EoW+Z45K8Vxxoe5HMiUG6JOGoddtsl6GTcjnY37kg8nrbcrbPWWMlqqSU+JgWnFux/3eloOR+aOoPaCXRbfKsIwPGk0o3bkBrsPOh404cd20fiKHvookKJGBoEchrACR0XPETqIUBJ3jDFxmpB0dYEJQvLWpBKk4MSe0ndIsyPP5jS3mZVzp7SKoleTk2GZPAXpNGG7GnP12iVb9l/ZY5dp93ZG/8RnZcQajIh9yHej/F+f7cVggDeuB0k9lY6x3aU+pXanYnfO/ujKUEJJnMV9qmvCqXPenDNznsr1bs+Wu/p3p7SsnadeW+Pa7pIuuHP3Ou+D7lq7s95e23nd3Hrnr3oGL/EKLbrPDWzhA7YRU/YFvuEHfgbnwefgS3D5WxpMec8S/hrB1S+EFLIz
    V = diag(A) + tril UST + triu SUT
    AAAC+3icjVHLThsxFD0ZCoR3CsturEZILFA0QRWwRGXDMlUbEomgyjNxghXPo7anahTlT9h1V7HtD3Rb9hV/AH/Ra3eoClEFHs3Muefec+zrG+VKGhuGN5Vg7sX8wmJ1aXlldW19o/Zy89RkhY5FO85UprsRN0LJVLSttEp0cy14EinRiUbHLt/5LLSRWfrBjnNxnvBhKgcy5paoj7X996xnxRc7YTztsylrs7+xFhSnPSsTYZjeZbqnxCejeGoZCethI/SLzYJmCeooVyur/UIPfWSIUSCBQApLWIHD0HOGJkLkxJ1jQpwmJH1eYIpl0hZUJaiCEzui75Cis5JNKXaexqtj2kXRq0nJsE2ajOo0Ybcb8/nCOzv2f94T7+nONqZ/VHolxFpcEPuU7r7yuTrXi8UAh74HST3lnnHdxaVL4W/FnZz905Ulh5w4h/uU14Rjr7y/Z+Y1xvfu7pb7/K2vdKyL47K2wJ07JQ24+Xics+B0r9Hcb+y9e1M/eluOuopXeI0dmucBjnCCFtrkfYkf+InrYBp8Db4FV39Kg0qp2cKDFXz/DalCpD4=
    S and U are n ⇥ r, r 6 n
    AAAC73icjVHLSsNAFD2Nr/quunQTLEJVKGkRdSm6caUVbBXaKpNx1NC8nEwEKf0Hd+7ErT/gVv9C/AP9C++MEXwgOiHJuefec2buXDf2vUQ5znPO6usfGBzKD4+Mjo1PTBamphtJlEou6jzyI3ngskT4XijqylO+OIilYIHri323s6nz+xdCJl4U7qnLWLQDdhp6Jx5niqijwmIrYOqMM7+70yuVWi6TXfew2rOXbPkeGHhYXdheOCoUnbJjlv0TVDJQRLZqUeEJLRwjAkeKAAIhFGEfDAk9TVTgICaujS5xkpBn8gI9jJA2pSpBFYzYDn1PKWpmbEix9kyMmtMuPr2SlDbmSRNRnSSsd7NNPjXOmv3Nu2s89dku6e9mXgGxCmfE/qX7qPyvTveicII104NHPcWG0d3xzCU1t6JPbn/qSpFDTJzGx5SXhLlRftyzbTSJ6V3fLTP5F1OpWR3zrDbFqz4lDbjyfZw/QaNarqyUq7vLxfWNbNR5zGIOJZrnKtaxhRrq5H2Fezzg0Tq3rq0b6/a91Mplmhl8WdbdG7Z4nvQ=
    O(( ¯
    b2 + r¯
    b + r2)N)
    Semi-separable component
    Leaf component
    AAACxnicjVHLSsNAFD2Nr1pfVZdugkVwVZIi6rLopsuK9gG1lGQ6rUPTJEwmSimCP+BWP038A/0L74wpqEV0QpIz595zZu69fhyIRDnOa85aWFxaXsmvFtbWNza3its7zSRKJeMNFgWRbPtewgMR8oYSKuDtWHJv7Ae85Y/Odbx1y2UiovBKTWLeHXvDUAwE8xRRl35P9Iolp+yYZc8DNwMlZKseFV9wjT4iMKQYgyOEIhzAQ0JPBy4cxMR1MSVOEhImznGPAmlTyuKU4RE7ou+Qdp2MDWmvPROjZnRKQK8kpY0D0kSUJwnr02wTT42zZn/znhpPfbcJ/f3Ma0yswg2xf+lmmf/V6VoUBjg1NQiqKTaMro5lLqnpir65/aUqRQ4xcRr3KS4JM6Oc9dk2msTUrnvrmfibydSs3rMsN8W7viUN2P05znnQrJTd43Ll4qhUPctGncce9nFI8zxBFTXU0SDvIR7xhGerZoVWat19plq5TLOLb8t6+ABg6ZBK
    bi non-zero extra diagonal coe
    ff
    i
    cients
    Inversion cost of a S+LEAF matrix
    Leaf matrices can model calibration noise

    View Slide

  51. Detecting exoplanets in RV data SCMA VII
    Semi-separable + Leaf matrices
    Generated


    quasi-periodic signal
    Gaussian process prediction


    With quasi-periodic kernel


    Calibration noise ignored
    Simulated data


    With calibration noise


    Gaussian process prediction


    with quasi-periodic kernel


    and calibration component
    51
    Calibration noise
    Calibration


    Component


    Important for


    densely


    sampled data


    View Slide

  52. Detecting exoplanets in RV data SCMA VII
    X is a Gaussian
    process
    Radial velocity
    Spectroscopic


    Indicators
    Gaussian processes
    52
    From Rajpaul et al 2015
    Wavelength lag
    Relative intensity
    Schematic CCFs
    Granular region
    Inter-granular region
    Wavelength lag
    Relative intensity
    Sum of CCF and bisector
    CCFs sum
    bisector
    FWHM
    BIS
    bottom
    BIS
    top
    AAADNnicjVHdahQxGP12qm1df7raS2+Ci7UiLLNFVJBCWb1o0Yta3e1CpyyZbLodmpmMmUyhDPtevknxxjvR3vUFCp7EWbQW0Qwzc3K+c07yJXGuksKG4edGMHft+vzC4o3mzVu37yy17t4bFLo0QvaFVtoMY15IlWSybxOr5DA3kqexkrvx0StX3z2Wpkh09sGe5HI/5ZMsOUgEt6BGrY8r0WupLGc7A7bOBiPBhqv2MXsCaFg01rYaTkG8ZCyKmiuR0hO2M6o230wfQf52Jve1lNtDk1a9rfdT1Hq/onqXokatdtgJ/WBXQbcGbarHtm6dUkRj0iSopJQkZWSBFXEq8OxRl0LKwe1TBc4AJb4uaUpNeEuoJBQc7BG+E8z2ajbD3GUW3i2wisJr4GT0EB4NnQF2qzFfL32yY/+WXflMt7cT/OM6KwVr6RDsv3wz5f/6XC+WDuiF7yFBT7lnXHeiTin9qbids9+6skjIwTk8Rt0AC++cnTPznsL37s6W+/p3r3Ssm4taW9KZ2yUuuPvndV4Fg7VO91ln7d3T9kavvupFuk8PaBX3+Zw2aJO2qY/sU7pozDcWgk/Bl+Br8O2nNGjUnmW6NILzH1CdszI=
    RV = V
    c
    X(t) + V
    r
    ˙
    X(t);
    log R0
    HK
    = L
    c
    X(t)
    BIS = B
    c
    X(t) + B
    r
    ˙
    X(t)
    Augmented data:


    AAADHnicjVFNT9wwEB3SUmgKZQvHXqyuKuCyyiIEHBH0AOqF0u4CwmjleL1ZC+dDjlNpFeW/9J/0xg1xpP0BleDS/oWOjZHaoqp1lOT5zbxnz0xcKFmaKPo6FTx6PP1kZvZp+Gxu/vlC68Viv8wrzUWP5yrXxzErhZKZ6BlplDgutGBprMRRfL5r40cfhS5lnn0wk0KcpSzJ5EhyZpAatE6oEiOzQmgsEpnVTGs2aWrekJC+EcowctgnhNKQqjwhh4N6722zfEekzIx1Wu/sv29CKrKh1xKqZTI2q2TQakedyC3yEHQ9aINfB3nrGigMIQcOFaQgIAODWAGDEp9T6EIEBXJnUCOnEUkXF9BAiNoKswRmMGTP8Zvg7tSzGe6tZ+nUHE9R+GpUEniNmhzzNGJ7GnHxyjlb9m/etfO0d5vgP/ZeKbIGxsj+S3ef+b86W4uBEWy5GiTWVDjGVse9S+W6Ym9OfqnKoEOBnMVDjGvE3Cnv+0ycpnS1294yF79xmZa1e+5zK7i1t8QBd/8c50PQX+t0Nzpr79bb2zt+1LPwEl7BCs5zE7ZhDw6gh96f4Rt8hx/Bp+AiuAyu7lKDKa9Zgt9W8OUniSOxfg==
    0
    @
    RV
    log R0
    HK
    BIS
    1
    A

    View Slide

  53. Detecting exoplanets in RV data SCMA VII
    X is a Gaussian process


    Radial velocity
    Indicators
    Gaussian processes: a data driven approach
    53
    From Jones et al 2017
    Wavelength lag
    Relative intensity
    Schematic CCFs
    Granular region
    Inter-granular region
    Wavelength lag
    Relative intensity
    Sum of CCF and bisector
    CCFs sum
    bisector
    FWHM
    BIS
    bottom
    BIS
    top
    Let the data select which are


    Non zero with BIC, AIC, cross validation…
    AAACyXicjVHLTsJAFD3UF+ILdemmkZi4Ii0x6pLoxsQNJvJIkJB2GHCgtLWdGpGw8gfc6o8Z/0D/wjtjSVRidJq2Z86958zce93QE7G0rNeMMTe/sLiUXc6trK6tb+Q3t2pxkESMV1ngBVHDdWLuCZ9XpZAeb4QRd4aux+vu4FTF67c8ikXgX8pRyFtDp+eLrmCOJKrmtMeiP2nnC1bR0sucBXYKCkhXJci/4AodBGBIMASHD0nYg4OYniZsWAiJa2FMXERI6DjHBDnSJpTFKcMhdkDfHu2aKevTXnnGWs3oFI/eiJQm9kgTUF5EWJ1m6niinRX7m/dYe6q7jejvpl5DYiWuif1LN838r07VItHFsa5BUE2hZlR1LHVJdFfUzc0vVUlyCIlTuEPxiDDTymmfTa2Jde2qt46Ov+lMxao9S3MTvKtb0oDtn+OcBbVS0T4sli4OCuWTdNRZ7GAX+zTPI5Rxhgqq5N3HI57wbJwbN8adcf+ZamRSzTa+LePhAx3Ckck=
    aij

    View Slide

  54. Detecting exoplanets in RV data SCMA VII
    Multivariate Gaussian processes
    S+LEAF still in O(N) for multivariate timeseries


    See Delisle et al. 2022
    54
    https://gitlab.unige.ch/Jean-Baptiste.Delisle/spleaf

    View Slide

  55. Back to the general case
    p(y ∣ (θj
    )j=1..n
    , η)
    data
    Vector of parameters of pattern j
    n patterns in the model
    Nuisance parameters
    General likelihood model
    We de
    fi
    ne a detection claim as


    « There are n patterns, one pattern with
    parameters , …, one pattern with
    parameters »


    s are regions of the parameter spac
    e

    θ ∈ Θ1
    θ ∈ Θn
    Θi
    Parameter space
    Θ1
    Θ2

    View Slide

  56. Conclusion
    p(y ∣ (θj
    )j=1..n
    , η)
    Have n discrete hypotheses Hi
    , i = 1..n
    How many are true?
    s are indices
    θ θj
    = (i)i=1..m

    View Slide

  57. General context
    Works for y = Ax + ϵ
    p(y ∣ (θj
    )j=1..n
    , η) s are indices and amplitudes
    θ θj
    = (i, xi
    )i=1..m
    Generalises the Barbieri & Berger 2004 framework

    View Slide

  58. Conclusion
    Bayes factors are cool but why settle for second best?
    Just like Bayes factor, the result heavily depend on the mode
    l

    -> average over models, check residuals
    Probability(what I’m interested in | data)

    View Slide