Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open software for Astronomical Data Analysis

Open software for Astronomical Data Analysis

@ NASA Goddard

Dan Foreman-Mackey

February 28, 2023
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. OPEN


    SOFTWARE


    FOR


    ASTRONOMICAL


    DATA ANALYSIS
    by Dan Foreman-Mackey

    View Slide

  2. View Slide

  3. open software for astrophysics
    0

    View Slide

  4. credit: Adrian Price-Whelan
    / /
    data: SAO/NASA ADS

    View Slide

  5. 7

    View Slide

  6. many fundamental software packages


    have a shockingly small number of


    maintainers.

    View Slide

  7. 7
    credit: Adrian Price-Whelan

    View Slide

  8. * astronomical software can be
    very high impact


    * we should think about career
    trajectories & mechanisms for
    supporting this work

    View Slide

  9. View Slide

  10. case study: gaussian processes
    1

    View Slide

  11. °0.6
    °0.3
    0.0
    0.3
    0.6
    raw [ppt]
    0 5 10 15 20 25
    time [days]
    °0.30
    °0.15
    0.00
    de-trended [ppt]
    N = 1000
    reference: DFM+ (2017)

    View Slide

  12. °0.6
    °0.3
    0.0
    0.3
    0.6
    raw [ppt]
    0 5 10 15 20 25
    time [days]
    °0.30
    °0.15
    0.00
    de-trended [ppt]
    N = 1000
    reference: DFM+ (2017)

    View Slide

  13. reference: Aigrain & DFM (2022)

    View Slide

  14. reference: Aigrain & DFM (2022)

    View Slide

  15. reference: Aigrain & DFM (2022)
    ignoring


    correlated


    noise
    accounting


    for


    correlated


    noise

    View Slide

  16. reference: Aigrain & DFM (2022)

    View Slide

  17. a Gaussian Process is a


    drop
    -
    in replacement for


    chi
    -
    squared

    View Slide

  18. more details:


    Aigrain & Foreman-Mackey (2023)


    arXiv:2209.08940

    View Slide

  19. View Slide

  20. 7
    [1] model building


    [2] computational cost

    View Slide

  21. reference: Luger, DFM, Hedges (2021)

    View Slide

  22. [2] computational cost

    View Slide

  23. 7
    [1] bigger/better computers


    [2] exploit matrix structure


    [3] approximate linear algebra


    [4] etc.

    View Slide

  24. 1
    3
    2

    View Slide

  25. View Slide

  26. View Slide

  27. 1
    3
    2

    View Slide

  28. °0.6
    °0.3
    0.0
    0.3
    0.6
    raw [ppt]
    0 5 10 15 20 25
    time [days]
    °0.30
    °0.15
    0.00
    de-trended [ppt]
    N = 1000
    reference: DFM+ (2017)

    View Slide

  29. reference: Gordon, Agol, DFM (2020) / tinygp.readthedocs.io

    View Slide

  30. * a Gaussian Process is a drop
    -
    in
    replacement for chi squared


    * model building & computational
    cost are (solvable!) challenges


    * you should check out tinygp!

    View Slide

  31. case study: probabilistic inference
    2

    View Slide

  32. have:


    physics
    = >
    data

    View Slide

  33. want:


    data
    = >
    physics

    View Slide

  34. 7
    [1] physical models


    [2] legacy code

    View Slide

  35. View Slide

  36. number of parameters
    patience required
    a few tenish not outrageously many
    reference: DFM (priv. comm.)

    View Slide

  37. number of parameters
    patience required
    emcee
    a few tenish not outrageously many
    reference: DFM (priv. comm.)

    View Slide

  38. number of parameters
    patience required
    emcee
    a few tenish not outrageously many
    how things should be
    reference: DFM (priv. comm.)

    View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. 3.0 3.5 4.0 4.5 5.0
    Wavelength [micron]
    2.05
    2.10
    2.15
    2.20
    2.25
    2.30
    Transit Depth [%]
    Alderson et al. 2023
    Joint Fit (N = 50)
    reference: Soichiro Hattori, Ruth Angus, DFM,
    . . .
    (in prep)
    WASP-39b / NIRSpec

    View Slide

  44. reference: Soichiro Hattori, Ruth Angus, DFM,
    . . .
    (in prep)
    showing 23 of the


    404 parameters


    (8 per channel + 4 shared)

    View Slide

  45. how?

    View Slide

  46. d(physics
    = >
    data) / dphysics

    View Slide

  47. automatic differentiation
    aka “backpropagation”

    View Slide

  48. View Slide

  49. 7
    [1] physical models


    [2] legacy code

    View Slide

  50. 7
    [1] domain
    -
    specif
    i
    c libraries


    [2] emulation

    View Slide

  51. View Slide

  52. * gradient
    -
    based inference using
    autodiff can improve eff
    i
    ciency


    * there are practical challenges
    with these methods in astro


    * of interest: domain
    -
    specif
    i
    c
    libraries & emulation

    View Slide

  53. aside: JAX
    3

    View Slide

  54. View Slide

  55. import numpy as np


    def linear_least_squares(x, y)
    :

    A = np.vander(x, 2)


    return np.linalg.lstsq(A, y)[0]

    View Slide

  56. import jax.numpy as jnp


    def linear_least_squares(x, y)
    :

    A = jnp.vander(x, 2)


    return jnp.linalg.lstsq(A, y)[0]

    View Slide

  57. View Slide

  58. open research practices
    4

    View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. open software is foundational to
    astrophysics research


    there are opportunities at the
    interface of astro & applied f
    i
    elds


    there are ways you can participate
    & benef
    i
    t right away

    View Slide

  67. 7
    I want to chat about…


    [1] your data analysis problems


    [2] building astronomical software


    [3] writing documentation & tutorials

    View Slide

  68. get in touch!


    dfm.io


    github.com/dfm

    View Slide