Slide 1

Slide 1 text

(Quelques exemples de) Données manquantes en cosmologie Réunion GT ICR IAS, 17 février 2025 17/02/2025 Réunion GT ICR 1

Slide 2

Slide 2 text

L’époque de la réionisation CMB Big Bang Today 380 ky 1 Gy 14 Gy z = 1100 z = 50 z = 5 z = 0 100 My Epoch of Reionisation Cosmic Dawn Dark Ages IGM ionised fraction 1 Small scales Large scales 0 ionised neutral galaxy quasar 380 ky 1 Gy 14 Gy 100 My What we want to measure: the ionisation level of intergalactic matter (mostly H) 17/02/2025 Réunion GT ICR 2

Slide 3

Slide 3 text

Le signal cosmologique à 21cm Hyperfine transition ɣ Neutral H fraction Baryon density Signal intensity ∝ With observations of this signal, one can trace the ionisation level and matter distribution in the Universe! 17/02/2025 Réunion GT ICR 3

Slide 4

Slide 4 text

Le signal cosmologique à 21cm Hyperfine transition λ = 21 cm Redshifted to radio frequencies ɣ Photon in a static Universe Photon in an expanding Universe Because it is a spectral line, we can know when the signal was emitted and trace back the history of reionisation: 17/02/2025 Réunion GT ICR 4

Slide 5

Slide 5 text

Le signal cosmologique à 21cm Hyperfine transition λ = 21 cm Redshifted to radio frequencies ɣ Today Big Bang Because it is a spectral line, we can know when the signal was emitted and trace back the history of reionisation: 17/02/2025 Réunion GT ICR 5

Slide 6

Slide 6 text

Le signal cosmologique à 21cm Brightness temperature Global signal Power spectrum Intensity mapping Time ionised neutral Time 17/02/2025 6 Réunion GT ICR

Slide 7

Slide 7 text

What is (our) power spectrum? The power spectrum of a second-order stationary (or homogenous, or translationally-invariant) random field is the spatial Fourier transform of the covariance function of that field: See arXiv:2407.14068 Wayne Hu 17/02/2025 Réunion GT ICR 7

Slide 8

Slide 8 text

What is (our) power spectrum? The power spectrum of a second-order stationary (or homogenous, or translationally-invariant) random field is the spatial Fourier transform of the covariance function of that field: Reionisation history Average intensity Power spectrum Intensity map For the 21cm signal: 17/02/2025 Réunion GT ICR 8

Slide 9

Slide 9 text

Interferometry 101 • Interferometers measure visibilities i.e. Fourier modes on the sky Baseline length b ij Signal intensity Beam is the Fourier dual of the sky angle (k ⟂ ) • Dense arrays measure large-scale fluctuations (e.g. EDGES’ “table”) • Wide arrays measure small-scale fluctuations (e.g. HERA & foreground avoidance) An estimator of the power spectrum is built directly from the visibilities: 17/02/2025 Réunion GT ICR 9

Slide 10

Slide 10 text

The Hydrogen Epoch of Reionization Array 350 14m dishes Bandwidth 200 MHz z = 13 z = 6 10° stripe (beam) at fixed declination 100 MHz HERA is an official SKA precursor. The signal is faint so HERA is huge! 17/02/2025 Réunion GT ICR 10

Slide 11

Slide 11 text

HERA analysis Concept: Interferometer to measure the 21cm power spectrum. SKA precursor in SA. Challenges: • Data cleaning • RFI • Data volumes (1TB/day, RTP) • Component separation for foregrounds • Characterising systematics Slide adapted from Lisa McBride’s 17/02/2025 Réunion GT ICR 11

Slide 12

Slide 12 text

HERA analysis What our data looks like: visbility waterfalls Slide adapted from Lisa McBride’s 17/02/2025 Réunion GT ICR 12 Time Frequency Frequency Phases Amplitudes

Slide 13

Slide 13 text

HERA analysis Slide adapted from Lisa McBride’s 17/02/2025 Réunion GT ICR 13 Time Frequency LST Frequency LST binning (Fast) Fourier transform along frequency axis and binning Cylindrical power spectrum Spherical average Power spectrum = science product! We don’t make images!

Slide 14

Slide 14 text

HERA analysis: Foregrounds Extremely bright foregrounds lie between the first stars and us and dominate the observed sky • Amplitude of the cosmological signal = 10mK • Amplitude of the foregrounds = 1 000 to 10 000 mK 17/02/2025 Réunion GT ICR 14 Figure by Vibor Jelic Time/redshift All foreground treatment methods rely on the assumption that foregrounds are spectrally smooth

Slide 15

Slide 15 text

HERA analysis: Calibration 1. Redundant direction-independent calibration: all baselines b ij with the same physical separation should observe the same Vtrue. No sky model. Solve for g i and Vsol at each time and frequency step: 17/02/2025 Réunion GT ICR 15 2. Absolute calibration using a catalog

Slide 16

Slide 16 text

HERA analysis: RFI excision 17/02/2025 Réunion GT ICR 16 Most of the target frequency band is polluted by human emission: aviation communication, FM radio, radars, … these are called radio frequency interference (RFI) Even the faintest outside signal is measured by our extremely sensitive telescopes → limits the amount of data we can analyse: we excise what is polluted FM band 12 < z < 15 TV

Slide 17

Slide 17 text

HERA analysis: RFI inpainting 17/02/2025 Réunion GT ICR 17 Flagging masks → strong sidelobes when Fourier transforming → foregrounds leakage Solution: inpaint the masked data or remove the effect of the mask on the FT

Slide 18

Slide 18 text

HERA analysis: RFI inpainting 17/02/2025 Réunion GT ICR 18 Methods in the literature: • CLEAN (deconvolution algorithm for 2D images, see Högbom+1974) Högbom+1974 Dirty maps Idea: Iterativerly remove sidelobes from regions with highest Fourier amplitudes until reaching the noise floor.

Slide 19

Slide 19 text

HERA analysis: RFI inpainting 17/02/2025 Réunion GT ICR 19 Methods in the literature: • CLEAN (deconvolution algorithm for 2D images, see Högbom+1974) • CNN (U-Net, Pagano+2023)

Slide 20

Slide 20 text

HERA analysis: RFI inpainting 17/02/2025 Réunion GT ICR 20 Methods in the literature: • CLEAN (deconvolution algorithm for 2D images, see Högbom+1974) • CNN (U-Net, Pagano+2023) • Wiener filtering and Gaussian process regression (GPR, see Kern & Liu 2020) Model missing data as Gaussian distribution with mean and cov: Cosmological signal Noise Foregrounds Data d Requires a model for each component

Slide 21

Slide 21 text

HERA analysis: RFI inpainting 17/02/2025 Réunion GT ICR 21 Pagano+2023 Power spectrum

Slide 22

Slide 22 text

Le signal cosmologique à 21cm Brightness temperature Global signal Power spectrum Intensity mapping Time ionised neutral Time 17/02/2025 22 Réunion GT ICR

Slide 23

Slide 23 text

Field-level inference 17/02/2025 Réunion GT ICR 23 Why not recover the full underlying density field (pixel by pixel) + reionisation parameters without any summary statistic? Reionisation history Average intensity Power spectrum Intensity map

Slide 24

Slide 24 text

Field-level inference 17/02/2025 Réunion GT ICR 24 Forward model Density field 21cm brightness temperature Why not recover the full underlying density field (pixel by pixel) + cosmological parameters? It is a very high dimension problem

Slide 25

Slide 25 text

Field-level inference 17/02/2025 Réunion GT ICR 25 𝜒2 Iteration #1 Iteration #2… Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? It is a very high dimension problem: we use gradient descent

Slide 26

Slide 26 text

Field-level inference with gradient descent Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? 17/02/2025 Réunion GT ICR 26 It is a very high dimension problem: we use gradient descent Matter overdensity to find the field that minimises

Slide 27

Slide 27 text

Field-level inference with gradient descent Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? 17/02/2025 Réunion GT ICR 27 It is a very high dimension problem: we use gradient descent Hamiltonian Monte-Carlo to sample the posteriors A CONCEPTUAL INTRODUCTION TO HAMILTONIAN MONTE CARLO 21 Fig 14. The exploration of a probabilistic system is mathematically equivalent to the exploration of a physical system. For example, we can interpret the mode of the target density as a massive planet and the gradient of the target density as that planet’s gravitational field. The typical set becomes the space around the planet through which we want a test object, such as a satellite, to orbit. High probability Walker Next walker step depends on the value of the posterior and its gradient.

Slide 28

Slide 28 text

Field-level inference with gradient descent Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? 17/02/2025 Réunion GT ICR 28 It is a very high dimension problem: we use gradient descent Hamiltonian Monte-Carlo to sample the posteriors Density field « model » 21cm field « data » True Recovered

Slide 29

Slide 29 text

Field-level inference with gradient descent Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? 17/02/2025 Réunion GT ICR 29 It is a very high dimension problem: we use gradient descent Hamiltonian Monte-Carlo to sample the posteriors

Slide 30

Slide 30 text

Field-level inference with gradient descent Why not recover the full underlying density field (pixel by pixel) + reionisation parameters? 17/02/2025 Réunion GT ICR 30 It is a very high dimension problem: we use gradient descent Things get messy when there are ionised “bubbles” = gaps in data Need to impose a prior on the density in these missing pixels a. Matter power spectrum (known theoretically, e.g., inpainting) b. Cross-correlations (e.g., with CO maps, see Zhou & Mao 2023)

Slide 31

Slide 31 text

Gaussian constrained realisations 17/02/2025 Réunion GT ICR 31 Raghunathan+2019 T 1 T 2 Hat(T): Gaussian realisation of T 1 + T 2 whose stats are known Known pixels

Slide 32

Slide 32 text

Gaussian constrained realisations 17/02/2025 Réunion GT ICR 32 Benoit-Levy+2013, Raghunathan+2019 T 1 T 2 T 2

Slide 33

Slide 33 text

Gaussian constrained realisations 17/02/2025 Réunion GT ICR 33 Any Gaussian realisation will not have the properties to match observations:

Slide 34

Slide 34 text

Gaussian constrained realisations 17/02/2025 Réunion GT ICR 34 Planck collaboration

Slide 35

Slide 35 text

Conclusions Missing data problems are common in cosmology/astrophysics. Other examples: • Masking the Galaxy or point sources • Resolution limits, e.g., in spectroscopic surveys… Depending on the application • Fill with statistical realisation or truth? • Requires a model for all data components: What if this model is not accurate? • Difficult to assess uncertainties pertaining to the missing data 17/02/2025 Réunion GT ICR 35 Thank you!