A data-scientific noise-removal method for efficient submillimeter spectroscopy with single-dish telescopes / 2021-08-18

Slide 1

Slide 1 text

A data-scienti fi c noise-removal method for ef fi cient submillimeter spectroscopy with single-dish telescopes Akio TaniguchiʢPostdoc, Astrophysics lab, Nagoya University) 2021-08-18 ALMA-J Seminar Based on: Taniguchi+19 PASJ (arxiv:1911.02574) and Taniguchi+21 AJ in press (arxiv:2107.06290)

Slide 8

Slide 8 text

This talk: data scienti fi c methods towards LST 7 || Getting new knowledge • from small (or incomplete) data (sparse modeling for super-resolution images) • from (extremely) big data (classi fi cation of celestial objects by machine learning) || Handling "big data" • Ef fi cient detection or identi fi cation (background-foreground separation) • Automation (queue observations, pipeline processing, database) particular, the raw reconstructed images in Figure 6 clearly show that smooth edges in the ground-truth images, which are attributed to a smooth transition in the emissivity and opacity of the plasma in the accretion flow, are much better reconstructed with ℓ1 +TSV regularization. As a consequence of this, the TSV term comes reproduces a much clearer shadow feature in the reconstructed images. For the Free-fall model, the size of the black hole shadow is larger in the ℓ1 +TSV image that the isoTV term and gets closer to the ground truth than the isoTV term. For sub-Keplerian and Keplerian models, the black hole shadow is visible in the ℓ1 +TSV images but is mostly obscured (except for the darker funnel region) in the ℓ1 +isoTV methods. The appearance of the reconstructed images indicates that ℓ1 +TSV regularization is justified based on a more physically reasonable assumption and is therefore more suitable to image the objects seen in many astronomical observations. In the following subsections, we evaluate the images more quantita- tively with the image fidelity metrics described in Section 4. 5.3. NRMSE Analysis and Optimal Beam Sizes In Figure 7, we evaluate the NRMSE metric on the image domain and its gradient domain over various spatial scales, as in previous work (Chael et al. 2016; Akiyama et al. 2017a, 2017b). The black curves represent the ideal NRMSE curves between the original (unconvolved) ground-truth image and the ground-truth image after convolution with a Gaussian beam scaled to each resolution on the horizontal axis. These curves represent the highest fidelity available at a given resolution, as would be provided by an algorithm that reconstructs the image Figure 5. Ground-truth image (left-most) and images reconstructed with CS-CLEAN (second from left), ℓ1 +isoTV (second from right), and ℓ1 +TSV (right-most) regularization. All reconstructed images are convolved with elliptical Gaussian beams represented by the yellow ellipses, for which the size corresponds to the optimal resolution determined with the image-domain NRMSE curve in Figure 7 (see Section 5.3). The same transfer function is adopted for four images of each model (i.e., on each row). 9 The Astrophysical Journal, 858:56 (14pp), 2018 May 1 Kuramochi et al. the rank and cardinality of the low-rank and sparse matrices, respectively. For the low-rank matrix L, we check the distribution of singular values, applying SVD, as shown in the main panel of Figure 2. The matrix L is expressed as = L UDVT , where U and V are orthogonal matrices, and D is a diagonal one. Then, we set the rank of the low-rank matrix by setting zeros for the singular values at indices larger than the rank. Within the sparse matrix S, the transient events can be easily extracted because these events are innately sparse. The time variation of the sky background can be monitored by checking the noise matrix G. After the data matrix M has been decomposed into the three matrices L, S, and G, further data processing is necessary, because otherwise the data size would be three times larger than that of the original data. The low-rank matrix L is easily compressed into three small matrices, as shown in Figure 1. For the sparse matrix S, the frames that contain a transient event(s) must be preserved, movie data, due to the speed of computation and memory consumption. We have rewritten the GoDec code in C++, utilizing the OpenBLAS5 and LAPACK6 libraries. We use Quick Select, instead of full sorting, to select non-zero elements for a sparse matrix in the GoDec algorithm. 3. APPLICATION OF THE PROPOSED METHOD We used the movie data set of a CMOS sensor for 400 frames obtained with the Tomo-e PM in 2015 December, which contains some transient events lasting for a short duration (Ohsawa et al. 2016). Panels (a) and (e) of Figure 3 show the subarray images with 300×300 pixels in two different time frames, which contained a transient point source and a meteor, respectively. We applied the decomposition to the data by setting r=10 and = ´ k 1 108. Panels (b)–(d) and Figure 3. Example decomposition images for movie data of the Tomo-e Gozen from two frames (top and bottom rows). Original (denoted as the matrix M in the main text), low-rank (L), sparse (S), and noise (G) images are shown in the four columns from left to right, respectively. A transient point source appears near the center of the image at the time frame of the top row, as spotted in the original image (a), in contrast to (e), which was taken in a different frame (bottom row), and as clearly visible in the sparse one (c), in contrast to (g). On the other hand, a line, which is a light trail caused by a meteor, is seen in the second time frame (bottom row), as in the original image (e) and the sparse one (g). These transient events are not recognized in the low-rank images (b) and (f). The noise images (d) and (h) do not contain any noticeable patterns. The Astrophysical Journal, 835:1 (5pp), 2017 January 20 Morii et al. Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning Low-rank + sparse decomposition (Morii+2017) Sparse modeling (Kuramochi+2018, ...)

Slide 43

Slide 43 text

Big data from large submm single-dish telescopes 31 || Application of data science in astronomy is inevitable! • Petabyte (PB) class data in optical-to-infrared telescopes in 2020 • (Sub)millimeter telescopes may output terabyte (TB) class data as well ... Figure 4. Detector counts of several field leading millimetre or submillimetre-wave direct-detection instruments, using bolometers, transition edge sensors, or kinetic inductance detectors. The data were compiled from an amalgam of publications and websites, and are shown here solely to illustrate a trend. The best-fit log-linear relation implies the number of detectors can increase by an order of magnitude roughly every seven years, reaching the megapixel regime circa 2032. Our projection for a wide-field, Cassegrain-mounted first-generation camera, which we refer to generically as ‘AtLAST Cam,’ is marked with a red star. Acronyms are defined in Table A in the appendix. performance) operational goal for each instrument, and develop the technical requirements to meet that goal. We will mature this further during the design study, but at the moment we consider the following operational goals as driving: 1) full band spectral mapping of extended sources (more than a few arcminutes) as the driving goal for the high resolution spectrometer, 2) detection of high redshift galaxies for the continuum camera, and 3) redshift determination of detected galaxies for the low resolution spectrometer. Broadly, these three driving goals determine the frequency band allocation within the limited feed horn count of the high resolution spectrometer and limited focal plane area of the continuum camera, and also optimise the spectral resolution # of pixels in continuum cameras neutrino observatories, which will produce tens of events per hour (Reitze 2019). This will require developing the cyberinfrastructure needed to combine several large-area follow-up surveys (i.e., LSST and ZTF) with real-time alerts (LIGO/Virgo, IceCube, and LISA) and analysis software tools. The white papers above provide concrete examples of how large data sets will be vital to make progress in specific science areas spanning astrophysics. Moreover, in an additional series of 6 science white papers, Fabbiano et al. (2019) emphasize that many paradigm-shifting discoveries in the 2020s will not be made through well-formulated hypotheses based on knowledge of the time, but rather by an exploratory discovery approach enabled by new telescopes and instrumentation, as well as by high-quality data products in easily accessible and interoperable science archives. Figure 1. The 2020s and beyond will see large increases in data volumes. Approximate expected data volumes in terabytes of selected astronomical observational facilities and surveys are shown as a function of time. Symbols are plotted at the (expected) end of operations. Ongoing surveys as of this writing are plotted in 2019 with an arrow. The current size of major data centers are shown on the right axis. 5 KATANA on LST   (~1.5M detectors) Estimate of the total data size Klaassen+2020 Desai et al. 2019 1 PB 1 TB 100000 pixels

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text