Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

次世代の地上単一鏡装置開発におけるデータ科学の応用 / 2021-09-14

Akio Taniguchi
September 14, 2021

次世代の地上単一鏡装置開発におけるデータ科学の応用 / 2021-09-14

データ科学は、画像のような“大きな”データやサンプルが不完全な“小さな”データなどから新たな知見を得るためのアプローチとして近年注目され、機械学習やスパースモデリングをはじめとする様々な手法が天文学へ応用されつつある。次世代の地上単一鏡計画においても、広視野・広波長域の同時分光撮像に対する世界的なニーズからビッグデータ化は不可避であり、従来の“人の手を介した”処理に頼らない観測・データ処理・天体検出の方法論の確立が急務である。また、多波長観測とのシナジーを前提としたデータ公開の仕組みも重要である。

本講演では、次世代の単一鏡装置開発におけるデータ科学が果たす役割と課題を、実際の応用例を交えて紹介する。一般的な地上観測では、天体信号は複数の要素(地球大気・アンテナ・受信機・分光計)を通して観測者に届くため、それぞれの要素での信号の劣化が課題である。この際、天体信号または雑音信号の持つ統計的な性質に即したデータ科学的方法を導入できれば、信号の劣化を抑えることが可能である。一例として、大気放射の天体信号への重畳は感度を制限する大きな要因である。本講演では大気放射を自動的に分離するデータ科学的方法を紹介し、次世代の装置開発においてデータ科学が観測感度の向上に必要不可欠であることを示す。

また、先行してビッグデータ化が進む光赤外望遠鏡の例を挙げつつ、観測から公開までのデータの取扱いに関する課題も述べる。特に、観測データの全てを観測者が手元にダウンロードし解析するという従来の方法は困難になることが予想される。本講演では、これに代わる方法としてデータ解析や可視化をクラウドベースで行うサイエンスプラットフォームの可能性を紹介し、実現のために必要なデータ形式やデータ処理の開発課題を示す。

Akio Taniguchi

September 14, 2021
Tweet

More Decks by Akio Taniguchi

Other Decks in Science

Transcript

  1. 涯㖑㔳խ傈劤⴨䃊խ⚥鿇㖑倯 h濶鿇牂 ͸͡Ίʹɿࣗݾ঺հ 2 || ݚڀྺ • 2013 - 2018:

    ౦ژେֶఱจֶઐ߈ɾՏ໺ݚڀࣨʢPh.D.ʣ • 2018 - 2021: ໊ݹ԰େֶఱମ෺ཧֶݚڀࣨʢϙευΫʣ || ݚڀ෼໺ • DESHIMA project: Φϯνοϓ௒޿ଳҬ෼ޫ૷ஔͷ։ൃ • σʔλՊֶΛԠ༻ͨ͠෼ޫ؍ଌख๏ɾσʔλͷߴײ౓Խ '96-'09 '09-'13 '90-'96 '13-'18 '18-'21 BS, MS, Ph.D Postdoc (DESHIMA) Home town
  2. ͸͡Ίʹɿࣗݾ঺հ 3 Yoichi Tamura / AtLAST Science Workshop 2018 技術開発:信号処理へのデータ科学の応用

    •求められるリダクション・解析のリアルタイム性 Ø 大気除去・天体検出を(準)リアルタイムで行い、必要なデータだけ残す • 例:スパースモデリングを利用した大気ー天体信号分離 Ø 天体検出・データ圧縮に有効なだけでなく、観測感度の改善も果たす = + + + + = PSW観測データ 大気放射 天体信号 ノイズ Taniguchi, A., Tamura, Y., Ikeda, S., Takekoshi, T., Kawabe, R., 2021, AJ, 162, 111 Zhou & Tao 2011 time freq. 名大・谷口暁星さん、田村陽一さんほか 12 © K. Kohno (ASJ 2021 annual autumn meeting, Z101a)
  3. Table of contents 5 || σʔλՊֶͱఱจֶ • σʔλՊֶͱ͸Կ͔ʁ • σʔλՊֶͰԿ͕Ͱ͖ΔʢΑ͏ʹͳͬͨʣͷ͔ʁ

    || σʔλՊֶͱ୯Ұڸ૷ஔ։ൃ • σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ • σʔλՊֶΛԠ༻͢ΔͷʹԿ͕ඞཁ͔ʁ || ࣍ੈ୅αϒϛϦ೾୯Ұڸ΁ͷԠ༻ • Ϟσϧέʔεɿେؾআڈ΁ͷσʔλՊֶͷԠ༻ • ి೾ఱจֶίϛϡχςΟͱͯ͠ͷ՝୊ || ·ͱΊ
  4. σʔλՊֶͱ͸Կ͔ʁԿ͕Ͱ͖Δͷ͔ʁ 6 The data science Venn diagram (Drew Conway 2010)

    ݚڀྖҬͷઐ໳஌ࣝ ϓϩάϥϛϯά౳ ͷITεΩϧ ୯ʹITΛ׆༻ͯ͠ ݁ՌΛಋ͘ͷ͸ةݥʂ ਺ֶɾ౷ܭֶͷ஌ࣝ ैདྷͷݚڀ ·ͨ͸๖ժతݚڀ || σʔλՊֶʢσʔλαΠΤϯεʣ • σʔλ͔ΒɺΑΓ൚༻తͳ஌ࣝΛநग़͢ΔͨΊͷֶ໰ʢDhar 2013ʣ • σʔλͷॲཧํ๏ɾղੳํ๏ʹؔ͢Δ஌ࣝΛਂΊɺͦΕΛ޿ΊΔ͜ͱʢIkeda 2016ʣ
  5. σʔλՊֶͱ͸Կ͔ʁԿ͕Ͱ͖Δͷ͔ʁ 7 σʔλඪ४Խɾม׵ϓϩάϥϜఏڙ ఱจֶ ݚڀऀ ౷ܭ਺ཧ ݚڀऀ ΠϯλʔϑΣʔε ʢݚڀऀɾπʔϧʣ ౷ܭ਺ཧతํ๏ͷఏҊʢ։ൃʣɾఏڙ

    || σʔλՊֶʢσʔλαΠΤϯεʣ • σʔλ͔ΒɺΑΓ൚༻తͳ஌ࣝΛநग़͢ΔͨΊͷֶ໰ʢDhar 2013ʣ • σʔλͷॲཧํ๏ɾղੳํ๏ʹؔ͢Δ஌ࣝΛਂΊɺͦΕΛ޿ΊΔ͜ͱʢIkeda 2016ʣ
  6. σʔλՊֶͱ͸Կ͔ʁԿ͕Ͱ͖Δͷ͔ʁ 8 || ৽ͨͳ஌ݟʢ৘ใʣͷநग़ • খ͞ͳσʔλʢྫɿܽଛσʔλɾߴ࣍ݩখඪຊʣ͔Βଟ͘ͷ৘ใΛநग़͢Δ • େ͖ͳσʔλʢྫɿը૾ɾಈըʣ͔Βಛ௃ྔΛࣗಈతʹநग़͢Δ || Ϗοάσʔλ΁ͷରԠ

    • ύλʔϯೝࣝʢྫɿػցֶशʣͳͲʹΑΔݕग़ɾൃݟͷޮ཰Խ • ෳࡶͳλεΫ΍ҙࢥܾఆͷࣗಈԽʢΦʔτϝʔγϣϯʣ self-driving cars60,61. Companies such as Mobileye and NVIDIA are using such ConvNet-based methods in their upcoming vision sys- tems for cars. Other applications gaining importance involve natural language understanding14 and speech recognition7. Despite these successes, ConvNets were largely forsaken by the mainstream computer-vision and machine-learning communities until the ImageNet competition in 2012. When deep convolutional networks were applied to a data set of about a million images from the web that contained 1,000 different classes, they achieved spec- tacular results, almost halving the error rates of the best compet- ing approaches1. This success came from the efficient use of GPUs, ReLUs, a new regularization technique called dropout62, and tech- niques to generate more training examples by deforming the existing ones. This success has brought about a revolution in computer vision; ConvNets are now the dominant approach for almost all recognition and detection tasks4,58,59,63–65 and approach human performance on some tasks. A recent stunning demonstration combines ConvNets and recurrent net modules for the generation of image captions (Fig. 3). Microsoft, IBM, Yahoo!, Twitter and Adobe, as well as a quickly growing number of start-ups to initiate research and development projects and to deploy ConvNet-based image understanding products and services. ConvNets are easily amenable to efficient hardware implemen- tations in chips or field-programmable gate arrays66,67. A number of companies such as NVIDIA, Mobileye, Intel, Qualcomm and Samsung are developing ConvNet chips to enable real-time vision applications in smartphones, cameras, robots and self-driving cars. Distributed representations and language processing Deep-learning theory shows that deep nets have two different expo- nential advantages over classic learning algorithms that do not use distributed representations21. Both of these advantages arise from the power of composition and depend on the underlying data-generating distribution having an appropriate componential structure40. First, learning distributed representations enable generalization to new combinations of the values of learned features beyond those seen during training (for example, 2n combinations are possible with n 68,69 Figure 3 | From image to text. Captions generated by a recurrent neural network (RNN) taking, as extra input, the representation extracted by a deep convolution neural network (CNN) from a test image, with the RNN trained to ‘translate’ high-level representations of images into captions (top). Reproduced with permission from ref. 102. When the RNN is given the ability to focus its attention on a different location in the input image (middle and bottom; the lighter patches were given more attention) as it generates each word (bold), we found86 that it exploits this to achieve better ‘translation’ of images into captions. A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A gira e standing in a forest with trees in the background. A dog is standing on a hardwood oor. A stop sign is on a road with a mountain in the background T. Bouwmans et al. / Computer Science Review 23 (2017) 1–71 CA via decomposition into low-rank plus sparse matrices in foreground/background separation: Original image (309), low-rank matrix L (background), sparse oreground), foreground mask (Sequences from BMC 2012 dataset [16]). ew for a comparative evaluation in the application of back- oreground separation. lem formulations based on decomposition into low-rank plus matrices: a preliminary overview im of this section is to allow the reader to have a quick ary overview of the different robust problem formulations reviewed in detail in the different sections of this paper. fferent problem formulations based on an implicit or decomposition into low-rank plus additive matrices are d in the following categories: robust PCA, robust non- matrix factorization, robust subspace recovery, robust tracking, robust matrix completion and robust low-rank ation. bust Principal Component Analysis (RPCA) t research in robust PCA is based on the explicit osition into low-rank plus sparse matrices which differs e decomposition, the loss functions, the optimization and the solvers used. These different approaches can be d as follows: CA via Principal Component Pursuit (RPCA-PCP): The more useful to estimate the low-rank matrix and the sparse matrix in an incremental way quickly when a new frame comes rather than in a batch way. The third limitation is that the spatial and temporal features are lost as each frame is considered as a column vector. The fourth limitation is that PCP imposed the low-rank component being exactly low- rank and the sparse component being exactly sparse but the observations like in video surveillance are often corrupted by noise affecting every entry of the data matrix. The fifth limitation is that PCP assumed that all entries of the matrix to be recovered are exactly known via the observation and that the distribution of corruption should be sparse and random enough without noise. These assumptions are rarely verified in the case of real applications because of the following main reasons: (1) only a fraction of entries of the matrix can be observed in some environments, (2) the observation can be corrupted by both impulsive and Gaussian noise, and (3) the outliers i.e. moving objects are spatially localized. Many efforts have been recently concentrated to develop low- computational algorithms for solving PCP [18–23,12,24–28]. Other authors investigated incremental algorithms of PCP to update the low-rank and sparse matrix when a new data arrives [29–32]. Real-time implementations [33–35] have been developed too. Moreover, other efforts have addressed Bouwmans+2017 (review) LeCun+2015 (review)
  7. σʔλՊֶͷఱจֶ΁ͷԠ༻ 9 || ৽ͨͳ஌ݟʢ৘ใʣͷநग़ • খ͞ͳσʔλ͔Βଟ͘ͷ৘ใΛநग़ɹྫɿεύʔεϞσϦϯάʹΑΔ࠷దԽ • େ͖ͳσʔλ͔Βಛ௃ྔΛࣗಈతʹநग़ɹྫɿػցֶशʹΑΔఱମݕग़ɾ෼ྨ || Ϗοάσʔλ΁ͷରԠ

    • ݕग़ɾൃݟͷޮ཰Խɹྫɿಛ௃ྔநग़ʹΑΔલܠఱମʔഎܠ෼཭ • ΦʔτϝʔγϣϯɹྫɿΩϡʔ؍ଌɾύΠϓϥΠϯॲཧɾσʔλϕʔε particular, the raw reconstructed images in Figure 6 clearly show that smooth edges in the ground-truth images, which are attributed to a smooth transition in the emissivity and opacity of the plasma in the accretion flow, are much better reconstructed with ℓ1 +TSV regularization. As a consequence of this, the TSV term comes reproduces a much clearer shadow feature in the reconstructed images. For the Free-fall model, the size of the black hole shadow is larger in the ℓ1 +TSV image that the isoTV term and gets closer to the ground truth than the isoTV term. For sub-Keplerian and Keplerian models, the black hole shadow is visible in the ℓ1 +TSV images but is mostly obscured (except for the darker funnel region) in the ℓ1 +isoTV methods. The appearance of the reconstructed images indicates that ℓ1 +TSV regularization is justified based on a more physically reasonable assumption and is therefore more suitable to image the objects seen in many astronomical observations. In the following subsections, we evaluate the images more quantita- tively with the image fidelity metrics described in Section 4. 5.3. NRMSE Analysis and Optimal Beam Sizes In Figure 7, we evaluate the NRMSE metric on the image domain and its gradient domain over various spatial scales, as in previous work (Chael et al. 2016; Akiyama et al. 2017a, 2017b). The black curves represent the ideal NRMSE curves between the original (unconvolved) ground-truth image and the ground-truth image after convolution with a Gaussian beam scaled to each resolution on the horizontal axis. These curves represent the highest fidelity available at a given resolution, as would be provided by an algorithm that reconstructs the image Figure 5. Ground-truth image (left-most) and images reconstructed with CS-CLEAN (second from left), ℓ1 +isoTV (second from right), and ℓ1 +TSV (right-most) regularization. All reconstructed images are convolved with elliptical Gaussian beams represented by the yellow ellipses, for which the size corresponds to the optimal resolution determined with the image-domain NRMSE curve in Figure 7 (see Section 5.3). The same transfer function is adopted for four images of each model (i.e., on each row). 9 The Astrophysical Journal, 858:56 (14pp), 2018 May 1 Kuramochi et al. the rank and cardinality of the low-rank and sparse matrices, respectively. For the low-rank matrix L, we check the distribution of singular values, applying SVD, as shown in the main panel of Figure 2. The matrix L is expressed as = L UDVT , where U and V are orthogonal matrices, and D is a diagonal one. Then, we set the rank of the low-rank matrix by setting zeros for the singular values at indices larger than the rank. Within the sparse matrix S, the transient events can be easily extracted because these events are innately sparse. The time variation of the sky background can be monitored by checking the noise matrix G. After the data matrix M has been decomposed into the three matrices L, S, and G, further data processing is necessary, because otherwise the data size would be three times larger than that of the original data. The low-rank matrix L is easily compressed into three small matrices, as shown in Figure 1. For the sparse matrix S, the frames that contain a transient event(s) must be preserved, movie data, due to the speed of computation and memory consumption. We have rewritten the GoDec code in C++, utilizing the OpenBLAS5 and LAPACK6 libraries. We use Quick Select, instead of full sorting, to select non-zero elements for a sparse matrix in the GoDec algorithm. 3. APPLICATION OF THE PROPOSED METHOD We used the movie data set of a CMOS sensor for 400 frames obtained with the Tomo-e PM in 2015 December, which contains some transient events lasting for a short duration (Ohsawa et al. 2016). Panels (a) and (e) of Figure 3 show the subarray images with 300×300 pixels in two different time frames, which contained a transient point source and a meteor, respectively. We applied the decomposition to the data by setting r=10 and = ´ k 1 108. Panels (b)–(d) and Figure 3. Example decomposition images for movie data of the Tomo-e Gozen from two frames (top and bottom rows). Original (denoted as the matrix M in the main text), low-rank (L), sparse (S), and noise (G) images are shown in the four columns from left to right, respectively. A transient point source appears near the center of the image at the time frame of the top row, as spotted in the original image (a), in contrast to (e), which was taken in a different frame (bottom row), and as clearly visible in the sparse one (c), in contrast to (g). On the other hand, a line, which is a light trail caused by a meteor, is seen in the second time frame (bottom row), as in the original image (e) and the sparse one (g). These transient events are not recognized in the low-rank images (b) and (f). The noise images (d) and (h) do not contain any noticeable patterns. The Astrophysical Journal, 835:1 (5pp), 2017 January 20 Morii et al. Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning Low-rank + sparse decomposition (Morii+2017) Sparse modeling (Kuramochi+2018, ...)
  8. σʔλՊֶͷఱจֶ΁ͷԠ༻ 11 || 2010೥୅͔Βٸ଎ʹ֦େͨ͠ఱจֶ΁ͷσʔλՊֶͷԠ༻ • ೔ຊʹ͓͍ͯ͸ɺ2015೥Ҏ߱ͷԠ༻͕ݦஶʹ૿Ճ͍ͯ͠Δ 0 10 20 30

    40 2000 2005 2010 2015 2020 ೔ຊఱจֶձय़قɾळق೥ձͰͷσʔλՊֶؔ࿈ߨԋ਺ʢ˞ʣͷਪҠ ※λΠτϧͷΈ࣍ͷϫʔυͰݕࡧ: "ػցֶश", "ਂ૚ֶश", "χϡʔϥϧ", "εύʔε", "ૄੑ", "௒ղ૾", "σʔλՊֶ", "machine learning", "deep learning", "sparse", "neural", "GAN", "CNN" X W V U T S R Q P N M ෼໺ผʢZ͸আ֎ʣ
  9. Table of contents 12 || σʔλՊֶͱఱจֶ • σʔλՊֶͱ͸Կ͔ʁ • σʔλՊֶͰԿ͕Ͱ͖ΔʢΑ͏ʹͳͬͨʣͷ͔ʁ

    || σʔλՊֶͱ୯Ұڸ૷ஔ։ൃ • σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ • σʔλՊֶΛԠ༻͢ΔͷʹԿ͕ඞཁ͔ʁ || ࣍ੈ୅αϒϛϦ೾୯Ұڸ΁ͷԠ༻ • Ϟσϧέʔεɿେؾআڈ΁ͷσʔλՊֶͷԠ༻ • ి೾ఱจֶίϛϡχςΟͱͯ͠ͷ՝୊ || ·ͱΊ
  10. σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ 13 1990 2000 2010 2020 2030 Year 5 10

    15 20 25 Primary Mirro APEX GLT ACT SPT SO/CCATp AtLAST 1990 2000 2010 2020 2030 48 24 16 12 9.5 Resolution a 1990 2000 2010 2020 2030 Year 101 102 103 Spatial Dynamic Range CSO JCMT IRAM 30m ASTE LMT APEX GLT ACT SPT SO/CCATp AtLAST 1990 2000 2010 2020 2030 Year 100 101 102 103 104 A (m2 deg2) CSO JCMT IRAM 30m ASTE LMT APEX GLT ACT SPT SO/CCATp AtLAST !$# 1990 2000 2010 2020 2030 Year 101 102 103 FoV (a SPT SO/CCATp AtLAST !%# Figure 2. Primary dish diameter and spatial resolution (panel a), field of view (panel b), throughput (A⌦ = FoV ⇥ collecting area, also known as ´ etendue, panel c), and spatial dynamic range (defined as the number of resolution element across the FoV, panel d) of existing or funded single-dish telescopes operating at (sub)mm wavelengths. Only facilitie that can (or will) observe frequencies of up to at least 270 GHz (1.1mm) are included. Acronyms, defined for conve nience in Table A in the appendix, correspond to: Caltech Submm Observatory (CSO); James Clerk Maxwell Telescop (JCMT); the 30-m telescope of the Institut de radioastronomie millim´ etrique (IRAM); the Atacama Submillimeter Tele scope Experiment (ASTE); The Large Millimeter Telescope or Gran Telescopio Milim´ etrico (LMT/GTM); the Atacam Pathfinder Experiment (APEX); the Greenland Telescope (GLT), the Atacama Cosmology Telescope (ACT); the Sout Pole Telescope (SPT); the Simons Observatory (SO) and CCAT-prime (which share the same design), and our concep Figure 4. Detector counts of several field leading millimetre or submillimetre-wave direct-detection instruments, using bolometers, transition edge sensors, or kinetic inductance detectors. The data were compiled from an amalgam of publications and websites, and are shown here solely to illustrate a trend. The best-fit log-linear relation implies the number of detectors can increase by an order of magnitude roughly every seven years, reaching the megapixel regime circa 2032. Our projection for a wide-field, Cassegrain-mounted first-generation camera, which we refer to generically as ‘AtLAST Cam,’ is marked with a red star. Acronyms are defined in Table A in the appendix. performance) operational goal for each instrument, and develop the technical requirements to meet that goal. We will mature this further during the design study, but at the moment we consider the following operational goals as driving: 1) full band spectral mapping of extended sources (more than a few arcminutes) as the driving goal for the high resolution spectrometer, 2) detection of high redshift galaxies for the continuum camera, and 3) redshift determination of detected galaxies for the low resolution spectrometer. Broadly, these three driving goals determine the frequency band allocation within the limited feed horn count of the high resolution spectrometer and limited focal plane area of the continuum camera, and also optimise the spectral resolution and frequency coverage for the low resolution spectrometer. AtLASTͷαʔϕΠεϧʔϓοτʢ༧૝ʣ AtLASTͷΧϝϥݕग़ث਺ʢ༧૝ʣ || ఱจσʔλͷϏοάσʔλԽͰσʔλՊֶͷԠ༻͸ඞཁෆՄܽ • ओʹՄࢹޫɾ੺֎ઢ๬ԕڸͰઌߦɺ2020೥୅ʹ͸ϖλόΠτͷྖҬ΁ • αϒϛϦ೾๬ԕڸ΋ྫ֎Ͱ͸ͳ͍ʢੜσʔλɾੜ੒෺૒ํͰʮܻҧ͍ʯʣ Klaassen+2020 Klaassen+2020
  11. σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ 14 || ఱจσʔλͷϏοάσʔλԽͰσʔλՊֶͷԠ༻͸ඞཁෆՄܽ • ओʹՄࢹޫɾ੺֎ઢ๬ԕڸͰઌߦɺ2020೥୅ʹ͸ϖλόΠτͷྖҬ΁ • αϒϛϦ೾๬ԕڸ΋ྫ֎Ͱ͸ͳ͍ʢੜσʔλɾੜ੒෺૒ํͰʮܻҧ͍ʯʣ reasing data

    volumes of existing and upcoming telescopes: Very ope (VLT), Sloan Digital Sky Survey (SDSS), Visible and In- ope for Astronomy (VISTA), Large Synoptic Survey Telescope Thirty Meter Telescope (TMT). d, in which more than 200 million galaxies, and even more stars, tected. Upcoming surveys will provide far greater data volumes. Kremer et al. 2017 ๬ԕڸͷ૯σʔλαΠζʢͷ༧ଌʣ Մࢹޫɾۙ੺֎๬ԕڸ͕ “Ұ൩Ͱ”औಘ͢ΔσʔλαΠζ neutrino observatories, which will produce tens of events per hour (Reitze 2019). This will require developing the cyberinfrastructure needed to combine several large-area follow-up surveys (i.e., LSST and ZTF) with real-time alerts (LIGO/Virgo, IceCube, and LISA) and analysis software tools. The white papers above provide concrete examples of how large data sets will be vital to make progress in specific science areas spanning astrophysics. Moreover, in an additional series of 6 science white papers, Fabbiano et al. (2019) emphasize that many paradigm-shifting discoveries in the 2020s will not be made through well-formulated hypotheses based on knowledge of the time, but rather by an exploratory discovery approach enabled by new telescopes and instrumentation, as well as by high-quality data products in easily accessible and interoperable science archives. Figure 1. The 2020s and beyond will see large increases in data volumes. Approximate expected data volumes in terabytes of selected astronomical observational facilities and surveys are shown as a function of time. Symbols are plotted at the (expected) end of operations. ​Ongoing surveys as of this writing are plotted in 2019 with an arrow. The current size of major data centers are shown on the right axis. 5 LSST ASKAP JWST 2MASS IRAS Herschel Euclid Planck Gaia ALMA WFIRST 1 PB Desai et al. 2019 90 TB 30 TB 0.3 TB 0.2 TB
  12. σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ 14 || ఱจσʔλͷϏοάσʔλԽͰσʔλՊֶͷԠ༻͸ඞཁෆՄܽ • ओʹՄࢹޫɾ੺֎ઢ๬ԕڸͰઌߦɺ2020೥୅ʹ͸ϖλόΠτͷྖҬ΁ • αϒϛϦ೾๬ԕڸ΋ྫ֎Ͱ͸ͳ͍ʢੜσʔλɾੜ੒෺૒ํͰʮܻҧ͍ʯʣ reasing data

    volumes of existing and upcoming telescopes: Very ope (VLT), Sloan Digital Sky Survey (SDSS), Visible and In- ope for Astronomy (VISTA), Large Synoptic Survey Telescope Thirty Meter Telescope (TMT). d, in which more than 200 million galaxies, and even more stars, tected. Upcoming surveys will provide far greater data volumes. Kremer et al. 2017 ๬ԕڸͷ૯σʔλαΠζʢͷ༧ଌʣ Մࢹޫɾۙ੺֎๬ԕڸ͕ “Ұ൩Ͱ”औಘ͢ΔσʔλαΠζ neutrino observatories, which will produce tens of events per hour (Reitze 2019). This will require developing the cyberinfrastructure needed to combine several large-area follow-up surveys (i.e., LSST and ZTF) with real-time alerts (LIGO/Virgo, IceCube, and LISA) and analysis software tools. The white papers above provide concrete examples of how large data sets will be vital to make progress in specific science areas spanning astrophysics. Moreover, in an additional series of 6 science white papers, Fabbiano et al. (2019) emphasize that many paradigm-shifting discoveries in the 2020s will not be made through well-formulated hypotheses based on knowledge of the time, but rather by an exploratory discovery approach enabled by new telescopes and instrumentation, as well as by high-quality data products in easily accessible and interoperable science archives. Figure 1. The 2020s and beyond will see large increases in data volumes. Approximate expected data volumes in terabytes of selected astronomical observational facilities and surveys are shown as a function of time. Symbols are plotted at the (expected) end of operations. ​Ongoing surveys as of this writing are plotted in 2019 with an arrow. The current size of major data centers are shown on the right axis. 5 LSST ASKAP JWST 2MASS IRAS Herschel Euclid Planck Gaia ALMA WFIRST 1 PB Desai et al. 2019 ௒఻ಋ෼ޫࡱ૾૷ஔKATANA ʢσʔλϨʔτ ~150 MB/sʣ 90 TB 30 TB 0.3 TB 0.2 TB
  13. ઌߦ͢ΔՄࢹޫɾ੺֎ઢ๬ԕڸͰͷରԠ 17 || Ϋϥ΢υϕʔεͷσʔλղੳɾՄࢹԽɾσʔλΫΤϦ • Ϣʔβ͕खݩʹσʔλΛμ΢ϯϩʔυ͢Δ͜ͱͳ͘࡞ۀ͕׬݁͢Δ • ࡞ۀʹඞཁͳܭࢉϦιʔε͸؍ଌॴʢσʔληϯλʔʣ͕ఏڙ͢Δ • ྫɿRubin

    Science PlatformʢVera C. Rubin ObservatoryʣͷϨΠϠʔߏ଄ and exploratory capabilities, over analysis features. The latter are expected to be more directly satis ed by the Notebook Aspect. 3.2 Notebook (JupyterLab) Aspect https://ls.st/lse-319
  14. || ݚڀ։ൃ෼໺ؒͷΠϯλʔϑΣʔεʢI/FʣΛ੔͑Δඞཁੑ • ϙετϓϩηεͰͲ͏ʹ͔͢ΔˠI/FΛ·͙ͨલʹ৴߸ɾσʔλΛ੔͑Δ • ඇઐ໳Ո͕࢖͑ΔΑ͏ʹσʔλΛඪ४Խ͢Δʢఱจ൛ʮσʔλͷຽओԽʯʣ ࣍ੈ୅ͷ஍্୯Ұڸ΁ͷσʔλՊֶͷԠ༻ ఱମ ৴߸ ஍ٿ

    େؾ ఱମ σʔλ ৴߸ ॲཧ ड৴ػ ෼ޫܭ Ξϯςφ ޫֶܥ ؍ଌ৴߸ ϋʔυ΢ΣΞ ιϑτ΢ΣΞ I/F I/F I/F I/F I/F ϛϦ೾ิঈޫֶʹΑΔ೾໘ิঈ େؾγϛϡϨʔλʹΑΔཧղ ड৴ػͷूੵԽʹΑΔ૷ஔͷplug-and-play σʔλʢϑΥʔϚοτʣͷඪ४Խ Ϋϥ΢υϕʔεͷσʔλղੳɾՄࢹԽ ϦΞϧλΠϜେؾআڈɾఱମݕग़ɾσʔλѹॖ 19
  15. େؾʔΞϯςφؒͷΠϯλʔϑΣʔε 20 || ϛϦ೾ิঈޫֶʢMAOʣʹΑΔ೾໘ิঈ • ׯবܭٕज़ΛԠ༻ͨ͠೾໘ηϯγϯάͰޫֶܥΛิঈ || େؾ+૷ஔϊΠζͷEnd-to-endγϛϡϨʔγϣϯʢTiEMPOʣ • େؾͷྗֶϞσϧͱ૷ஔͷϊΠζಛੑΛߟྀٖͨ͠ࣅ؍ଌΛ࣮ݱ

    Reference Noise Generator 16−24 GHz Correlator Optical Modulator SM Fiber (70 m) MEMS Optical Switch 20 GHz Receiver at Focal Point Optical Demodulator) Optical Demodulator) Optical Demodulator) O/E (Optical Demodulator) Optics Primary Mirror SM Fibers (45−48 m) Anti Aliasing Filter (16−24 GHz) Power Amplifier Feed RECEIVER CABIN TELESCOPE Switching Pattern Generator 1 PPS/10 MHz Ref. Accelerometers (b) (a) Receiver Cabin O/E−Feed Optical Switch Figure 2. (a) The system block diagram of the prototype wavefront sensor for millimetric adaptive optics. (b) The configuration of the wavefront sensor system mounted on the Nobeyama 45 m telescope. 3. SYSTEM Here we describe the system of the prototype wavefront sensor for demonstration. Although this prototype only has several elements of radiators operating at 20 GHz, which is relatively low frequency and easy to handle, the aperture-plane interferometry is scalable up to tens to hundreds of elements at higher frequencies. 3.1 Requirement and specifications The top-level requirement on the system is to instantaneously measure the deviation from the ideal mirror surface with an accuracy of 40 µm r.m.s. with a time-resolution of 100 ms, well below the natural frequency of the primary mirror structure (⇡ 1 Hz). This requirement corresponds to the phase accuracy of 1 deg r.m.s. for an operating frequency of 20 GHz. The error budget is split into (1) the statistical error of thermal noises Huijten+2020 Tamura+2020
  16. ड৴ػɾ෼ޫܭ·ΘΓͷΠϯλʔϑΣʔε 21 || ड৴ػͷूੵԽʹΑΔ૷ஔͷplug-and-play • νοϓͷަ׵͚ͩͰҟͳΔ෼ޫࡱ૾ύϥϝʔλʔͷ؍ଌ૷ஔΛ࣮ݱ • ಡΈग़͠෦ͷڞ௨ԽɾαʔυύʔςΟ੡νοϓͷಋೖɾߴϝϯςφϯεੑ || σʔλϑΥʔϚοτͷඪ४Խ

    • Ϗοάσʔλʹ଱͑ΔεέʔϥϒϧͳϑΥʔϚοτʢxarray, dask, ...ʣ ௒఻ಋ෼ޫࡱ૾γεςϜKATANA Karatsu+2019 • γϦίϯϨϯζ • ޿ଳҬΞϯςφΞϨΠ 100 mm ௒఻ಋ෼ޫճ࿏ Plug-and-play concept SZࡱ૾༻νοϓ ًઢ୳ࠪ༻νοϓ etc ... ަ׵
  17. ৴߸ॲཧʢڧ౓ߍਖ਼ɾϊΠζআڈɾetcʣ 22 || σʔλϑΥʔϚοτͷඪ४Խ • Ϗοάσʔλʹ଱͑ΔεέʔϥϒϧͳϑΥʔϚοτʢxarray, dask, ...ʣ https://www.safe.nrao.edu/wiki/pub/Software/CASA/CASAUsersCommittee2020/2020_CUC_ngCASA_CNGI.pdf ngCASA

    / CNGIʹ͓͚ΔϑΥʔϚοτͷมԽʢ༧ఆʣ MS/Image middleware, analysis and parallelization control • ngCASA is a collection of functions that implement flagging, calibration, imaging and plotting through the CNGI building blocks • Pipeline, CARTA, PlotMS, Interactive Control are applications built for specific purpo that use ngCASA / CNGI building blocks UI૚ͷ෼཭ʢޙड़ʣ
  18. Table of contents 24 || σʔλՊֶͱఱจֶ • σʔλՊֶͱ͸Կ͔ʁ • σʔλՊֶͰԿ͕Ͱ͖ΔʢΑ͏ʹͳͬͨʣͷ͔ʁ

    || σʔλՊֶͱ୯Ұڸ૷ஔ։ൃ • σʔλՊֶ͸ఱจ૷ஔ։ൃʹͳͥඞཁ͔ʁ • σʔλՊֶΛԠ༻͢ΔͷʹԿ͕ඞཁ͔ʁ || ࣍ੈ୅αϒϛϦ೾୯Ұڸ΁ͷԠ༻ • Ϟσϧέʔεɿେؾআڈ΁ͷσʔλՊֶͷԠ༻ • ి೾ఱจֶίϛϡχςΟͱͯ͠ͷ՝୊ || ·ͱΊ
  19. ޿ଳҬɾ޿ࢹ໺Խʹଈͨ͠େؾϞσϦϯάͷෆࡏ 25 • ʢαϒʣϛϦ೾Ͱ͸େؾ์ࣹʢओʹਫৠؾʣʹΑΓఱମ৴߸͕ݮਰ • ONʔOFFΛަޓʹ؍ଌ͢ΔϙδγϣϯεΠονʢPSWʣͷ໰୊఺ • ೤ࡶԻΛؚΉσʔλಉ࢜ͷݮࢉͰϊΠζϨϕϧ͕√2ഒѱԽʢ ʣ •

    ࣌ؒతɾۭؒతʹҟͳΔେؾ੒෼ͷݮࢉʹΑΔϕʔεϥΠϯͷ͏ͶΓ • OFF఺؍ଌʹΑΔఱମ؍ଌޮ཰ͷ௿͞ʢON఺؍ଌ࣌ؒ͸૯؍ଌ࣌ؒͷ൒෼ҎԼʣ σ ∼ 2 T sys / Δν t on T sky (ν) = T⋆ a (ν) η atm (ν, t) + T atm [1 − η atm (ν, t)] observable want to get atm. transmission (must be removed) → Intensity frequency atmosphere + signal (ON) Intensity only atmosphere (OFF) frequency astronomical signal Intensity frequency
  20. σʔλՊֶͷେؾϞσϦϯάख๏΁ͷԠ༻ 26 || ཁૉٕज़1: ߴස౓ʢ> 1 Hzʣ͔ͭ࿈ଓతͳ࣌ܥྻ෼ޫσʔλͷऔಘ • యܕతʹ< 1

    HzͰ࣌ؒมಈ͢Δେؾ์ࣹΛ׬શʹʢφΠΩετʣαϯϓϦϯά͢Δ || ཁૉٕज़2: ఱମ৴߸ʢًઢεϖΫτϧʣͷมௐʢen: modulationʣ • ఱମ৴߸͕σʔλ্Ͱ࿈ଓతʢ૬ؔʣʹͳΒͳ͍Α͏ʹ؍ଌํ๏Λ޻෉͢Δ || ཁૉٕज़3: ౷ܭతख๏ʹΑΔେؾɾఱମ৴߸ͷ෼཭ʢen: decompositionʣ • ݪཧతʹ√2ഒͷϊΠζϨϕϧͷѱԽͳ͠ʹେؾ੒෼ͷਪఆ͕Մೳ Freq. channels Observation time ON OFF ON OFF ON OFF OFF Continuous time-series spectra (proposed)
  21. σʔλՊֶͷେؾϞσϦϯάख๏΁ͷԠ༻ 26 || ཁૉٕज़1: ߴස౓ʢ> 1 Hzʣ͔ͭ࿈ଓతͳ࣌ܥྻ෼ޫσʔλͷऔಘ • యܕతʹ< 1

    HzͰ࣌ؒมಈ͢Δେؾ์ࣹΛ׬શʹʢφΠΩετʣαϯϓϦϯά͢Δ || ཁૉٕज़2: ఱମ৴߸ʢًઢεϖΫτϧʣͷมௐʢen: modulationʣ • ఱମ৴߸͕σʔλ্Ͱ࿈ଓతʢ૬ؔʣʹͳΒͳ͍Α͏ʹ؍ଌํ๏Λ޻෉͢Δ || ཁૉٕज़3: ౷ܭతख๏ʹΑΔେؾɾఱମ৴߸ͷ෼཭ʢen: decompositionʣ • ݪཧతʹ√2ഒͷϊΠζϨϕϧͷѱԽͳ͠ʹେؾ੒෼ͷਪఆ͕Մೳ Freq. channels Observation time ON OFF ON OFF ON OFF OFF Continuous time-series spectra (proposed) Position switching (PSW) spectra (conventional)
  22. σʔλՊֶͷେؾϞσϦϯάख๏΁ͷԠ༻ 26 || ཁૉٕज़1: ߴස౓ʢ> 1 Hzʣ͔ͭ࿈ଓతͳ࣌ܥྻ෼ޫσʔλͷऔಘ • యܕతʹ< 1

    HzͰ࣌ؒมಈ͢Δେؾ์ࣹΛ׬શʹʢφΠΩετʣαϯϓϦϯά͢Δ || ཁૉٕज़2: ఱମ৴߸ʢًઢεϖΫτϧʣͷมௐʢen: modulationʣ • ఱମ৴߸͕σʔλ্Ͱ࿈ଓతʢ૬ؔʣʹͳΒͳ͍Α͏ʹ؍ଌํ๏Λ޻෉͢Δ || ཁૉٕज़3: ౷ܭతख๏ʹΑΔେؾɾఱମ৴߸ͷ෼཭ʢen: decompositionʣ • ݪཧతʹ√2ഒͷϊΠζϨϕϧͷѱԽͳ͠ʹେؾ੒෼ͷਪఆ͕Մೳ Freq. channels Observation time ON OFF ON OFF ON OFF OFF Continuous time-series spectra (proposed)
  23. σʔλՊֶͷେؾϞσϦϯάख๏΁ͷԠ༻ 26 || ཁૉٕज़1: ߴස౓ʢ> 1 Hzʣ͔ͭ࿈ଓతͳ࣌ܥྻ෼ޫσʔλͷऔಘ • యܕతʹ< 1

    HzͰ࣌ؒมಈ͢Δେؾ์ࣹΛ׬શʹʢφΠΩετʣαϯϓϦϯά͢Δ || ཁૉٕज़2: ఱମ৴߸ʢًઢεϖΫτϧʣͷมௐʢen: modulationʣ • ఱମ৴߸͕σʔλ্Ͱ࿈ଓతʢ૬ؔʣʹͳΒͳ͍Α͏ʹ؍ଌํ๏Λ޻෉͢Δ || ཁૉٕज़3: ౷ܭతख๏ʹΑΔେؾɾఱମ৴߸ͷ෼཭ʢen: decompositionʣ • ݪཧతʹ√2ഒͷϊΠζϨϕϧͷѱԽͳ͠ʹେؾ੒෼ͷਪఆ͕Մೳ Freq. channels Observation time ON OFF ON OFF ON OFF OFF Continuous time-series spectra (proposed)
  24. σʔλՊֶͷେؾϞσϦϯάख๏΁ͷԠ༻ 27 || GoDec algorithm (Zhou & Tao 2011) •

    ৴߸ͷεύʔεੑɾഎܠ੒෼ͷ௿ϥϯΫੑΛ࢖ͬͨ৴߸ʔഎܠ෼཭ख๏ || ෼ޫ؍ଌσʔλ΁ͷద༻ʢTaniguchi et al. 2021ʣ • ৴߸ʹًઢεϖΫτϧɾഎܠʹେؾ์ࣹͱͯ͠ΧελϚΠζͨ͠GoDecΛద༻ = + + + + = PSW؍ଌσʔλ େؾ์ࣹ ఱମ৴߸ ϊΠζ Taniguchi et al. 2021 Zhou & Tao 2011 time freq.
  25. LMT 50 m / Band 4ड৴ػͷhigh-zۜՏًઢ؍ଌ΁ͷద༻ 28 || PJ020941.3ʢz=2.55ʣͷCOʢJ=4-3ʣًઢ؍ଌσʔλ΁ͷద༻ •

    ಉҰσʔλͰ΋େؾআڈख๏Λม͑Δ͚ͩͰϊΠζϨϕϧ͕໿1.7ഒվળ • ϊΠζʔϊΠζΛߦΘͳ͍͜ͱʹՃ͑ɺϕʔεϥΠϯͷ҆ఆԽ΋ײ౓ʹد༩ Conventional position-switch data-reduction Proposed new data-reduction using the GoDec algorithm Taniguchi et al. 2021 AJ 2.5 GHz ໿1.7ഒͷϊΠζϨϕϧͷվળ
  26. ௒޿ଳҬ෼ޫ૷ஔʢDESHIMAʣʹ͓͚Δେؾআڈ 29 || ௒޿ଳҬ × ߴײ౓ʢ௕࣌ؒʹϏοάσʔλʣεϖΫτϧ؍ଌ΁ͷ௅ઓ • 4ܻҎ্΋ҟͳΔେؾ์ࣹʢ࣌ؒɾۭؒมಈ͋Γʣ͔Βఱମ৴߸Λ෼཭͍ͨ͠ • ఱମ৴߸͕࣌ؒมԽ͠ͳ͍ͱ͍͏੍໿৚݅Λ௥Ճʢ

    : Brackenhoff et al. in prepʣ • DESHIMA end-to-endγϛϡϨʔγϣϯΛ࢖ͬͯ࿈ଓ೾ʴًઢεϖΫτϧͷਪఆΛ։ൃத SPILITTERʹΑΔۜՏͷ࿈ଓ೾ʴًઢεϖΫτϧͷஞ࣍ਪఆ © A. Endo 250 300 350 400 (GHz) 100 200 300 400 500 10-3 10-2 10-1 100 101 102 (K) (GHz) ً౓Թ౓ͷൺֱ ஍ٿେؾ ఱମ৴߸ʢӉ஦ʣ ఱମ৴߸ʢ஍্ʣ 1 0 3 2 1 0 3 2 TA * (mK) TA * (mK) © Stefanie Brackenhoff ܻ̐Ҏ্ͷࠩʂ ਪఆ PSW Ground truth ࣌ؒ෼ׂͨ࣌͠ܥྻ σʔλͷݸʑͷਪఆ 220 GHz
  27. ௒޿ଳҬ෼ޫ૷ஔʢDESHIMAʣʹ͓͚Δେؾআڈ 30 || SPLITTERʹ͓͚Δ෼໺ؒԣஅʢinter-field: I/Fʣݚڀ • ౷ܭɾ৴߸ॲཧͷઐ໳ՈͱTU Delftςϥϔϧπάϧʔϓͷڞಉݚڀ • ෼໺Λͭͳ͗σʔλՊֶΛಋೖ͢Δਓࡐɾٕज़։ൃͷͻͱͭͷྫ

    (Circuits and Systems Group) A. Endo (Terahertz Sensing Group) Stefanie. Brackenhoff A. Taniguchi $GUVVJGUKUKPGNGEVTKECN GPIKPGGTKPIQH67&GNHVb Alle-Jan van der Veen
  28. ·ͱΊ 31 || ࣍ੈ୅ͷ஍্୯Ұڸ૷ஔ։ൃʹ͓͚ΔσʔλՊֶͷԠ༻ • ϏοάσʔλରԠɿϢʔβɾαʔϏεɾσʔλΛ෼཭ͨ͠ϓϥοτϑΥʔϜ • ΠϯλʔϑΣʔεͷ੔උɿϙετϓϩηεʹཔΒͳ͍࢓૊Έͮ͘Γ ఱମ ৴߸

    ஍ٿ େؾ ఱମ σʔλ ৴߸ ॲཧ ड৴ػ ෼ޫܭ Ξϯςφ ޫֶܥ ؍ଌ৴߸ ϋʔυ΢ΣΞ ιϑτ΢ΣΞ I/F I/F I/F I/F I/F ϛϦ೾ิঈޫֶʹΑΔ೾໘ิঈ େؾγϛϡϨʔλʹΑΔཧղ ड৴ػͷूੵԽʹΑΔ૷ஔͷplug-and-play σʔλʢϑΥʔϚοτʣͷඪ४Խ Ϋϥ΢υϕʔεͷσʔλղੳɾՄࢹԽ ϦΞϧλΠϜେؾআڈɾఱମݕग़ɾσʔλѹॖ
  29. ·ͱΊ 32 || σʔλՊֶͱఱจֶ • ৽ͨͳ஌ݟΛσʔλ͔Βநग़͢ΔͨΊͷํ๏ • Ϗοάσʔλ΁ͷରԠʢଟ೾௕Ͱڞ௨ͷ՝୊ʣ || σʔλՊֶͱαϒϛϦ೾૷ஔ։ൃ

    • Ϗοάσʔλ΁ͷରԠͰσʔλՊֶ͸ඞཁෆՄܽ • ֤ίϯϙʔωϯτؒͷΠϯλʔϑΣʔεͷ੔උ͕ॏཁ || ࣍ੈ୅αϒϛϦ೾୯Ұڸ΁ͷԠ༻ • ݱࡏͷ؍ଌ૷ஔͷ࿮૊ΈͰͰ͖Δʢ෼ޫʣେؾআڈ͸໨ॲཱ͕ͪͭͭ͋Δ • σʔλՊֶͷԠ༻Λલఏʹͨ͠૷ஔɾ؍ଌͷσβΠϯ͕ࠓޙͷ՝୊ • αΠΤϯςΟετɾΤϯδχΞͱ΋ʹσʔλʹҙࣝతͰ͋Γ͍ͨ