Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Requisitos Computacionales Para la Radioastrono...

Requisitos Computacionales Para la Radioastronomía a Gran Escala: la Actualización de Sensibilidad de Banda Ancha de ALMA (WSU)

The Atacama Large Millimetre/sub-millimeter Array (ALMA) is a collection of 66 radio telescopes in the millimetre and sub-millimetric range (from 35 GHz to 950 GHz) that has been key to understanding the formation of stars and planets, the early universe, and many other scientific cases. ALMA was inaugurated in March 2013, and soon after began the process to keep ALMA at the forefront of technology and scientific relevance. The first step was the definition of the ALMA 2030 roadmap. The first priority in the ALMA 2030 roadmap is the Broadband Sensitivity Update (WSU), which will provide up to 4 times the current bandwidth in the signal chain, and the order of 70 times the transmission speeds and size of the generated data. The computational needs for this update are even greater in relative terms. In this talk we will see what the WSU program consists of and what it implies in terms of network and computing updates, and how ALMA will take advantage of the big data ecosystem to build the Radio Astronomy Data Processing System for ALMA (RADPS-ALMA).

See DIINF 2024 - YouTube (in Spanish): https://www.youtube.com/live/_8_O4txf9mQ?t=770

Juande Santander-Vela

October 23, 2024
Tweet

More Decks by Juande Santander-Vela

Other Decks in Technology

Transcript

  1. Requisitos Computacionales Para la Radioastronomía a Gran Escala: la Actualización

    de Sensibilidad de Banda Ancha de ALMA (WSU) Juande Santander-Vela JAO Development Systems Engineer Array 2024 — USACH
  2. Who am I? • Juande Santander-Vela • Electronics Engineer, Software

    Developer, Software Analyst Background • 2009: Ph.D. in 2009 on bringing radio astronomy data into the Virtual Observatory (UGR, IAA-CSIC) • 2009-2011: Applied Scientist (ESO) • 2011: ALMA Query Interface Developer (ESO) • 2012-2013: WfEver Scientist, VIA-SKA Project Manager (IAA-CSIC) • 2014-2018: System Engineer TM, SDP (SKA Organisation) • 2018-2019: Project Scientist/Engineer (MINECON, Chile) • 2019-2022: Head of Software Development (SKAO) • 2022-: Development Systems Engineer (JAO) 15+ years working in the intersection of software and instrument engineering
  3. Software is eating the world Imaging Sensors High-speed Digital Signal

    Processing 5G Networks Collaboration & VC Software
  4. Charged-Coupled Devices • Invented at Bell Labs in the 70’s

    by Willard Boyle and George E. Smith (while developing a new type of memory) • Adopted for astronomy in the late 70’s and early 80’s • Greater linearity and sensitivity compared to photographic plates → adequate for scientific imaging
  5. High-Speed Digitizers • Uses the Nyquist-Shannon sampling theorem • Broadband

    signals from 3G+ networks driving higher sampling rates
  6. High-Performance Field Programmable Grid Arrays (FPGAs) • Allow for custom

    low-latency, high-performance processing • Include fast IO (Ethernet 400G, Firefly…) • Avoids use of Application Specific Integrated Circuits (ASICs)
  7. High-Density, High-Speed Data Storage • High-performance SDDs, and tiered storage

    • Multiple technologies for management: • RAID, Just a Bunch of Disks (JBOD) • Btrfs, Ceph, OpenZFS, Unraid…
  8. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: No ALMA upgrades yet!
  9. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: Exponential growth
  10. ALMA overview • (Sub)millimeter interferometer located at 5000 m site

    in Atacama desert in Chile International partnership of North America, Europe, and East Asia • 66 configurable antennas mm • Array configurations between 0.16 and 16 km angular resolution as fine as 0.005'' at 950 GHz λ ≈ 0.3 − 9
  11. AOS (High-site) Fact sheet • ALMA antennas at 5000 m

    • ALMA staff obliged to wear oxygen outside • Oxygenated building • Location of correlators OSF (Base Camp) Fact sheet • OSF is at 2900 m • Staff sleep here • Astronomers work here • Night-time observing only, with daytime observations in Santiago Santiago ALMA operation sites
  12. Fomalhaut Fomalhaut is the brightest star in the constellation and

    one of the brightest stars known to have an orbiting planet. It lies about 25 light-years from the Earth and is surrounded by a huge disc of dust. This is a super-imposed ALMA partial image of the ring (in orange) over an earlier image obtained by the NASA/ESA Hubble Space Telescope.
  13. M87 Black Hole with Polarization First time that astronomers were

    able to measure polarization so close to the edge of a black hole. Credit: EHT Collaboration
  14. Milky Way Black Hole This, and the previous M87 image,

    were done with ALMA as part of the Event Horizon Telescope (EHT) collaboration. The Sag* black hole, being smaller, is much more dynamic than M87’s black hole, and it was therefore much more difficult to image. Credit: EHT Collaboration
  15. The importance of ALMA (and APEX) in the Event Horizon

    Telescope Credit: EHT Collaboration
  16. HL Tauri ALMA image of the young star HL Tau

    and its protoplanetary disk. This best image ever of planet formation reveals multiple rings and gaps that herald the presence of emerging planets as they sweep their orbits clear of dust and gas. It has been cited now more than 1000 times. Credit: ALMA(ESO/ NAOJ/NRAO); C. Brogan, B. Saxton (NRAO/AUI/NSF)
  17. to the atmospheric transmission windows. These windows and the tuning

    ranges are outlined in Figure 4.1. This illustrates the broad, deep absorption features, mostly due to H2 O in the lower few km of the atmosphere, as well as some O2 transitions. The many narrow features seen in this plot are mostly from stratospheric O3 , along with some transitions of CO and other trace species. In Cycle 9, Bands 3, 4, 5, 6, 7, 8, 9, and 10 are available, and the basic characteristics of the bands are outlined in Table 4.1. Each of the ALMA receiver bands is described in more detail in the following sections as well as in the references listed in Table 4.2. 100 200 300 400 500 600 700 800 900 1000 Frequency (GHz) 0 20 40 60 80 100 Transmission (%) Transmission in All ALMA Bands at Zenith 1 2 3 4 5 6 7 ALMA Bands and Transmission
  18. From Construction to Operations • ALMA declared in Operations in

    2013 (after ORR) • Some deferred capabilities (from 2005 rebaselining) became part of the ALMA Development Program. • Band 1 and Band 2 are the last Bands being delivered to the observatory: • Band 1 already being used since Cycle 10 (Cycle 11 just started observations on October 1st) • Band 2 passed their Manufacturing Readiness Review, construction and installation in progress.
  19. ALMA Development Program • Governed by the Principles for ALMA

    Development Program (2013) • Addresses: • Missing Capabilities • Obsolescence • New Capabilities • Establishes yearly funding level • Share is proportional to share of Operational costs
  20. ALMA Development Program • Governed by the Principles for ALMA

    Development Program (2013) • Addresses: • Missing Capabilities • Obsolescence • New Capabilities • Establishes yearly funding level • Share is proportional to share of Operational costs Key Principle: the ALMA Development Program must be driven by science – its purpose is to enhance the scientific capability and or impact of ALMA, within the bounds imposed by the availability of resources both for the development projects and for the ongoing operation of the observatory.
  21. Context: Facilities in the next decade • Many wonderful new

    facilities coming online over the next decade • Share many of the same science themes as ALMA • origins of galaxies • origins of stars • origins of planets ALMA TMT 30 meter ESO 39 meter JWST Nancy Grace Roman Telescope ngVLA GMT 25 meter Vera Rubin Telescope
  22. Context: Facilities in the next decade • Many wonderful new

    facilities coming online over the next decade • Share many of the same science themes as ALMA • origins of galaxies • origins of stars • origins of planets • Only one ALMA! • premier telescope for sensitive, high-angular resolution submillimeter observations • a replacement for ALMA is not on the horizon, so we must continuously enhance its capabilities ALMA TMT 30 meter ESO 39 meter JWST Nancy Grace Roman Telescope ngVLA GMT 25 meter Vera Rubin Telescope Digitization and Computing is also a big part of these projects!
  23. Context: Facilities in the next decade • Many wonderful new

    facilities coming online over the next decade • Share many of the same science themes as ALMA • origins of galaxies • origins of stars • origins of planets • Only one ALMA! • premier telescope for sensitive, high-angular resolution submillimeter observations • a replacement for ALMA is not on the horizon, so we must continuously enhance its capabilities ALMA TMT 30 meter ESO 39 meter JWST Nancy Grace Roman Telescope ngVLA GMT 25 meter Vera Rubin Telescope Need for a cohesive ALMA roadmap with a vision that keeps ALMA relevant Digitization and Computing is also a big part of these projects!
  24. ALMA 2030 Roadmap Process THE ALMA DEVELOPMENT ROADMAP J. Carpenter,

    D. Iono, L. Testi, N. Whyborn, A. Wootten, N. Evans (The ALMA Development Working Group) Approved by the Board by written procedure pursuant Art. 11 of the Board’s Rules of Procedure 2018
  25. New Fundamental Science Drivers • Origins of Galaxies: Trace the

    cosmic evolution of key elements from the first galaxies (z>10) through the peak of star formation (z=2–4) by detecting their cooling lines, both atomic ([CII], [OIII]) and molecular (CO), and dust continuum, at a rate of 1-2 galaxies per hour. • Origins of Chemical Complexity: Trace the evolution from simple to complex organic molecules through the process of star and planet formation down to solar system scales (~10-100 au) by performing full-band frequency scans at a rate of 2-4 protostars per day. • Origins of Planets: Image protoplanetary disks in nearby (150 pc) star formation regions to resolve the Earth forming zone (~ 1 au) in the dust continuum at wavelengths shorter than 1mm, enabling detection of the tidal gaps and inner holes created by planets undergoing formation.
  26. New Fundamental Science Drivers • Origins of Galaxies: Trace the

    cosmic evolution of key elements from the first galaxies (z>10) through the peak of star formation (z=2–4) by detecting their cooling lines, both atomic ([CII], [OIII]) and molecular (CO), and dust continuum, at a rate of 1-2 galaxies per hour. • Origins of Chemical Complexity: Trace the evolution from simple to complex organic molecules through the process of star and planet formation down to solar system scales (~10-100 au) by performing full-band frequency scans at a rate of 2-4 protostars per day. • Origins of Planets: Image protoplanetary disks in nearby (150 pc) star formation regions to resolve the Earth forming zone (~ 1 au) in the dust continuum at wavelengths shorter than 1mm, enabling detection of the tidal gaps and inner holes created by planets undergoing formation. The original science goals of ALMA were considered achieved in 2019!
  27. Wideband Sensitivity Upgrade (WSU): Top Priority of the ALMA 2030

    Roadmap • Upgrade of the bandwidth and throughput of the ALMA system • upgraded receivers with increased bandwidth and improved receiver temperatures • more powerful correlator • increased data reduction capacity Correlator Archives Data processing Astronomers Antennas Receivers Back end
  28. Wideband Sensitivity Upgrade (WSU): Top Priority of the ALMA 2030

    Roadmap • Upgrade of the bandwidth and throughput of the ALMA system • upgraded receivers with increased bandwidth and improved receiver temperatures • more powerful correlator • increased data reduction capacity Correlator Archives Data processing Astronomers Antennas Receivers Back end Upgrade!
  29. Antenna New or upgraded components are in blue Front End

    Receivers The Wideband Sensitivity Upgrade
  30. IF Switches & Anti-aliasing filters Digitizers & Digital Signal Processing

    Data Transmission System Antenna New or upgraded components are in blue Back End Front End Receivers The Wideband Sensitivity Upgrade
  31. Array Operations Site (AOS) at 5000m Existing Antenna to AOS

    Fibers IF Switches & Anti-aliasing filters Digitizers & Digital Signal Processing Data Transmission System Antenna New or upgraded components are in blue Back End Front End Receivers The Wideband Sensitivity Upgrade
  32. Array Operations Site (AOS) at 5000m Operations Support Facility (OSF)

    at 3000m Existing Antenna to AOS Fibers IF Switches & Anti-aliasing filters Digitizers & Digital Signal Processing Data Transmission System Antenna New or upgraded components are in blue Back End Front End Receivers New fiber The Wideband Sensitivity Upgrade
  33. Array Operations Site (AOS) at 5000m Operations Support Facility (OSF)

    at 3000m Existing Antenna to AOS Fibers IF Switches & Anti-aliasing filters Digitizers & Digital Signal Processing Data Transmission System Antenna New or upgraded components are in blue CONTROL, TelCal, Scheduling, OT, Archive, Pipeline Back End Front End Receivers New fiber 2nd Generation Correlator & Upgraded ACAS in new OSF Correlator Room The Wideband Sensitivity Upgrade
  34. Working groups created using ALMA-wide expertise to focus on the

    next step definition. The WSU program planning and implementation phases
  35. Working groups created using ALMA-wide expertise to focus on the

    next step definition. The WSU program planning and implementation phases 1 THE ALMA DEVELOPMENT ROADMAP J. Carpenter, D. Iono, L. Testi, N. Whyborn, A. Wootten, N. Evans (The ALMA Development Working Group) Approved by the Board by written procedure pursuant Art. 11 of the Board’s Rules of Procedure
  36. Working groups created using ALMA-wide expertise to focus on the

    next step definition. We’re here! The WSU program planning and implementation phases 1 THE ALMA DEVELOPMENT ROADMAP J. Carpenter, D. Iono, L. Testi, N. Whyborn, A. Wootten, N. Evans (The ALMA Development Working Group) Approved by the Board by written procedure pursuant Art. 11 of the Board’s Rules of Procedure
  37. Working groups created using ALMA-wide expertise to focus on the

    next step definition. We’re here! It’s a collection of projects The WSU program planning and implementation phases 1 THE ALMA DEVELOPMENT ROADMAP J. Carpenter, D. Iono, L. Testi, N. Whyborn, A. Wootten, N. Evans (The ALMA Development Working Group) Approved by the Board by written procedure pursuant Art. 11 of the Board’s Rules of Procedure
  38. Working groups created using ALMA-wide expertise to focus on the

    next step definition. We’re here! It’s a collection of projects The WSU program planning and implementation phases → program 1 THE ALMA DEVELOPMENT ROADMAP J. Carpenter, D. Iono, L. Testi, N. Whyborn, A. Wootten, N. Evans (The ALMA Development Working Group) Approved by the Board by written procedure pursuant Art. 11 of the Board’s Rules of Procedure
  39. Lots of work so far • Input from the WGs

    started in 2019: • Signal Chain WG • Front End/Digitizer • Second Generation Correlator • Initial CoSDD release and internal review Q2 2022. • System Requirements Review in Q4 2022 • Input from additional ICT/ ISOpT WGs in 2023 • Data Processing, Distribution, and Access • Data Acquisition • Followed by even more WGs…
  40. • Second generation ICT/ISOpT WGs: • Array Calibration & Science

    Observing Strategies (ACSOS) • Data Model (DM) • Data Processing (DP) • Data Transfer and Archive Storage (DTAS) • User Interfaces to the Data (UID) • ISOpT WGs: • Spurious Signals • IST WGs: • Data Rates Ramp-Up Plan (DRRUP) • IET/ICT/IST/ISOpT: • Deployment Concept Lots of work so far (cont.)
  41. • Second generation ICT/ISOpT WGs: • Array Calibration & Science

    Observing Strategies (ACSOS) • Data Model (DM) • Data Processing (DP) • Data Transfer and Archive Storage (DTAS) • User Interfaces to the Data (UID) • ISOpT WGs: • Spurious Signals • IST WGs: • Data Rates Ramp-Up Plan (DRRUP) • IET/ICT/IST/ISOpT: • Deployment Concept Lots of work so far (cont.) Collected in an updated Conceptual System Design Description (CoSDD) Also input to the ALMA System Technical Requirements
  42. WSU Challenge: Don’t Disturb Science • Main message from ALMA

    Science Advisory Committee: minimize the WSU impact on science. • Current Deployment Concept: Parallel Deployment
  43. Top-Level Notional Timeline: From Today to WSU Operations 2024 2025

    2026 2027 2028 2029 2030 Start of Cycle 16 with WSU System WSU Call for Proposals WSU System Review and Program Plan Review Delta SRR/Initial Program Plan Rev. Go/ NoGo WSU ATAC ready for WSU integration tests OCRO construction Construction of new AOS- OSF fibers Instrumentation starts to arrive at OSF Dev Projects PDRs Dev Projects CDRs Steady retrofitting of WSU antennas #10 - #33, 2 per month Deploy WSU components at OSF in 4 antennas Deploy WSU comp. in 5 ant. @AOS Steady retrofitting of WSU antennas #34 - #66, 3 per month (TBC) Science operations interleaved with WSU commissioning Science observations in separate APE WSU Science Verification & data release (TBC) Online software TRR and E2E tests Data Processing Transition Development and Commissioning of SW/ Comp/SciOps deliverables for WSU AIVC Planning of WSU Software / Computing / Sci Ops deliverables Design/ Development/ Deployment/ Commissioning of SW/Comp/SciOps deliverables
  44. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Factor of 2-4 increase in the available IF bandwidth. ALMA 2030 Band 2 Band 6 Band 8 Band 1 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Available instantaneous bandwidth per polarization (GHz) 0 8 16 24 32 Current receivers (2SB unless noted) Under development / construction Goal 4x upgrade (goal) 2x upgrade Goal DSB DSB
  45. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Factor of 2-4 increase in the available IF bandwidth. ALMA 2030 Band 2 Band 6 Band 8 Band 1 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Available instantaneous bandwidth per polarization (GHz) 0 8 16 24 32 Current receivers (2SB unless noted) Under development / construction Goal 4x upgrade (goal) 2x upgrade Goal DSB DSB Data holdings proportional to bandwidth at the same resolution
  46. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Factor increase in correlated bandwidth 0 10 20 30 40 50 60 70 Low spectral resolution High spectral resolution High spectral resolution ~ 0.1 km/s
  47. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band 7 Band 8 Band 9 Band 10 Factor increase in correlated bandwidth 0 10 20 30 40 50 60 70 Low spectral resolution High spectral resolution High spectral resolution ~ 0.1 km/s Data holdings proportional to the physical resolution at the same bandwidth
  48. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Increase in Band 6 observing speed with ALMA 2030 Observing mode Increase in speed over current system* Continuum 4.8x (with goal of 9.6x) Spectral line 2.2-4.7x Increase in observing speed results from • improved receiver temperatures • improved digital efficiency • wider bandwidth (continuum) Spectral scans will see further speed increases due to larger correlated bandwidth. * To reach same sensitivity as current system with single tuning
  49. Wideband Sensitivity Upgrade: Overview • Available bandwidth • Correlated bandwidth

    • Observing speed Increase in Band 6 observing speed with ALMA 2030 Observing mode Increase in speed over current system* Continuum 4.8x (with goal of 9.6x) Spectral line 2.2-4.7x Increase in observing speed results from • improved receiver temperatures • improved digital efficiency • wider bandwidth (continuum) Spectral scans will see further speed increases due to larger correlated bandwidth. * To reach same sensitivity as current system with single tuning Data holdings proportional to the survey speed gains
  50. Big Data Size Storage Access techniques Processing techniques Flow Real

    time Event Processi ng O!ine Data mining Processing level Raw Data Processed Data Statistics Schemata Stuctured Tagging Unstructured Value Files Formats Durability Paralell Access Capabilities Information Extracted Tech Debt Big Data Dimensions
  51. Big Data Size Storage Access techniques Processing techniques Flow Real

    time Event Processi ng O!ine Data mining Processing level Raw Data Processed Data Statistics Schemata Stuctured Tagging Unstructured Value Files Formats Durability Paralell Access Capabilities Information Extracted Tech Debt Big Data Dimensions Changed/ pushed by WSU
  52. WSU — Data Volume Table 11. Overview of Data Volume

    Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both Visibility Data Volume (Total) Median (TB) 0.155 0.004 0.061 0.366 0.008 0.153 Time Weighted Average (TB) 3.170 0.178 1.876 7.427 0.378 4.379 Maximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 Total per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 Visibility Data Volume (Science) Median (TB) 0.101 0.002 0.038 0.254 0.005 0.092 Time Weighted Average (TB) 2.367 0.128 1.399 5.439 0.268 3.203 Maximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 Total per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 Product Size (Total) Median (TB) 0.052 0.001 0.016 0.127 0.003 0.038 Time Weighted Average (TB) 5.376 0.058 3.076 11.525 0.119 6.592 Maximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 Total per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707
  53. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: 11. Overview of Data Volume Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both dian (TB) 0.155 0.004 0.061 0.366 0.008 0.153 me Weighted rage (TB) 3.170 0.178 1.876 7.427 0.378 4.379 ximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 al per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 dian (TB) 0.101 0.002 0.038 0.254 0.005 0.092 me Weighted rage (TB) 2.367 0.128 1.399 5.439 0.268 3.203 ximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 al per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 dian (TB) 0.052 0.001 0.016 0.127 0.003 0.038 me Weighted rage (TB) 5.376 0.058 3.076 11.525 0.119 6.592 ximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 al per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707
  54. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: 11. Overview of Data Volume Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both dian (TB) 0.155 0.004 0.061 0.366 0.008 0.153 me Weighted rage (TB) 3.170 0.178 1.876 7.427 0.378 4.379 ximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 al per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 dian (TB) 0.101 0.002 0.038 0.254 0.005 0.092 me Weighted rage (TB) 2.367 0.128 1.399 5.439 0.268 3.203 ximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 al per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 dian (TB) 0.052 0.001 0.016 0.127 0.003 0.038 me Weighted rage (TB) 5.376 0.058 3.076 11.525 0.119 6.592 ximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 al per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707
  55. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: 11. Overview of Data Volume Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both dian (TB) 0.155 0.004 0.061 0.366 0.008 0.153 me Weighted rage (TB) 3.170 0.178 1.876 7.427 0.378 4.379 ximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 al per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 dian (TB) 0.101 0.002 0.038 0.254 0.005 0.092 me Weighted rage (TB) 2.367 0.128 1.399 5.439 0.268 3.203 ximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 al per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 dian (TB) 0.052 0.001 0.016 0.127 0.003 0.038 me Weighted rage (TB) 5.376 0.058 3.076 11.525 0.119 6.592 ximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 al per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707 ~0.5 PB ALMA Cycle
  56. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: 11. Overview of Data Volume Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both dian (TB) 0.155 0.004 0.061 0.366 0.008 0.153 me Weighted rage (TB) 3.170 0.178 1.876 7.427 0.378 4.379 ximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 al per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 dian (TB) 0.101 0.002 0.038 0.254 0.005 0.092 me Weighted rage (TB) 2.367 0.128 1.399 5.439 0.268 3.203 ximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 al per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 dian (TB) 0.052 0.001 0.016 0.127 0.003 0.038 me Weighted rage (TB) 5.376 0.058 3.076 11.525 0.119 6.592 ximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 al per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707 ~0.5 PB ALMA Cycle Early WSU 10x Later WSU 20x
  57. 2.6 petabytes La Silla Paranal Observatory 2.0 petabytes ALMA 2.5

    2.0 1.5 1.0 0.5 0.0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Year Volume of data (petabytes) ALMA La Silla Paranal Total volume of data stored in the ESO archives: 11. Overview of Data Volume Properties for WSU Early WSU Later WSU 12m 7m both 12m 7m both dian (TB) 0.155 0.004 0.061 0.366 0.008 0.153 me Weighted rage (TB) 3.170 0.178 1.876 7.427 0.378 4.379 ximum (TB) 88.656 3.283 88.656 177.312 6.565 177.312 al per cycle (PB) 2.067 0.036 2.103 4.815 0.077 4.892 dian (TB) 0.101 0.002 0.038 0.254 0.005 0.092 me Weighted rage (TB) 2.367 0.128 1.399 5.439 0.268 3.203 ximum (TB) 73.900 2.428 73.900 147.800 4.857 147.800 al per cycle (PB) 1.530 0.025 1.555 3.500 0.053 3.553 dian (TB) 0.052 0.001 0.016 0.127 0.003 0.038 me Weighted rage (TB) 5.376 0.058 3.076 11.525 0.119 6.592 ximum (TB) 563.690 0.829 563.690 1127.379 1.658 1127.379 al per cycle (PB) 5.891 0.031 5.922 12.643 0.064 12.707 ~0.5 PB ALMA Cycle Early WSU 10x Later WSU 20x Very high spread of use cases
  58. Dealing with Big Data • We cannot afford arbitrary queries

    ➡We can have arbitrary processing instead • We cannot allow full data dumps ➡We can generate data on the the fly (see above)
  59. Queries as functions QUERY = FUNCTION { } DATA ALL

    Queries need to be precomputed Arbitrary queries only possible on the precomputed, smaller datasets
  60. Lambda Architecture Batch Layer Serving Layer Speed Layer STORE MASTER

    DATASET COMPUTE ARBITRARY VIEWS RANDOM ACCESS TO VIEWS UPDATED BY BATCH LAYER FAST, INCREMENTAL ALGORITHMS QUERIES NOT ON BATCH LAYER COMPENSATES FOR LATENCY
  61. Batch Layer • Stores master copy of the dataset •

    Precomputes batch views on that master dataset INMUTABLE, CONSTANTLY GROWING
  62. Serving Layer • Allows for: • batch writes of view

    updates • random reads on the views • Does not allow random writes
  63. Speed Layer • Allows for: • incremental writes of view

    updates • short-term temporal queries on the views • Can be discarded!
  64. Figure 2.1 The master dataset in the Lambda Architecture serves

    as the source of Not so useful for non-event driven astronomy
  65. Computing over Big Data • Batch layer as a computational

    engine on data • Need to formally specify • Inputs • Processes • Outputs
  66. Computing over Big Data • Batch layer as a computational

    engine on data • Need to formally specify • Inputs • Processes • Outputs That looks like a workflow! Or SQL querying…
  67. Lots of opportunities for all of you! • Keep an

    eye on the ALMA Job Opportunities portal: • https://www.comeet.com/jobs/almaobservatory/ F5.001/
  68. Lots of opportunities for all of you! • Keep an

    eye on the ALMA Job Opportunities portal: • https://www.comeet.com/jobs/almaobservatory/ F5.001/
  69. T he Atacama Large Millimeter/submillimeter Array (ALMA), an international astronomy

    facility, is a partnership of Europe, North America and East Asia in cooperation with the Republic of Chile. ALMA is funded in Europe by the European Southern Observatory (ESO), in North America by the U.S. National Science Foundation (NSF) in cooperation with the National Research Council of Canada (NRC) and the National Science Council of Taiwan (NSC) and in East Asia by the National Institutes of Natural Sciences (NINS) of Japan in cooperation with the Academia Sinica (AS) in Taiwan. ALMA construction and operations are led on behalf of Europe by ESO, on behalf of North America by the National Radio Astronomy Observatory (NRAO), which is managed by Associated Universities, Inc. (AUI) and on behalf of East Asia by the National Astronomical Observatory of Japan (NAOJ). The Joint ALMA Observatory (JAO) provides the unified leadership and management of the construction, commissioning and operation of ALMA.