e-Science for the Square Kilometre Array

e-Science for the Square Kilometre Array

The Square Kilometre Array (SKA) will be the world's largest radio interferometer by far. Its thousands of antennas, with up to one square kilometer of collecting area, will be distributed across thousands of kilometres and numerous stations. The SKA will constitute a continental-scale sensor network, needing for its operation exascale computation systems running over exabytes of data. In order to embrace the Big Data challenge posed by this copious data flux, astronomers will need tools so that enable them to embed their algorithms within the SKA processing stream, and provide services than can be later reused by the community. Given the need to perform multiple, geographically distributed computations on data streams generated by a widely distributed sensor network, e-Science tools are a particularly good fit. In this talk we will show how scientific workflow technologies are very well suited for this task, as they provide a formal definition of processes, inputs and outputs. This allows for their inclusion in more general scientific processing systems, exploiting their parallelism, and recording the provenance for processes and results. The developments of the AMIGA group (Analysis of the interstellar Medium of Isolated GAlaxies) within the Wf4Ever FP7 project (http://amiga.iaa.es/p/212-scientific-workflows.htm) will be presented. AMIGA aims to provide scientific workflows for the 3D analysis and modelling of HI data, which can be shared and composed within the Virtual Observatory framework.

D6c83d5d20c63b8e421a7966b04cfedb?s=128

Juande Santander-Vela

August 29, 2012
Tweet

Transcript

  1. 1.

    e-Science for the Square Kilometre Array Juan de Dios Santander

    Vela (IAA-CSIC) on behalf of the AMIGA team IAU 2012 Data Intensive Astronomy Symposium (Sp15) Beijing, August 29th 2012
  2. 2.

    Talk Overview The Square Kilometre Array (SKA) The SKA Challenge

    AMIGA & SKA e-Science Tools for the SKA SKA Computing Synergies Conclusions
  3. 4.

    The Square Kilometre Array The embodiment of The Hydrogen Array

    concept Thousands of antennas, with up to 1 sq km collecting area Distributed across thousands of kilometres of terrain With enormous simultaneous bandwidth to increase survey speed Can be incrementally built A CONTINENTAL SCALE, DISTRIBUTED SENSOR NETWORK
  4. 5.

    SKA Antennas COMBINATION OF DIFFERENT ANTENNA KINDS Low-Frequency Aperture Arrays

    Sparse aperture arrays 70 – 450 MHz Multibeaming Mid-frequency dishes 13m Gregorian-offset dishes 450 MHz – 3 GHz Surface accuracy to 10-25 GHz SKA1: 2016 -2019
  5. 7.

    SKA Antennas Mid-Frequency Aperture Arrays Dense aperture arrays 200 –

    500 MHz 200 deg2 FoV Focal Plane Arrays Multibeam Radio-Camera 12m antennas 700 MHz – 1.8 GHz Surface accuracy to 10 GHz SKA2: 2018 -2023
  6. 9.

    SKA Site Selection SKA1 SKA2 SKA1_LOW ANZ SKA2_LOW ANZ SKA1_MID

    RSA SKA2_MID RSA SKA1_SURVEY ANZ SKA2_AA RSA SKA1&2 MID SKA2 AAS SKA1 SURVEY SKA2 MID SKA1 LOW
  7. 11.

    MASSIVE DATA FLOW, STORAGE & PROCESSING Workshop June 2012 AA

    Power Challenges SKA2 wide area data flow 16 Tb/s 4 Pb/s 24 Tb/s 20 Gb/s 20 Gb/s COURTESY A. FAULKNER
  8. 18.

    AMIGA Analysis of the interstellar Medium of Isolated GAlaxies Multi-wavelength,

    multi-object study on isolated galaxies with strict isolation criteria Careful curation of data Very careful processing of new parameters from Group’s own observation programs and data reduction Literature table scanning Virtual Observatory table harvesting and parsing Emphasis on marrying astronomy and computer science, and buy-in of the VO E-SCIENCE USERS
  9. 19.

    AMIGA Analysis of the interstellar Medium of Isolated GAlaxies Multi-wavelength,

    multi-object study on isolated galaxies with strict isolation criteria Careful curation of data Very careful processing of new parameters from Group’s own observation programs and data reduction Literature table scanning Virtual Observatory table harvesting and parsing Emphasis on marrying astronomy and computer science, and buy-in of the VO PI, L. VERDES-MONTENEGRO REVISE HER TALK FOR MAIN RESULTS (SPS3, SECULAR EVOLUTION) E-SCIENCE DEVELOPERS!
  10. 20.

    AMIGA Project goal: providing a baseline for galaxy properties to

    compare with other environments Interaction-free sample, ideal for tracing HI infall: we can use CIG galaxies to detect the cosmic web Need for very sensitive telescopes able to resolve faint HI ➡ Square Kilometre Array & pathfinders PARTICIPATING IN SKA.TEL.SDP PROTOCONSORTIUM WE NEED TOOLS FOR OUR OWN SCIENCE ANALYSIS ⤷
  11. 22.

    e-Science Tools & SKA Distributed computing Move computation to the

    data Computing services Collaborative environments Linked data ʩ FOR SCIENTIFIC DISCUSSION & SCIENCE EXTRACTION ➡ Science-computing
  12. 23.

    Defining Computations Events & Processes Dependencies Resources Local & Remote

    Processes Sequences Concurrences Triggers FORMALLY, OR AT LEAST MACHINE READABLE ➡ WORKFLOW DEFINITION LANGUAGES
  13. 24.

    AMIGA Contributions Wf4Ever Workflows for process & scientific methodology specification

    Web and command line tools for data preservation, metho- dology preservation, reuse, repurposing, & collaboration Provide extra tools for astronomical data processing & services AMIGA4GAS Use workflows as process abstraction engines Use federation and supercomputing models for Taverna Adapt Taverna (& workflows) to those computing models
  14. 25.

    3 7 4 1 6 5 2 1. Intelligent Software

    Components (iSOCO, Spain) 2. University of Manchester (UNIMAN, UK) 3. Universidad Politécnica de Madrid (UPM, Spain) 4. Poznan Supercomputing and Networking Centre (PSNC, Poland) 5. University of Oxford (OXF, UK) 6. Instituto de Astrofísica de Andalucía (IAA, Spain) 7. Leiden University Medical Centre (LUMC, NL) EU FUNDED FP7 STREP PROJECT DECEMBER 2010 – DECEMBER 2013
  15. 26.

    • Astronomy (IAA-CSIC) • Genome-wide Analysis and Biobanking Case Studies

    Archival, classification, and indexing of scientific workflows and their associated materials in scalable semantic repositories, providing advanced access and recommendation capabilities Creation of scientific communities to collaboratively share, reuse, and evolve workflows and their parts, stimulating the development of new scientific knowledge Goals • Digital Libraries • Workflow Management • Semantic Web • Integrity & Authenticity • Provenance • Information Quality Core Competencies (Tech) • One SME • Six public organisations Partners Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines TARGETING ALREADY ESTABLISHED COMMUNITIES: MYEXPERIMENT, VIRTUAL OBSERVATORY
  16. 27.

    AstroTaverna Astronomy plugins for Taverna Workbench AstroTaverna To install the

    AstroTaverna plugin to Taverna: Download and install Taverna 2.4 Start Taverna Add a plugin site: http://wf4ever.github.com/astrotaverna/ Restart Taverna The VO services perspective should now appear together with variouys local tools under Available Services For more information, see the Astrotaverna wiki page. AstroTaverna maintained by wf4ever Published with GitHub Pages Fork Me on GitHub
  17. 28.

    AstroTaverna Astronomy plugins for Taverna Workbench AstroTaverna To install the

    AstroTaverna plugin to Taverna: Download and install Taverna 2.4 Start Taverna Add a plugin site: http://wf4ever.github.com/astrotaverna/ Restart Taverna The VO services perspective should now appear together with variouys local tools under Available Services For more information, see the Astrotaverna wiki page. AstroTaverna maintained by wf4ever Published with GitHub Pages Fork Me on GitHub
  18. 29.

    AMIGA4GAS 3D Kinematical modeling Input Files ROTCUR 12 Runs Possible

    combinations in Input Parameters 12 ASCII Files GALMOD 12 Cubes • 4 Approaching • 4 Receeding • 4 Both COPY 8 Cubes • 4 Approaching + Receeding • 4 Both MOMENTS 8 Velocity Maps 1 DataCube 1 Velocity Map 1 Config File Rotcur 1 Config File Galmod 00SUB 8 Residual Cubes 8 Residual Maps 00SUB 00MNMX 8 Values for Peaks in Cubes 8 Values for Peaks in Maps VARIABLE PARAMS INSET RADII, WIDTHS WEIGHT TOLERANCE DENS NV Z0 VDISP
  19. 30.

    AMIGA4GAS Technical part, devoted to computing & data federation Heterogeneous

    computing federation Local computing cluster, grid, cloud computing Main Goals Porting the Taverna workflow engine to supercomputing environments Development of an integration layer for automatic workflow deployment AMIGA for the GTC, ALMA, and SKA Pathfinders IN PARTNERSHIP WITH BSC, FCSCL DIRECT RELEVANCE TO SKA SCIENCE DATA PROCESSOR
  20. 31.

    GRID SUPER COMPUTER Infrastructure Federation FED4AMIGA: FEDERATION OF INFRASTRUCTURES •HOW

    TO INTEGRATE THE INFRASTRUCTURES IN A FEDERATED SYSTEM? •HOW TO AUTHENTICATE THE USERS? •HOW TO IMPLEMENT BUSINESS RULES TO DECIDE IN WHICH INFRASTRUCTURE THE TASK SHOULD RUN? CLOUD COMPSs FED4AMIGA RESOURCE MANAGER
  21. 32.

    Wf4Ever + CyberSKA Workflows can be used to formally specify

    processing tasks Specially good when computing tasks exists as services Even better if data can be referenced, instead of sent over the wire
  22. 33.

    Wf4Ever + CyberSKA Complementary to CyberSKA (infrastructure) Wf4Ever places more

    emphasis on End-user tools process creation data manipulation annotation interdisciplinary algorithm repurposing Long-term Preservation, Quality Assurance VERY MUCH SCIENCE-ORIENTED
  23. 35.

    SKA Computing Synergies The SKA computing power amounts to being

    able to sift through the entire Internet more than 100 times per day Citizens can be empowered, through SKA-like tools, to process city wide, regional, or national data for insight Intelligent sensor networks can provide tools for better, instantaneous, resource planning IN LINE WITH H2020 PRIORITIES
  24. 37.

    Conclusions The SKA is the proverbial e-Science instrument Workflows can

    be used for both machine- readable, formal process description, and human- readable scientific tool development Federated, transparent workflow computing Wf4Ever & AMIGA4GAS are complementary to CyberSKA, SKA.TEL.SDP work
  25. 38.

    Conclusions It is a long road towards the SKA, but

    we have to get involved in this problems now
  26. 39.

    Thank you! Julián Garrido José Enrique Ruiz del Mazo Susana

    Sánchez Expósito Juan de Dios Santander-Vela <jdsant@iaa.es> Lourdes Verdes-Montenegro