Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Wf4Ever: Scientific Workflows and Research Objects as tools for scientific insight and methodology curation

Wf4Ever: Scientific Workflows and Research Objects as tools for scientific insight and methodology curation

Astronomers are being drowned in data: facilities like ALMA currently provide datasets in the Gigabyte range, and increasing, while facilities like the LSST and the SKA will generate datasets large enough so that data download, even of the reduced datasets, will not be feasible. In this talk we will introduce the concept of Scientific Workflows, as software tools that allow for the easy exploration of both local and remote datasets and processing services, and of Research Objects, which encapsulate all relevant aspects of a scientific experiment, and allow for its quantitative and qualitative assessment, enable reuse with proper attribution, and linkage to publications, among others. The AstroTaverna plugin, with astronomy-specific tools for workflow creation, will also be presented in this ALMA weekly seminar.

Juande Santander-Vela

July 04, 2013
Tweet

More Decks by Juande Santander-Vela

Other Decks in Science

Transcript

  1. Wf4Ever: Scientific Workflows
    and Research Objects as tools
    for scientific insight and
    methodology curation
    Juande Santander-Vela [email protected]
    Instituto de Astrofísica de Andalucía-CSIC

    View Slide

  2. Talk Outline
    Introduction
    Current challenges for radio astronomy and science
    Potential e-Science solutions: Workflows and
    Research Objects
    Final points

    View Slide

  3. Introduction

    View Slide

  4. Who am I?
    Member of the AMIGA international collaboration,
    based at IAA-CSIC
    Ph.D. on bringing Radio Astronomical data archives and
    tools into the VO
    Applied Scientist at ESO VLT archive, Software
    Engineer/Astronomy Specialist at ALMA archive
    (May 2009-Dec 2011)
    Back to IAA-CSIC as VIA-SKA Project Manager,
    Radio Astroinformatician
    GROUP INTEREST IN TECH
    DEVELOPMENTS FOR BETTER SCIENCE

    View Slide

  5. Why I’m here?
    Collaboration with Stephane Leon and the ALMA
    Data Management Group
    Helping bring the ALMA Science Archive to the VO ‏
    Modelling radio data cubes
    Finding use cases for workflow technology (see later)

    View Slide

  6. AMIGA
    Analysis of the interstellar Medium of Isolated GAlaxies
    Multi-wavelength, multi-object study on isolated galaxies
    with strict isolation criteria
    Careful curation of data
    Very careful processing of new parameters from
    Group’s own observation programs and data reduction
    Literature table scanning
    Virtual Observatory table harvesting and parsing
    Emphasis on marrying astronomy and computer science,
    and buy-in of the VO
    E-SCIENCE USERS

    View Slide

  7. AMIGA
    Analysis of the interstellar Medium of Isolated GAlaxies
    Multi-wavelength, multi-object study on isolated galaxies
    with strict isolation criteria
    Careful curation of data
    Very careful processing of new parameters from
    Group’s own observation programs and data reduction
    Literature table scanning
    Virtual Observatory table harvesting and parsing
    Emphasis on marrying astronomy and computer science,
    and buy-in of the VO
    E-SCIENCE DEVELOPERS!

    View Slide

  8. AMIGA
    Project goal: providing a baseline for galaxy
    properties to compare with other environments
    Interaction-free sample, ideal for tracing HI infall:
    we can use CIG galaxies to detect the cosmic web
    Need for very sensitive telescopes able to resolve
    faint HI ➡ Square Kilometre Array & pathfinders
    PARTICIPATING IN SKA.TEL.SDP CONSORTIUM
    WE NEED TOOLS FOR OUR OWN SCIENCE ANALYSIS

    View Slide

  9. Current challenges for
    radio astronomy
    and science

    View Slide

  10. Data over-abundance
    Moore’s Law for Detectors ➡ Exponential increase
    of individual and accumulated data sets
    We have more data than ever… but we can’t use it:
    Because we can’t:
    Difficult to set up (for sharing)
    Difficult to find (for using)
    Difficult to document (both using and sharing)
    Difficult to deal with (because of size, formatting, purpose…)
    Because it is not in our best interest
    FULLY
    ?

    View Slide

  11. Courtesy J.E. Ruiz (AMIGA, Wf4Ever)

    View Slide

  12. Courtesy J.E. Ruiz (AMIGA, Wf4Ever)
    Tools!

    View Slide

  13. Data sharing
    Search Go
    Advanced search
    Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors
    SPECIALS See all specials
    Editorial Feature Opinion Elsewhere in Nature
    DATA SHARING
    Sharing data is good. But sharing your own data? That can get complicated. As two research
    communities who held meetings in May on the issue report their proposals to promote data sharing
    in biology, a special issue of Nature examines the cultural and technical hurdles that can get in the
    way of good intentions.
    Data Sharing
    Specials & supplements archive
    Archive
    DATA FLIRTING
    DATA HOARDING
    IRREPRODUCIBLE
    RESEARCH
    ?

    View Slide

  14. Irreproducible research
    Search Go
    Advanced search
    Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors
    SPECIAL See all specials
    Editorial News and analysis Comment Perspectives and reviews
    CHALLENGES IN IRREPRODUCIBLE RESEARCH
    No research paper can ever be considered to be the final word, and the replication and
    corroboration of research results is key to the scientific process. In studying complex entities,
    especially animals and human beings, the complexity of the system and of the techniques can all
    too easily lead to results that seem robust in the lab, and valid to editors and referees of journals,
    but which do not stand the test of further studies. Nature has published a series of articles about
    the worrying extent to which research results have been found wanting in this respect. The editors
    of Nature and the Nature life sciences research journals have also taken substantive steps to put
    our own houses in order, in improving the transparency and robustness of what we publish.
    Journals, research laboratories and institutions and funders all have an interest in tackling issues
    of irreproducibility. We hope that the articles contained in this collection will help.
    Free full access
    Challenges in irreproducible research
    Specials & supplements archive
    Archive
    nature.com Sitemap Cart Login Register
    Search Go
    Advanced search
    Home News & Comment Research Careers & Jobs Current Issue Audio & Video For Authors
    SPECIAL See all specials
    Editorial News and analysis Comment Perspectives and reviews
    CHALLENGES IN IRREPRODUCIBLE RESEARCH
    No research paper can ever be considered to be the final word, and the replication and
    corroboration of research results is key to the scientific process. In studying complex entities,
    especially animals and human beings, the complexity of the system and of the techniques can all
    too easily lead to results that seem robust in the lab, and valid to editors and referees of journals,
    but which do not stand the test of further studies. Nature has published a series of articles about
    the worrying extent to which research results have been found wanting in this respect. The editors
    of Nature and the Nature life sciences research journals have also taken substantive steps to put
    our own houses in order, in improving the transparency and robustness of what we publish.
    Journals, research laboratories and institutions and funders all have an interest in tackling issues
    of irreproducibility. We hope that the articles contained in this collection will help.
    Free full access
    Challenges in irreproducible research
    Specials & supplements archive
    Archive

    View Slide

  15. Irreproducible research
    Search Go
    Advanced search
    Home News & Comment Research Careers & Jobs Current Issue Archive Audio & Video For Authors
    SPECIAL See all specials
    Editorial News and analysis Comment Perspectives and reviews
    CHALLENGES IN IRREPRODUCIBLE RESEARCH
    No research paper can ever be considered to be the final word, and the replication and
    corroboration of research results is key to the scientific process. In studying complex entities,
    especially animals and human beings, the complexity of the system and of the techniques can all
    too easily lead to results that seem robust in the lab, and valid to editors and referees of journals,
    but which do not stand the test of further studies. Nature has published a series of articles about
    the worrying extent to which research results have been found wanting in this respect. The editors
    of Nature and the Nature life sciences research journals have also taken substantive steps to put
    our own houses in order, in improving the transparency and robustness of what we publish.
    Journals, research laboratories and institutions and funders all have an interest in tackling issues
    of irreproducibility. We hope that the articles contained in this collection will help.
    Free full access
    Challenges in irreproducible research
    Specials & supplements archive
    Archive
    CHALLENGES IN IRREPRODUCIBLE RESEARCH
    No research paper can ever be considered to be the final word, and the replication and
    corroboration of research results is key to the scientific process. In studying complex entities,
    especially animals and human beings, the complexity of the system and of the techniques can all
    too easily lead to results that seem robust in the lab, and valid to editors and referees of journals,
    but which do not stand the test of further studies. Nature has published a series of articles about
    the worrying extent to which research results have been found wanting in this respect. The editors
    of Nature and the Nature life sciences research journals have also taken substantive steps to put
    our own houses in order, in improving the transparency and robustness of what we publish.
    Journals, research laboratories and institutions and funders all have an interest in tackling issues
    of irreproducibility. We hope that the articles contained in this collection will help.
    Free full access

    View Slide

  16. Tool over-abundance
    ++

    View Slide

  17. Starship Asterisk*
    APOD and General Astronomy Discussion Forum
    Board index ‹ Learning & Resources ‹ The Engineering Deck: Astrophysics Source Code Library
    FAQ Register Login
    Search this forum… Search 671 topics • Page 1 of 7 • 1 2 3 4 5 ... 7
    The Engineering Deck: Astrophysics Source Code Library
    Search… Search
    Advanced search
    Post a new topic
    ANNOUNCEMENTS REPLIES VIEWS LAST POST
    Welcome & Rules (please read before posting)
    by RJN » Mon Jan 18, 2010 7:40 pm
    0 15666 by RJN
    Mon Jan 18, 2010 7:40 pm
    TOPICS REPLIES VIEWS LAST POST
    Guide to the Astrophysics Source Code Library
    by RJN » Sat Jul 24, 2010 8:01 pm
    13 17027 by owlice
    Mon Jul 01, 2013 3:32 am
    1 2
    Papers of Possible Interest to Astronomical Software Users
    by owlice » Tue Oct 12, 2010 7:02 am
    27 7056 by owlice
    Wed May 15, 2013 1:31 pm
    1 2
    The Astrophysics Source Code Library: New codes welcome
    by RJN » Sat Jul 24, 2010 8:01 pm
    26 5273 by Eran Ofek
    Thu Dec 13, 2012 9:20 pm
    *Web Resources and Tools for Astrophysicists/Astronomers*
    by owlice » Sat Jul 16, 2011 12:01 pm
    22 2750 by owlice
    Fri May 10, 2013 12:12 pm
    2011 and 2012 Additions to the ASCL
    by owlice » Thu Feb 24, 2011 11:26 pm
    23 1693 by owlice
    Sat Dec 08, 2012 8:09 pm
    21cmFAST: Simulation of the High-Redshift 21-cm Signal
    by owlice » Thu Feb 17, 2011 10:47 pm
    0 3443 by owlice
    Thu Feb 17, 2011 10:47 pm
    2LPTIC: 2nd-order Lagrangian Perturbation Theory Initial Con
    by owlice » Tue Jan 03, 2012 5:27 am
    0 855 by owlice
    Tue Jan 03, 2012 5:27 am
    2MASS Kit: 2MASS Catalog Server Kit
    by owlice » Sun Mar 17, 2013 5:16 pm
    0 214 by owlice
    Sun Mar 17, 2013 5:16 pm
    3DEX: Fast Fourier-Bessel Decomposition of Spherical 3D Surv
    by owlice » Sat Nov 26, 2011 4:00 pm
    0 741 by owlice
    Sat Nov 26, 2011 4:00 pm
    AAOGlimpse: Three-dimensional Data Viewer
    by owlice » Sat Oct 15, 2011 11:29 am
    0 1034 by owlice
    Sat Oct 15, 2011 11:29 am
    ACORNS-ADI: Calibration, Registration and Nulling in Imaging
    by kcd » Sat Mar 30, 2013 7:40 am
    0 177 by kcd
    Sat Mar 30, 2013 7:40 am
    ACS: ALMA Common Software
    by kcd » Sat Feb 09, 2013 3:44 am
    0 269 by kcd
    Sat Feb 09, 2013 3:44 am
    671 topics • Page 1 of 7 •

    View Slide

  18. Services too!

    View Slide

  19. How to deal with all this?
    ++
    All of this compounds the
    problems of
    reproducibility,
    methodology
    assessment, result
    dissemination…

    View Slide

  20. How to deal with all this?
    AND THE
    CODE?
    WHAT
    SOFTWARE
    DOES IT
    DEPEND ON?
    WHICH
    CODE DID
    WHAT?
    NOT
    A GOOD
    SOLUTION
    TRADITIONALLY…

    View Slide

  21. How to deal with all this?
    ++
    ORCHESTATION,
    ENCAPSULATION,
    DATA ACCESS,
    PROVENANCE,
    ANNOTATION…

    View Slide

  22. Why Workflows?
    SCIENTIFIC

    View Slide

  23. Workflows define
    computations
    Events & Processes
    Dependencies
    Resources
    Local & Remote Processes
    Sequences
    Concurrences
    Triggers
    FORMALLY,
    OR AT LEAST
    MACHINE READABLE

    WORKFLOW
    DEFINITION
    LANGUAGES

    View Slide

  24. Workflows enable
    distributed computing
    Distributed computing paradigm
    Move computation to the data
    Computing services
    Collaborative environments
    Linked data
    ʩ
    FOR SCIENTIFIC
    DISCUSSION &
    SCIENCE EXTRACTION
    ➡ Science-computing

    View Slide

  25. Workflows enable
    distributed computing
    Data can be anywhere
    Workflows can be constructed hierarchicaly
    Each workflow does useful work on its own
    The data flow can be easily followed

    View Slide

  26. Workflows enable
    interactive computing
    Each workflow run records it’s inputs, outputs,
    and intermediate results
    You can build and run workflows incrementally
    You can get (almost) immediate feedback on
    changes

    View Slide

  27. Tools for workflow
    storage and discovery
    About | Give us Feedback | Publications Juandesant
    New Workflow GO Workflows Search
    View
    Download (v7)
    Taverna 2
    Original
    Uploader
    Paul
    Fisher
    Sort by: Rank
    « Previous 1 2 3 4 5 … 221 Next »
    1111
    562
    243
    43
    34
    26
    24
    23
    18
    13
    223
    Search filter terms
    Filter by type
    Taverna 2
    Taverna 1
    RapidMiner
    Kepler
    Bioclipse Scri…
    LONI Pipeline
    GWorkflowDL
    KNIME
    BioExtract Ser…
    Galaxy
    Filter by tag
    example
    Home Users Groups Workflows Files Packs Topics
    Home > Workflows
    Workflows
    Showing 2207 results. Use the filters on the left and the search box below to refine the results.
    Search
    Pathways and Gene annotations for QTL region (7)
    Created: 19/11/09 @ 18:18:52 | Last updated: 07/09/12 @ 18:23:36
    Credits: Paul Fisher
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the
    mouse, Mus musculus. The workflow requires an input of: a chromosome name or number; a QTL
    start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate
    each of the genes found in this region. The Entrez and UniProt identifiers are then sent to KEGG to
    obtain KEGG gene identifiers. The KEGG gene identifiers are then used to searcg for pathways in
    the KEGG path...

    View Slide

  28. Tools for workflow
    storage and discovery
    About | Give us Feedback | Publications Juandesant
    New Workflow GO Astrotaverna Workflows Search
    View
    Download (v3)
    Taverna 2
    Original
    Uploader
    Julian
    Garrido
    Sort by: Relevance
    « Previous 1 2 3 4 5 Next »
    44
    43
    42
    40
    26
    23
    9
    9
    9
    5
    5
    Search filter terms
    Filter by type
    Taverna 2
    Filter by tag
    astronomy
    astrotaverna
    votable
    virtual observ…
    starter pack
    local processes
    taverna
    workflow
    galfit
    sextractor
    Home Users Groups Workflows Files Packs Topics
    Home > Workflows
    Workflows
    Showing 44 results. Use the filters on the left and the search box below to refine the results.
    Astrotaverna Search
    Remove search query
    Cocatenates several VOTables into one (3)
    Created: 30/08/12 @ 10:05:29 | Last updated: 22/04/13 @ 16:52:00
    Credits: Julian Garrido
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    Snippet showing how to use AstroTaverna tool for concatenating several VOTables. The input
    is four VOTables with the same number of columns. The result if using sample values provided
    will be a four times vertically duplicated VOTable.
    Rating: 0.0 / 5 (0 ratings) | Versions: 3 | Reviews: 0 | Comments: 0 | Citations: 0

    View Slide

  29. View
    Download (v3)
    Taverna 2
    Original
    Uploader
    Julian
    Garrido
    View
    Download (v1)
    Taverna 2
    Original
    Uploader
    Julian
    Garrido
    View
    Download (v1)
    Taverna 2
    Original
    Uploader
    Sort by: Relevance
    « Previous 1 2 3 4 5 Next »
    44
    43
    42
    40
    26
    23
    9
    9
    9
    5
    5
    27
    17
    40
    4
    16
    4
    Search filter terms
    Filter by type
    Taverna 2
    Filter by tag
    astronomy
    astrotaverna
    votable
    virtual observ…
    starter pack
    local processes
    taverna
    workflow
    galfit
    sextractor
    Filter by user
    Jose Enrique …
    Julian Garrido
    Filter by licence
    by-sa
    BSD
    Filter by group
    AMIGA
    Wf4Ever
    Showing 44 results. Use the filters on the left and the search box below to refine the results.
    Astrotaverna Search
    Remove search query
    Cocatenates several VOTables into one (3)
    Created: 30/08/12 @ 10:05:29 | Last updated: 22/04/13 @ 16:52:00
    Credits: Julian Garrido
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    Snippet showing how to use AstroTaverna tool for concatenating several VOTables. The input
    is four VOTables with the same number of columns. The result if using sample values provided
    will be a four times vertically duplicated VOTable.
    Rating: 0.0 / 5 (0 ratings) | Versions: 3 | Reviews: 0 | Comments: 0 | Citations: 0
    Viewed: 26 times | Downloaded: 12 times
    Tags (4):
    astronomy | astrotaverna | cat | votable
    Create configuration files from a template... (1)
    Created: 26/07/12 @ 10:56:46 | Last updated: 04/09/12 @ 07:30:55
    Credits: Julian Garrido
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    This workflow uses astrotaverna artifacts. It creates files by using a template whose keys are
    replaced by data from a votable. A configuration file is created for every row in the votable. The
    keys must appear also in the vocabulary file and match column names in the votable. A column
    in the votable must contain the name of the result configuration file.
    Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0
    Viewed: 14 times | Downloaded: 15 times
    Tags (4):
    astronomy | astrotaverna | local processes | votable
    Simulates the physical, dynamical, and che... (1)
    Created: 17/05/13 @ 08:03:13
    Credits: Julian Garrido

    View Slide

  30. View
    Download (v3)
    Taverna 2
    Original
    Uploader
    Julian
    Garrido
    View
    Download (v1)
    Taverna 2
    Original
    Uploader
    Julian
    Garrido
    View
    Download (v1)
    Taverna 2
    Original
    Uploader
    Sort by: Relevance
    « Previous 1 2 3 4 5 Next »
    44
    43
    42
    40
    26
    23
    9
    9
    9
    5
    5
    27
    17
    40
    4
    16
    4
    Search filter terms
    Filter by type
    Taverna 2
    Filter by tag
    astronomy
    astrotaverna
    votable
    virtual observ…
    starter pack
    local processes
    taverna
    workflow
    galfit
    sextractor
    Filter by user
    Jose Enrique …
    Julian Garrido
    Filter by licence
    by-sa
    BSD
    Filter by group
    AMIGA
    Wf4Ever
    Showing 44 results. Use the filters on the left and the search box below to refine the results.
    Astrotaverna Search
    Remove search query
    Cocatenates several VOTables into one (3)
    Created: 30/08/12 @ 10:05:29 | Last updated: 22/04/13 @ 16:52:00
    Credits: Julian Garrido
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    Snippet showing how to use AstroTaverna tool for concatenating several VOTables. The input
    is four VOTables with the same number of columns. The result if using sample values provided
    will be a four times vertically duplicated VOTable.
    Rating: 0.0 / 5 (0 ratings) | Versions: 3 | Reviews: 0 | Comments: 0 | Citations: 0
    Viewed: 26 times | Downloaded: 12 times
    Tags (4):
    astronomy | astrotaverna | cat | votable
    Create configuration files from a template... (1)
    Created: 26/07/12 @ 10:56:46 | Last updated: 04/09/12 @ 07:30:55
    Credits: Julian Garrido
    License: Creative Commons Attribution-Share Alike 3.0 Unported License
    This workflow uses astrotaverna artifacts. It creates files by using a template whose keys are
    replaced by data from a votable. A configuration file is created for every row in the votable. The
    keys must appear also in the vocabulary file and match column names in the votable. A column
    in the votable must contain the name of the result configuration file.
    Rating: 0.0 / 5 (0 ratings) | Versions: 1 | Reviews: 0 | Comments: 0 | Citations: 0
    Viewed: 14 times | Downloaded: 15 times
    Tags (4):
    astronomy | astrotaverna | local processes | votable
    Simulates the physical, dynamical, and che... (1)
    Created: 17/05/13 @ 08:03:13
    Credits: Julian Garrido

    View Slide

  31. About | Give us Feedback | Publications Juandesant
    New Workflow GO All Search
    Version 3 (latest) (of 3) View version: 3 (latest)
    Version created on: 22/04/13 @ 16:52:00 by: Julian Garrido
    Title: Cocatenates several VOTables into one
    Type: Taverna 2
    Preview
    (Click on the image to get the full size)
    Workflow Type
    Taverna 2
    Original Uploader
    Julian
    Garrido
    License
    All versions of this Workflow are licensed
    under:
    Credits (1)
    (People/Groups)
    Julian Garrido
    Attributions (0)
    (Workflows/Files)
    None
    Home Users Groups Workflows Files Packs Topics
    Home > Workflows > Cocatenates several VOTables into one
    Workflow Entry: Cocatenates several VOTables into one
    Created at: 30/08/12 @ 10:05:29 Last updated: 22/04/13 @ 16:52:00
    | License | Credits (1) | Attributions (0) | Tags (4) | Featured in Packs (1) | Ratings (0) | Attributed By (0) | Favourited By (0) |
    | Citations (0) | Version History | Reviews (0) | Comments (0) |

    View Slide

  32. Version 3 (latest) (of 3) View version: 3 (latest)
    Version created on: 22/04/13 @ 16:52:00 by: Julian Garrido
    Title: Cocatenates several VOTables into one
    Type: Taverna 2
    Preview
    (Click on the image to get the full size)
    Download Scalable Diagram (SVG)
    Description
    Snippet showing how to use AstroTaverna tool for concatenating several VOTables. The input is four
    VOTables with the same number of columns. The result if using sample values provided will be a four times
    vertically duplicated VOTable.
    Download
    Download Workflow File/Package (T2FLOW)
    Workflow Type
    Taverna 2
    Original Uploader
    Julian
    Garrido
    License
    All versions of this Workflow are licensed
    under:
    Credits (1)
    (People/Groups)
    Julian Garrido
    Attributions (0)
    (Workflows/Files)
    None
    Tags (4)
    Original Uploader tags
    astronomy | astrotaverna | cat |
    votable
    Add Tags
    Shared with Groups (1)
    AMIGA
    Featured In Packs (1)
    AstroTaverna Starter Pack
    Ratings (0)

    View Slide

  33. Download
    Download Workflow File/Package (T2FLOW)
    Download Workflow as a Galaxy tool
    Run
    Run this Workflow in the Taverna Workbench...
    Option 1:
    Copy and paste this link into File > 'Open workflow location...'
    http://www.myexperiment.org/workflows/3130/download?version=3
    [ More Info ]
    Workflow Components
    Authors (1)
    Titles (1)
    Descriptions (1)
    Dependencies (0)
    Inputs (4)
    Processors (1)
    Beanshells (0)
    Outputs (1)
    Datalinks (5)
    Coordinations (0)
    Featured In Packs (1)
    AstroTaverna Starter Pack
    Ratings (0)
    Hover and click to rate
    Current:
    0.0 / 5
    (0 ratings)
    You haven't rated yet
    Breakdown
    Attributed By (0)
    (Workflows/Files)
    None
    Favourited By (0)
    No one
    Add to your Favourites
    Statistics
    53 viewings
    75 downloads
    [ see breakdown ]
    More

    View Slide

  34. That’s not enough!
    FOR ASTRONOMERS
    FOR REPRODUCIBILITY
    AND REUSE

    View Slide

  35. 3
    7
    4
    1
    6
    5
    2
    1. Intelligent Software Components
    (iSOCO, Spain)
    2. University of Manchester (UNIMAN,
    UK)
    3. Universidad Politécnica de Madrid
    (UPM, Spain)
    4. Poznan Supercomputing and
    Networking Centre (PSNC, Poland)
    5. University of Oxford (OXF, UK)
    6. Instituto de Astrofísica de Andalucía
    (IAA, Spain)
    7. Leiden University Medical Centre
    (LUMC, NL)
    EU FUNDED FP7 STREP PROJECT
    DECEMBER 2010 – DECEMBER 2013

    View Slide

  36. • Astronomy (IAA-CSIC)
    • Genome-wide Analysis and Biobanking
    Case Studies
    Archival, classification, and indexing
    of scientific workflows and their
    associated materials in scalable
    semantic repositories, providing
    advanced access and recommendation
    capabilities
    Creation of scientific communities to
    collaboratively share, reuse, and evolve
    workflows and their parts, stimulating
    the development of new scientific
    knowledge
    Goals
    • Digital Libraries
    • Workflow Management
    • Semantic Web
    • Integrity & Authenticity
    • Provenance
    • Information Quality
    Core Competencies (Tech)
    • One SME
    • Six public organisations
    Partners
    Technological infrastructure for the preservation and efficient
    retrieval and reuse of scientific workflows in a range of
    disciplines
    TARGETING ALREADY ESTABLISHED
    COMMUNITIES: MYEXPERIMENT,
    VIRTUAL OBSERVATORY

    View Slide

  37. 3
    What is a Scientific Workflow?
    Workflows to Access and Massage VO Data
    »  A mechanism for coordinating the execution of
    services and codes, and linking together resources.
    »  The combination of data and processes into a
    configurable, modular, structured set of steps that
    implement semi-automated computational solutions
    in scientific problem-solving.
    »  The implementation of a scientific method.
    COURTESY J.E. RUIZ
    NOT A PIPELINE!

    View Slide

  38. AMIGA4GAS
    3D KINEMATICAL MODELING
    INPUT FILES
    ROTCUR
    12 RUNS
    POSSIBLE COMBINATIONS
    IN INPUT PARAMETERS
    12 ASCII FILES
    GALMOD
    12 CUBES
    4 APPROACHING
    4 RECEEDING
    4 BOTH COPY
    8 CUBES
    4 APPROACHING + RECEEDING
    4 BOTH
    MOMENTS
    8 VELOCITY MAPS
    1 DATACUBE
    1 VELOCITY MAP
    1 CONFIG FILE ROTCUR
    1 CONFIG FILE GALMOD
    SUB
    8 RESIDUAL CUBES
    8 RESIDUAL MAPS
    SUB
    MNMX 8 VALUES FOR PEAKS IN CUBES
    8 VALUES FOR PEAKS IN MAPS
    VARIABLE
    PARAMS
    INSET
    RADII, WIDTHS
    WEIGHT
    TOLERANCE
    DENS
    NV
    Z0
    VDISP

    View Slide

  39. How do we build
    workflows?

    View Slide

  40. AstroTaverna
    Taverna plugin for retrieving and manipulating
    VO Data + Catalogs on HTML Pages
    VO Services: ConeSearch, SIA, SSA, TAP coming soon
    Tabular Data (VOTables, converters from other formats)
    Crossmatching, Filtering, NameResolving, Coordinates and reference
    system transformation, Data massage.. (STILTS)
    Source catalog overplotting on Images and filtering, overplot circles,
    ellipses, etc. as a function of physical magnitude. Resampling, crops,
    blinks, mosaics, movies, blinks, RGBs, fusion, diff.. (through Aladin)
    VO Table rendering, SAMP for final inspection
    Image support, Spectra not yet PLUS ADDITIONAL
    ANALYSIS USING
    SCRIPTS

    View Slide

  41. Service discovery

    View Slide

  42. Data massaging

    View Slide

  43. Data massaging
    X-Matching
    Calculation
    Additions
    Filtering
    Access

    View Slide

  44. Data curation
    X-Matching
    Calculation
    Additions
    Filtering
    Access

    View Slide

  45. Data curation
    X-Matching
    Calculation
    Additions
    Filtering
    Access

    View Slide

  46. Data curation
    X-Matching
    Calculation
    Additions
    Filtering
    Access

    View Slide

  47. Aladin scripting

    View Slide

  48. Interactive data inspection

    View Slide

  49. Interactive data inspection

    View Slide

  50. Learning examples

    View Slide

  51. Not yet enough!
    FOR REPRODUCIBILITY
    AND REUSE

    View Slide

  52. Home RO at 5000 feet Examples Ontologies Tools Collaboration Publications History About
    Search
    Research Objects

    View Slide

  53. Research Objects
    Content
    Process (workflows), data, external resources and bibliography
    Execution environment set-up and local software dependencies
    Experimental protocol followed
    Roles, types and relationships among all digital components
    Provenance of intermediate and final results
    Decomposable attribution and authoring
    Fine-grained access control and permissions
    Example datasets for demonstration, reproducibility, monitoring, etc
    Templates
    Placeholders to ease the aggregation process
    Completeness checking/quality assessment

    View Slide

  54. Research Objects
    Target Audiencies
    Scientists [producers] who want to share their research
    outcomes so that they are more reusable and
    reproducible – ease of sharing and citation.
    Scientists [consumers] who want to understand, reuse,
    validate and further extend existing RO’s.
    Publishers can adopt the concept and principles of
    Research Object to enable the sharing of and access to
    the actual data and methods.
    Librarians who want to support research preservation.

    View Slide

  55. Semantic annotations
    Author of an annotation
    Author and co-authors of a workflow; reference link to a re-used workflow and its
    author
    Who has performed the execution of a workflow leading to the results provided in the RO
    Computing execution environment of the RO and local software dependencies
    Special access requirements to web services
    Datasets provider: person, webpage, survey, data release, etc.
    How much time does it take to run a workflow using the full data and the provided
    subsample
    The number of elements of the sample dataset where one workflow and/or RO iterates
    Previous and subsequent workflows to be executed, as in the experimental
    protocol
    Research institution, country, and scientific domain of the RO
    The actual size of the RO and/or a folder

    View Slide

  56. Semantic model
    DataLink
    MULTI
    DISCIPLINARY

    View Slide

  57. RO data organisation
    Recommended
    organisation provides
    automatic semantics
    for some items
    It makes it easier for
    both people and
    machines to
    understand the RO

    View Slide

  58. ROs in Astronomy
    ADSLabs Research Objects
    Authors
    Publications
    Journals
    Objects SIMBAD
    Tabular data behind the plots CDS
    ASCL reference of used software
    Observing time Proposals
    Used facilities, surveys or missions
    NOT JUST FROM WORKFLOWS
    POTENTIAL FOR
    RESEARCH OBJECT
    INDEXING IN ADS

    View Slide

  59. RO Incentive
    PAPERS WITH DATA LINKS ARE CITED MORE THAN THOSE WITHOUT
    Effect of E-printing on Citation Rates in Astronomy and Physics
    2006. Edwin A. Henneken et al.

    View Slide

  60. RO Incentive
    PAPERS WITH DATA LINKS ARE CITED MORE THAN THOSE WITHOUT
    Effect of E-printing on Citation Rates in Astronomy and Physics
    2006. Edwin A. Henneken et al.
    NOW YOU CAN
    CITE DATA AND
    PROCESSES, TOO

    View Slide

  61. Roadmap
    AstroTaverna, mostly ready: you can publish
    workflows and packs to myExperiment from
    Taverna
    myExperiment, building support for ROs
    ADS will populate myExperiment with literature-
    ROs
    Taverna will be able to publish ROs to
    myExperiment

    View Slide

  62. Final points
    We need something like workflows to describe
    computations in a distributed environment
    Workflows are not enough for supporting reuse
    and methodology preservation
    Research Objects are meaningful associations of
    data, operations, provenance, which can also be
    cited
    CAN EMBED
    COMPUTATIONS IN
    SCIENCE ARCHIVES

    View Slide

  63. Useful Links
    http://www.wf4ever-project.org
    http://www.myexperiment.org
    http://www.researchobject.org
    http://wf4ever.github.io/astrotaverna/
    http://amiga.iaa.es

    View Slide

  64. Thank you!

    View Slide