Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy: Genome Informatis 2008

James Taylor
October 01, 2008

Galaxy: Genome Informatis 2008

Talk on Galaxy at Genome Informatics 2008, I was a session chair so no oversight on this one. We were obsessed with authorization that year, and this talk is probably the most detailed ever on roles, groups, and dataset security in Galaxy. Another classic team slide.

James Taylor

October 01, 2008
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. Galaxy goals • Making large-scale computational analysis more accessible •

    Facilitating transparent analysis • Ensuring that analyses are reproducible
  2. What Galaxy provides • An open-source framework for integrating various

    computational tools and databases into a cohesive workspace • A web-based service we provide, integrating many popular tools and resources for comparative genomics • A completely self-contained application for building your own Galaxy style sites
  3. What is a Galaxy Tool? • The basic unit of

    analysis in Galaxy • A program, script, external web resource, whatever... • Adapted to a standard structured interface • Parameters, data inputs, data outputs
  4. Short read sequence analysis • Analyzing read quality and filtering

    • Genomic analysis • Mapping against assembled genomes • Coverage, polymorphism, ... • Metagenomic analysis • Mapping against sequence databases • Taxonomy analysis, visualization, ...
  5. Galaxy workflows • Abstract description of an analysis procedure •

    Essentially: what tools to run, and the flow of data between tools
  6. Galaxy Data Libraries • Mechanism for storing and organizing shared

    datasets in a Galaxy instance • An instance can have many libraries, each containing datasets organized using folders as well as tags • Full type specific metadata like any other dataset in Galaxy
  7. Driving use cases • Large shared datasets • Genotype data

    • Sequencing reads • Direct from the instrument! • Data management for distributed projects
  8. Galaxy dataset security • Fine grained access controls for Galaxy

    datasets • Different actions on datasets require different permissions • Users and groups are granted these permissions • Enforced throughout Galaxy • e.g. a History can still be shared, but access to individual datasets in the history is controlled
  9. Security customization • Authentication mechanism can be replaced, or can

    leverage a single sign-on mechanism (e.g. through a proxying web server) • Authorization provider can be customized or replaced
  10. Completely integrated with analysis • Dataset restrictions propagate through an

    analysis • Analyses that combine datasets also combine their restrictions
  11. Up next... • Libraries: • sequencer integration • versioning •

    tagging and annotation • automatic workflow triggering • Security • configurable adapters to different authorization providers (e.g. directory services)
  12. Acknowledgements • Data and browser connections • UCSC • Biomart

    • GMOD • Intermine • Funding • National Science Foundation • Huck Institutes, Pennsylvania Dept. of Health
  13. The Galaxy Team Guru Ananda | Penn State Dan Blankenberg

    | Penn State Wen-Yu Chung | Penn State Nate Coraor | Penn State Greg Von Kuster | Penn State Sergei Kosakovsky | UCSD Ross Lazarus | Harvard MS Anton Nekrutenko | Penn State