Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy: Genome Informatics 2007

James Taylor
October 01, 2007

Galaxy: Genome Informatics 2007

Galaxy talk presented at Genome Informatics in 2007. Note the workflow editor noodles are not yet colored correctly. I think I wrote the editor only a few months before while sitting in a lobby at ISMB in Vienna. One of the first Galaxy Team slides at the end! Dan and Greg both had posters at this meeting so they got cool slide-up animations on relevant slides and halos at the end. Fancy.

James Taylor

October 01, 2007
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. “Biology is an experimental science that is experiencing an explosion

    of new data. This requires biologists to increase the scale and sophistication in the information technology used for their research”
  2. Making sense of this explosion of data requires developing sophisticated

    computational methods ...and making these methods accessible
  3. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  4. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  5. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  6. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  7. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  8. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  9. What is Galaxy? • An open-source framework for integrating various

    computational tools and databases into a cohesive workspace • A web-based service we (Penn State) provide, integrating many popular tools and resources for comparative genomics • A completely self-contained application for building your own Galaxy style sites
  10. Why integrate tools with Galaxy? • Galaxy makes it substantially

    easier to give your tools user interfaces • The resulting user interfaces are of high quality, and continually improved • Your tools gain value by being integrated with data sources and other tools
  11. • Out of the box • Operations on genomic intervals

    • Extracting regions from and manipulating genome-wide multiple sequence alignments • General text manipulation, filtering, sorting, grouping, graphing • EMBOSS • In development • Statistical genetics suite based on RGenetics • Phlogenetic analysis based on HyPhy Galaxy tool suites
  12. • Out of the box • Operations on genomic intervals

    • Extracting regions from and manipulating genome-wide multiple sequence alignments • General text manipulation, filtering, sorting, grouping, graphing • EMBOSS • In development • Statistical genetics suite based on RGenetics • Phlogenetic analysis based on HyPhy Galaxy tool suites To learn more about Galaxy’s tools for working with alignments, see Dan Blankenberg, Poster #19
  13. Dataflow in Galaxy • Tools in Galaxy have a well

    defined abstract interface • Tools generate datasets of specific types (history items), which are used as the inputs of other tools • Can almost always determine the inputs and outputs of a tool • (given the values of certain parameters for more complex tools)
  14. Workflow support future • Intuitive interfaces for running workflows and

    configuring runtime parameterization • Control flow constructs to allow dealing with ambiguities that are not resolved until runtime • Support for repetitive invocation of tools and workflows, and aggregation of results • Saving and sharing of workflows (reproducibility!) • Dealing with changes to tool interfaces • Running workflows without the web interface
  15. Acknowledgements • UCSC Genome Browser team • Biomart team •

    GMOD team • National Science Foundation
  16. The Galaxy Team Guru Ananda Dan Blankenberg Nate Coraor Jianbin

    He Greg von Kuster Ross Lazarus (Harvard) Anton Nekrutenko James Taylor (NYU)