Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Galaxy: Genome Informatics 2007

3ee44f53c39bcd4bc663a2ea0e21d526?s=47 James Taylor
October 01, 2007

Galaxy: Genome Informatics 2007

Galaxy talk presented at Genome Informatics in 2007. Note the workflow editor noodles are not yet colored correctly. I think I wrote the editor only a few months before while sitting in a lobby at ISMB in Vienna. One of the first Galaxy Team slides at the end! Dan and Greg both had posters at this meeting so they got cool slide-up animations on relevant slides and halos at the end. Fancy.

3ee44f53c39bcd4bc663a2ea0e21d526?s=128

James Taylor

October 01, 2007
Tweet

More Decks by James Taylor

Other Decks in Science

Transcript

  1. Galaxy http://g2.bx.psu.edu James Taylor, Courant Institute, NYU

  2. “Biology is an experimental science that is experiencing an explosion

    of new data. This requires biologists to increase the scale and sophistication in the information technology used for their research”
  3. Making sense of this explosion of data requires developing sophisticated

    computational methods
  4. Making sense of this explosion of data requires developing sophisticated

    computational methods ...and making these methods accessible
  5. Success in some areas...

  6. Data warehouses (storage and querying)

  7. Data warehouses (storage and querying) Data visualization (browsers)

  8. Data warehouses (storage and querying) Data visualization (browsers) Analysis? Accessible?

    Efficient? Reproducible?
  9. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  10. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  11. Wasting Time • For developers, building user interfaces is both

    time consuming and highly repetitive, yet doing it well is hard • Without accessible interfaces, experimentalists end up using ill suited / inefficient tools, hiring inexperienced students, ... • Even with accessible interfaces, users waste time moving data between data sources and tools, converting between data formats, ...
  12. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  13. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  14. Wasting Potential • New technologies allow individuals labs to generate

    massive amounts of experimental data • However, effectively analyzing this data still requires specific technical / computational skills • The easier it is for experimentalists to work with sophisticated computational tools, the greater the potential for biological discovery
  15. None
  16. What is Galaxy? • An open-source framework for integrating various

    computational tools and databases into a cohesive workspace • A web-based service we (Penn State) provide, integrating many popular tools and resources for comparative genomics • A completely self-contained application for building your own Galaxy style sites
  17. Galaxy’s web user interface

  18. None
  19. None
  20. None
  21. None
  22. Integrating tools into Galaxy

  23. Why integrate tools with Galaxy? • Galaxy makes it substantially

    easier to give your tools user interfaces • The resulting user interfaces are of high quality, and continually improved • Your tools gain value by being integrated with data sources and other tools
  24. Integrating web based tools

  25. None
  26. None
  27. To learn more about integrating external web sites with Galaxy,

    see Greg von Kuster, Poster #109
  28. Integrating command line tools

  29. None
  30. None
  31. HTML inputs generated from abstract parameter description

  32. HTML inputs generated from abstract parameter description

  33. Automatic input validation based on type, or more...

  34. Tool help generated from a simple text format

  35. None
  36. } Template for generating command line from parameter values

  37. Functional tests to be run with the “full stack” in

    place
  38. Dealing with more complex interface needs

  39. None
  40. None
  41. Repeating sets of parameters

  42. Template language for building complex command lines

  43. None
  44. None
  45. None
  46. Conditional groups, grouping constructs can be nested

  47. Command line tool expects a configuration file

  48. Configuration file is generated based on user input

  49. • Out of the box • Operations on genomic intervals

    • Extracting regions from and manipulating genome-wide multiple sequence alignments • General text manipulation, filtering, sorting, grouping, graphing • EMBOSS • In development • Statistical genetics suite based on RGenetics • Phlogenetic analysis based on HyPhy Galaxy tool suites
  50. • Out of the box • Operations on genomic intervals

    • Extracting regions from and manipulating genome-wide multiple sequence alignments • General text manipulation, filtering, sorting, grouping, graphing • EMBOSS • In development • Statistical genetics suite based on RGenetics • Phlogenetic analysis based on HyPhy Galaxy tool suites To learn more about Galaxy’s tools for working with alignments, see Dan Blankenberg, Poster #19
  51. None
  52. Dataflow in Galaxy • Tools in Galaxy have a well

    defined abstract interface • Tools generate datasets of specific types (history items), which are used as the inputs of other tools • Can almost always determine the inputs and outputs of a tool • (given the values of certain parameters for more complex tools)
  53. Reusable analysis workflows

  54. Workflow editor

  55. None
  56. None
  57. None
  58. None
  59. None
  60. None
  61. Workflow construction by example

  62. None
  63. None
  64. None
  65. None
  66. None
  67. Workflow support future • Intuitive interfaces for running workflows and

    configuring runtime parameterization • Control flow constructs to allow dealing with ambiguities that are not resolved until runtime • Support for repetitive invocation of tools and workflows, and aggregation of results • Saving and sharing of workflows (reproducibility!) • Dealing with changes to tool interfaces • Running workflows without the web interface
  68. None
  69. Acknowledgements • UCSC Genome Browser team • Biomart team •

    GMOD team • National Science Foundation
  70. The Galaxy Team Guru Ananda Dan Blankenberg Nate Coraor Jianbin

    He Greg von Kuster Ross Lazarus (Harvard) Anton Nekrutenko James Taylor (NYU)