complex methods, make these methods available to everyone Transparency: Facilitate communication of analyses and results in ways that are easy to understand while providing all details Reproducibility*: Ensure that analysis performed in the system can be reproduced precisely and practically *The state of which is still frighteningly bad, see doi:10.1038/nrg3305, doi:10.7717/peerj.148
tools, compute resources, terabytes of reference data and permanent storage Open source software that makes integrating your own tools and data and customizing for your own site simple An open extensible platform for sharing tools, datatypes, workflows, ...
applications in parallel (one per input). Merged output for subsequent processing. Dataset collections: map/reduce workflows over 1000s of datasets Interactive tours for building realtime interactive training Interactive Environments: custom analysis in Galaxy workflows using Jupyter, …
Galaxy can be easily customized to the needs of diﬀerent types of analyses by assembly diﬀerent tools, workflows, visualizations, datasets… For this to work, we need to make it as easy as possible for developers to integrate and share tools
was very ad hoc. No tracking of wrapper version information in the Galaxy database, no standard way to share. ToolShed enables not just sharing, but global identifiers and versions across all Galaxy instances.
of software packages and their dependencies and switching easily between them” ~2000 recipes for software packages* All packages are built in a minimal environment to ensure isolation and portability *not even including diﬀerent versions!
the kernel level up Containers — lightweight environments with isolation enforced at the OS level, complete control over all software Adds a complete ecosystem for sharing, versioning, managing containers — e.g. Docker hub
in Conda/ Bioconda, we can build a container with just that software on a minimal base image If we use the same base image, we can reconstruct exactly the same container (since we archive all binary builds of all versions) With automation, these containers can be built automatically for every package with no manual modification or intervention (e.g. mulled)
genome projects Many relevant tools are wrapped and can now be easily used in a fully reproducible way Custom data can easily be added to the system using “Data Managers” Custom genomes can be added on the fly and then used in any of Galaxy’s genome analysis tools
Martin Čech, John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, Jeremy Goecks, Björn Grüning, Sam Guerler, Mo Heydarian, Jennifer Hillman-Jackson, Anton Nekrutenko, Eric Rasche, Nicola Soranzo, Marius van den Beek JHU Data Science: Jeﬀ Leek, Roger Peng, … Jetstream: Craig Stewart, Ian Foster, Matthew Vaughn, Nirav Merchant BioConda: Johannes Köster, Björn Grüning, Ryan Dale, Chris Tomkins-Tinch, Brad Chapman, … Other lab members: Boris Brenerman, Min Hyung Cho, Peter DeFord, German Uritskiy, Mallory Freeberg NHGRI (HG005133, HG004909, HG005542, HG005573, HG006620) NIDDK (DK065806) and NSF (DBI 0543285, DBI 0850103)