Nextflow workshop '17: Lessons learned and new challenges
In this presentation I gave a quick overview of the state of Nextflow project and the some new features we are planning to implement in the upcoming releases.
DATAFLOW • Declarative computational model for concurrent processes • Processes wait for data, when an input set is ready the process is executed • They communicate by using dataflow variables i.e. asynchronous stream of data called channels • Parallelisation and task dependencies are implicitly defined by process in/out declarations
A DATA ANALYSIS PRODUCES DIFFERENT RESULTS ON DIFFERENT SYSTEMS GIVEN THE SAME INPUTS P. Di Tommaso, et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820
CONTAINERISATION • Nextflow envisioned the use of software containers to fix computational reproducibility • Mar 2014 (ver 0.7), support for Docker • Dec 2016 (ver 0.23), support for Singularity Nextflow job job job
GOLDEN RULES FOR REPRODUCIBILITY • Use Nextflow (obviously..) • Publish your pipeline project from day one on GitHub • Isolate the pipeline tools using a Docker container • Create a small dataset to quickly test your scripts and include it as default data in your project • Use a CI server (eg. Travis) to test any change timely
10x unique IPs downloading NF over the last year (!) Unique IPs 0 750 1,500 2,250 3,000 Ago '16 Sep O ct N ov D ec Jan '17 Feb M ar Apr M ay Jun Jul Ago 17
WORKFLOW COMPOSITION • Allows the creation of a workflow by composing other NF workflows • Top requested feature • Challenging to implement workflow A task workflow B input results
AWS BATCH • Managed container-based computing environment in the Amazon cloud • Already integrated with Nextflow, under test • In collaboration with Francesco Strozzi
GA4GH • Partecipate in Containers and Workflows working group • Task Execution API (working prototype) • Workflow Execution API • Enable interoperability with GA4GH complaint platforms eg. Cancer Genomics Cloud and Broad FireCloud