Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nextflow workshop '17: Lessons learned and new challenges

Paolo Di Tommaso
September 14, 2017

Nextflow workshop '17: Lessons learned and new challenges

In this presentation I gave a quick overview of the state of Nextflow project and the some new features we are planning to implement in the upcoming releases. 

Paolo Di Tommaso

September 14, 2017


  1. DATAFLOW • Declarative computational model for concurrent processes • Processes

    wait for data, when an input set is ready the process is executed • They communicate by using dataflow variables i.e. asynchronous stream of data called channels • Parallelisation and task dependencies are implicitly defined by process in/out declarations

    THE SAME INPUTS P. Di Tommaso, et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820
  3. CONTAINERISATION • Nextflow envisioned the use of software containers to

    fix computational reproducibility • Mar 2014 (ver 0.7), support for Docker • Dec 2016 (ver 0.23), support for Singularity Nextflow job job job
  4. GOLDEN RULES FOR REPRODUCIBILITY • Use Nextflow (obviously..) • Publish

    your pipeline project from day one on GitHub • Isolate the pipeline tools using a Docker container • Create a small dataset to quickly test your scripts and include it as default data in your project • Use a CI server (eg. Travis) to test any change timely
  5. STATE OF THE PROJECT • Started on March 2013 •

    ~ 65k lines of code • ~ 370 stars and 70 forks on GH • ~ 4'000 downloads / month from Maven
  6. 10x unique IPs downloading 
 NF over the last year

    (!) Unique IPs 0 750 1,500 2,250 3,000 Ago '16 Sep O ct N ov D ec Jan '17 Feb M ar Apr M ay Jun Jul Ago 17
  7. WORKFLOW COMPOSITION • Allows the creation of a workflow by

    composing other NF workflows • Top requested feature • Challenging to implement workflow A task workflow B input results
  8. IMPROVE CLOUD SUPPORT • Target all major cloud providers (AWS,

    Azure, Google, OpenStack) • NoOps approach ie. deploy transient cluster on demands • Optimise remote storage usage and caching
  9. AWS BATCH • Managed container-based computing environment in the Amazon

    cloud • Already integrated with Nextflow, under test • In collaboration with Francesco Strozzi
  10. GA4GH • Partecipate in Containers and Workflows working group •

    Task Execution API (working prototype) • Workflow Execution API • Enable interoperability with GA4GH complaint platforms eg. Cancer Genomics Cloud and Broad FireCloud
  11. KUBERNETES • Cloud-agnostic container clustering and management • NF includes

    an experimental Kubernetes executor worker worker worker master NF driver shared storage k8s cluster • Add support for Kubernetes Persistent Volumes worker worker worker master NF driver shared storage k8s cluster persistent volume