Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nextflow workshop '17: Lessons learned and new ...

Paolo Di Tommaso
September 14, 2017
330

Nextflow workshop '17: Lessons learned and new challenges

In this presentation I gave a quick overview of the state of Nextflow project and the some new features we are planning to implement in the upcoming releases. 

Paolo Di Tommaso

September 14, 2017
Tweet

Transcript

  1. DATAFLOW • Declarative computational model for concurrent processes • Processes

    wait for data, when an input set is ready the process is executed • They communicate by using dataflow variables i.e. asynchronous stream of data called channels • Parallelisation and task dependencies are implicitly defined by process in/out declarations
  2. A DATA ANALYSIS PRODUCES DIFFERENT RESULTS ON DIFFERENT SYSTEMS GIVEN

    THE SAME INPUTS P. Di Tommaso, et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820
  3. CONTAINERISATION • Nextflow envisioned the use of software containers to

    fix computational reproducibility • Mar 2014 (ver 0.7), support for Docker • Dec 2016 (ver 0.23), support for Singularity Nextflow job job job
  4. GOLDEN RULES FOR REPRODUCIBILITY • Use Nextflow (obviously..) • Publish

    your pipeline project from day one on GitHub • Isolate the pipeline tools using a Docker container • Create a small dataset to quickly test your scripts and include it as default data in your project • Use a CI server (eg. Travis) to test any change timely
  5. STATE OF THE PROJECT • Started on March 2013 •

    ~ 65k lines of code • ~ 370 stars and 70 forks on GH • ~ 4'000 downloads / month from Maven
  6. 10x unique IPs downloading 
 NF over the last year

    (!) Unique IPs 0 750 1,500 2,250 3,000 Ago '16 Sep O ct N ov D ec Jan '17 Feb M ar Apr M ay Jun Jul Ago 17
  7. WORKFLOW COMPOSITION • Allows the creation of a workflow by

    composing other NF workflows • Top requested feature • Challenging to implement workflow A task workflow B input results
  8. IMPROVE CLOUD SUPPORT • Target all major cloud providers (AWS,

    Azure, Google, OpenStack) • NoOps approach ie. deploy transient cluster on demands • Optimise remote storage usage and caching
  9. AWS BATCH • Managed container-based computing environment in the Amazon

    cloud • Already integrated with Nextflow, under test • In collaboration with Francesco Strozzi
  10. GA4GH • Partecipate in Containers and Workflows working group •

    Task Execution API (working prototype) • Workflow Execution API • Enable interoperability with GA4GH complaint platforms eg. Cancer Genomics Cloud and Broad FireCloud
  11. KUBERNETES • Cloud-agnostic container clustering and management • NF includes

    an experimental Kubernetes executor worker worker worker master NF driver shared storage k8s cluster • Add support for Kubernetes Persistent Volumes worker worker worker master NF driver shared storage k8s cluster persistent volume