Save 37% off PRO during our Black Friday Sale! »

Nextflow workshop '17: Lessons learned and new challenges

21c5e4164ca1573516b6a378fc279df2?s=47 Paolo Di Tommaso
September 14, 2017
150

Nextflow workshop '17: Lessons learned and new challenges

In this presentation I gave a quick overview of the state of Nextflow project and the some new features we are planning to implement in the upcoming releases. 

21c5e4164ca1573516b6a378fc279df2?s=128

Paolo Di Tommaso

September 14, 2017
Tweet

Transcript

  1. LESSONS LEARNED AND NEW CHALLENGES Paolo Di Tommaso Notredame Lab,

    CRG 14 Sep 2017
  2. None
  3. HOW IS NEXTFLOW DIFFERENT ?

  4. PORTABILITY

  5. SCALABILITY

  6. DATAFLOW • Declarative computational model for concurrent processes • Processes

    wait for data, when an input set is ready the process is executed • They communicate by using dataflow variables i.e. asynchronous stream of data called channels • Parallelisation and task dependencies are implicitly defined by process in/out declarations
  7. REPRODUCIBILITY

  8. A DATA ANALYSIS PRODUCES DIFFERENT RESULTS ON DIFFERENT SYSTEMS GIVEN

    THE SAME INPUTS P. Di Tommaso, et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820
  9. CONTAINERISATION

  10. None
  11. CONTAINERISATION • Nextflow envisioned the use of software containers to

    fix computational reproducibility • Mar 2014 (ver 0.7), support for Docker • Dec 2016 (ver 0.23), support for Singularity Nextflow job job job
  12. Orchestration & Parallelisation Scalability & Portability Deployment & Reproducibility containers

    Git GitHub
  13. PUSH-THE-BUTTON REPRODUCIBILITY

  14. GOLDEN RULES FOR REPRODUCIBILITY • Use Nextflow (obviously..) • Publish

    your pipeline project from day one on GitHub • Isolate the pipeline tools using a Docker container • Create a small dataset to quickly test your scripts and include it as default data in your project • Use a CI server (eg. Travis) to test any change timely
  15. STATE OF THE PROJECT • Started on March 2013 •

    ~ 65k lines of code • ~ 370 stars and 70 forks on GH • ~ 4'000 downloads / month from Maven
  16. 10x unique IPs downloading 
 NF over the last year

    (!) Unique IPs 0 750 1,500 2,250 3,000 Ago '16 Sep O ct N ov D ec Jan '17 Feb M ar Apr M ay Jun Jul Ago 17
  17. WHAT'S NEXT

  18. WORKFLOW COMPOSITION • Allows the creation of a workflow by

    composing other NF workflows • Top requested feature • Challenging to implement workflow A task workflow B input results
  19. IMPROVE CLOUD SUPPORT • Target all major cloud providers (AWS,

    Azure, Google, OpenStack) • NoOps approach ie. deploy transient cluster on demands • Optimise remote storage usage and caching
  20. AWS BATCH • Managed container-based computing environment in the Amazon

    cloud • Already integrated with Nextflow, under test • In collaboration with Francesco Strozzi
  21. GA4GH • Partecipate in Containers and Workflows working group •

    Task Execution API (working prototype) • Workflow Execution API • Enable interoperability with GA4GH complaint platforms eg. Cancer Genomics Cloud and Broad FireCloud
  22. KUBERNETES • Cloud-agnostic container clustering and management • NF includes

    an experimental Kubernetes executor worker worker worker master NF driver shared storage k8s cluster • Add support for Kubernetes Persistent Volumes worker worker worker master NF driver shared storage k8s cluster persistent volume
  23. None
  24. ACKNOWLEDGEMENTS Evan Floden Emilio Palumbo Cedric Notredame The Nextflow community

    !