$30 off During Our Annual Pro Sale. View Details »

Nextflow workshop '17: Lessons learned and new challenges

Paolo Di Tommaso
September 14, 2017
240

Nextflow workshop '17: Lessons learned and new challenges

In this presentation I gave a quick overview of the state of Nextflow project and the some new features we are planning to implement in the upcoming releases. 

Paolo Di Tommaso

September 14, 2017
Tweet

Transcript

  1. LESSONS LEARNED AND
    NEW CHALLENGES
    Paolo Di Tommaso
    Notredame Lab, CRG
    14 Sep 2017

    View Slide

  2. View Slide

  3. HOW IS NEXTFLOW
    DIFFERENT ?

    View Slide

  4. PORTABILITY

    View Slide

  5. SCALABILITY

    View Slide

  6. DATAFLOW
    • Declarative computational model for concurrent processes
    • Processes wait for data, when an input set is ready the
    process is executed
    • They communicate by using dataflow variables i.e.
    asynchronous stream of data called channels
    • Parallelisation and task dependencies are implicitly defined
    by process in/out declarations

    View Slide

  7. REPRODUCIBILITY

    View Slide

  8. A DATA ANALYSIS PRODUCES
    DIFFERENT RESULTS ON DIFFERENT
    SYSTEMS GIVEN THE SAME INPUTS
    P. Di Tommaso, et al. (2017). Nextflow enables reproducible computational workflows.
    Nature Biotechnology, 35(4), 316–319. doi:10.1038/nbt.3820

    View Slide

  9. CONTAINERISATION

    View Slide

  10. View Slide

  11. CONTAINERISATION
    • Nextflow envisioned the use
    of software containers to fix
    computational reproducibility
    • Mar 2014 (ver 0.7), support
    for Docker
    • Dec 2016 (ver 0.23), support
    for Singularity
    Nextflow
    job job job

    View Slide

  12. Orchestration
    & Parallelisation
    Scalability
    & Portability
    Deployment &
    Reproducibility
    containers
    Git GitHub

    View Slide

  13. PUSH-THE-BUTTON
    REPRODUCIBILITY

    View Slide

  14. GOLDEN RULES FOR
    REPRODUCIBILITY
    • Use Nextflow (obviously..)
    • Publish your pipeline project from day one on GitHub
    • Isolate the pipeline tools using a Docker container
    • Create a small dataset to quickly test your scripts and
    include it as default data in your project
    • Use a CI server (eg. Travis) to test any change timely

    View Slide

  15. STATE OF THE PROJECT
    • Started on March 2013
    • ~ 65k lines of code
    • ~ 370 stars and 70 forks on GH
    • ~ 4'000 downloads / month from Maven

    View Slide

  16. 10x unique IPs downloading 

    NF over the last year (!)
    Unique IPs
    0
    750
    1,500
    2,250
    3,000
    Ago
    '16
    Sep
    O
    ct
    N
    ov
    D
    ec
    Jan
    '17
    Feb
    M
    ar
    Apr
    M
    ay
    Jun Jul
    Ago
    17

    View Slide

  17. WHAT'S NEXT

    View Slide

  18. WORKFLOW COMPOSITION
    • Allows the creation of a
    workflow by composing other
    NF workflows
    • Top requested feature
    • Challenging to implement
    workflow A
    task
    workflow B
    input
    results

    View Slide

  19. IMPROVE CLOUD SUPPORT
    • Target all major cloud providers (AWS, Azure,
    Google, OpenStack)
    • NoOps approach ie. deploy transient cluster on
    demands
    • Optimise remote storage usage and caching

    View Slide

  20. AWS BATCH
    • Managed container-based computing environment
    in the Amazon cloud
    • Already integrated with Nextflow, under test
    • In collaboration with Francesco Strozzi

    View Slide

  21. GA4GH
    • Partecipate in Containers and Workflows working group
    • Task Execution API (working prototype)
    • Workflow Execution API
    • Enable interoperability with GA4GH complaint
    platforms eg. Cancer Genomics Cloud and Broad
    FireCloud

    View Slide

  22. KUBERNETES
    • Cloud-agnostic container clustering and management
    • NF includes an experimental Kubernetes executor
    worker
    worker
    worker
    master
    NF driver
    shared storage
    k8s cluster
    • Add support for Kubernetes Persistent Volumes
    worker
    worker
    worker
    master
    NF driver
    shared storage
    k8s cluster
    persistent volume

    View Slide

  23. View Slide

  24. ACKNOWLEDGEMENTS
    Evan Floden
    Emilio Palumbo
    Cedric Notredame
    The Nextflow community !

    View Slide