$30 off During Our Annual Pro Sale. View Details »

Introduction to Nextflow

Paolo Di Tommaso
January 09, 2015
180

Introduction to Nextflow

Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.

You may reuse your existing scripts and tools and you don't need to learn a new language or API to start using it.

Paolo Di Tommaso

January 09, 2015
Tweet

Transcript

  1. Paolo Di Tommaso

    Emilio Palumbo

    Cedric Notredame

    Bioinformatics and Genomics programme

    CRG

    ENCODE AWG

    9 Jan 2015

    View Slide

  2. WHAT IS NEXTFLOW
    A toolkit for parallel and reproducible
    computational pipelines
    2

    View Slide

  3. PROJECT RATIONALE
    • Fast prototyping

    • Reuse any existing scripts/tools

    • High-level parallelisation model

    • Portable across platforms

    • Enables reproducible pipelines

    • Lightweight and easy to install
    3

    View Slide

  4. HOW IT WORKS
    • The pipeline flow is defined in a declarative manner

    • It is composed by several processes

    • A process is defined by a set of inputs/outputs and a script
    snippet to be executed

    • Tasks dependency/parallelisation is defined implicitly by input/
    output declarations

    • Processes communicate using channels
    4

    View Slide

  5. HOW PARALLELISATION WORKS
    data x
    data y
    data z
    task 1
    task 2
    task 3
    data x
    data y
    data z
    channel
    process
    5

    View Slide

  6. SCATTER-GATHER
    6

    View Slide

  7. WHAT A SCRIPT LOOKS LIKE
    params.blast_db  =  "$baseDir/blast-­‐db/tiny"  
    params.blast_query  =  "$baseDir/data/sample.fa"  
    params.chunk_size  =  100  
    !
    seq  =  Channel    
                           .fromPath(params.blast_query)  
                           .splitFasta(by:  params.chunk_size)  
    !
    process  blast  {  
           input:  
           file  'seq.fa'  from  seq  
    !
           output:  
           file  'out.txt'  into  result  
    !
           script:  
           """  
           blastp  -­‐db  $params.blast_db  -­‐query  seq.fa  -­‐outfmt  6  >  out.txt  
           """  
    }  
    !
    result.view  {  it.text  }    
    7

    View Slide

  8. $ nextflow run blast-test.nf
    !
    N E X T F L O W ~ version 0.12.0
    [3d/ec5c2e] Submitted process > blast (1)
    [1f/277042] Submitted process > blast (2)
    [9d/b49472] Submitted process > blast (3)
    [4a/3c2d5e] Submitted process > blast (4)
    [61/7dc8f0] Submitted process > blast (5)
    !
    1ycsB 1YCS:B 100.00 60 0 0 1 60 170 229 3e-42 131
    1ycsB 1ABO:B 24.07 54 39 1 3 56 6 57 4e-05 28.5
    1ycsB 1ABO:A 24.07 54 39 1 3 56 6 57 4e-05 28.5
    1ycsB 1PHT:A 30.43 23 16 0 6 28 10 32 0.013 22.3
    1vie 1VIE:A 100.00 51 0 0 1 51 12 62 1e-35 108
    1pht 1PHT:A 100.00 80 0 0 1 80 5 84 1e-56 164
    1pht 1YCS:B 30.43 23 16 0 6 28 175 197 0.015 23.5
    1pht 1IHF:B 33.33 21 14 0 53 73 60 80 0.75 18.1
    1pht 1IHT:H 32.00 25 17 0 40 64 175 199 4.0 16.5
    1pht 1IHS:H 32.00 25 17 0 40 64 175 199 4.0 16.5
    8

    View Slide

  9. REPRODUCIBILITY
    • Scripts can run on multiple platforms

    • A pipeline project is self-contained

    • Binary dependencies can be deployed with
    Docker
    9

    View Slide

  10. NFS
    PLATFORM AGNOSTIC
    cluster engine
    local executor
    Nextflow
    grid executor
    Nextflow
    Docker
    10
    *nix OS

    View Slide

  11. SUPPORTED PLATFORMS
    11

    View Slide

  12. GIT INTEGRATION
    • Handle and track changes in a consistent manner

    • Simplify the sharing of pipeline project

    • Run and compare results of different revisions/
    versions
    12

    View Slide

  13. 13

    View Slide

  14. $  nextflow  run  cbcrg/grape-­‐nf
    14

    View Slide

  15. CONCLUSION
    • Nextflow allows you to write a pipeline by reusing your
    existing tools and scripts.

    • Parallelisation is managed automatically by the frameworks.

    • Pipeline scripts are portable across multiple executions
    environments.

    • The integration with Docker and Git allows pipeline
    projects to be shared and replicated with ease.
    15

    View Slide