Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Nextflow

Paolo Di Tommaso
January 09, 2015
200

Introduction to Nextflow

Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.

You may reuse your existing scripts and tools and you don't need to learn a new language or API to start using it.

Paolo Di Tommaso

January 09, 2015
Tweet

Transcript

  1. PROJECT RATIONALE • Fast prototyping • Reuse any existing scripts/tools

    • High-level parallelisation model • Portable across platforms • Enables reproducible pipelines • Lightweight and easy to install 3
  2. HOW IT WORKS • The pipeline flow is defined in

    a declarative manner • It is composed by several processes • A process is defined by a set of inputs/outputs and a script snippet to be executed • Tasks dependency/parallelisation is defined implicitly by input/ output declarations • Processes communicate using channels 4
  3. HOW PARALLELISATION WORKS data x data y data z task

    1 task 2 task 3 data x data y data z channel process 5
  4. WHAT A SCRIPT LOOKS LIKE params.blast_db  =  "$baseDir/blast-­‐db/tiny"   params.blast_query

     =  "$baseDir/data/sample.fa"   params.chunk_size  =  100   ! seq  =  Channel                            .fromPath(params.blast_query)                          .splitFasta(by:  params.chunk_size)   ! process  blast  {          input:          file  'seq.fa'  from  seq   !        output:          file  'out.txt'  into  result   !        script:          """          blastp  -­‐db  $params.blast_db  -­‐query  seq.fa  -­‐outfmt  6  >  out.txt          """   }   ! result.view  {  it.text  }     7
  5. $ nextflow run blast-test.nf ! N E X T F

    L O W ~ version 0.12.0 [3d/ec5c2e] Submitted process > blast (1) [1f/277042] Submitted process > blast (2) [9d/b49472] Submitted process > blast (3) [4a/3c2d5e] Submitted process > blast (4) [61/7dc8f0] Submitted process > blast (5) ! 1ycsB 1YCS:B 100.00 60 0 0 1 60 170 229 3e-42 131 1ycsB 1ABO:B 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1ABO:A 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1PHT:A 30.43 23 16 0 6 28 10 32 0.013 22.3 1vie 1VIE:A 100.00 51 0 0 1 51 12 62 1e-35 108 1pht 1PHT:A 100.00 80 0 0 1 80 5 84 1e-56 164 1pht 1YCS:B 30.43 23 16 0 6 28 175 197 0.015 23.5 1pht 1IHF:B 33.33 21 14 0 53 73 60 80 0.75 18.1 1pht 1IHT:H 32.00 25 17 0 40 64 175 199 4.0 16.5 1pht 1IHS:H 32.00 25 17 0 40 64 175 199 4.0 16.5 8
  6. REPRODUCIBILITY • Scripts can run on multiple platforms • A

    pipeline project is self-contained • Binary dependencies can be deployed with Docker 9
  7. GIT INTEGRATION • Handle and track changes in a consistent

    manner • Simplify the sharing of pipeline project • Run and compare results of different revisions/ versions 12
  8. 13

  9. CONCLUSION • Nextflow allows you to write a pipeline by

    reusing your existing tools and scripts. • Parallelisation is managed automatically by the frameworks. • Pipeline scripts are portable across multiple executions environments. • The integration with Docker and Git allows pipeline projects to be shared and replicated with ease. 15