Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Nextflow

Paolo Di Tommaso
January 09, 2015
160

Introduction to Nextflow

Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.

You may reuse your existing scripts and tools and you don't need to learn a new language or API to start using it.

Paolo Di Tommaso

January 09, 2015
Tweet

Transcript

  1. Paolo Di Tommaso Emilio Palumbo Cedric Notredame Bioinformatics and Genomics

    programme CRG ENCODE AWG 9 Jan 2015
  2. WHAT IS NEXTFLOW A toolkit for parallel and reproducible computational

    pipelines 2
  3. PROJECT RATIONALE • Fast prototyping • Reuse any existing scripts/tools

    • High-level parallelisation model • Portable across platforms • Enables reproducible pipelines • Lightweight and easy to install 3
  4. HOW IT WORKS • The pipeline flow is defined in

    a declarative manner • It is composed by several processes • A process is defined by a set of inputs/outputs and a script snippet to be executed • Tasks dependency/parallelisation is defined implicitly by input/ output declarations • Processes communicate using channels 4
  5. HOW PARALLELISATION WORKS data x data y data z task

    1 task 2 task 3 data x data y data z channel process 5
  6. SCATTER-GATHER 6

  7. WHAT A SCRIPT LOOKS LIKE params.blast_db  =  "$baseDir/blast-­‐db/tiny"   params.blast_query

     =  "$baseDir/data/sample.fa"   params.chunk_size  =  100   ! seq  =  Channel                            .fromPath(params.blast_query)                          .splitFasta(by:  params.chunk_size)   ! process  blast  {          input:          file  'seq.fa'  from  seq   !        output:          file  'out.txt'  into  result   !        script:          """          blastp  -­‐db  $params.blast_db  -­‐query  seq.fa  -­‐outfmt  6  >  out.txt          """   }   ! result.view  {  it.text  }     7
  8. $ nextflow run blast-test.nf ! N E X T F

    L O W ~ version 0.12.0 [3d/ec5c2e] Submitted process > blast (1) [1f/277042] Submitted process > blast (2) [9d/b49472] Submitted process > blast (3) [4a/3c2d5e] Submitted process > blast (4) [61/7dc8f0] Submitted process > blast (5) ! 1ycsB 1YCS:B 100.00 60 0 0 1 60 170 229 3e-42 131 1ycsB 1ABO:B 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1ABO:A 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1PHT:A 30.43 23 16 0 6 28 10 32 0.013 22.3 1vie 1VIE:A 100.00 51 0 0 1 51 12 62 1e-35 108 1pht 1PHT:A 100.00 80 0 0 1 80 5 84 1e-56 164 1pht 1YCS:B 30.43 23 16 0 6 28 175 197 0.015 23.5 1pht 1IHF:B 33.33 21 14 0 53 73 60 80 0.75 18.1 1pht 1IHT:H 32.00 25 17 0 40 64 175 199 4.0 16.5 1pht 1IHS:H 32.00 25 17 0 40 64 175 199 4.0 16.5 8
  9. REPRODUCIBILITY • Scripts can run on multiple platforms • A

    pipeline project is self-contained • Binary dependencies can be deployed with Docker 9
  10. NFS PLATFORM AGNOSTIC cluster engine local executor Nextflow grid executor

    Nextflow Docker 10 *nix OS
  11. SUPPORTED PLATFORMS 11

  12. GIT INTEGRATION • Handle and track changes in a consistent

    manner • Simplify the sharing of pipeline project • Run and compare results of different revisions/ versions 12
  13. 13

  14. $  nextflow  run  cbcrg/grape-­‐nf 14

  15. CONCLUSION • Nextflow allows you to write a pipeline by

    reusing your existing tools and scripts. • Parallelisation is managed automatically by the frameworks. • Pipeline scripts are portable across multiple executions environments. • The integration with Docker and Git allows pipeline projects to be shared and replicated with ease. 15