Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Nextflow

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Paolo Di Tommaso Paolo Di Tommaso
January 09, 2015
240

Introduction to Nextflow

Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.

You may reuse your existing scripts and tools and you don't need to learn a new language or API to start using it.

Avatar for Paolo Di Tommaso

Paolo Di Tommaso

January 09, 2015
Tweet

Transcript

  1. PROJECT RATIONALE • Fast prototyping • Reuse any existing scripts/tools

    • High-level parallelisation model • Portable across platforms • Enables reproducible pipelines • Lightweight and easy to install 3
  2. HOW IT WORKS • The pipeline flow is defined in

    a declarative manner • It is composed by several processes • A process is defined by a set of inputs/outputs and a script snippet to be executed • Tasks dependency/parallelisation is defined implicitly by input/ output declarations • Processes communicate using channels 4
  3. HOW PARALLELISATION WORKS data x data y data z task

    1 task 2 task 3 data x data y data z channel process 5
  4. WHAT A SCRIPT LOOKS LIKE params.blast_db  =  "$baseDir/blast-­‐db/tiny"   params.blast_query

     =  "$baseDir/data/sample.fa"   params.chunk_size  =  100   ! seq  =  Channel                            .fromPath(params.blast_query)                          .splitFasta(by:  params.chunk_size)   ! process  blast  {          input:          file  'seq.fa'  from  seq   !        output:          file  'out.txt'  into  result   !        script:          """          blastp  -­‐db  $params.blast_db  -­‐query  seq.fa  -­‐outfmt  6  >  out.txt          """   }   ! result.view  {  it.text  }     7
  5. $ nextflow run blast-test.nf ! N E X T F

    L O W ~ version 0.12.0 [3d/ec5c2e] Submitted process > blast (1) [1f/277042] Submitted process > blast (2) [9d/b49472] Submitted process > blast (3) [4a/3c2d5e] Submitted process > blast (4) [61/7dc8f0] Submitted process > blast (5) ! 1ycsB 1YCS:B 100.00 60 0 0 1 60 170 229 3e-42 131 1ycsB 1ABO:B 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1ABO:A 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1PHT:A 30.43 23 16 0 6 28 10 32 0.013 22.3 1vie 1VIE:A 100.00 51 0 0 1 51 12 62 1e-35 108 1pht 1PHT:A 100.00 80 0 0 1 80 5 84 1e-56 164 1pht 1YCS:B 30.43 23 16 0 6 28 175 197 0.015 23.5 1pht 1IHF:B 33.33 21 14 0 53 73 60 80 0.75 18.1 1pht 1IHT:H 32.00 25 17 0 40 64 175 199 4.0 16.5 1pht 1IHS:H 32.00 25 17 0 40 64 175 199 4.0 16.5 8
  6. REPRODUCIBILITY • Scripts can run on multiple platforms • A

    pipeline project is self-contained • Binary dependencies can be deployed with Docker 9
  7. GIT INTEGRATION • Handle and track changes in a consistent

    manner • Simplify the sharing of pipeline project • Run and compare results of different revisions/ versions 12
  8. 13

  9. CONCLUSION • Nextflow allows you to write a pipeline by

    reusing your existing tools and scripts. • Parallelisation is managed automatically by the frameworks. • Pipeline scripts are portable across multiple executions environments. • The integration with Docker and Git allows pipeline projects to be shared and replicated with ease. 15