Slide 1

Slide 1 text

Paolo Di Tommaso Emilio Palumbo Cedric Notredame Bioinformatics and Genomics programme CRG ENCODE AWG 9 Jan 2015

Slide 2

Slide 2 text

WHAT IS NEXTFLOW A toolkit for parallel and reproducible computational pipelines 2

Slide 3

Slide 3 text

PROJECT RATIONALE • Fast prototyping • Reuse any existing scripts/tools • High-level parallelisation model • Portable across platforms • Enables reproducible pipelines • Lightweight and easy to install 3

Slide 4

Slide 4 text

HOW IT WORKS • The pipeline flow is defined in a declarative manner • It is composed by several processes • A process is defined by a set of inputs/outputs and a script snippet to be executed • Tasks dependency/parallelisation is defined implicitly by input/ output declarations • Processes communicate using channels 4

Slide 5

Slide 5 text

HOW PARALLELISATION WORKS data x data y data z task 1 task 2 task 3 data x data y data z channel process 5

Slide 6

Slide 6 text

SCATTER-GATHER 6

Slide 7

Slide 7 text

WHAT A SCRIPT LOOKS LIKE params.blast_db  =  "$baseDir/blast-­‐db/tiny"   params.blast_query  =  "$baseDir/data/sample.fa"   params.chunk_size  =  100   ! seq  =  Channel                            .fromPath(params.blast_query)                          .splitFasta(by:  params.chunk_size)   ! process  blast  {          input:          file  'seq.fa'  from  seq   !        output:          file  'out.txt'  into  result   !        script:          """          blastp  -­‐db  $params.blast_db  -­‐query  seq.fa  -­‐outfmt  6  >  out.txt          """   }   ! result.view  {  it.text  }     7

Slide 8

Slide 8 text

$ nextflow run blast-test.nf ! N E X T F L O W ~ version 0.12.0 [3d/ec5c2e] Submitted process > blast (1) [1f/277042] Submitted process > blast (2) [9d/b49472] Submitted process > blast (3) [4a/3c2d5e] Submitted process > blast (4) [61/7dc8f0] Submitted process > blast (5) ! 1ycsB 1YCS:B 100.00 60 0 0 1 60 170 229 3e-42 131 1ycsB 1ABO:B 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1ABO:A 24.07 54 39 1 3 56 6 57 4e-05 28.5 1ycsB 1PHT:A 30.43 23 16 0 6 28 10 32 0.013 22.3 1vie 1VIE:A 100.00 51 0 0 1 51 12 62 1e-35 108 1pht 1PHT:A 100.00 80 0 0 1 80 5 84 1e-56 164 1pht 1YCS:B 30.43 23 16 0 6 28 175 197 0.015 23.5 1pht 1IHF:B 33.33 21 14 0 53 73 60 80 0.75 18.1 1pht 1IHT:H 32.00 25 17 0 40 64 175 199 4.0 16.5 1pht 1IHS:H 32.00 25 17 0 40 64 175 199 4.0 16.5 8

Slide 9

Slide 9 text

REPRODUCIBILITY • Scripts can run on multiple platforms • A pipeline project is self-contained • Binary dependencies can be deployed with Docker 9

Slide 10

Slide 10 text

NFS PLATFORM AGNOSTIC cluster engine local executor Nextflow grid executor Nextflow Docker 10 *nix OS

Slide 11

Slide 11 text

SUPPORTED PLATFORMS 11

Slide 12

Slide 12 text

GIT INTEGRATION • Handle and track changes in a consistent manner • Simplify the sharing of pipeline project • Run and compare results of different revisions/ versions 12

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

$  nextflow  run  cbcrg/grape-­‐nf 14

Slide 15

Slide 15 text

CONCLUSION • Nextflow allows you to write a pipeline by reusing your existing tools and scripts. • Parallelisation is managed automatically by the frameworks. • Pipeline scripts are portable across multiple executions environments. • The integration with Docker and Git allows pipeline projects to be shared and replicated with ease. 15