Slide 1

Slide 1 text

Paolo Di Tommaso Comparative bioinformatics Notredame Lab - CRG ! 26 Feb 2015

Slide 2

Slide 2 text

WHAT NEXTFLOW IS • A computing runtime which executes Nextflow pipeline scripts • A programming DSL that simplify writing of highly parallel computational pipelines reusing your existing scripts and tools

Slide 3

Slide 3 text

NEXTFLOW DSL • It is NOT a new programming language • It extends the Groovy scripting language • It provides a multi-paradigm programming environment

Slide 4

Slide 4 text

MULTI-PARADIGM Imperative
 Object-oriented programming + Declarative concurrency
 Dataflow programming model

Slide 5

Slide 5 text

VFS Groovy Runtime Executors Tasks dispatcher Dataflow parallelisation & synchronisation Script interpreter Java VM 7+

Slide 6

Slide 6 text

HOW TO INSTALL Use the following command: wget  -­‐qO-­‐  get.nextflow.io  |  bash nextflow

Slide 7

Slide 7 text

GET STARTED $  cd  ~/crg-­‐course   $  vagrant  up
 $  vagrant  ssh   Login in your course laptop Once in the virtual machine $  cd  ~/nextflow-­‐tutorial   $  git  pull   $  nextflow  info  

Slide 8

Slide 8 text

THE BASIC Variables and assignments x  =  1   y  =  10.5   str  =  'hello  world!'   p  =  x;  q  =  y  

Slide 9

Slide 9 text

THE BASIC Printing values x  =  1   y  =  10.5   str  =  'hello  world!'   print  x   print  str   print  str  +  '\n' println  str  

Slide 10

Slide 10 text

THE BASIC Printing values x  =  1   y  =  10.5   str  =  'hello  world!'   print(x)   print(str)   print(str  +  '\n') println(str)  

Slide 11

Slide 11 text

MORE ON STRINGS str  =  'bioinformatics'     print  str[0]   ! print  "$str  is  cool!"   print  "Current  path:  $PWD" str  =  '''              multi              line                string       ''' ! str  =  """              User:  $USER              Home:  $HOME              """

Slide 12

Slide 12 text

COMMON STRUCTURES & PROGRAMMING IDIOMS • Data structures: Lists & Maps • Control statements: if, for, while, etc. • Functions and classes • File I/O operations

Slide 13

Slide 13 text

6 PAGES PRIMER http://refcardz.dzone.com/refcardz/groovy

Slide 14

Slide 14 text

MAIN ABSTRACTIONS • Processes: run any piece of script • Channels: unidirectional async queues that allows the processes to comunicate • Operators: transform channels content

Slide 15

Slide 15 text

CHANNELS • It connects two processes/operators • Write operations is NOT blocking • Read operation is blocking • Once an item is read is removed from the queue

Slide 16

Slide 16 text

CHANNELS some_items  =  Channel.from(10,  20,  30,  ..) my_channel  =  Channel.create() single_file  =  Channel.fromPath('some/file/name') more_files  =  Channel.fromPath('some/data/path/*') file x file y file z

Slide 17

Slide 17 text

OPERATORS • Functions applied to channels • Transform channels content • Can be used also to filter, fork and combine channels • Operators can be chained to implement custom behaviours

Slide 18

Slide 18 text

OPERATORS nums  =  Channel.from(1,2,3,4)   square  =  nums.map  {  it  -­‐>  it  *  it  } 4            3              2            1 16          9              4            1 nums square map

Slide 19

Slide 19 text

OPERATORS CHAINING Channel.from(1,2,3,4)       .map  {  it  -­‐>  [it,  it*it]  }       .subscribe  {  num,  sqr  -­‐>  println  "Square  of:  $num  is  $sqr"  } //  it  prints     Square  of:  1  is  1     Square  of:  2  is  4     Square  of:  3  is  9     Square  of:  4  is  16  

Slide 20

Slide 20 text

SPLIT FASTA FILE(S) Channel.fromPath('/some/path/fasta.fa')       .splitFasta()       .view() Channel.fromPath('/some/path/fasta.fa')       .splitFasta(by:  3)       .view() Channel.fromPath('/some/path/*.fa')       .splitFasta(by:  3)       .view()

Slide 21

Slide 21 text

SPLITTING OPERATORS You can split text object or files using the splitting methods: • splitText - line by line • splitCsv - comma separated values format • splitFasta - by FASTA sequences • splitFastq - by FASTQ sequences

Slide 22

Slide 22 text

EXAMPLE 1 • Split a FASTA file in sequence • Parse a FASTA file and count number of sequences matching specified ID

Slide 23

Slide 23 text

EXAMPLE 1 $  nextflow  run  channel_split.nf   ! ! $  nextflow  run  channel_filter.nf  

Slide 24

Slide 24 text

PROCESS process  sayHello  {   !      input:        val  str   !      output:        stdout  into  result   !      script:        """        echo  $str  world!        """   }   ! str  =  Channel.from('hello',  'hola',  'bonjour',  'ciao') result.subscribe  {  print  it  }

Slide 25

Slide 25 text

PROCESS INPUTS input:          [from  ]  [attributes] process  procName  {   ! ! ! ! ! ! ! ! !      """                """     ! }

Slide 26

Slide 26 text

PROCESS INPUTS input:      val    x  from  ch_1      file  y  from  ch_2      file  'data.fa'  from  ch_3      stdin  from  from  ch_4      set  (x,  'file.txt')  from  ch_5 process  procName  {   ! ! ! ! ! ! ! ! !      """                """     ! }

Slide 27

Slide 27 text

PROCESS INPUTS proteins  =  Channel.fromPath(  '/some/path/data.fa'  )   ! ! ! process  blastThemAll  {   !    input:      file  'query.fa'  from  proteins   !    "blastp  -­‐query  query.fa  -­‐db  nr"   ! }   !

Slide 28

Slide 28 text

PROCESS OUTPUTS process  randomNum  {   !      output:        file  'result.txt'  into  numbers   ! !      '''        echo  $RANDOM  >  result.txt        '''   ! }   ! ! numbers.subscribe  {  println  "Received:  "  +  it.text  }

Slide 29

Slide 29 text

USE YOUR FAVOURITE
 PROGRAMMING LANG process  pyStuff  {   !        script:          """          #!/usr/bin/env  python   !        x  =  'Hello'          y  =  'world!'          print  "%s  -­‐  %s"  %  (x,y)          """   }

Slide 30

Slide 30 text

EXAMPLE 2 • Execute a process running a BLAST job given an input file • Execute a BLAST job emitting the produced output

Slide 31

Slide 31 text

EXAMPLE 2 $  nextflow  run  process_input.nf   ! ! $  nextflow  run  process_output.nf  

Slide 32

Slide 32 text

PIPELINES PARAMETERS params.p1  =  'alpha'   params.p2  =  'beta'   : Simply declares some variables prefixed by params When launching your script you can override the default values $  nextflow  run    -­‐-­‐p1  'delta'  -­‐-­‐p2  'gamma'

Slide 33

Slide 33 text

COLLECT FILE The operator collectFile allows to gather items produced by upstream processes my_results.collectFile(name:'result.txt')   Collect all items to a single file

Slide 34

Slide 34 text

COLLECT FILE The operator collectFile allows to gather items produced by upstream processes my_items.collectFile(storeDir:'path/name')  {   !       def  key  =  get_a_key_from_the_item(it)         def  content  =  get_the_item_value(it)         [  key,  content  ]   ! } Collect the items and group them into files having a names defined by a grouping criteria

Slide 35

Slide 35 text

EXAMPLE 3 • Split a FASTA file, execute a BLAST query for each chunk and gather the results • Split multiple FASTA file and execute a BLAST query for each chunk

Slide 36

Slide 36 text

EXAMPLE 3 $  nextflow  run  split_fasta.nf   ! ! $  nextflow  run  split_fasta.nf  -­‐-­‐chunkSize  2   ! ! $  nextflow  run  split_fasta.nf  -­‐-­‐chunkSize  2  -­‐-­‐query  data/p\*.fa   ! ! $  nextflow  run  split_and_collect.nf  

Slide 37

Slide 37 text

UNDERSTANDING MULTIPLE INPUTS task 1 process a out x d a c β .. /END/ task 2 out y β d

Slide 38

Slide 38 text

UNDERSTANDING MULTIPLE INPUTS process a out x d a c .. β β d out y β c out z β task 1 task 2 task 3 : task n

Slide 39

Slide 39 text

CONFIG FILE • Pipeline configuration can be externalised to a file named nextflow.config • parameters • environment variables • required resources (mem, cpus, queue, etc) • modules/containers

Slide 40

Slide 40 text

CONFIG FILE params.p1  =  'alpha'   params.p2  =  'beta'   ! env.VAR_1  =  'some_value'   env.CACHE_4_TCOFFEE  =  '/some/path/cache'   env.LOCKDIR_4_TCOFFEE  =  '/some/path/lock'   ! process.executor  =  'sge'

Slide 41

Slide 41 text

CONFIG FILE params  {      p1  =  'alpha'      p2  =  'beta'   }   ! env  {      VAR_1  =  'some_value'      CACHE_4_TCOFFEE  =  '/some/path/cache'      LOCKDIR_4_TCOFFEE  =  '/some/path/lock'   }     ! process  {        executor  =  'sge'   } Alternate syntax (almost) equivalent

Slide 42

Slide 42 text

HOW USE DOCKER Specify in the config file the Docker image to use ! process  {         container  =     } Add the with-docker flag when launching it ! $  nextflow  run    -­‐with-­‐docker  

Slide 43

Slide 43 text

EXAMPLE 4 Launch a pipeline using a Docker container

Slide 44

Slide 44 text

EXAMPLE 4 ! $  nextflow  run  blast_extract.nf  -­‐with-­‐docker  

Slide 45

Slide 45 text

HOW USE THE CLUSTER //  default  properties  for  any  process   process  {     executor  =  'crg'     queue  =  'short'     cpus  =  2       memory  =  '4GB'     scratch  =  true   }   ! ! Define the CRG executor in nextflow.config

Slide 46

Slide 46 text

PROCESS RESOURCES //  default  properties  for  any  process   process  {     executor  =  'crg'     queue  =  'short'     scratch  =  true   }   ! //  cpus  for  process  'foo'   process.$foo.cpus  =  2   ! //  resources  for  'bar'     process.$bar.queue  =  'long'   process.$bar.cpus  =  4     process.$bar.memory  =  '4GB'   !

Slide 47

Slide 47 text

ENVIRONMENT MODULE ! process.$foo.module  =  'Bowtie2/2.2.3'   ! process.$bar.module  =  'TopHat/2.0.12:Boost/1.55.0'   Specify in the config file the modules required

Slide 48

Slide 48 text

EXAMPLE 5 Executes a pipeline in the cluster

Slide 49

Slide 49 text

EXAMPLE 5 $  ssh  username@ant-­‐login.linux.crg.es $  module  avail     $  module  purge     $  module  load  nextflow/0.12.3-­‐goolf-­‐1.4.10-­‐no-­‐OFED-­‐Java-­‐1.7.0_21 $  curl  -­‐fsSL  get.nextflow.io  |  bash Login in ANT-LOGIN If you have module configured: Otherwise install it downloading from internet

Slide 50

Slide 50 text

EXAMPLE 5 Create the following nextflow.config file: process  {      executor  =  'crg'      queue  =  'course'      scratch  =  true   } $  nextflow  run  rnatoy  -­‐with-­‐docker  -­‐with-­‐trace Launch the pipeline execution:

Slide 51

Slide 51 text

RESOURCES project home
 http://nextflow.io tutorials
 https://github.com/nextflow-io/examples community
 http://groups.google.com/forum/#!forum/nextflow