Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Snake makes Vital-it a nicer place

Snake makes Vital-it a nicer place

I tried to explain the logic behind build/workflow languages. Latter on I am pinpointing differences between snakemake and GNU make and in the end I am introducing a template I wrote for computation on Vital-it, a Swiss lsf cluster for bioinformatics.

5777dabfd3de7c2a700eebf2ed8b33a4?s=128

Kamil S Jaroň

March 29, 2018
Tweet

More Decks by Kamil S Jaroň

Other Decks in Education

Transcript

  1. Snake makes Vital-it a nicer place March 29, 2018

  2. ./configure make make install

  3. bash make execute line 1 1. read script execute line

    2 2. define relationships between files execute line 3 3. execute what is needed to create/update desired output
  4. bash command1 command2 ... wc -w hamlet.txt > word_count.txt

  5. Makefile target : dependencies recepie shell : make Make target

    : if the target does not exist if any of the dependiencies have a newer timestamp only if & only if all dependencies exist
  6. Rule Makefile target : dependencies <tab>recepie hamlet_wc.txt : hamlet.txt wc

    -w hamlet.txt > hamlet_wc.txt
  7. Variables hamlet_wc.txt : hamlet.txt wc -w hamlet.txt > hamlet_wc.txt $<

    the first dependency $ˆ all dependencies $@ target
  8. Variables hamlet_wc.txt : hamlet.txt wc -w $< > $@ $<

    the first dependency $ˆ all dependencies $@ target
  9. Pattern rules Makefile %_wc.txt : %.txt wc -w $< >

    $@ executed from bash make hamlet_wc.txt # requires hamlet.txt make romeo_wc.txt # requires romeo.txt
  10. PHONY targets Makefile PHONY : both both : hamlet_wc.txt romeo_wc.txt

    %_wc.txt : %.txt wc -w $< > $@ executed from bash make both
  11. ./configure make make install # run automake to generate Makefile

    # compile source code into programs # move programs somewhere (/bin/bash)
  12. Convert workflow into procedural script make --dry-run Build graph and

    print commands that would get executed.
  13. But... this presentation was supposed to be about snakemake

  14. GNU make snakemake exhaustive documentation stable limited programming local files

    running in memory no isolation of env tutorial instead of documentation sometimes misbehaving python flavoured ! cluster support lock files virtal env
  15. snakemake rules have names rule bwa_map: input: "data/genome.fa", "data/samples/A.fastq" output:

    "mapped_reads/A.bam" shell: "bwa mem {input} | samtools view -Sb - > {output}"
  16. Snakefiles have verbose wildcards rule download_all : input : "data/monkey/genome.fa.gz",

    "data/lion/genome.fa.gz" rule download_genome : output : "data/{sp}/genome.fa.gz" shell : "download_genome.sh {wildcards.sp} {output}"
  17. Snakefiles have blocks of ”procedural” code species_with_genomes = [] with

    open(’tables/genome_table.tsv’) as tab : tab.readline() for textline in tab : line = textline.split() if line[2] != ’NA’ : species_with_genomes.append(line[0]) rule download_all : input : expand("data/{sp}/genome.fa.gz", sp=species_with_genomes) rule download_genome : output : "data/{sp}/genome.fa.gz" shell : "download_genome.sh {wildcards.sp} genome_table.tsv {output}"
  18. Running snakemake shell snakemake download_all

  19. Ok, this is not a tutorial about snakemake. What about

    Vital-it?
  20. Running snakemake on Vital-it shell # snakemake v3.13.0 is installed

    # Warning, following lines are stupid snakemake download_all --jobs 10 --cluster "bsub \ -J snakejobs \ -q normal \ -n 1 \ -M 5000000 \ -R \"span[hosts=1] rusage[tmp=50000] span[ptile=1]\" \ -o \"logs/log.out\" \ -e \"logs/log.err\""
  21. Snakefile can specify resources in rules Snakefile rule download_genome :

    threads : 1 resources : mem=2000000, tmp=3000 output : "data/{sp}/genome.fa.gz" shell : "download_genome.sh {wildcards.sp} genome_table.tsv {output}"
  22. When executed resources are pulled for every submited job shell

    snakemake download_all --jobs 10 --cluster "bsub \ -J {rule} \ -q normal \ -n {threads} \ -M {resources.mem} \ -R \"span[hosts=1] rusage[tmp={resources.tmp}]\" \ -o \"logs/{rule}.{wildcards}.out\" \ -e \"logs/{rule}.{wildcards}.err\""
  23. Ultimate goal no Vital-it boiler plate code

  24. Snakemake project template https://github.com/KamilSJaron/snakemake-vital-it-template

  25. Classical job scripts #BSUB -L /bin/bash #BSUB -q normal #BSUB

    -n 16 #BSUB -M 25165824 #BSUB -R \"rusage[tmp=70000] span[ptile=16]\" INPUTDIR=/scratch/data/raw_reads/ INPUT=raw_reads.fq.gz LOCALDIR=/scratch/local/daily/$USER/$JOBID TARGETPATH=/scratch/data/$1/trimmed_reads mkdir -p $LOCALDIR $TARGETPATH cp $INPUTDIR/$INPUT . trimmomatic PE -threads 16 ... $INPUT mv reads[12].fq.gz \$TARGETPATH rm $INPUT rmdir \$LOCALDIR
  26. My alternative scripts/use_local.sh <script> <arguments> <output> the first argument is

    the script that get executed the last argument is output (can be specified using wildcards or a directory) 1. copy to local disk all arguments that are valid files 2. execute the script (fist argument) 3. move the output back to where snakemake was executed (last argument) 4. remove all files that were copied as an input
  27. My alternative scripts/use_local.sh <script> <arguments> <output> the first argument is

    the script that get executed the last argument is output (can be specified using wildcards or a directory) 1. copy to local disk all arguments that are valid files 2. execute the script (fist argument) 3. move the output back to where snakemake was executed (last argument) 4. remove all files that were copied as an input
  28. Snakemake project template https://github.com/KamilSJaron/snakemake-vital-it-template ssh prd.vital-it.ch cd /scratch/beegfs/monthly/$USER git clone

    \ git@github.com:KamilSJaron/snakemake-vital-it-template.git mv snakemake-vital-it-template the_coolest_study_ever cd the_coolest_study_ever git remote set ... ...
  29. Thank you for your attention!