Targeted Sequencing Pipeline

Targeted Sequencing Pipeline pipeline features pipeline requirements discovery mode how
does it work improvements common problems Results and Validation

Targeted Sequencing Pipeline M A N I F E S
T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No TSP

T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP

Targeted Sequencing Pipeline WE HAVE A LOT OF DATAAAA CLASSIC
VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Job already sent … Fiction

Targeted Sequencing Pipeline WE HAVE A LOT OF DATAAAA CLASSIC
VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP I am gonna trim! these FASTQ agaiiiiin !! Reality

T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Bam Count Binomial Dependencies Coverage Allele frequencies log log log !! qsub drmaa rocks down rocks down

T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Bam Count Binomial Dependencies Coverage Allele frequencies log log log !! qsub drmaa rocks down rocks down rsync Excel Feedback quality

sequencing Targeted Sequencing Pipeline Output structure

sequencing Targeted Sequencing Pipeline Output structure /share/lustre/archive/MiSeq/MiSeq_Analysis_Files/<run_id>/

Targeted Sequencing Pipeline Features * Alignment using bwa-mem * alignment
problems * improvements * Xenograft project showed some limitations * Binomial exact test * Background size * Code working on decent coverage * No tests yet * Variant tagging needs some more work

Targeted Sequencing Pipeline git clone http://[email protected]/scm/pp/miseq-pipeline.git cd miseq-pipeline/pipeline/code ! python
pipeline.py \ —num_cpus 10 \ —mode cluster \ —install_dir <path_to_miseq>/software \ <path_to_config_file> Limitations : ! - Manifest file - positions format - limited to point mutations ! ! If you want to run this version : ! - create_config_file_for_miseqpipeline.py One single code that generates all needed formats to run the pipeline

Targeted Sequencing Pipeline git clone http://[email protected]/scm/pp/miseq-pipeline.git cd miseq-pipeline/pipeline/code ! python
pipeline.py \ —num_cpus 10 \ —mode cluster \ —install_dir <path_to_miseq>/software \ <path_to_conﬁg_ﬁle>

Targeted Sequencing Pipeline discovery mode

Input processing Alignment recalibration Variant calling and ﬁltering Variant characterization

* Fastq QC (not yet implemented) * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?)

6% > 200 bp garbage-in garbage-out

* Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?)

* Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?) * call variants (any caller), I tested UniﬁedGenotyper and HaplotypeCaller * Merge all vcfs for all the samples * Intersect with amplicons positions and ﬁlter out call out of the targets * Get unique calls across all samples

* Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?) * call variants (any caller), I tested UnifiedGenotyper and HaplotypeCaller * Merge all vcfs for all the samples * Intersect with amplicons positions and filter out call out of the targets * Get unique calls across all samples * Build counts file per sample * Create foreground and background * binomial test and tag the variant * report generation

Targeted Sequencing Pipeline git clone <stash>! python pipeline.py \! !
! --num_cpus 40 \! ! ! --mode cluster \! ! ! --install_dir <path_to_tsp>/software/ \! ! ! <path_to_config.yaml> \! ! ! --dedup no \! ! ! --targets <path_to_amplicon_targets>! ! ! --email your@email Installation instructions on the README.md of the repo

Targeted Sequencing Pipeline git clone python pipeline.py \! ! !
! ! ! ! --install_dir <path_to_tsp> ! ! <path_to_config.yaml> ! ! ! ! --targets ! ! -- Installation instructions on the

Targeted Sequencing Pipeline Validation Xenograft SA495

Targeted Sequencing Pipeline Validation Xenograft SA495 classic-bwa classic-bowtie2 discovery-mode

Targeted Sequencing Pipeline Aligners : test, compare and pick classic-bwa
NSG

Targeted Sequencing Pipeline Aligners : test, compare and pick NSG
86 % classic-bwa

Targeted Sequencing Pipeline Aligners : test, compare and pick classic-bowtie2
NSG

97 % classic-bowtie2

Targeted Sequencing Pipeline Aligners : test, compare and pick discovery-mode
NSG

97 % discovery-mode

Targeted Sequencing Pipeline Coverage

Targeted Sequencing Pipeline Allelic Frequencies

Targeted Sequencing Pipeline - Bugs found and ﬁxed in the
original codebase - Classic version on Stash (branch rad) - Bowtie over BWA (Single Cell) - Discovery mode (++features) - mutation-seq deep seq model - Discovery mode on Stash ! ! - Classic version Factory based version : Done and being tested - Add components for the discovery mode

Targeted Sequencing Pipeline

Targeted Sequencing Pipeline

Targeted Sequencing Pipeline

More Decks by Radhouane Aniba

Other Decks in Programming

Featured

Transcript