Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Targeted Sequencing Pipeline

Targeted Sequencing Pipeline

Notes about the current status of the pipeline and our analyses

Radhouane Aniba

August 08, 2014
Tweet

More Decks by Radhouane Aniba

Other Decks in Programming

Transcript

  1. Targeted Sequencing Pipeline pipeline features pipeline requirements discovery mode how

    does it work improvements common problems Results and Validation
  2. Targeted Sequencing Pipeline M A N I F E S

    T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No TSP
  3. Targeted Sequencing Pipeline M A N I F E S

    T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP
  4. Targeted Sequencing Pipeline WE HAVE A LOT OF DATAAAA CLASSIC

    VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Job already sent … Fiction
  5. Targeted Sequencing Pipeline WE HAVE A LOT OF DATAAAA CLASSIC

    VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP I am gonna trim! these FASTQ agaiiiiin !! Reality
  6. Targeted Sequencing Pipeline M A N I F E S

    T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Bam Count Binomial Dependencies Coverage Allele frequencies log log log !! qsub drmaa rocks down rocks down
  7. Targeted Sequencing Pipeline M A N I F E S

    T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Bam Count Binomial Dependencies Coverage Allele frequencies log log log !! qsub drmaa rocks down rocks down rsync Excel Feedback quality
  8. Targeted Sequencing Pipeline M A N I F E S

    T! OR NOTHING ! WE HAVE A LOT OF DATAAAA CLASSIC VERSION DISCOVERY MODE Positions known ? Yes No fastq TSP Bam Count Binomial Dependencies Coverage Allele frequencies log log log !! qsub drmaa rocks down rocks down rsync Excel Feedback quality
  9. Targeted Sequencing Pipeline Features * Alignment using bwa-mem * alignment

    problems * improvements * Xenograft project showed some limitations * Binomial exact test * Background size * Code working on decent coverage * No tests yet * Variant tagging needs some more work
  10. Targeted Sequencing Pipeline git clone http://[email protected]/scm/pp/miseq-pipeline.git cd miseq-pipeline/pipeline/code ! python

    pipeline.py \ —num_cpus 10 \ —mode cluster \ —install_dir <path_to_miseq>/software \ <path_to_config_file> Limitations : ! - Manifest file - positions format - limited to point mutations ! ! If you want to run this version : ! - create_config_file_for_miseqpipeline.py One single code that generates all needed formats to run the pipeline
  11. Targeted Sequencing Pipeline git clone http://[email protected]/scm/pp/miseq-pipeline.git cd miseq-pipeline/pipeline/code ! python

    pipeline.py \ —num_cpus 10 \ —mode cluster \ —install_dir <path_to_miseq>/software \ <path_to_config_file>
  12. Input processing Alignment recalibration Variant calling and filtering Variant characterization

    * Fastq QC (not yet implemented) * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?)
  13. Input processing Alignment recalibration Variant calling and filtering Variant characterization

    * Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?)
  14. Input processing Alignment recalibration Variant calling and filtering Variant characterization

    * Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?) * call variants (any caller), I tested UnifiedGenotyper and HaplotypeCaller * Merge all vcfs for all the samples * Intersect with amplicons positions and filter out call out of the targets * Get unique calls across all samples
  15. Input processing Alignment recalibration Variant calling and filtering Variant characterization

    * Recalibrate Base Quality from sequencers * create targets for local realignments (indels) * local realignment (correction of small fraction of the alignment) * can be time consuming depending on the depth Broad best practices information : http://goo.gl/8sRWCF * Fastq QC * Alignment (BWA-MEM, bwa) * (Bowtie2 to test ?) * call variants (any caller), I tested UnifiedGenotyper and HaplotypeCaller * Merge all vcfs for all the samples * Intersect with amplicons positions and filter out call out of the targets * Get unique calls across all samples * Build counts file per sample * Create foreground and background * binomial test and tag the variant * report generation
  16. Targeted Sequencing Pipeline git clone <stash>! python pipeline.py \! !

    ! --num_cpus 40 \! ! ! --mode cluster \! ! ! --install_dir <path_to_tsp>/software/ \! ! ! <path_to_config.yaml> \! ! ! --dedup no \! ! ! --targets <path_to_amplicon_targets>! ! ! --email your@email Installation instructions on the README.md of the repo
  17. Targeted Sequencing Pipeline git clone python pipeline.py \! ! !

    ! ! ! ! --install_dir <path_to_tsp> ! ! <path_to_config.yaml> ! ! ! ! --targets ! ! -- Installation instructions on the
  18. Targeted Sequencing Pipeline - Bugs found and fixed in the

    original codebase - Classic version on Stash (branch rad) - Bowtie over BWA (Single Cell) - Discovery mode (++features) - mutation-seq deep seq model - Discovery mode on Stash ! ! - Classic version Factory based version : Done and being tested - Add components for the discovery mode