Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Other GEMINI tools

Aaron Quinlan
August 19, 2015

Other GEMINI tools

Aaron Quinlan

August 19, 2015
Tweet

More Decks by Aaron Quinlan

Other Decks in Science

Transcript

  1. Analysis options 15 min: finish up recessive and ROH 5

    min: annotate tool 5 min: variant impacts 10 min: Mendel violations 10: faster, more flexible loading (VCFANNO) 10: gene.iobio
  2. The autosomal_recessive tool. The -­‐-­‐min-­‐kindreds  option: This specifies the number

    of families required to have a variant in the same gene in order for it to be reported. For example, we may only be interested in candidates where at least 2 families have a variant in that gene.
  3. The autosomal_recessive tool: other options The —gt-­‐pl-­‐max  option: In order

    to eliminate less confident genotypes, it is possible to enforce a maximum PL value for each sample. On this scale, lower values indicate more confidence that the called genotype is correct. 10 is a reasonable value: What is the “PL”? https://samtools.github.io/hts-specs/VCFv4.2.pdf What is a “Phred scaled” genotype likelihood?
  4. The autosomal_recessive tool: other options What is a “Phred scaled”

    genotype likelihood? Example calculation based on the GATK HaplotypeCaller http://gatkforums.broadinstitute.org/discussion/5913/math-notes-how-pl-is-calculated-in-haplotypecaller
  5. bgzip homoz_region.bed tabix -p bed homoz_region.bed.gz Intersect with observed homozygosity

    region(s) (Example commands) 1. Tabix a BED file with the observed homozygosity regions gemini annotate -f homoz_region.bed.gz \ –c homoz_region \ -t boolean \ AR.db 2. Use the annotate tool to flag variants that overlap these regions. gemini autosomal_recessive AR.db --columns "chrom, start, end, ref, alt, filter, qual, gene, impact, aaf_esp_ea, aaf_1kg_eur” -–filter "filter is NULL and aaf_esp_ea < 0.1 and (impact_severity = 'HIGH' or impact_severity = 'MED') and homoz_region ==1” 3. Filter variants for those that overlap these regions.
  6. Intersect with observed homozygosity region(s) gemini roh AR.db Run the

    roh tool to search for candidate runs of homozygosity.
  7. Intersect with observed homozygosity region(s) gemini roh AR.db | sort

    -k7nr Run the roh tool to search for candidate runs of homozygosity. chrom start end sample num_of_snps density_per_kb run_length_in_bp! chr2 92134506 95353861 S101 151 0.0475 3219355! chr2 92188685 95359306 S101 75 0.0243 3170621! chr2 92189814 95353861 S106 68 0.0221 3164047! chr2 92189814 95329940 S138 50 0.0166 3140126! chr2 92262183 95365848 S106 57 0.019 3103665! chr2 92264378 95357391 S138 39 0.0133 3093013! chr2 233336080 234631638 S138 2583 1.9953 1295558! chr2 184922511 185985238 S101 1983 1.8678 1062727! chr2 119980657 121036333 S106 1921 1.8216 1055676! chr2 184994526 186030052 S101 1956 1.8908 1035526! chr2 136161952 137038729 S106 1529 1.7462 876777! chr2 119952406 120805007 S106 1601 1.8801 852601! chr2 135168500 136012610 S106 1509 1.7901 844110! chr2 136266716 137041366 S106 1372 1.7737 774650! chr2 134764013 135515261 S101 1559 2.0779 751248! chr2 134821134 135536838 S101 1477 2.0665 715704! chr2 184390618 185105579 S138 1453 2.0351 714961! chr2 187780557 188481382 S101 1279 1.8278 700825! chr2 135162250 135837013 S106 1268 1.8821 674763
  8. Caveats when screening for runs of homozygosity 1. Difficult with

    exome data. Density of markers. 2. Shorter runs of homozygosity happen often by chance. 3. Density of homozygotes is important. http://www.nature.com/nature/journal/v449/n7164/extref/nature06258-s1.pdf
  9. mendelian_error $  gemini  mendel_errors  -­‐-­‐columns  "chrom,start,end"  test.mendel.db  -­‐-­‐gt-­‐pl-­‐max  1  

    chrom              start      end            family_members              family_genotypes              violation                              violation_prob   chr1                10670      10671        dad,mom,child                G/G,G/G,G/C                        plausible  de  novo              0.962   chr1                28493      28494        dad,mom,child                T/C,T/T,C/C                        loss  of  heterozygosity    0.660   chr1                28627      28628        dad,mom,child                C/C,C/C,C/T                        plausible  de  novo              0.989   chr1                267558    267560      dad,mom,child                C/C,C/C,CT/C                      plausible  de  novo              0.896   chr1                537969    537970      dad,mom,child                C/C,C/C,C/T                        plausible  de  novo              0.928   chr1                547518    547519      dad,mom,child                G/G,G/G,G/T                        plausible  de  novo              1.000   chr1                589081    589086      dad,mom,child                G/G,GAGAA/GAGAA,G/G        uniparental  disomy            0.940   chr1                749688    749689      dad,mom,child                T/T,T/T,G/G                        implausible  de  novo          0.959   chr1                788944    788945      dad,mom,child                C/C,G/G,G/G                        uniparental  disomy            0.914   chr1                1004248  1004249    dad,mom,child                G/G,G/G,G/C                        plausible  de  novo              1.000
  10. GEMINI current • recommended, vetted annotations • fast loading •

    any organism supported by VEP / SnpEff • custom annotations treated same as vetted
  11. GEMINI future • most annotations are fixed • can add

    some custom annotations after load • loading is slow • stuck to single genome version
  12. Configuration [[annotation]]   file="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a. 20130502.sites.tidy.vcf.gz"   fields=["AF",  "AMR_AF",  "EUR_AF",  ...]

      names=["in_1kg_flag",  "aaf_1kg_amr_float",   "aaf_1kg_eas_float",  ...]   ops=["flag",  "max",  "max",  ...]   ! Specify an annotation, file, which (VCF INFO) fields to pull, and how to report them. ! We’ll include a vetted file like this for gemini for human, but users can modify it and/or create their own for other organisms. ! Possible to create custom database with only columns of interest. https://github.com/brentp/vcfanno/blob/master/example/gem.conf
  13. vcfanno loading performance • 10 million ExAC variants annotated with

    34 annotations from 11 distinct files in ~30 minutes • current gemini takes at least 40X longer to load the same number of variants.
  14. Modularize functionality • pedigree (determine modes of inheritance) • inheritance

    models (rules for autosomal rec/ dom, de novo) • effect parsing/prioritizing • normalize between SnpEff / VEP • prioritize by impact (missense over synonymous) separating functionality improves code reuse, eases testing, and simplifies code maintenance