of families required to have a variant in the same gene in order for it to be reported. For example, we may only be interested in candidates where at least 2 families have a variant in that gene.
to eliminate less confident genotypes, it is possible to enforce a maximum PL value for each sample. On this scale, lower values indicate more confidence that the called genotype is correct. 10 is a reasonable value: What is the “PL”? https://samtools.github.io/hts-specs/VCFv4.2.pdf What is a “Phred scaled” genotype likelihood?
genotype likelihood? Example calculation based on the GATK HaplotypeCaller http://gatkforums.broadinstitute.org/discussion/5913/math-notes-how-pl-is-calculated-in-haplotypecaller
region(s) (Example commands) 1. Tabix a BED file with the observed homozygosity regions gemini annotate -f homoz_region.bed.gz \ –c homoz_region \ -t boolean \ AR.db 2. Use the annotate tool to flag variants that overlap these regions. gemini autosomal_recessive AR.db --columns "chrom, start, end, ref, alt, filter, qual, gene, impact, aaf_esp_ea, aaf_1kg_eur” -–filter "filter is NULL and aaf_esp_ea < 0.1 and (impact_severity = 'HIGH' or impact_severity = 'MED') and homoz_region ==1” 3. Filter variants for those that overlap these regions.
exome data. Density of markers. 2. Shorter runs of homozygosity happen often by chance. 3. Density of homozygotes is important. http://www.nature.com/nature/journal/v449/n7164/extref/nature06258-s1.pdf
names=["in_1kg_flag", "aaf_1kg_amr_float", "aaf_1kg_eas_float", ...] ops=["flag", "max", "max", ...] ! Specify an annotation, file, which (VCF INFO) fields to pull, and how to report them. ! We’ll include a vetted file like this for gemini for human, but users can modify it and/or create their own for other organisms. ! Possible to create custom database with only columns of interest. https://github.com/brentp/vcfanno/blob/master/example/gem.conf