following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist Aaron Quinlan University of Utah ! ! ! ! ! quinlanlab.org 1 https://gist.github.com/arq5x/9e1928638397ba45da2e#file-autosomal-dominant-sh
VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. $ curl https://s3.amazonaws.com/gemini-‐tutorials/trio.trim.vep.vcf.gz > trio.trim.vep.vcf.gz $ curl https://s3.amazonaws.com/gemini-‐tutorials/dominant.ped > dominant.ped $ gemini load -‐-‐cores 4 \ -‐v trio.trim.vep.vcf.gz \ -‐t VEP \ -‐-‐skip-‐gene-‐tables \ -‐p dominant.ped \ ! trio.trim.vep.dominant.db Note: copy and paste the full command from the Github Gist to avoid errors http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants 4
gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l Use ESP and ExAC to focus on rare variants Note: copy and paste the full command from the Github Gist 40 candidates 9
gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l Let’s be more strict with the functional consequence Note: copy and paste the full command from the Github Gist chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr17 28778816 28778817 T C CPD stop_loss 1.09 11860 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 chr2 21236250 21236251 G A APOB stop_gain 7.82 492 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr22 21363743 21363744 T C TUBA3FP splice_acceptor 2.46 15272 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 10
phenotype? chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr17 28778816 28778817 T C CPD stop_loss 1.09 11860 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 chr2 21236250 21236251 G A APOB stop_gain 7.82 492 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr22 21363743 21363744 T C TUBA3FP splice_acceptor 2.46 15272 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 How could you extend the query from the previous slide to answer this? Hint: use the gemini documentation 11
command from the Github Gist $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all)” \ trio.trim.vep.dominant.db \ | head \ | column -t Affected individuals must be HET Unffected individuals must be HOM_REF chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 chr2 229933 229934 A G SH3YL1 intron A/G A/A A/G chr2 277249 277250 G A ACP1 downstream G/A G/G G/A chr2 675830 675831 G T TMEM18 intron G/T G/G G/T chr2 905441 905442 A T AC113607.2 upstream A/T A/A A/T chr2 1636903 1636904 C T PXDN UTR_3_prime C/T C/C C/T chr2 1642789 1642790 T C PXDN intron T/C T/T T/C chr2 3653841 3653842 C A COLEC11 intron C/A C/C C/A chr2 3660868 3660869 G A COLEC11 intron G/A G/G G/A chr2 3661001 3661002 G A COLEC11 intron G/A G/G G/A 16
Note: copy and paste the full command from the Github Gist $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants \ WHERE filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all) \ and (gt_depths).(*).(>=20).(all)" \ trio.trim.vep.dominant.db \ | column -t chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr2 21236250 21236251 G A APOB stop_gain G/A G/G G/A 46 72 112 chr17 28778816 28778817 T C CPD stop_loss T/C T/T T/C 144 171 207 chr22 21363743 21363744 T C TUBA3FP splice_acceptor T/C T/T T/C 36 56 241 Note that (pleasingly) these are the same three candidates as detected with the autosomal_dominant tool. 18