Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Identifying de novo mutations with GEMINI

Aaron Quinlan
August 18, 2015

Identifying de novo mutations with GEMINI

Aaron Quinlan

August 18, 2015
Tweet

More Decks by Aaron Quinlan

Other Decks in Science

Transcript

  1. Identifying de novo mutations with GEMINI Please refer to the

    following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist Aaron Quinlan University of Utah ! ! ! ! ! quinlanlab.org 1 https://gist.github.com/arq5x/9e1928638397ba45da2e#file-denovo-sh
  2. 11

  3. Why are there so many artifacts? • Prior probabilities -

    the more interesting something is, the less likely it is to be real ! • If something can go wrong, it will. • Incorrect genotype assignment • Low coverage in one or more of the individuals in the family (especially the parents…why?) • Mismapping • Misalignment • Paralogy • Systematic artifacts • Somatic events 12
  4. Create a GEMINI database from a VCF Notes: 1. The

    VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. $  curl  https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz  >  trio.trim.vep.vcf.gz   $  curl  https://s3.amazonaws.com/gemini-­‐tutorials/denovo.ped  >  denovo.ped   $  gemini  load  -­‐-­‐cores  4  \                              -­‐v  trio.trim.vep.vcf.gz  \                              -­‐t  VEP  \                              -­‐-­‐skip-­‐gene-­‐tables  -­‐-­‐skip-­‐cadd  -­‐-­‐skip-­‐gerp-­‐bp  \                              -­‐p  de_novo.ped  \   !                  trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist to avoid errors ~8 minutes http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants 15
  5. Normalization and decomposition are required preprocessing steps Variant decomposition http://genome.sph.umich.edu/wiki/Vt#Decompose

    Variant normalization http://genome.sph.umich.edu/wiki/File:Normalization_mnp.png http://gemini.readthedocs.org/en/latest/ content/preprocessing.html#preprocessing- and-loading-a-vcf-file-into-gemini Details can be found in the GEMINI documentation
 16
  6. Information overload There are currently 115 columns in the variants

    table. Perhaps a bit of overkill for a typical analysis http://gemini.readthedocs.org/en/latest/content/database_schema.html#the-variants-table 18
  7. Limit the attributes returned w/ the -­‐-­‐columns option.  $  gemini

     de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist 19
  8. Limit the attributes returned w/ the -­‐-­‐columns option. http://gemini.readthedocs.org/en/latest/content/tools.html#common-args-common-arguments  $

     gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist 20
  9. Better, but there are still so many (likely false) candidates.

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 771  candidates! 21
  10. Let’s enforce a minimum sequence depth for each subject: -­‐d

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 676  candidates 23
  11. Require that the mutation passes GATK QC with -­‐-­‐filter  $

     gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL"  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 55  candidates 25
  12. Require that the mutation is likely to have functional consequence

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  and  impact_severity  !=  ‘LOW’”  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 13  candidates 27
  13. Require that the mutation is not likely to be a

    known polymorphism Note: copy and paste the full command from the Github Gist  $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  \                            and  is_coding  =  1  and  impact_severity  !=  ‘LOW’  \                            and  (aaf_1kg_eur  <=  0.005  or  aaf_1kg_eur  is  NULL)  \                            and  (aaf_esp_ea  <=  0.005  or  aaf_esp_ea  is  NULL)"  \          trio.trim.vep.denovo.db  |  wc  -­‐l 6  candidates! 29
  14. 6 candidates. Which is causal? Requires manual inspection… chrom  

     start          end              ref    alt    filter    qual          gene              impact                    variant_id    family_id    family_members    family_genotypes    samples    family_count   chr2      96525735    96525736    T        C        None        1929.31    ANKRD36C      non_syn_coding    2537                family1        1805,1847,4805    T/T,T/T,T/C              4805          1   chr2      96525749    96525750    T        A        None        1513.36    ANKRD36C      non_syn_coding    2538                family1        1805,1847,4805    T/T,T/T,T/A              4805          1   chr2      96525754    96525755    A        T        None        1699.28    ANKRD36C      non_syn_coding    2539                family1        1805,1847,4805    A/A,A/A,A/T              4805          1   chr15    41229630    41229631    T        G        None        2116.49    DLL4              non_syn_coding    7892                family1        1805,1847,4805    T/T,T/T,T/G              4805          1   chr17    55183812    55183813    A        G        None        2155.84    AKAP1            non_syn_coding    13311              family1        1805,1847,4805    A/A,A/A,A/G              4805          1   chr22    43027436    43027437    C        T        None        1320.03    CYB5R3          non_syn_coding    16718              family1        1805,1847,4805    C/C,C/C,C/T              4805          1   Phenotype: blue skin disease  $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  \                            and  is_coding  =  1  and  impact_severity  !=  ‘LOW’  \                            and  (aaf_1kg_eur  <=  0.005  or  aaf_1kg_eur  is  NULL)  \                            and  (aaf_esp_ea  <=  0.005  or  aaf_esp_ea  is  NULL)"  \          trio.trim.vep.denovo.db Which gene can we rule out at a glance? 30
  15. Load the following files into IGV (Load from URL) and

    inspect your candidates BAM alignment files: ! https://s3.amazonaws.com/gemini-­‐tutorials/1805.workshop.bam   https://s3.amazonaws.com/gemini-­‐tutorials/1847.workshop.bam   https://s3.amazonaws.com/gemini-­‐tutorials/4805.workshop.bam VCF variant file: ! https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz   ! 31