Identifying de novo mutations with GEMINI

Identifying de novo mutations with GEMINI

91f1e43339bdc1bd3690295bfaeeb17e?s=128

Aaron Quinlan

August 18, 2015
Tweet

Transcript

  1. Identifying de novo mutations with GEMINI Please refer to the

    following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist Aaron Quinlan University of Utah ! ! ! ! ! quinlanlab.org 1 https://gist.github.com/arq5x/9e1928638397ba45da2e#file-denovo-sh
  2. Automated tools for disease inheritance models 2

  3. Automated tools for disease inheritance models 2

  4. Automated tools for disease inheritance models 3

  5. Common options for disease model tools. 4

  6. Why search for de novo mutations? Brian O’Roak 5

  7. High impact variants Brian O’Roak 6

  8. De novo mutations 7

  9. How many de novo mutations should we expect? 8

  10. De novo mutations (rough expectations) 9

  11. In practice, it’s not so simple. Brian O’Roak 10

  12. 11

  13. Why are there so many artifacts? • Prior probabilities -

    the more interesting something is, the less likely it is to be real ! • If something can go wrong, it will. • Incorrect genotype assignment • Low coverage in one or more of the individuals in the family (especially the parents…why?) • Mismapping • Misalignment • Paralogy • Systematic artifacts • Somatic events 12
  14. Detective work with GEMINI 13

  15. The de_novo tool in GEMINI http://gemini.readthedocs.org/en/latest/content/tools.html#de-novo-identifying-potential-de-novo-mutations 14

  16. Create a GEMINI database from a VCF Notes: 1. The

    VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. $  curl  https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz  >  trio.trim.vep.vcf.gz   $  curl  https://s3.amazonaws.com/gemini-­‐tutorials/denovo.ped  >  denovo.ped   $  gemini  load  -­‐-­‐cores  4  \                              -­‐v  trio.trim.vep.vcf.gz  \                              -­‐t  VEP  \                              -­‐-­‐skip-­‐gene-­‐tables  -­‐-­‐skip-­‐cadd  -­‐-­‐skip-­‐gerp-­‐bp  \                              -­‐p  de_novo.ped  \   !                  trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist to avoid errors ~8 minutes http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants 15
  17. Normalization and decomposition are required preprocessing steps Variant decomposition http://genome.sph.umich.edu/wiki/Vt#Decompose

    Variant normalization http://genome.sph.umich.edu/wiki/File:Normalization_mnp.png http://gemini.readthedocs.org/en/latest/ content/preprocessing.html#preprocessing- and-loading-a-vcf-file-into-gemini Details can be found in the GEMINI documentation
 16
  18. Running the de_novo tool  $  gemini  de_novo  trio.trim.vep.denovo.db Note: copy

    and paste the full command from the Github Gist 17
  19. Information overload There are currently 115 columns in the variants

    table. Perhaps a bit of overkill for a typical analysis http://gemini.readthedocs.org/en/latest/content/database_schema.html#the-variants-table 18
  20. Limit the attributes returned w/ the -­‐-­‐columns option.  $  gemini

     de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist 19
  21. Limit the attributes returned w/ the -­‐-­‐columns option. http://gemini.readthedocs.org/en/latest/content/tools.html#common-args-common-arguments  $

     gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist 20
  22. Better, but there are still so many (likely false) candidates.

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 771  candidates! 21
  23. Causes of erroneous genotype predictions: lack of depth 22

  24. Let’s enforce a minimum sequence depth for each subject: -­‐d

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 676  candidates 23
  25. Causes of erroneous genotype predictions: low quality variants 24

  26. Require that the mutation passes GATK QC with -­‐-­‐filter  $

     gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL"  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 55  candidates 25
  27. Require that the mutation is likely to have functional consequence

    26
  28. Require that the mutation is likely to have functional consequence

     $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  and  impact_severity  !=  ‘LOW’”  \        trio.trim.vep.denovo.db  |  wc  -­‐l Note: copy and paste the full command from the Github Gist 13  candidates 27
  29. Require that the mutation is not likely to be a

    known polymorphism 28
  30. Require that the mutation is not likely to be a

    known polymorphism Note: copy and paste the full command from the Github Gist  $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  \                            and  is_coding  =  1  and  impact_severity  !=  ‘LOW’  \                            and  (aaf_1kg_eur  <=  0.005  or  aaf_1kg_eur  is  NULL)  \                            and  (aaf_esp_ea  <=  0.005  or  aaf_esp_ea  is  NULL)"  \          trio.trim.vep.denovo.db  |  wc  -­‐l 6  candidates! 29
  31. 6 candidates. Which is causal? Requires manual inspection… chrom  

     start          end              ref    alt    filter    qual          gene              impact                    variant_id    family_id    family_members    family_genotypes    samples    family_count   chr2      96525735    96525736    T        C        None        1929.31    ANKRD36C      non_syn_coding    2537                family1        1805,1847,4805    T/T,T/T,T/C              4805          1   chr2      96525749    96525750    T        A        None        1513.36    ANKRD36C      non_syn_coding    2538                family1        1805,1847,4805    T/T,T/T,T/A              4805          1   chr2      96525754    96525755    A        T        None        1699.28    ANKRD36C      non_syn_coding    2539                family1        1805,1847,4805    A/A,A/A,A/T              4805          1   chr15    41229630    41229631    T        G        None        2116.49    DLL4              non_syn_coding    7892                family1        1805,1847,4805    T/T,T/T,T/G              4805          1   chr17    55183812    55183813    A        G        None        2155.84    AKAP1            non_syn_coding    13311              family1        1805,1847,4805    A/A,A/A,A/G              4805          1   chr22    43027436    43027437    C        T        None        1320.03    CYB5R3          non_syn_coding    16718              family1        1805,1847,4805    C/C,C/C,C/T              4805          1   Phenotype: blue skin disease  $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  \                            and  is_coding  =  1  and  impact_severity  !=  ‘LOW’  \                            and  (aaf_1kg_eur  <=  0.005  or  aaf_1kg_eur  is  NULL)  \                            and  (aaf_esp_ea  <=  0.005  or  aaf_esp_ea  is  NULL)"  \          trio.trim.vep.denovo.db Which gene can we rule out at a glance? 30
  32. Load the following files into IGV (Load from URL) and

    inspect your candidates BAM alignment files: ! https://s3.amazonaws.com/gemini-­‐tutorials/1805.workshop.bam   https://s3.amazonaws.com/gemini-­‐tutorials/1847.workshop.bam   https://s3.amazonaws.com/gemini-­‐tutorials/4805.workshop.bam VCF variant file: ! https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz   ! 31