Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seminar at Sanger Institute 10-5-2017

Steve Munger
October 05, 2017

Seminar at Sanger Institute 10-5-2017

Steve Munger

October 05, 2017
Tweet

More Decks by Steve Munger

Other Decks in Science

Transcript

  1. Harnessing genetic diversity to discover protein regulatory networks Steven Munger

    The Jackson Laboratory, Bar Harbor, ME USA @stevemunger The problem – and the power – of genetic diversity in genomics studies
  2. Until recently, our understanding of gene regulation stopped at the

    transcript. DNA RNA Protein transcription translation replication
  3. DNA pre-mRNA Protein mRNA mRNA mRNA miRNA polyuridylation ubiquitination siRNA

    RNA interference protein splicing, phosphorylation, acetylation, N-linked glycosylation, amidation sulfation… and more! hydroxylation methylation O-linked glycosylation epigenetic modification A-to-I editing replication stoichiometric buffering Goal: Expanding our understanding of gene regulation to the proteome. How does genetic variation affect transcript and protein abundance?
  4. Diversity Outbred (DO) mice: A reservoir of natural genetic perturbations.

    -  40M+ SNPS -  2M+ indels -  Balanced popula7on structure -  Each individual unique -  400+ recombina7ons in each animal - High heterozygosity
  5. 50 40 30 20 10 Body weight (gm) 7/11/2014 7/31/2014

    8/20/2014 date 50 40 30 20 10 Body weight (gm) 7/11/2014 7/31/2014 8/20/2014 date female DO mice male DO mice DO mice are genetically and phenotypically diverse Alan Attie & Mark Keller Female DO mice Male DO mice
  6. 192 DO Livers Transcripts Short Reads RNA-Seq eQTL pQTL eQTL

    Mapping pQTL Mapping Proteins Peptides MS/MS Compare ? Munger et al. 2014 Chick*, Munger* et al. Nature, 2016 How does genetic variation influence transcript and protein abundance?
  7. The perfect read: 1 read = 1 unique alignment. ACATGCTGCGGA

    ACATGCTGCGGA 100bp Read ✓ Chr 1 Chr 2 Chr 3
  8. Some reads will align equally well to multiple locations. “Multireads”

    ACATGCTGCGGA ACATGCTGCGGA ACATGCTGCGGA ACATGCTGCGGA 100bp Read ✓ ✗ ✗ 1 read 3 valid alignments Only 1 alignment is correct Read “Mappability” – www.gene7cs.org/content/198/1/59
  9. How does genetic variation affect alignment of RNA-seq reads? Start

    with a simple comparison of two inbred strains. CAST/EiJ C57BL/6J ≈ ≠
  10. Based on known gene annotations, we expect that >50% of

    100bp CAST reads will have at least one SNP that differs from the reference. Sanger Mouse Genomes Project – Thank you thank you thank you Thomas and colleagues
  11. 100bp SE Reads from CAST liver Compare alignment results to

    ground truth Align to CAST Pseudotranscriptome 5’-ATCGGCGTCTTACATTAGCTCAAGGGTGCC-3’ 5’-ATCGGCGTCTTGCTCAAGGGTGCC-3’ Align to B6 Transcriptome 5’-ATCGGCGTCTTACATTAGCTCAAGGGTGCC-3’ To what degree do these differences affect alignment of RNA-Seq reads and gene abundance es7mates? Simulated reads Real data
  12. Simulated CAST reads map more accurately and uniquely to the

    CAST transcriptome. 458,297 out of ~10M reads improve by alignment to CAST 10,533 reads improve by alignment to the reference.
  13. For these genes in the simulated data, 2,242 – CAST

    alignment gave beier es7mate 439 – REF alignment gave beier es7mate 71 – CAST es7mate = REF es7mate 232 – No results in simula7on One real CAST sample 2,984 genes differ by > 10% by alignment alone.
  14. Every DO sample will have a unique gene set that

    is sensitive to alignment errors from reference alignment…
  15. Munger et al. 2014 Gak et al. 2014 Solution: Construct

    individualized diploid transcriptomes for RNA-seq alignment with Seqnature.
  16. Analysis Pipeline ~ 30 million SE 100bp reads Yfg 1.

    Align reads to transcriptome. Yfg Yfg Yfg Mouse 1 Mouse 2 Mouse 3 x 272 mice RSEM (Li and Dewey 2010) 2. Es7mate gene and isoform expression. 3. Map expression QTL
  17. Alignment to individualized transcriptomes results in fewer spurious liver eQTL.

    Rps12-ps2 Aligned to NCBIm37 Aligned to DO IRGs Lesson 1: One false read alignment can cause two false positive genetic associations.
  18. Hebp1 Aligned to NCBIM37 Aligned to DO IRGs Alignment to

    individualized transcriptomes unmasks significant local eQTLs for 2,000+ genes. Lesson 2: Alignment of all samples to a single reference genome obscures a huge amount of real regulatory variation.
  19. The founder origin of each allele provides direct estimates of

    allele specific expression. Only alleles derived from 129S1 express Gm12976 in the DO popula7on.
  20. Lesson 3: Allele specific expression is the rule rather than

    the exception in genetically diverse individuals.
  21. Lesson 4: Most expressed genes have eQTL (>75%). The DO

    is a reservoir of genetic perturbations. Gene Location eQTL Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
  22. 192 DO Livers Transcripts Short Reads RNA-Seq eQTL pQTL eQTL

    Mapping pQTL Mapping Proteins Peptides MS/MS Compare ? Munger et al. 2014 Chick*, Munger* et al. Nature 2016 How does genetic variation affect protein abundance?
  23. An unprecedented view of protein regulation. 2,866 pQTL detected for

    2,552 proteins. Total eQTL pQTL 2306 1152 1400 N=6707 p < 2.2e-16 FDR < 0.1
  24. 80% of proteins with local pQTL have concordant local eQTL

    Local eQTL pQTL 1819 344 1392 QTL RNA Protein cis cis
  25. 20% of proteins with local pQTL lack concordant local eQTL

    Local eQTL pQTL 1819 344 1392 QTL RNA Protein cis
  26. 25% of expressed proteins appear buffered from local transcriptional variation

    Local eQTL pQTL 1819 344 1392 QTL RNA Protein cis
  27. Only 9 out of 1130 distant pQTL have concordant distant

    eQTL. Distant eQTL pQTL 915 1039 9 cis RNA Protein QTL trans FDR < 0.1
  28. What post-transcriptional mechanism is acting in trans to control these

    proteins? Distant eQTL pQTL 915 1039 9 RNA Protein QTL trans
  29. Searching for protein and transcript mediators of distant pQTL –>

    Mediation Analysis RNA Protein QTL trans cis RNA Protein Target Causal Intermediates RNA Protein trans QTL cis Target Target Protein ~ pQTLdistant Target Protein ~ pQTLdistant + MediatorProtein x 8000 proteins Target Protein ~ pQTLdistant + MediatorRNA x 21000 Transcripts X
  30. Tmem68 TMEM68 trans 13 cis Target 3 cis cis Nnt

    NNT Mediation analysis reveals causal intermediates.
  31. 43,102 SNPs in region 3 Candidate SNPs 1 short deletion

    1 long deletion of Exons 7-11 Nnt eQTL B6 alleles do not express Nnt Low abundance of NNT in C57BL/6J drives low abundance of TMEM68.
  32. Protein complex members are tightly coregulated, with one member adopting

    the “regulatory” role. Chaperonin containing TCP1 complex
  33. CCT2 CCT3 CCT5 CCT6A CCT4 CCT7 CCT8 TCP1 TCP1 TCP1

    TCP1 TCP1 TCP1 TCP1 TCP1 CCT2 CCT2 CCT2 CCT2 CCT2 CCT2 CCT2 CCT2 CCT2 CCT3 CCT3 CCT3 CCT3 CCT3 CCT3 CCT4 CCT4 CCT4 CCT4 CCT4 CCT4 CCT4 CCT4 CCT4 CCT4 CCT5 CCT5 CCT5 CCT5 CCT5 CCT5 CCT6A CCT6A CCT7 CCT7 CCT7 CCT7 CCT7 CCT7 CCT7 CCT7 CCT8 CCT8 CCT8 CCT8 CCT2 CCT3 CCT5 CCT6A CCT4 CCT7 CCT8 TCP1 Stable CCT2 CCT3 CCT5 CCT4 CCT7 CCT8 TCP1 Stoichiometric buffering of protein abundance
  34. Wash Kiaa1033 Trans Cis Kiaa0196 Fam21 Zw10 Vcp Ccdc43 llph

    Spg20 Fam45a Ccdc22 Gaa Atg16l1 Rufy1 Wash Complex Ccc Complex Ccdc93 Commd10 Commd9 9030624J02Rik Commd5 Commd7 Commd3 Commd2 Commd4 Dscr3 Cis Trans H2-Q10 Cis Trans Commd1 Pum1 1110004F10Rik Exocyst Complex Arp2/3 Complex Exoc6 Exoc2 Exoc7 Exoc8 Exoc5 Exoc4 Exoc1 Ttc39b Arpc3 Gckr Arpc5 Actr3 Arpc4 Arpc2 Actr2 Rala Coro1b Cis Cis Exoc3 Cis Cis eQTL pQTL Co-regulated Mediation reveals higher order protein networks Endosome
  35. In Progress: Using genetic diversity to identify kinases for specific

    phosphorylation sites. Liver Phospho-Proteome Kinase <–> phospho site iden7fica7on by media7on
  36. Collaborative Cross strains can be used to validate predictions from

    the DO and build new models. CC001– 98% Homozygous
  37. Looking ahead: Pathway-centered predictive genomics Example: Drug metabolism pathways are

    enriched for genes with significant liver pQTL. Tamoxifen
  38. Predict and test CC strain crosses that will produce progeny

    with compromised drug metabolism. CC Strain Cyp3a13 Cyp3a16 Cyp2d10 Cyp2d22 Fmo1 Fmo5 PredicAon CC001 ++ + - +++ - + Highest CC002 - + + - - + Medium CC003 - - - + + + Medium CC004 - + --- + -- - Lowest CC005 + -- ++ - + - Medium CC006 + - - - - + Low CC007 + - + - + + High Pathway-centered prediction Toy Example Test!
  39. Conclusions •  Most genetic variation that affects transcript abundance does

    not affect protein abundance. –  For local genetic variation that does affect protein abundance, 80% act proximally on transcription (standard model). •  99+% of distant pQTL act on the target protein’s abundance independent of the target’s transcript abundance. •  Mediation analysis identifies 700 RNA/protein causal intermediates of distant pQTL and infers >5000 protein interactions. •  Stoichiometric buffering is a common post-translational mechanism governing protein abundance of binding partners and complex members. •  We can apply our new understanding of the genome- proteome map in DO mice to tune output of liver pathways.