$30 off During Our Annual Pro Sale. View Details »

Using natural genetic diversity to discover protein regulatory networks

Using natural genetic diversity to discover protein regulatory networks

Slides from my seminar at the Vanderbilt Genetics Institute on 6-21-2017

Steve Munger

June 21, 2017
Tweet

More Decks by Steve Munger

Other Decks in Research

Transcript

  1. Harnessing genetic diversity to
    discover protein regulatory networks
    Steve Munger
    The Jackson Laboratory

    View Slide

  2. How does genetic variation affect
    transcript and protein abundance?
    DNA RNA Protein
    DNA RNA Protein
    Transcrip)on Transla)on
    DNA RNA Protein
    Francis Crick 1956

    View Slide

  3. Nature July 2013
    “We estimate that approximately one-half
    of pQTLs are probably also eQTLs. However,
    many pQTLs do not correspond to eQTLs,
    even at a relaxed stringency.”
    Accumulating evidence of a disconnect between
    transcript and protein expression.
    September 2014
    February 2015
    “QTLs affecting mRNA levels are, on
    average, attenuated or buffered at
    the protein level…

    View Slide

  4. “Next Generation” genetic models: The
    founder strains of the mouse Diversity
    Outbred stock
    CAST
    129S1
    WSB NZO
    A/J
    B6
    PWK
    NOD

    View Slide

  5. Diversity Outbred (DO)
    Heterogeneous Stock

    View Slide

  6. Diversity Outbred (DO) mice:
    A reservoir of natural genetic perturbations.
    -  45M+ SNPS
    -  2M+ indels
    -  Balanced popula)on
    structure
    -  Each individual unique
    -  400+ recombina)ons
    in each animal
    - High heterozygosity

    View Slide

  7. 50
    40
    30
    20
    10
    Body weight (gm)
    7/11/2014 7/31/2014 8/20/2014
    date
    50
    40
    30
    20
    10
    Body weight (gm)
    7/11/2014 7/31/2014 8/20/2014
    date
    female DO mice male DO mice
    DO mice are genetically and phenotypically diverse
    Alan Attie &
    Mark Keller
    Female DO mice Male DO mice

    View Slide

  8. Diversity Outbred mice exhibit phenotypes far exceeding the
    range observed in the founder strains.

    View Slide

  9. 192 DO Livers
    Transcripts
    Short Reads
    RNA-Seq
    eQTL pQTL
    eQTL Mapping pQTL Mapping
    Proteins
    Peptides
    MS/MS
    Compare
    ?
    Munger et al. 2014 Chick*, Munger* et al. Nature, 2016
    How does genetic variation influence transcript
    and protein abundance?

    View Slide

  10. Challenge: Every mouse is a unique diploid
    combination of 10M+ SNPs and 500K+ indels.

    View Slide

  11. Munger et al. 2014
    GaO et al. 2014
    Construc)ng individualized diploid transcriptomes for
    RNA-seq alignment with Seqnature.

    View Slide

  12. Seqnature
    Munger et al. 2014
    GaO et al. 2014
    Al Simons
    Narayanan Raghupathy
    Kwangbom Choi
    Dan GaO

    View Slide

  13. Every DO sample will have a unique gene set that is sensi)ve
    to alignment errors from reference alignment…

    View Slide

  14. Analysis Pipeline
    ~ 30 million SE 100bp reads
    Yfg
    1. Align reads to transcriptome.
    Yfg
    Yfg
    Yfg
    Mouse 1
    Mouse 2
    Mouse 3
    x 272 mice
    RSEM (Li and Dewey 2010)
    2. Es)mate gene and isoform expression. 3. Map expression QTL

    View Slide

  15. Alignment to individualized transcriptomes results
    in fewer spurious liver eQTL.
    Rps12-ps2 Aligned to NCBIm37
    Aligned to DO IRGs

    View Slide

  16. Hebp1 Aligned to NCBIM37
    Aligned to DO IRGs
    Alignment to individualized transcriptomes
    reveals significant local eQTLs for 2,000+ genes.

    View Slide

  17. Munger et al. 2014
    Are these unmasked local eQTLs real? Yes.
    CC/DO Founder Strain samples

    View Slide

  18. The founder origin of each allele provides direct es)mates of
    allele specific expression.
    Only alleles derived from 129S1
    express Gm12976 in the
    DO popula)on.

    View Slide

  19. Allele specific expression is the rule rather than the excep)on in
    gene)cally diverse individuals.

    View Slide

  20. The DO is a reservoir of gene)c
    perturba)ons. ~75% of genes have eQTL.
    Gene Location
    eQTL Location
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    X

    View Slide

  21. 192 DO Livers
    Transcripts
    Short Reads
    RNA-Seq
    eQTL pQTL
    eQTL Mapping pQTL Mapping
    Proteins
    Peptides
    MS/MS
    Compare
    ?
    Munger et al. 2014 Chick*, Munger* et al. Nature 2016
    How does genetic variation affect protein abundance?

    View Slide

  22. An unprecedented view of protein regulation.
    2,866 pQTL detected for 2,552 proteins.
    Total
    eQTL
    pQTL
    2306
    1152 1400
    N=6707
    p < 2.2e-16
    FDR < 0.1

    View Slide

  23. 80% of proteins with local pQTL have
    concordant local eQTL
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis cis

    View Slide

  24. 20% of proteins with local pQTL
    lack concordant local eQTL
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis

    View Slide

  25. 25% of expressed proteins appear buffered
    from local transcriptional variation
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis

    View Slide

  26. 1,130 distant pQTL indicate extensive
    trans regulation of protein abundance.

    View Slide

  27. Only 9 out of 1130 distant pQTL have
    concordant distant eQTL.
    Distant
    eQTL
    pQTL
    915
    1039 9
    cis
    RNA Protein
    QTL
    trans
    FDR < 0.1

    View Slide

  28. What post-transcriptional mechanism is acting
    in trans to control these proteins?
    Distant
    eQTL
    pQTL
    915
    1039 9
    RNA Protein
    QTL
    trans

    View Slide

  29. Searching for protein and transcript mediators
    of distant pQTL –> Mediation Analysis
    RNA Protein
    QTL
    trans
    cis
    RNA Protein
    Target
    Causal Intermediates
    RNA Protein
    trans
    QTL
    cis
    Target
    Target Protein ~ pQTLdistant
    Target Protein ~ pQTLdistant
    + MediatorProtein
    x 8000 proteins
    Target Protein ~ pQTLdistant
    + MediatorRNA
    x 21000 Transcripts
    X

    View Slide

  30. Mediation analysis reveals causal intermediates.
    pQTLD
    Tmem68 TMEM68
    trans
    13
    Target
    3 cis

    View Slide

  31. Tmem68 TMEM68
    trans
    13
    cis
    Target
    3 cis
    cis
    Nnt NNT
    Mediation analysis reveals causal intermediates.

    View Slide

  32. 43,102 SNPs in region
    3 Candidate SNPs
    1 Short Deletion
    Nnt eQTL
    B6 alleles do not express Nnt
    Low abundance of NNT in C57BL/6J
    drives low abundance of TMEM68.

    View Slide

  33. We re-discovered a known deficiency in C57BL/6J and
    assigned Tmem68 to a pathway.
    Free Radical Biology and Medicine 2013

    View Slide

  34. TMEM68-NNT Next Step: Valida)on
    Transcript Abundance Protein Abundance
    Tmem68 TMEM68
    trans
    13
    cis
    3 cis
    cis
    Nnt NNT
    Alex Stanton – Tums predoc
    Tmem68 TMEM68
    3 cis
    Ques)on 1: Where does the disconnect between Tmem68 transcript and protein occur?
    At the level of transla)on, or by post-transla)onal mechanisms?
    ?

    View Slide

  35. Gene)c varia)on may affect transla)on or post-
    transla)onal mechanisms.
    Tmem68 TMEM68 TMEM68
    Transla.on
    Total RNA “Steady state”
    Protein
    Translated
    Protein
    Post-Transla.on
    Folding
    Stability
    Post-transla)onal mods
    Phosphoryla)on
    Acetyla)on
    Ubiqui)na)on
    Methyla)on
    Localiza)on/Exporta)on
    Transla)on Efficiency
    Transla)on pausing
    Alterna)ve ORF

    View Slide

  36. Brar and Weissman 2015
    Not all transcripts are translated equally

    View Slide

  37. Protein complex members are tightly coregulated,
    with one member adopting the “regulatory” role.
    Chaperonin containing TCP1 complex

    View Slide

  38. CCT2 Mediation Analysis

    View Slide

  39. Cct6a
    Low expression of Cct6a in NOD/ShiLtJ
    Drives low expression of CCT complex

    View Slide

  40. CCT2 CCT3
    CCT5
    CCT6A
    CCT4
    CCT7
    CCT8
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT3
    CCT3
    CCT3
    CCT3
    CCT3
    CCT3
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT5
    CCT5
    CCT5
    CCT5
    CCT5
    CCT5
    CCT6A
    CCT6A
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT8
    CCT8
    CCT8
    CCT8
    CCT2 CCT3
    CCT5
    CCT6A
    CCT4
    CCT7
    CCT8
    TCP1
    Stable
    CCT2 CCT3
    CCT5 CCT4
    CCT7
    CCT8
    TCP1
    Stoichiometric buffering of protein abundance

    View Slide

  41. Stoichiometric buffering of CCT complex: Next steps
    •  Observa)on: DO animals with NOD
    allele at Cct6a have lower protein
    abundance of all CCT members.
    •  NOD muta)on affects both Cct6a
    transcript and protein levels.
    •  NOD has a transversion subs)tu)on in
    conserved KLF4 binding domain in Cct6a
    promoter.
    •  Luciferase assays to quan)fy effect of
    this subs)tu)on on promoter strength.
    •  CRISPR to “fix” muta)on in NOD and
    introduce same muta)on into B6.
    •  Test stoichiometric buffering hypothesis
    by introducing muta)on into other CCT
    member -> Can we transfer the
    regulatory role in the complex by
    knocking down the expression of
    another protein?

    View Slide

  42. One NOD private SNP (G>T Transversion)
    200bp upstream of the TSS.

    View Slide

  43. Mediation identifies known and novel protein
    interactions

    View Slide

  44. Wash
    Kiaa1033 Trans
    Cis
    Kiaa0196
    Fam21
    Zw10
    Vcp
    Ccdc43
    llph
    Spg20
    Fam45a
    Ccdc22
    Gaa
    Atg16l1
    Rufy1
    Wash Complex
    Ccc Complex
    Ccdc93
    Commd10
    Commd9
    9030624J02Rik
    Commd5 Commd7
    Commd3 Commd2
    Commd4
    Dscr3
    Cis
    Trans
    H2-Q10 Cis
    Trans
    Commd1
    Pum1
    1110004F10Rik
    Exocyst Complex
    Arp2/3 Complex
    Exoc6
    Exoc2
    Exoc7
    Exoc8
    Exoc5
    Exoc4
    Exoc1
    Ttc39b
    Arpc3
    Gckr
    Arpc5
    Actr3
    Arpc4
    Arpc2
    Actr2
    Rala
    Coro1b
    Cis
    Cis
    Exoc3
    Cis
    Cis
    eQTL
    pQTL
    Co-regulated
    Mediation reveals higher order protein networks
    Endosome

    View Slide

  45. Natural gene)c perturba)ons +
    Media)on analysis = Predic)ve
    Protein Network

    View Slide

  46. In Progress: Using genetic diversity to identify
    kinases for specific phosphorylation sites.
    Liver Phospho-Proteome
    Kinase <–> phospho site iden)fica)on by media)on

    View Slide

  47. Collaborative Cross strains can be used to validate
    predictions from the DO and build new models.
    CC001– 98% Homozygous

    View Slide

  48. Prediction of protein abundance in
    Founder and Collaborative Cross Strains.

    View Slide

  49. Looking ahead: Pathway-centered predictive genomics
    Example: Drug metabolism pathways are enriched for genes
    with significant liver pQTL.
    Tamoxifen

    View Slide

  50. Predict and test CC strain crosses that will produce
    progeny with compromised drug metabolism.
    CC Strain Cyp3a13 Cyp3a16 Cyp2d10 Cyp2d22 Fmo1 Fmo5 Predic@on
    CC001 ++ + - +++ - + Highest
    CC002 - + + - - + Medium
    CC003 - - - + + + Medium
    CC004 - + --- + -- - Lowest
    CC005 + -- ++ - + - Medium
    CC006 + - - - - + Low
    CC007 + - + - + + High
    Pathway-centered prediction Toy Example
    Test!

    View Slide

  51. Looking ahead: “Pulling the (gene)c) weeds”
    PRODH2
    “Weeds”
    Step 1: Condi)on out the giant cis effect.

    View Slide

  52. “Pulling the weeds” to iden)fy subtle gene)c interac)ons
    Step 2: Mediate all subthreshold peaks LOD > 5.
    Aka “Pull the weeds”.
    Step 3: Repeat process for all 8k proteins.
    Step 4: Find proteins that share same
    subthreshold peak and mediator

    View Slide

  53. Lpgat1
    Dync1h1
    Actr1a
    Nup210
    Arl6ip5
    Dync1li2
    Prodh2
    Wbscr16
    Etfa
    Ywhah
    Rock2
    Ctsf
    Mcm2
    Myo9b
    Ehd4
    Mcm5
    Frmd8
    Mrps18c
    Maoa
    Ttll12
    Actr1b
    Apba3
    Lrrc59
    Cdc34
    Uaca
    Ptcd2
    Dctn1
    Tmem63a
    Nfkb2
    Plcb3
    Ap3b1
    Rnf135
    Ccdc6
    Trap1
    Yars2
    Ggct
    Mtpap
    Dctn3
    Diap1
    Etnppl
    Cps1
    Usp40
    Aass
    Etfb
    Dctn2
    Rbm25
    Ptk2b
    Mme
    Mkl2
    Ube2l3
    Actr10
    Ccdc91
    Ebna1bp2
    Cecr5
    Tgfbrap1
    Dync1i2
    Lyrm4
    Dctn4
    Klc4
    Sin3a
    Psmg1
    Cpsf7
    Clpp
    Slco2a1
    Mapkapk2
    Dync1li1
    Smu1
    Naga
    Scpep1
    Step 5: Look for enriched annota)ons among list.
    Dynein complex, mitochondria, amine catabolism.

    View Slide

  54. Conclusions
    •  Most gene)c varia)on that affects transcript abundance does not
    affect protein abundance.
    –  For local gene)c varia)on that does affect protein abundance,
    80% act proximally on transcrip)on (standard model).
    •  99+% of distant pQTL act on the target protein’s
    abundance independent of the target’s
    transcript abundance.
    •  Media)on analysis iden)fies 700 RNA/protein
    causal intermediates of distant pQTL and infers
    >5000 protein interac)ons.
    •  Stoichiometric buffering is a common post-transla)onal
    mechanism governing protein abundance of binding partners and
    complex members.
    CCT2 CCT3
    CCT5
    CCT6A
    CCT4
    CCT7
    CCT8
    TCP1

    View Slide

  55. Acknowledgments

    View Slide

  56. View Slide

  57. Thank you!

    View Slide

  58. Genetic variation affects protein abundance.
    Liver Kidney

    View Slide

  59. Liver
    Protein Location
    pQTL Location
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    X





































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































    Transcriptional mechanisms underlie most local pQTL.
    Over half of local pQTL affect both liver and kidney.
    QTL RNA Protein
    cis cis
    Kidney

    View Slide

  60. Example: DHTKD1– Cis pQTL liver, Cis pQTL kidney
    Liver
    Kidney

    View Slide

  61. Genetic variant(s) with conserved effects in liver & kidney.
    Founder strain coefficients
    at peak QTL SNP
    r = 0.99

    View Slide

  62. Example: LDHA – Cis pQTL liver, Cis pQTL kidney
    Kidney
    Liver

    View Slide

  63. Local variant(s) cause opposite tissue effects on LDHA expression.
    r = -0.88

    View Slide

  64. Example: MESDC2 – Cis pQTL liver, No pQTL kidney
    Liver
    Kidney

    View Slide

  65. No cis pQTL in kidney, but …
    Kidney founder coefficients
    match what we see in the liver.
    Effects are subtle, but there.
    r = 0.93

    View Slide

  66. Distant pQTL are abundant.
    Liver Kidney

    View Slide

  67. Post-transcriptional mechanisms underlie
    nearly all distant pQTL.
    Protein Location
    pQTL Location
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    X












    Liver Kidney
    cis
    RNA Protein
    QTL
    trans

    View Slide

  68. Post-transcriptional mechanisms underlie distant pQTL.
    Almost no overlap between liver and kidney.
    RNA Protein
    QTL
    trans
    Liver Kidney

    View Slide

  69. Searching for protein and transcript mediators
    of distant pQTL –> Mediation Analysis
    RNA Protein
    QTL
    trans
    cis
    RNA Protein
    Target
    Causal Intermediates
    RNA Protein
    trans
    QTL
    cis
    Target
    Target Protein ~ pQTLdistant
    Target Protein ~ pQTLdistant
    + MediatorProtein
    x 8000 proteins
    Target Protein ~ pQTLdistant
    + MediatorRNA
    x 21000 Transcripts
    X

    View Slide

  70. Mediation analysis reveals causal intermediates.
    pQTLD
    Tmem68 TMEM68
    trans
    13
    Target
    3 cis

    View Slide

  71. Tmem68 TMEM68
    trans
    13
    cis
    Target
    3 cis
    cis
    Nnt NNT
    Mediation analysis reveals causal intermediates.

    View Slide

  72. Slides can be downloaded at https://speakerdeck.com/stevemunger
    Lab website: mungerlab.com
    Twitter: @stevemunger

    View Slide