$30 off During Our Annual Pro Sale. View Details »

Seminar at Sanger Institute 10-5-2017

Steve Munger
October 05, 2017

Seminar at Sanger Institute 10-5-2017

Steve Munger

October 05, 2017
Tweet

More Decks by Steve Munger

Other Decks in Science

Transcript

  1. Harnessing genetic diversity
    to discover protein regulatory networks
    Steven Munger
    The Jackson Laboratory, Bar Harbor, ME USA @stevemunger
    The problem – and the power – of genetic diversity in genomics studies

    View Slide

  2. Until recently, our understanding of gene
    regulation stopped at the transcript.
    DNA RNA Protein
    transcription translation
    replication

    View Slide

  3. DNA pre-mRNA Protein
    mRNA
    mRNA
    mRNA
    miRNA
    polyuridylation
    ubiquitination
    siRNA
    RNA interference
    protein splicing,
    phosphorylation,
    acetylation,
    N-linked glycosylation,
    amidation
    sulfation… and more!
    hydroxylation
    methylation
    O-linked glycosylation
    epigenetic modification
    A-to-I
    editing
    replication
    stoichiometric buffering
    Goal: Expanding our understanding
    of gene regulation to the proteome.
    How does genetic variation affect transcript and protein abundance?

    View Slide

  4. “Next Generation” genetic models:
    The mouse Collaborative Cross and
    Diversity Outbred stock
    CAST
    129S1
    WSB NZO
    A/J
    B6
    PWK
    NOD

    View Slide

  5. Diversity Outbred (DO) mice:
    A reservoir of natural genetic perturbations.
    -  40M+ SNPS
    -  2M+ indels
    -  Balanced popula7on
    structure
    -  Each individual unique
    -  400+ recombina7ons
    in each animal
    - High heterozygosity

    View Slide

  6. Diversity Outbred (DO) Heterogeneous Stock
    A natural reservoir of genetic perturbations

    View Slide

  7. 50
    40
    30
    20
    10
    Body weight (gm)
    7/11/2014 7/31/2014 8/20/2014
    date
    50
    40
    30
    20
    10
    Body weight (gm)
    7/11/2014 7/31/2014 8/20/2014
    date
    female DO mice male DO mice
    DO mice are genetically and phenotypically diverse
    Alan Attie &
    Mark Keller
    Female DO mice Male DO mice

    View Slide

  8. Diversity Outbred mice exhibit phenotypes far exceeding the
    range observed in the founder strains.

    View Slide

  9. Some combinations of genetic variants
    produce very long-lived mice.

    View Slide

  10. 192 DO Livers
    Transcripts
    Short Reads
    RNA-Seq
    eQTL pQTL
    eQTL Mapping pQTL Mapping
    Proteins
    Peptides
    MS/MS
    Compare
    ?
    Munger et al. 2014 Chick*, Munger* et al. Nature, 2016
    How does genetic variation
    influence transcript and protein abundance?

    View Slide

  11. Challenge: Every DO mouse is a unique diploid
    combination of 10M+ SNPs and 500K+ indels.

    View Slide

  12. Alignment 101
    ACATGCTGCGGA
    ACATGCTGCGGA
    100bp Read
    Chr 1
    Chr 2
    Chr 3

    View Slide

  13. The perfect read: 1 read = 1 unique alignment.
    ACATGCTGCGGA
    ACATGCTGCGGA
    100bp Read

    Chr 1
    Chr 2
    Chr 3

    View Slide

  14. Some reads will align equally well to
    multiple locations. “Multireads”
    ACATGCTGCGGA
    ACATGCTGCGGA
    ACATGCTGCGGA
    ACATGCTGCGGA
    100bp Read



    1 read
    3 valid alignments
    Only 1 alignment is correct
    Read “Mappability” – www.gene7cs.org/content/198/1/59

    View Slide

  15. How does genetic variation affect alignment
    of RNA-seq reads? Start with a simple
    comparison of two inbred strains.
    CAST/EiJ
    C57BL/6J


    View Slide

  16. Based on known gene annotations, we expect that
    >50% of 100bp CAST reads will have at least one SNP
    that differs from the reference.
    Sanger Mouse Genomes Project – Thank you thank you thank you Thomas and colleagues

    View Slide

  17. 100bp SE Reads from CAST liver
    Compare alignment results to ground truth
    Align to CAST Pseudotranscriptome
    5’-ATCGGCGTCTTACATTAGCTCAAGGGTGCC-3’
    5’-ATCGGCGTCTTGCTCAAGGGTGCC-3’
    Align to B6 Transcriptome
    5’-ATCGGCGTCTTACATTAGCTCAAGGGTGCC-3’
    To what degree do these differences affect alignment of RNA-Seq reads
    and gene abundance es7mates?
    Simulated reads
    Real data

    View Slide

  18. Simulated CAST reads map more accurately and
    uniquely to the CAST transcriptome.
    458,297 out of ~10M reads improve by alignment to CAST
    10,533 reads improve by alignment to the reference.

    View Slide

  19. Gene-level abundance es7mates are improved
    by alignment to CAST transcriptome.
    RSEM, Li and Dewey 2010

    View Slide

  20. What about real CAST data?
    One CAST RNA-seq sample

    View Slide

  21. For 2,984 genes, abundance es7mates
    differ by > 10% by alignment approach alone.

    View Slide

  22. For these genes in the simulated data,
    2,242 – CAST alignment gave beier es7mate
    439 – REF alignment gave beier es7mate
    71 – CAST es7mate = REF es7mate
    232 – No results in simula7on
    One real CAST sample
    2,984 genes differ by > 10% by alignment alone.

    View Slide

  23. Every DO sample will have a unique gene set that is
    sensitive to alignment errors from reference alignment…

    View Slide

  24. Munger et al. 2014
    Gak et al. 2014
    Solution: Construct individualized diploid
    transcriptomes for RNA-seq alignment with Seqnature.

    View Slide

  25. Seqnature
    Munger et al. 2014
    Gak et al. 2014
    Choi Raghupathy et al., Submiied

    View Slide

  26. Analysis Pipeline
    ~ 30 million SE 100bp reads
    Yfg
    1. Align reads to transcriptome.
    Yfg
    Yfg
    Yfg
    Mouse 1
    Mouse 2
    Mouse 3
    x 272 mice
    RSEM (Li and Dewey 2010)
    2. Es7mate gene and isoform expression. 3. Map expression QTL

    View Slide

  27. Alignment to individualized transcriptomes
    results in fewer spurious liver eQTL.
    Rps12-ps2 Aligned to NCBIm37
    Aligned to DO IRGs
    Lesson 1: One false read alignment can cause two false positive genetic associations.

    View Slide

  28. Hebp1 Aligned to NCBIM37
    Aligned to DO IRGs
    Alignment to individualized transcriptomes unmasks
    significant local eQTLs for 2,000+ genes.
    Lesson 2: Alignment of all samples to a single reference genome obscures
    a huge amount of real regulatory variation.

    View Slide

  29. Munger et al. 2014
    Are these unmasked local eQTLs real? Yes.
    CC/DO Founder Strain samples

    View Slide

  30. The founder origin of each allele provides direct
    estimates of allele specific expression.
    Only alleles derived from 129S1
    express Gm12976 in the
    DO popula7on.

    View Slide

  31. Lesson 3: Allele specific expression is the rule rather than the
    exception in genetically diverse individuals.

    View Slide

  32. Lesson 4: Most expressed genes have eQTL (>75%).
    The DO is a reservoir of genetic perturbations.
    Gene Location
    eQTL Location
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    X

    View Slide

  33. 192 DO Livers
    Transcripts
    Short Reads
    RNA-Seq
    eQTL pQTL
    eQTL Mapping pQTL Mapping
    Proteins
    Peptides
    MS/MS
    Compare
    ?
    Munger et al. 2014 Chick*, Munger* et al. Nature 2016
    How does genetic variation affect protein abundance?

    View Slide

  34. Global, multiplexed, and quantitative:
    A new era in proteomics

    View Slide

  35. An unprecedented view of protein regulation.
    2,866 pQTL detected for 2,552 proteins.
    Total
    eQTL
    pQTL
    2306
    1152 1400
    N=6707
    p < 2.2e-16
    FDR < 0.1

    View Slide

  36. 80% of proteins with local pQTL have
    concordant local eQTL
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis cis

    View Slide

  37. 20% of proteins with local pQTL
    lack concordant local eQTL
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis

    View Slide

  38. 25% of expressed proteins appear buffered
    from local transcriptional variation
    Local
    eQTL
    pQTL
    1819
    344 1392
    QTL RNA Protein
    cis

    View Slide

  39. 1,130 distant pQTL indicate extensive
    trans regulation of protein abundance.

    View Slide

  40. Only 9 out of 1130 distant pQTL have
    concordant distant eQTL.
    Distant
    eQTL
    pQTL
    915
    1039 9
    cis
    RNA Protein
    QTL
    trans
    FDR < 0.1

    View Slide

  41. What post-transcriptional mechanism is acting
    in trans to control these proteins?
    Distant
    eQTL
    pQTL
    915
    1039 9
    RNA Protein
    QTL
    trans

    View Slide

  42. Searching for protein and transcript mediators
    of distant pQTL –> Mediation Analysis
    RNA Protein
    QTL
    trans
    cis
    RNA Protein
    Target
    Causal Intermediates
    RNA Protein
    trans
    QTL
    cis
    Target
    Target Protein ~ pQTLdistant
    Target Protein ~ pQTLdistant
    + MediatorProtein
    x 8000 proteins
    Target Protein ~ pQTLdistant
    + MediatorRNA
    x 21000 Transcripts
    X

    View Slide

  43. Mediation analysis reveals causal intermediates.
    pQTLD
    Tmem68 TMEM68
    trans
    13
    Target
    3 cis

    View Slide

  44. Tmem68 TMEM68
    trans
    13
    cis
    Target
    3 cis
    cis
    Nnt NNT
    Mediation analysis reveals causal intermediates.

    View Slide

  45. 43,102 SNPs in region
    3 Candidate SNPs
    1 short deletion
    1 long deletion of Exons 7-11
    Nnt eQTL
    B6 alleles do not express Nnt
    Low abundance of NNT in C57BL/6J
    drives low abundance of TMEM68.

    View Slide

  46. Protein complex members are tightly coregulated,
    with one member adopting the “regulatory” role.
    Chaperonin containing TCP1 complex

    View Slide

  47. CCT2 Mediation Analysis

    View Slide

  48. Cct6a
    Low expression of Cct6a in NOD/ShiLtJ
    drives low expression of CCT complex

    View Slide

  49. CCT2 CCT3
    CCT5
    CCT6A
    CCT4
    CCT7
    CCT8
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    TCP1
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT2
    CCT3
    CCT3
    CCT3
    CCT3
    CCT3
    CCT3
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT4
    CCT5
    CCT5
    CCT5
    CCT5
    CCT5
    CCT5
    CCT6A
    CCT6A
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT7
    CCT8
    CCT8
    CCT8
    CCT8
    CCT2 CCT3
    CCT5
    CCT6A
    CCT4
    CCT7
    CCT8
    TCP1
    Stable
    CCT2 CCT3
    CCT5 CCT4
    CCT7
    CCT8
    TCP1
    Stoichiometric buffering of protein abundance

    View Slide

  50. Mediation identifies known and novel protein
    interactions

    View Slide

  51. Wash
    Kiaa1033 Trans
    Cis
    Kiaa0196
    Fam21
    Zw10
    Vcp
    Ccdc43
    llph
    Spg20
    Fam45a
    Ccdc22
    Gaa
    Atg16l1
    Rufy1
    Wash Complex
    Ccc Complex
    Ccdc93
    Commd10
    Commd9
    9030624J02Rik
    Commd5 Commd7
    Commd3 Commd2
    Commd4
    Dscr3
    Cis
    Trans
    H2-Q10 Cis
    Trans
    Commd1
    Pum1
    1110004F10Rik
    Exocyst Complex
    Arp2/3 Complex
    Exoc6
    Exoc2
    Exoc7
    Exoc8
    Exoc5
    Exoc4
    Exoc1
    Ttc39b
    Arpc3
    Gckr
    Arpc5
    Actr3
    Arpc4
    Arpc2
    Actr2
    Rala
    Coro1b
    Cis
    Cis
    Exoc3
    Cis
    Cis
    eQTL
    pQTL
    Co-regulated
    Mediation reveals higher order protein networks
    Endosome

    View Slide

  52. Natural genetic perturbations
    + Mediation analysis
    = Predictive protein network

    View Slide

  53. In Progress: Using genetic diversity to identify
    kinases for specific phosphorylation sites.
    Liver Phospho-Proteome
    Kinase <–> phospho site iden7fica7on by media7on

    View Slide

  54. Collaborative Cross strains can be used to validate
    predictions from the DO and build new models.
    CC001– 98% Homozygous

    View Slide

  55. Accurate prediction of protein abundance
    in Founder and Collaborative Cross Strains.
    Chick Munger et al. 2016

    View Slide

  56. Looking ahead: Pathway-centered predictive genomics
    Example: Drug metabolism pathways are enriched for genes
    with significant liver pQTL.
    Tamoxifen

    View Slide

  57. Predict and test CC strain crosses that will produce
    progeny with compromised drug metabolism.
    CC Strain Cyp3a13 Cyp3a16 Cyp2d10 Cyp2d22 Fmo1 Fmo5 PredicAon
    CC001 ++ + - +++ - + Highest
    CC002 - + + - - + Medium
    CC003 - - - + + + Medium
    CC004 - + --- + -- - Lowest
    CC005 + -- ++ - + - Medium
    CC006 + - - - - + Low
    CC007 + - + - + + High
    Pathway-centered prediction Toy Example
    Test!

    View Slide

  58. Conclusions
    •  Most genetic variation that affects transcript abundance does
    not affect protein abundance.
    –  For local genetic variation that does affect protein abundance, 80%
    act proximally on transcription (standard model).
    •  99+% of distant pQTL act on the target protein’s abundance
    independent of the target’s transcript abundance.
    •  Mediation analysis identifies 700 RNA/protein causal
    intermediates of distant pQTL and infers >5000 protein
    interactions.
    •  Stoichiometric buffering is a common post-translational
    mechanism governing protein abundance of binding
    partners and complex members.
    •  We can apply our new understanding of the genome-
    proteome map in DO mice to tune output of liver pathways.

    View Slide

  59. Acknowledgments

    View Slide

  60. View Slide

  61. Slides can be downloaded:
    https://speakerdeck.com/stevemunger/
    seminar-at-sanger-institute-10-5-2017

    View Slide