Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Expression Profiles in GTEx Data

Amy Peterson
February 19, 2018

Expression Profiles in GTEx Data

Amy Peterson

February 19, 2018
Tweet

More Decks by Amy Peterson

Other Decks in Research

Transcript

  1. 11
    Expression Profiles in GTEx Data
    Amy Peterson
    Advisers:
    Andrew Jaffe, PhD
    Leonardo Collado-Torres, PhD

    View full-size slide

  2. ABOUT ME
    2
    • Current MPH student, 2017-2018
    • Concentration: Epidemiology & Biostatistics
    • Boston University
    • BA in Neuroscience, 2012
    • Duke University Medical Center, 2012-2017
    • Research Analyst, Laboratory for AIDS
    Vaccine Research & Development

    View full-size slide

  3. EXPRESSION
    3
    Methods
    • Examine RNA expression variability across
    brain regions
    • Compute mean base-pair level coverage
    for all brain samples and for each of the 13
    brain sub-tissues in GTEx using data from
    recount2 (Collado-Torres et al. 2017a, Ellis et al. 2017)
    • Scaled by 40 million reads of 100 base-pairs

    View full-size slide

  4. EXPRESSED REGIONS
    4
    Expressed regions defined by global cutoff
    (Collado-Torres et al. 2017b)

    View full-size slide

  5. 5
    All GTEx Samples: Mean Region Width

    View full-size slide

  6. 6
    Overall Brain and Sub-tissues: Mean Region Width

    View full-size slide

  7. 7
    Overall Brain and Sub-tissues: Median Region Width

    View full-size slide

  8. 8
    Overall Brain and Sub-tissues: IQR Region Width

    View full-size slide

  9. 9
    All GTEx Samples: Number of Regions

    View full-size slide

  10. 10
    Overall Brain and Sub-Tissues: Number of Regions

    View full-size slide

  11. 11
    All GTEx Samples: Percent of the Genome

    View full-size slide

  12. 12
    Overall Brain and Sub-Tissues: Percent of the Genome

    View full-size slide

  13. EXPRESSION
    13
    Next steps
    • Examine width distribution of known exons
    • Compare distributions from all brain
    samples and each of the 13 brain sub-
    tissues

    View full-size slide

  14. 14
    Overall Brain and Sub-Tissues: Mean Disjoint Exon Length

    View full-size slide

  15. 15
    Overall Brain and Sub-Tissues: Median Disjoint Exon Length

    View full-size slide

  16. 16
    Overall Brain and Sub-Tissues: IQR Disjoint Exon Length

    View full-size slide

  17. 17
    Overall Brain and Sub-Tissues: Number of Disjoint Exons

    View full-size slide

  18. 18
    Overall Brain and Sub-Tissues Exon Distribution: Percent of Genome

    View full-size slide

  19. EXPRESSION
    19
    Collaborator suggestions:
    Mina Ryten Lab
    UCL Institute of Neurology

    View full-size slide

  20. 20
    Expectations
    • Many short regions are identified together
    with the true exons with a skew to short
    regions
    • When the mean region length is then
    calculated would expect a shift to short values
    • When the median region length is calculated
    there is some protection from this effect, but
    it is still present
    Measuring noise at the “bottom”
    (Mina Ryten)

    View full-size slide

  21. 21
    High cut off applied
    to a single tissue
    Expectations
    • Many short regions are identified together with some true
    exons with a skew to short regions, BUT there is less noise at
    the “top” than at the “bottom”
    • When the mean region length is then calculated would
    expect a shift to short values and accounts for the
    appearance of an optimum in the mean region length
    • This is dampened when you use median values because
    there is simply less noise at the “top”
    Measuring noise at the “top”
    (Mina Ryten)

    View full-size slide

  22. 22
    Expectations
    • The noise is summed across regions and when a low
    cut off is used this will generate regions that effectively
    merge with true exons to produce long region lengths
    in some cases, which shift the mean upwards.
    • Noise at the “top” will not generate as many problems
    but is more likely to collapse over true exons and so the
    grey line starts following the rest.
    Collapsing noise at the “bottom”
    Now take another 20 noise profiles…. and combine this with a low cut off….
    (Mina Ryten)

    View full-size slide

  23. 23
    Conclusions
    • The cut off should be applied for each region separately
    • Noise at the “bottom” is a much bigger problem than noise at the
    “top”.
    • There should be an optimum solution and it might be expected to
    vary by tissue/expt because the factors contributing to noise at
    the “bottom” would be expected to vary (e.g. library prep – total
    RNA would be expected to increase noise at the “bottom”).
    • We could use exon-exon junction reads as external information to
    find this optimum solution.
    • The cut off which results in the highest usage of exon-exon
    junction reads is the correct cut off.
    • Defining “usage” might not be straight forward though and will
    make a difference to this
    (Mina Ryten)

    View full-size slide

  24. 24
    Future Directions
    • Combining expressed regions across
    different sub-tissues at the same cutoff
    • Adapt derfinder to use variable cutoffs that
    are tissue and context-specific
    • Examine part of the genome that shows
    higher expression
    • Test higher cutoff, determine noise-
    appropriate level
    EXPRESSION

    View full-size slide

  25. References
    Collado-Torres, L., Nellore, A., Kammers, K., Ellis, S.E., Taub, M.A.,
    Hansen, K.D., Jaffe, A.E.,
    Langmead, B. and Leek, J.T. 2017a. Reproducible RNA-seq analysis
    using recount2. Nature
    Biotechnology 35(4), pp. 319–321.
    Collado-Torres, L., Nellore, A., Frazee, A. C., Wilks, C., Love, M. I.,
    Langmead, B., . . . Jaffe, A. E. 2017b. Flexible expressed region analysis
    for RNA-seq with derfinder. Nucleic Acids Research, 45(2), e9.
    doi:10.1093/nar/gkw852
    Ellis, S.E., ColladoTorres, L. and Leek, J. 2017. Improving the value of
    public RNA-seq
    expression data by phenotype prediction. BioRxiv.
    25

    View full-size slide