Expression Profiles in GTEx Data

11 Expression Profiles in GTEx Data Amy Peterson Advisers: Andrew
Jaffe, PhD Leonardo Collado-Torres, PhD

ABOUT ME 2 • Current MPH student, 2017-2018 • Concentration:
Epidemiology & Biostatistics • Boston University • BA in Neuroscience, 2012 • Duke University Medical Center, 2012-2017 • Research Analyst, Laboratory for AIDS Vaccine Research & Development

EXPRESSION 3 Methods • Examine RNA expression variability across brain
regions • Compute mean base-pair level coverage for all brain samples and for each of the 13 brain sub-tissues in GTEx using data from recount2 (Collado-Torres et al. 2017a, Ellis et al. 2017) • Scaled by 40 million reads of 100 base-pairs

EXPRESSED REGIONS 4 Expressed regions defined by global cutoff (Collado-Torres
et al. 2017b)

5 All GTEx Samples: Mean Region Width

6 Overall Brain and Sub-tissues: Mean Region Width

7 Overall Brain and Sub-tissues: Median Region Width

8 Overall Brain and Sub-tissues: IQR Region Width

9 All GTEx Samples: Number of Regions

10 Overall Brain and Sub-Tissues: Number of Regions

11 All GTEx Samples: Percent of the Genome

12 Overall Brain and Sub-Tissues: Percent of the Genome

EXPRESSION 13 Next steps • Examine width distribution of known
exons • Compare distributions from all brain samples and each of the 13 brain sub- tissues

14 Overall Brain and Sub-Tissues: Mean Disjoint Exon Length

15 Overall Brain and Sub-Tissues: Median Disjoint Exon Length

16 Overall Brain and Sub-Tissues: IQR Disjoint Exon Length

17 Overall Brain and Sub-Tissues: Number of Disjoint Exons

18 Overall Brain and Sub-Tissues Exon Distribution: Percent of Genome

EXPRESSION 19 Collaborator suggestions: Mina Ryten Lab UCL Institute of
Neurology

20 Expectations • Many short regions are identified together with
the true exons with a skew to short regions • When the mean region length is then calculated would expect a shift to short values • When the median region length is calculated there is some protection from this effect, but it is still present Measuring noise at the “bottom” (Mina Ryten)

21 High cut off applied to a single tissue Expectations
• Many short regions are identified together with some true exons with a skew to short regions, BUT there is less noise at the “top” than at the “bottom” • When the mean region length is then calculated would expect a shift to short values and accounts for the appearance of an optimum in the mean region length • This is dampened when you use median values because there is simply less noise at the “top” Measuring noise at the “top” (Mina Ryten)

22 Expectations • The noise is summed across regions and
when a low cut off is used this will generate regions that effectively merge with true exons to produce long region lengths in some cases, which shift the mean upwards. • Noise at the “top” will not generate as many problems but is more likely to collapse over true exons and so the grey line starts following the rest. Collapsing noise at the “bottom” Now take another 20 noise profiles…. and combine this with a low cut off…. (Mina Ryten)

23 Conclusions • The cut off should be applied for
each region separately • Noise at the “bottom” is a much bigger problem than noise at the “top”. • There should be an optimum solution and it might be expected to vary by tissue/expt because the factors contributing to noise at the “bottom” would be expected to vary (e.g. library prep – total RNA would be expected to increase noise at the “bottom”). • We could use exon-exon junction reads as external information to find this optimum solution. • The cut off which results in the highest usage of exon-exon junction reads is the correct cut off. • Defining “usage” might not be straight forward though and will make a difference to this (Mina Ryten)

24 Future Directions • Combining expressed regions across different sub-tissues
at the same cutoff • Adapt derfinder to use variable cutoffs that are tissue and context-specific • Examine part of the genome that shows higher expression • Test higher cutoff, determine noise- appropriate level EXPRESSION

References Collado-Torres, L., Nellore, A., Kammers, K., Ellis, S.E., Taub,
M.A., Hansen, K.D., Jaffe, A.E., Langmead, B. and Leek, J.T. 2017a. Reproducible RNA-seq analysis using recount2. Nature Biotechnology 35(4), pp. 319–321. Collado-Torres, L., Nellore, A., Frazee, A. C., Wilks, C., Love, M. I., Langmead, B., . . . Jaffe, A. E. 2017b. Flexible expressed region analysis for RNA-seq with derfinder. Nucleic Acids Research, 45(2), e9. doi:10.1093/nar/gkw852 Ellis, S.E., ColladoTorres, L. and Leek, J. 2017. Improving the value of public RNA-seq expression data by phenotype prediction. BioRxiv. 25

Expression Profiles in GTEx Data

Expression Profiles in GTEx Data

Amy Peterson

More Decks by Amy Peterson

Other Decks in Research

Featured

Transcript

11 Expression Profiles in GTEx Data Amy Peterson Advisers: Andrew

ABOUT ME 2 • Current MPH student, 2017-2018 • Concentration:

EXPRESSION 3 Methods • Examine RNA expression variability across brain

EXPRESSED REGIONS 4 Expressed regions defined by global cutoff (Collado-Torres

5 All GTEx Samples: Mean Region Width

6 Overall Brain and Sub-tissues: Mean Region Width

7 Overall Brain and Sub-tissues: Median Region Width

8 Overall Brain and Sub-tissues: IQR Region Width

9 All GTEx Samples: Number of Regions

10 Overall Brain and Sub-Tissues: Number of Regions

11 All GTEx Samples: Percent of the Genome

12 Overall Brain and Sub-Tissues: Percent of the Genome

EXPRESSION 13 Next steps • Examine width distribution of known

14 Overall Brain and Sub-Tissues: Mean Disjoint Exon Length

15 Overall Brain and Sub-Tissues: Median Disjoint Exon Length

16 Overall Brain and Sub-Tissues: IQR Disjoint Exon Length

17 Overall Brain and Sub-Tissues: Number of Disjoint Exons

18 Overall Brain and Sub-Tissues Exon Distribution: Percent of Genome

EXPRESSION 19 Collaborator suggestions: Mina Ryten Lab UCL Institute of

20 Expectations • Many short regions are identified together with

21 High cut off applied to a single tissue Expectations

22 Expectations • The noise is summed across regions and

23 Conclusions • The cut off should be applied for

24 Future Directions • Combining expressed regions across different sub-tissues

References Collado-Torres, L., Nellore, A., Kammers, K., Ellis, S.E., Taub,