N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis
male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$Sex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis
samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 GTEx Genotype Tissue Expression Project N=9,662 divide samples build and optimize phenotype predictor training set predict phenotypes across SRA samples test accuracy of predictor predict phenotypes across samples in TCGA test set TCGA The Cancer Genome Atlas N=11,284 slide adapted from Shannon Ellis
Black Hispanic White Tissue Site 1 Cerebral cortex Hippocampus Brainstem Cerebellum Tissue Site 2 Frontal lobe Temporal lobe Midbrain Basal ganglia Tissue Site 3 Dorsolateral prefrontal cortex Superior temporal gyrus Substantia nigra Caudate Hemisphere Left Right Brodmann Area 1-52 Disease Status Disease Neurological control Disease Brain tumor Alzheimer’s disease Parkinson’s disease Bipolar disorder Tumor Type Glioblastoma Astrocytoma Oligodendroglioma Ependymoma Clinical Stage 1 Grade I Grade II Grade III Grade IV Clinical Stage 2 Primary Secondary Recurrent Viability Postmortem Biopsy Preparation Frozen Thawed Ashkaun Razmara, et al doi.org/10.1101/618025
T. Leek University of Toronto Dustin J. Sokolowski Michael D. Wilson NIH Sean Davis LIBD Andrew E. Jaffe Funding NIH R01 GM105705 NIH 1R21MH109956 NIH R01 GM121459 CIHR, NSERC Ontario Ministry of Research IDIES SciServer Hosting recount2 github.com/LieberInstitute/recount-brain
= 36 Discovery data Jaffe et al, Nat. Neuroscience, 2015 Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data
project involves the Hansen, Leek, Langmead and Battle labs at JHU & the Nellore lab at OHSU & the Jaffe lab at LIBD Contact: • Kasper D. Hansen www.hansenlab.org • Jeff Leek jtleek.com/ • Ben Langmead www.langmead-lab.org/ • Alexis Battle battlelab.jhu.edu/ • Abhinav Nellore nellore.bio/ • Andrew Jaffe aejaffe.com/ + Leonardo Collado-Torres lcolladotor.github.io/
Hadoop cluster • No major updates & human-only: “where is my favorite dataset?” • Requires a lot of manual post-alignment work: hard to auto-update • Annotation choice (Gencode v25) is engrained in the R files
& human: studies + collections • Can be auto-updated (hopefully!) • Several annotation choices included • Faster tools for re-annotation quantification (faster than rtracklayer, bwtool, …) • R interface is more flexible: builds RangedSummarizedExperiment objects on the fly Coming to your nearest Bioconductor mirror in 2020!