"Navigating human brain gene expression measurements at different resolutions to study psychiatric disorders" seminar on 2023-05-22 at UCL Great Ormond Street Institute of Child Health
@lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Navigating human brain gene expression measurements at different resolutions to study psychiatric disorders Leonardo Collado Torres, Investigator UCL Great Ormond Street Institute of Child Health May 22 2023 Slides available at speakerdeck.com/lcolladotor
Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Discovery data Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data Andrew E Jaffe @andrewejaffe Ph.D. co-advisor Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36
doi.org/10.1038/nn.3898 Developmental regulation of human cortex transcription and its clinical relevance at single base resolution github.com/leekgroup/libd_n36
DERs outside of “known genes” Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36
doi.org/10.1038/nn.3898 BrainSpan data Developmental regulation of human cortex transcription and its clinical relevance at single base resolution github.com/leekgroup/libd_n36
doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe LIBD former boss Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/
doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/
doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/
doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/
doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/
doi.org/10.1016/j.neuron. 2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/ doi.org/10.1073/pnas.1617384114 qSVA framework for RNA quality correction in differential expression analysis Amy Peterson @amptrsn
Zoom in: more data! Ben Langmead @BenLangmead Abhinav Nellore @nellore (GitHub) Christopher Wilks @chrisnwilks Shannon Ellis @Shannon_E_Ellis Kasper Daniel Hansen @KasperDHansen Andrew E Jaffe @andrewejaffe Ph.D. co-advisor + LIBD former boss Jeff Leek @jtleek Ph.D. advisor
doi.org/10.1038/nrg.2017.113 #RailRNA Cloud computing for genomic data analysis and collaboration Ben Langmead @BenLangmead Abhinav Nellore @nellore (GitHub)
expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis Reproducible RNA-seq analysis using #recount2 + Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1038/nbt.3838 doi.org/10.1093/nar/gky102
SRA phenotype information is far from complete SubjectID Sex Tissue Race Age 6620 NA female liver NA NA 6621 NA female liver NA NA 6622 NA female liver NA NA 6623 NA female liver NA NA 6624 NA female liver NA NA 6625 NA male liver NA NA 6626 NA male liver NA NA 6627 NA male liver NA NA 6628 NA male liver NA NA 6629 NA male liver NA NA 6630 NA male liver NA NA 6631 NA NA blood NA NA 6632 NA NA blood NA NA 6633 NA NA blood NA NA 6634 NA NA blood NA NA 6635 NA NA blood NA NA 6636 NA NA blood NA NA z z z z slide adapted from shannon ellis Shannon Ellis @Shannon_E_Ellis
Category Frequency F 95 female 2036 Female 51 M 77 male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$Sex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis Shannon Ellis @Shannon_E_Ellis Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1093/nar/gky102
http://research.libd.org/recount-brain/ doi.org/10.1101/618025 recount-brain: a curated repository of human brain RNA-seq datasets metadata Ashkaun Razmara @ashkaun_razmara
recount3: over 700,000 human and mouse RNA-seq samples #recount3: summaries and queries for large-scale RNA-seq expression and splicing Christopher Wilks @chrisnwilks research.libd.org/recount3-docs/ doi.org/10.1186/s13059-021-02533-6
bioconductor.org/packages/megadepth doi.org/10.1093/bioinformatics/btab152 #Megadepth: efficient coverage quantification for BigWigs and BAMs David Zhang @dyzhang32 Christopher Wilks @chrisnwilks
doi.org/10.1093/nar/gkac1056 doi.org/10.1101/2023.03.29.534370 #IntroVerse: a comprehensive database of introns across human tissues + Splicing accuracy varies across human introns, tissues and age Sonia García-Ruiz @sonigruiz
Zoom in: snRNA-seq → deconvolution of bulk RNA-seq Matthew N Tran @mattntran Kristen R Maynard @kr_maynard Louise A Huuki-Myers @lahuuki Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks
Motivation ● Improve Deconvolution algorithms by considering differences in size and RNA content between cell types ● Use smFISH with RNAscope to establish data set of: ○ Cellular composition ○ Nuclei sizes of major cell types ○ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki
What is a TREG? ● Total RNA Expression Gene ● Expression is proportional to the overall RNA expression in a nucleus ● In smFISH the count of TREG puncta in a nucleus can estimate the RNA content Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923
#deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean
Zoom in: spatial omics Kristen R Maynard @kr_maynard Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks Andrew E Jaffe @andrewejaffe Stephanie C Page @CerceoPage
DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 Andrew E Jaffe @andrewejaffe Kristen R Maynard @kr_maynard Keri Martinowich @martinowk
DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects Stephanie C Hicks @stephaniehicks Lukas M Weber @lmweber
DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 twitter.com/CrowellHL/status/1597579271945715717 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects
Hypothesis Local tissue microenvironments in close proximity to AD-related neuropathology have distinct cellular and molecular signatures. Sang Ho Kwon
bioconductor.org/packages/spatialLIBD Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w Maynard, Collado-Torres, 2021 DOI 10.1038/s41593-020-00787-0 Brenda Pardo Abby Spangler @PardoBree @abspangler Louise A. Huuki-Myers @lahuuki
Boxplots of non-DE genes with DE tx Ankrd11: acts in head morphogenesis; expressed in cerebral cortex Trpc4: acts upstream of or within gamma-aminobutyric acid secretion and oligodendrocyte differentiation; expressed in brain Scaf11: predicted to be involved in spliceosomal complex assembly; expressed in diencephalon lateral wall ventricular layer; ; midbrain ventricular layer; and telencephalon ventricular layer Daianna Gonzalez-Padilla @daianna_glez
Boxplots of DEG with Up and Down DE tx Dcun1d5: predicted to be involved in protein modification by small protein conjugation or removal, protein neddylation, and regulation of cell growth; expressed in NS Pnisr: predicted to be active in presynaptic active zone; expressed in NS
Figure 4: RNA quality surrogate variable assessment of Lieber Institute Datasets. Comparing gene-level degradation effects in the full degradation experiment (all regions) vs. t-statistic from Differential expression of case vs. schizophrenia for five Lieber Institute publicly available datasets (rows TODO supp table) over six different models (columns). Backgrounds shaded by value of absolute correlation. Joshua M Stolz @JoshStolz2 Hédia Tnani @TnaniHedia #qsvaR
Figure 5: Effect of correcting models on reproducibility of differential expression. The replication rate between over p-value cutoffs for all available models for A. BSP1 and BSP2 DLPFC B. CMC and BSP1 C. CMC and BSP2 DLPFC Joshua M Stolz @JoshStolz2
lcolladotor.github.io/#projects ● Every assay has caveats ● We re-use tricks: think adding 0, multiplying by 1 ● It nearly always takes a team ● Data sharing accelerates science + democratizes access to it ● Zooming in allows us to reduce the heterogeneity ● We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?
There is increasingly more data & tools - Greater demand for data skills: wrangling, visualization, analysis - LIBD itself generates results that are large data collections - Greater demand across LIBD scientists to learn how to work with data https://ceramics.org/ceramic-tech-today/supercomputer- powered-materials-database-unleashes-data-deluge … and many more
Protected time goes both ways - You need protected time to learn, guide, build training material - You need time also for collaborating - It’s important to respect both and plan accordingly 2 20% 80% research
jhpce.jhu.edu/knowledge-base/knowledge-base-articles-from-lieber-institute/ research.libd.org/rstatsclub/ Join us Fridays at 9 AM (check the code of conduct please!)
www.youtube.com/@lcolladotor/playlists Videos allow us to multiply ourselves We can make you custom selections of videos for a specific problem on DSgs sessions
@MadhaviTippani Madhavi Tippani @HeenaDivecha Heena R Divecha @lmwebr Lukas M Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran