Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UCL-ICH-2023

 UCL-ICH-2023

"Navigating human brain gene expression measurements at different resolutions to study psychiatric disorders" seminar on 2023-05-22 at UCL Great Ormond Street Institute of Child Health

Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. @lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Navigating human brain gene expression measurements at

    different resolutions to study psychiatric disorders Leonardo Collado Torres, Investigator UCL Great Ormond Street Institute of Child Health May 22 2023 Slides available at speakerdeck.com/lcolladotor
  2. Zoom in: base pair resolution Jeff Leek @jtleek Ph.D. advisor

    Andrew E Jaffe @andrewejaffe Ph.D. co-advisor
  3. Fetal Infant Child Teen Adult 50+ 6 / group, N

    = 36 Discovery data Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data Andrew E Jaffe @andrewejaffe Ph.D. co-advisor Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36
  4. doi.org/10.1038/nn.3898 Developmental regulation of human cortex transcription and its clinical

    relevance at single base resolution github.com/leekgroup/libd_n36
  5. DERs outside of “known genes” Developmental regulation of human cortex

    transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36
  6. doi.org/10.1038/nn.3898 BrainSpan data Developmental regulation of human cortex transcription and

    its clinical relevance at single base resolution github.com/leekgroup/libd_n36
  7. doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe LIBD former boss Developmental and

    genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/
  8. doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe Developmental and genetic regulation of

    the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/
  9. doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in

    the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/
  10. doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in

    the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/
  11. doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in

    the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/
  12. doi.org/10.1016/j.neuron. 2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence

    in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/ doi.org/10.1073/pnas.1617384114 qSVA framework for RNA quality correction in differential expression analysis Amy Peterson @amptrsn
  13. doi.org/10.1016/j.neuron.2019.05.013 LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/ 48

    Supplementary Figures 😅 data.mendeley.com/datasets/3j93ybf4md/1 DLPFC_donor i HPC_donor i g 1 5 10 g 2 6 12 … … … g k 10 20
  14. Zoom in: more data! Ben Langmead @BenLangmead Abhinav Nellore @nellore

    (GitHub) Christopher Wilks @chrisnwilks Shannon Ellis @Shannon_E_Ellis Kasper Daniel Hansen @KasperDHansen Andrew E Jaffe @andrewejaffe Ph.D. co-advisor + LIBD former boss Jeff Leek @jtleek Ph.D. advisor
  15. expression data for ~70,000 human samples samples phenotypes ? GTEx

    N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis Reproducible RNA-seq analysis using #recount2 + Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1038/nbt.3838 doi.org/10.1093/nar/gky102
  16. SRA phenotype information is far from complete SubjectID Sex Tissue

    Race Age 6620 NA female liver NA NA 6621 NA female liver NA NA 6622 NA female liver NA NA 6623 NA female liver NA NA 6624 NA female liver NA NA 6625 NA male liver NA NA 6626 NA male liver NA NA 6627 NA male liver NA NA 6628 NA male liver NA NA 6629 NA male liver NA NA 6630 NA male liver NA NA 6631 NA NA blood NA NA 6632 NA NA blood NA NA 6633 NA NA blood NA NA 6634 NA NA blood NA NA 6635 NA NA blood NA NA 6636 NA NA blood NA NA z z z z slide adapted from shannon ellis Shannon Ellis @Shannon_E_Ellis
  17. Category Frequency F 95 female 2036 Female 51 M 77

    male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$Sex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis Shannon Ellis @Shannon_E_Ellis Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1093/nar/gky102
  18. install.packages( "BiocManager" ) BiocManager:: install("recount") recount:: download_study() load() … #recountWorkflow:

    Accessing over 70,000 human RNA-seq samples with Bioconductor doi.org/10.12688/f1000research.12223.1
  19. related projects • Bioconductor recountWorkflow: doi.org/10.12688/f1000research.12223.1 • Shannon Ellis &

    Leek: phenotype prediction doi.org/10.1093/nar/gky102 • Jack Fu & Taub: transcript estimations doi.org/10.1101/247346 • Madugundu & Pandey (JHU): proteomics doi.org/10.1002/pmic.201800315 • Luidi-Imada & Marchionni (JHU): cancer and FANTOM doi.org/10.1101/659490 • Kuri-Magaña & Martínez-Barnetche (INSP Mexico): immune expression doi.org/10.3389/fimmu.2018.02679 • Ryten (UCL): Guelfi: validating expressed regions (ERs) eQTLs doi.org/10.1038/s41467-020-14483-x Zhang: improving the detection of ERs doi.org/10.1126/sciadv.aay8299 Mina Ryten @MinaRyten ??? 🤔
  20. recount3: over 700,000 human and mouse RNA-seq samples #recount3: summaries

    and queries for large-scale RNA-seq expression and splicing Christopher Wilks @chrisnwilks research.libd.org/recount3-docs/ doi.org/10.1186/s13059-021-02533-6
  21. doi.org/10.1093/nar/gkac1056 doi.org/10.1101/2023.03.29.534370 #IntroVerse: a comprehensive database of introns across human

    tissues + Splicing accuracy varies across human introns, tissues and age Sonia García-Ruiz @sonigruiz
  22. Zoom in: snRNA-seq → deconvolution of bulk RNA-seq Matthew N

    Tran @mattntran Kristen R Maynard @kr_maynard Louise A Huuki-Myers @lahuuki Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks
  23. 10x snRNA-seq Reference Data Tran, Maynard et al., Neuron, 2021

    AMY DLPFC HPC NAc sACC Astro 1638 782 1170 1099 907 Endo 31 0 0 0 0 Macro 0 10 0 22 0 Micro 1168 388 1126 492 784 Mural 39 18 43 0 0 Oligo 6080 5455 5912 6134 4584 OPC 1459 572 838 669 911 Tcell 31 9 26 0 0 Excit 443 2388 623 0 4163 Inhib 3117 1580 366 11476 3974 @mattntran Matthew N Tran
  24. 1vAll Markers vs. Mean Ratio Markers 39 Louise A Huuki-Myers

    @lahuuki research.libd.org/DeconvoBuddies/
  25. Peric = Mural + Endo Mean Proportions By Region: Tran

    et al, Neuron, 2021 (8 donors, 10 cell types) Louise A Huuki-Myers @lahuuki
  26. Motivation • Improve Deconvolution algorithms by considering differences in size

    and RNA content between cell types • Use smFISH with RNAscope to establish data set of: ◦ Cellular composition ◦ Nuclei sizes of major cell types ◦ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki
  27. What is a TREG? • Total RNA Expression Gene •

    Expression is proportional to the overall RNA expression in a nucleus • In smFISH the count of TREG puncta in a nucleus can estimate the RNA content Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923
  28. #deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with

    varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean
  29. Zoom in: spatial omics Kristen R Maynard @kr_maynard Keri Martinowich

    @martinowk Stephanie C Hicks @stephaniehicks Andrew E Jaffe @andrewejaffe Stephanie C Page @CerceoPage
  30. DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 DOI 10.1093/bioinformatics/btac299 Since Feb 2020

    spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects Stephanie C Hicks @stephaniehicks Lukas M Weber @lmweber
  31. Visium spatial clustering works for variables with high % variance

    explained. But what about other ones? DOI: 10.1038/s41593-020-00787-0
  32. Hypothesis Local tissue microenvironments in close proximity to AD-related neuropathology

    have distinct cellular and molecular signatures. Sang Ho Kwon
  33. AD pathology signal is too small to detect by spatially-resolved

    gene expression alone research.libd.org/Visium_SPG_AD/
  34. bioconductor.org/packages/spatialLIBD Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w Maynard, Collado-Torres, 2021

    DOI 10.1038/s41593-020-00787-0 Brenda Pardo Abby Spangler @PardoBree @abspangler Louise A. Huuki-Myers @lahuuki
  35. Boxplots of non-DE genes with DE tx Ankrd11: acts in

    head morphogenesis; expressed in cerebral cortex Trpc4: acts upstream of or within gamma-aminobutyric acid secretion and oligodendrocyte differentiation; expressed in brain Scaf11: predicted to be involved in spliceosomal complex assembly; expressed in diencephalon lateral wall ventricular layer; ; midbrain ventricular layer; and telencephalon ventricular layer Daianna Gonzalez-Padilla @daianna_glez
  36. Boxplots of DEG with Up and Down DE tx Dcun1d5:

    predicted to be involved in protein modification by small protein conjugation or removal, protein neddylation, and regulation of cell growth; expressed in NS Pnisr: predicted to be active in presynaptic active zone; expressed in NS
  37. spliced alignment (HISAT2, STAR) RNA sequencing (paired reads) exon2 exon1

    exon3 exon1 exon3 exon2 genome sequence GT AG GT AG exon3 exon1 isoform1 isoform1 alignments isoform2 alignments isoform2 transcript assembly (Cufflinks,StringTie) exon2 exon3 exon1 exon3 exon1 isoform1 isoform2 1 2 3 Transcript reconstruction from read mappings to the genome exons & introns do not have to be defined in the reference annotation captures potentially "novel" isoforms
  38. Transcriptional noise makes transcript reconstruction difficult - inflation of "novel"

    transfrags read alignments assembled transfrags (transcript fragments) observed junctions read coverage
  39. Figure 4: RNA quality surrogate variable assessment of Lieber Institute

    Datasets. Comparing gene-level degradation effects in the full degradation experiment (all regions) vs. t-statistic from Differential expression of case vs. schizophrenia for five Lieber Institute publicly available datasets (rows TODO supp table) over six different models (columns). Backgrounds shaded by value of absolute correlation. Joshua M Stolz @JoshStolz2 Hédia Tnani @TnaniHedia #qsvaR
  40. Figure 5: Effect of correcting models on reproducibility of differential

    expression. The replication rate between over p-value cutoffs for all available models for A. BSP1 and BSP2 DLPFC B. CMC and BSP1 C. CMC and BSP2 DLPFC Joshua M Stolz @JoshStolz2
  41. lcolladotor.github.io/#projects • Every assay has caveats • We re-use tricks:

    think adding 0, multiplying by 1 • It nearly always takes a team • Data sharing accelerates science + democratizes access to it • Zooming in allows us to reduce the heterogeneity • We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?
  42. There is increasingly more data & tools - Greater demand

    for data skills: wrangling, visualization, analysis - LIBD itself generates results that are large data collections - Greater demand across LIBD scientists to learn how to work with data https://ceramics.org/ceramic-tech-today/supercomputer- powered-materials-database-unleashes-data-deluge … and many more
  43. Protected time goes both ways - You need protected time

    to learn, guide, build training material - You need time also for collaborating - It’s important to respect both and plan accordingly 2 20% 80% research
  44. www.youtube.com/@lcolladotor/playlists Videos allow us to multiply ourselves We can make

    you custom selections of videos for a specific problem on DSgs sessions
  45. @MadhaviTippani Madhavi Tippani @HeenaDivecha Heena R Divecha @lmwebr Lukas M

    Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran