Slide 1

Slide 1 text

@lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Navigating human brain gene expression measurements at different resolutions to study psychiatric disorders Leonardo Collado Torres, Investigator UCL Great Ormond Street Institute of Child Health May 22 2023 Slides available at speakerdeck.com/lcolladotor

Slide 2

Slide 2 text

doi.org/10.1016/j.biopsych.2020.06.005 Michael Gandal @mikejg84 Transcriptomic Insight Into the Polygenic Mechanisms Underlying Psychiatric Disorders

Slide 3

Slide 3 text

https://twitter.com/lcolladotor/status/1506998412809412612 Keri Martinowich @martinowk

Slide 4

Slide 4 text

Zoom in: base pair resolution Jeff Leek @jtleek Ph.D. advisor Andrew E Jaffe @andrewejaffe Ph.D. co-advisor

Slide 5

Slide 5 text

Slide adapted from Ben Langmead

Slide 6

Slide 6 text

#recountWorkflow: Accessing over 70,000 human RNA-seq samples with Bioconductor doi.org/10.12688/f1000research.12223.1

Slide 7

Slide 7 text

#recountWorkflow recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor doi.org/10.12688/f1000research.12223.1

Slide 8

Slide 8 text

Flexible expressed region analysis for RNA-seq with #derfinder doi.org/10.1093/nar/gkw852 Jeff Leek @jtleek Ph.D. advisor

Slide 9

Slide 9 text

Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Discovery data Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data Andrew E Jaffe @andrewejaffe Ph.D. co-advisor Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36

Slide 10

Slide 10 text

doi.org/10.1038/nn.3898 Developmental regulation of human cortex transcription and its clinical relevance at single base resolution github.com/leekgroup/libd_n36

Slide 11

Slide 11 text

DERs outside of “known genes” Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36

Slide 12

Slide 12 text

doi.org/10.1038/nn.3898 BrainSpan data Developmental regulation of human cortex transcription and its clinical relevance at single base resolution github.com/leekgroup/libd_n36

Slide 13

Slide 13 text

doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe LIBD former boss Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/

Slide 14

Slide 14 text

doi.org/10.1038/s41593-018-0197-y Andrew E Jaffe @andrewejaffe Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis LIBD BrainSEQ Phase 1 eqtl.brainseq.org/phase1/

Slide 15

Slide 15 text

doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/

Slide 16

Slide 16 text

doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/

Slide 17

Slide 17 text

doi.org/10.1016/j.neuron.2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/ phase2/

Slide 18

Slide 18 text

doi.org/10.1016/j.neuron. 2019.05.013 Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/ doi.org/10.1073/pnas.1617384114 qSVA framework for RNA quality correction in differential expression analysis Amy Peterson @amptrsn

Slide 19

Slide 19 text

doi.org/10.1016/j.neuron.2019.05.013 LIBD BrainSEQ Phase 2: DLPFC + HPC eqtl.brainseq.org/phase2/ 48 Supplementary Figures 😅 data.mendeley.com/datasets/3j93ybf4md/1 DLPFC_donor i HPC_donor i g 1 5 10 g 2 6 12 … … … g k 10 20

Slide 20

Slide 20 text

Zoom in: more data! Ben Langmead @BenLangmead Abhinav Nellore @nellore (GitHub) Christopher Wilks @chrisnwilks Shannon Ellis @Shannon_E_Ellis Kasper Daniel Hansen @KasperDHansen Andrew E Jaffe @andrewejaffe Ph.D. co-advisor + LIBD former boss Jeff Leek @jtleek Ph.D. advisor

Slide 21

Slide 21 text

doi.org/10.1038/nrg.2017.113 #RailRNA Cloud computing for genomic data analysis and collaboration Ben Langmead @BenLangmead Abhinav Nellore @nellore (GitHub)

Slide 22

Slide 22 text

doi.org/10.1038/543007a

Slide 23

Slide 23 text

https://jhubiostatistics.shinyapps.io/recount/ doi.org/10.1038/nbt.3838

Slide 24

Slide 24 text

expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis Reproducible RNA-seq analysis using #recount2 + Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1038/nbt.3838 doi.org/10.1093/nar/gky102

Slide 25

Slide 25 text

SRA phenotype information is far from complete SubjectID Sex Tissue Race Age 6620 NA female liver NA NA 6621 NA female liver NA NA 6622 NA female liver NA NA 6623 NA female liver NA NA 6624 NA female liver NA NA 6625 NA male liver NA NA 6626 NA male liver NA NA 6627 NA male liver NA NA 6628 NA male liver NA NA 6629 NA male liver NA NA 6630 NA male liver NA NA 6631 NA NA blood NA NA 6632 NA NA blood NA NA 6633 NA NA blood NA NA 6634 NA NA blood NA NA 6635 NA NA blood NA NA 6636 NA NA blood NA NA z z z z slide adapted from shannon ellis Shannon Ellis @Shannon_E_Ellis

Slide 26

Slide 26 text

Category Frequency F 95 female 2036 Female 51 M 77 male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$Sex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis Shannon Ellis @Shannon_E_Ellis Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1093/nar/gky102

Slide 27

Slide 27 text

install.packages( "BiocManager" ) BiocManager:: install("recount") recount:: download_study() load() … #recountWorkflow: Accessing over 70,000 human RNA-seq samples with Bioconductor doi.org/10.12688/f1000research.12223.1

Slide 28

Slide 28 text

http://research.libd.org/recount-brain/ doi.org/10.1101/618025 recount-brain: a curated repository of human brain RNA-seq datasets metadata Ashkaun Razmara @ashkaun_razmara

Slide 29

Slide 29 text

http://research.libd.org/recount-brain/ recount-brain: a curated repository of human brain RNA-seq datasets metadata doi.org/10.1101/618025 RNASE2

Slide 30

Slide 30 text

related projects • Bioconductor recountWorkflow: doi.org/10.12688/f1000research.12223.1 • Shannon Ellis & Leek: phenotype prediction doi.org/10.1093/nar/gky102 • Jack Fu & Taub: transcript estimations doi.org/10.1101/247346 • Madugundu & Pandey (JHU): proteomics doi.org/10.1002/pmic.201800315 • Luidi-Imada & Marchionni (JHU): cancer and FANTOM doi.org/10.1101/659490 • Kuri-Magaña & Martínez-Barnetche (INSP Mexico): immune expression doi.org/10.3389/fimmu.2018.02679 • Ryten (UCL): Guelfi: validating expressed regions (ERs) eQTLs doi.org/10.1038/s41467-020-14483-x Zhang: improving the detection of ERs doi.org/10.1126/sciadv.aay8299 Mina Ryten @MinaRyten ??? 🤔

Slide 31

Slide 31 text

recount3: over 700,000 human and mouse RNA-seq samples #recount3: summaries and queries for large-scale RNA-seq expression and splicing Christopher Wilks @chrisnwilks research.libd.org/recount3-docs/ doi.org/10.1186/s13059-021-02533-6

Slide 32

Slide 32 text

#recount3: summaries and queries for large-scale RNA-seq expression and splicing rna.recount.bio doi.org/10.1186/s13059-021-02533-6

Slide 33

Slide 33 text

bioconductor.org/ packages/recount3 #recount3: summaries and queries for large-scale RNA-seq expression and splicing

Slide 34

Slide 34 text

bioconductor.org/packages/megadepth doi.org/10.1093/bioinformatics/btab152 #Megadepth: efficient coverage quantification for BigWigs and BAMs David Zhang @dyzhang32 Christopher Wilks @chrisnwilks

Slide 35

Slide 35 text

doi.org/10.1093/nar/gkac1056 doi.org/10.1101/2023.03.29.534370 #IntroVerse: a comprehensive database of introns across human tissues + Splicing accuracy varies across human introns, tissues and age Sonia García-Ruiz @sonigruiz

Slide 36

Slide 36 text

Zoom in: snRNA-seq → deconvolution of bulk RNA-seq Matthew N Tran @mattntran Kristen R Maynard @kr_maynard Louise A Huuki-Myers @lahuuki Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks

Slide 37

Slide 37 text

10x snRNA-seq Reference Data Tran, Maynard et al., Neuron, 2021 AMY DLPFC HPC NAc sACC Astro 1638 782 1170 1099 907 Endo 31 0 0 0 0 Macro 0 10 0 22 0 Micro 1168 388 1126 492 784 Mural 39 18 43 0 0 Oligo 6080 5455 5912 6134 4584 OPC 1459 572 838 669 911 Tcell 31 9 26 0 0 Excit 443 2388 623 0 4163 Inhib 3117 1580 366 11476 3974 @mattntran Matthew N Tran

Slide 38

Slide 38 text

Sean Maden @MadenSean Sang Ho Kwon @sanghokwon17 #deconvochallenge

Slide 39

Slide 39 text

1vAll Markers vs. Mean Ratio Markers 39 Louise A Huuki-Myers @lahuuki research.libd.org/DeconvoBuddies/

Slide 40

Slide 40 text

Peric = Mural + Endo Mean Proportions By Region: Tran et al, Neuron, 2021 (8 donors, 10 cell types) Louise A Huuki-Myers @lahuuki

Slide 41

Slide 41 text

Motivation ● Improve Deconvolution algorithms by considering differences in size and RNA content between cell types ● Use smFISH with RNAscope to establish data set of: ○ Cellular composition ○ Nuclei sizes of major cell types ○ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki

Slide 42

Slide 42 text

What is a TREG? ● Total RNA Expression Gene ● Expression is proportional to the overall RNA expression in a nucleus ● In smFISH the count of TREG puncta in a nucleus can estimate the RNA content Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923

Slide 43

Slide 43 text

#deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean

Slide 44

Slide 44 text

Zoom in: spatial omics Kristen R Maynard @kr_maynard Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks Andrew E Jaffe @andrewejaffe Stephanie C Page @CerceoPage

Slide 45

Slide 45 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 Andrew E Jaffe @andrewejaffe Kristen R Maynard @kr_maynard Keri Martinowich @martinowk

Slide 46

Slide 46 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects Stephanie C Hicks @stephaniehicks Lukas M Weber @lmweber

Slide 47

Slide 47 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 twitter.com/CrowellHL/status/1597579271945715717 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects

Slide 48

Slide 48 text

#spatialDLPFC 48 doi.org/10.1101/2023.02.15.528722 Louise A Huuki-Myers @lahuuki Abby Spangler @abspangler Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 49

Slide 49 text

Visium spatial clustering works for variables with high % variance explained. But what about other ones? DOI: 10.1038/s41593-020-00787-0

Slide 50

Slide 50 text

twitter.com/sanghokwon17/status/1650589385379962881 from 2023-04-24 Sang Ho Kwon @sanghokwon17 DOI: 10.1101/2023.04.20.537710 #Visium_SPG_AD

Slide 51

Slide 51 text

Hypothesis Local tissue microenvironments in close proximity to AD-related neuropathology have distinct cellular and molecular signatures. Sang Ho Kwon

Slide 52

Slide 52 text

Visium Spatial Proteogenomics (Visium-SPG) Visium-SPG = Visium SRT + immunofluorescence (using identical tissue samples) Sang Ho Kwon

Slide 53

Slide 53 text

Experimental design & study overview Braak V-VI & CERAD frequent Sang Ho Kwon

Slide 54

Slide 54 text

AD pathology signal is too small to detect by spatially-resolved gene expression alone research.libd.org/Visium_SPG_AD/

Slide 55

Slide 55 text

Estimating pathological burden per spot to generate transcriptome-scale maps of AD pathology

Slide 56

Slide 56 text

Identifying transcriptional signatures of AD-related neuropathology Sang Ho Kwon

Slide 57

Slide 57 text

bioconductor.org/packages/spatialLIBD Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w Maynard, Collado-Torres, 2021 DOI 10.1038/s41593-020-00787-0 Brenda Pardo Abby Spangler @PardoBree @abspangler Louise A. Huuki-Myers @lahuuki

Slide 58

Slide 58 text

Zoom in: transcripts? work in progress

Slide 59

Slide 59 text

Boxplots of non-DE genes with DE tx Ankrd11: acts in head morphogenesis; expressed in cerebral cortex Trpc4: acts upstream of or within gamma-aminobutyric acid secretion and oligodendrocyte differentiation; expressed in brain Scaf11: predicted to be involved in spliceosomal complex assembly; expressed in diencephalon lateral wall ventricular layer; ; midbrain ventricular layer; and telencephalon ventricular layer Daianna Gonzalez-Padilla @daianna_glez

Slide 60

Slide 60 text

Boxplots of DEG with Up and Down DE tx Dcun1d5: predicted to be involved in protein modification by small protein conjugation or removal, protein neddylation, and regulation of cell growth; expressed in NS Pnisr: predicted to be active in presynaptic active zone; expressed in NS

Slide 61

Slide 61 text

spliced alignment (HISAT2, STAR) RNA sequencing (paired reads) exon2 exon1 exon3 exon1 exon3 exon2 genome sequence GT AG GT AG exon3 exon1 isoform1 isoform1 alignments isoform2 alignments isoform2 transcript assembly (Cufflinks,StringTie) exon2 exon3 exon1 exon3 exon1 isoform1 isoform2 1 2 3 Transcript reconstruction from read mappings to the genome exons & introns do not have to be defined in the reference annotation captures potentially "novel" isoforms

Slide 62

Slide 62 text

Transcriptional noise makes transcript reconstruction difficult - inflation of "novel" transfrags read alignments assembled transfrags (transcript fragments) observed junctions read coverage

Slide 63

Slide 63 text

Figure 4: RNA quality surrogate variable assessment of Lieber Institute Datasets. Comparing gene-level degradation effects in the full degradation experiment (all regions) vs. t-statistic from Differential expression of case vs. schizophrenia for five Lieber Institute publicly available datasets (rows TODO supp table) over six different models (columns). Backgrounds shaded by value of absolute correlation. Joshua M Stolz @JoshStolz2 Hédia Tnani @TnaniHedia #qsvaR

Slide 64

Slide 64 text

Correlate Proportion Cell Type vs. qSVs Louise A Huuki-Myers @lahuuki

Slide 65

Slide 65 text

Figure 5: Effect of correcting models on reproducibility of differential expression. The replication rate between over p-value cutoffs for all available models for A. BSP1 and BSP2 DLPFC B. CMC and BSP1 C. CMC and BSP2 DLPFC Joshua M Stolz @JoshStolz2

Slide 66

Slide 66 text

doi.org/10.1186/s12859-021-04142-3 Nicholas J Eagles @Nick-Eagles (GitHub) Upcoming LIBD Data portal built on top of SPEAQeasy & PopTop & BiocMAP

Slide 67

Slide 67 text

lcolladotor.github.io/#projects ● Every assay has caveats ● We re-use tricks: think adding 0, multiplying by 1 ● It nearly always takes a team ● Data sharing accelerates science + democratizes access to it ● Zooming in allows us to reduce the heterogeneity ● We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?

Slide 68

Slide 68 text

https://youtu.be/33scakbTNO0

Slide 69

Slide 69 text

Another type of data science group Leonardo Collado Torres lcolladotor.github.io 2020-08-19

Slide 70

Slide 70 text

There is increasingly more data & tools - Greater demand for data skills: wrangling, visualization, analysis - LIBD itself generates results that are large data collections - Greater demand across LIBD scientists to learn how to work with data https://ceramics.org/ceramic-tech-today/supercomputer- powered-materials-database-unleashes-data-deluge … and many more

Slide 71

Slide 71 text

Protected time goes both ways - You need protected time to learn, guide, build training material - You need time also for collaborating - It’s important to respect both and plan accordingly 2 20% 80% research

Slide 72

Slide 72 text

jhpce.jhu.edu/knowledge-base/knowledge-base-articles-from-lieber-institute/ research.libd.org/rstatsclub/ Join us Fridays at 9 AM (check the code of conduct please!)

Slide 73

Slide 73 text

www.youtube.com/@lcolladotor/playlists Videos allow us to multiply ourselves We can make you custom selections of videos for a specific problem on DSgs sessions

Slide 74

Slide 74 text

20 chapters and counting! lcolladotor.github.io/bioc_team_ds

Slide 75

Slide 75 text

Melissa Grant-Peters @mgrantpeters Endorsed by your UCL colleague:

Slide 76

Slide 76 text

lcolladotor.github.io/pkgs lcolladotor.github.io/biocthis

Slide 77

Slide 77 text

@MadhaviTippani Madhavi Tippani @HeenaDivecha Heena R Divecha @lmwebr Lukas M Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran

Slide 78

Slide 78 text

#GBD23 Thank you for having us over in the UK 󰏅!

Slide 79

Slide 79 text

lcolladotor.github.io @lcolladotor