2023-07-18_Verge_Genomics

@lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Lessons from working on the edge of
human brain transcriptomics with spatially-resolved transcriptomics and deconvolution Leonardo Collado Torres, Investigator Verge Genomics July 18 2023 Slides available at speakerdeck.com/lcolladotor

doi.org/10.1016/j.biopsych.2020.06.005 Michael Gandal @mikejg84 Transcriptomic Insight Into the Polygenic Mechanisms
Underlying Psychiatric Disorders

Background: Human DLPFC 3 slideshare.net Louise A Huuki-Myers @lahuuki

Zoom in: snRNA-seq → deconvolution of bulk RNA-seq Matthew N
Tran @mattntran Kristen R Maynard @kr_maynard Louise A Huuki-Myers @lahuuki Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks

What is Deconvolution? • Inferring the composition of different cell
types in a bulk RNA-seq data Louise A Huuki-Myers @lahuuki

Interaction eQTLs with cell type proportions github.com/LieberInstitute/goesHyde_mdd_rnaseq/tree/master/eqtl/code

Reference Single Cell Data 7 deconvolution(Y, Z) = Proportion of
Cell Types Louise A Huuki-Myers @lahuuki

10x snRNA-seq Reference Data Tran, Maynard et al., Neuron, 2021
AMY DLPFC HPC NAc sACC Astro 1638 782 1170 1099 907 Endo 31 0 0 0 0 Macro 0 10 0 22 0 Micro 1168 388 1126 492 784 Mural 39 18 43 0 0 Oligo 6080 5455 5912 6134 4584 OPC 1459 572 838 669 911 Tcell 31 9 26 0 0 Excit 443 2388 623 0 4163 Inhib 3117 1580 366 11476 3974 @mattntran Matthew N Tran

Marker Finding 9 deconvolution(Y, Z) = Proportion of Cell Types
Louise A Huuki-Myers @lahuuki

1vAll Markers vs. Mean Ratio Markers 10 Louise A Huuki-Myers
@lahuuki research.libd.org/DeconvoBuddies/

1vAll Markers vs. Mean Ratio Markers 11 Louise A Huuki-Myers
@lahuuki research.libd.org/DeconvoBuddies/

Methods 12 deconvolution(Y, Z) = Proportion of Cell Types Louise
A Huuki-Myers @lahuuki

Method Summary Method Regression Correction for Technical Variation Other Features
MuSiC Wang et al, Nature Communications, 2019 W-NNLS regression (Weighted - Non-negative least squares) None Tree guided deconvolution, good for closely related cell types Bisque Jew et al, Nature Communications, 2020 NNLS regresion Gene specific transformation of bulk data Leverage overlapping bulk & sc data SCDC Dong et al, Briefings in Bioinformatics, 2020 W-NNLS framework proposed by MuSiC Option for Gene specific transformation of bulk data (from Bisque) Multiple reference datasets can be used, results combined with ENSEMBL weights DWLS Tsoucas, Nature Communications, 2019 Dampened Weighted least squares None 13

Method Regression Method Run Time Marker Evaluation Adjust for snRNA-seq
vs. Bulk Tissues Tested Consider Cell Size Reference Set MuSiC W-NNLS Min. Internal Weighting No Pancreatic Islet, Rat & Mouse Kidney Yes Bisque NNLS Min. No Yes Adipose, DLPFC Recommend 3+ donors SCDC W-NNLS Min. Internal Weighting Yes Pancreatic Islet, mouse mammary Can input multiple references DWLS DWLS Hours Internal Selection No Mouse kidney, lung, liver, small intestine 14

Which Method is the Most Accurate? • Benchmarking shows that
different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) • Benchmarking results from different papers on “real” data ◦ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ◦ Bisque paper: Bisque > MuSiC > CIBERSORT ▪ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ◦ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b) ◦ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq ▪ Human PMBC ﬂow sorted (Fig 7) 15 Louise A Huuki-Myers @lahuuki

Results + Validation 16 deconvolution(Y, Z) = Proportion of Cell
Types Louise A Huuki-Myers @lahuuki

Mean Proportions By Region: Tran et al, bioRxiv, 2020 (5
donors, 6 cell types) Louise A Huuki-Myers @lahuuki

Peric = Mural + Endo Mean Proportions By Region: Tran
et al, Neuron, 2021 (8 donors, 10 cell types) Louise A Huuki-Myers @lahuuki

• Run with set of 20 & 25 marker genes
per cell type • Bisque is more robust to changes in the marker set than MuSiC Method Sensitivity to Marker Set 25 vs. 20 Genes Louise A Huuki-Myers @lahuuki

Sean Maden @MadenSean Sang Ho Kwon @sanghokwon17 #deconvochallenge doi.org/10.48550/arXiv.2305.06501

#deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with
varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean

Sean Maden @MadenSean Sang Ho Kwon @sanghokwon17 #deconvochallenge doi.org/10.48550/arXiv.2305.06501

Motivation • Improve Deconvolution algorithms by considering differences in size
and RNA content between cell types • Use smFISH with RNAscope to establish data set of: ◦ Cellular composition ◦ Nuclei sizes of major cell types ◦ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG Data-driven Identiﬁcation of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki #TREG

What is a TREG? • Total RNA Expression Gene •
Expression is proportional to the overall RNA expression in a nucleus • In smFISH the count of TREG puncta in a nucleus can estimate the RNA content Data-driven Identiﬁcation of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 #TREG

Validate with RNAscope • DLPFC from control, sectioned at 10μm
• 3 slides with 3 sections each ◦ TREG candidate + cell type marker genes • Images analyzed with HALO TREG Gene AKT3 ARID1B MALAT1/ POLR2A Cell Type Markers GAD1, SLC17A7, MBP GAD1, SLC17A7, MBP SLC17A7, MBP Cell Type Marker Excit SLC17A7 Inhib GAD1 Oligo MBP Kelsey D Montgomery Sang Ho Kwon research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923

Patterns of Observed Puncta • TREGs were expressed in most
cells • AKT3 tracks really well with pattern of expression seen in snRNA-seq (ARID1B is also pretty good) snRNA-seq RNAscope Gene Mean Prop. Cells with Expression Prop. non-zero in DLPFC snRNA Standardized β (95% CI) AKT3 0.948 0.92 -1.38 (-1.39,-1.37) ARID1B 0.908 0.94 -0.62 (-0.62,-0.61) MALAT1 0.910 1.00 -0.11 (-0.12,-0.11) POLR2A 0.853 0.30 -0.98 (-0.99,-0.98) snRNA-seq NA NA -1.33 (-1.35,-1.31) Remember: MALAT1’s puncta data is unreliable research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923

research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki

#deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with
varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean

Sean Maden @MadenSean #deconvochallenge doi.org/10.48550/arXiv.2305.06501

Zoom in: spatial omics Kristen R Maynard @kr_maynard Keri Martinowich
@martinowk Stephanie C Hicks @stephaniehicks Andrew E Jaffe @andrewejaffe Stephanie C Page @CerceoPage

Visium Platform for Spatial Gene Expression Image from 10x Genomics
- A slide contains 4 capture areas, each full of thousands of 55um-wide “spots” (often containing 1-10 cells) - Unique barcodes in each spot bind to particular genes; after sequencing, gene expression can be tied back to exact spots, forming a spatial map Kristen R. Maynard 31

bioconductor.org/packages/spatialLIBD Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w Maynard, Collado-Torres, 2021
DOI 10.1038/s41593-020-00787-0 Brenda Pardo Abby Spangler @PardoBree @abspangler Louise A. Huuki-Myers @lahuuki

2 pairs spatial adjacent replicates x subject = 12 sections
33 Subject 1 Subject 2 Subject 3 Adjacent spatial replicates (0μm) Adjacent spatial replicates (300μm) PCP4 Maynard, Collado-Torres, et al, Nat Neuro, 2021

“Pseudo-bulking” collapses data: spot to layer level 34 Maynard, Collado-Torres,
et al, Nat Neuro, 2021

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 Andrew E Jaffe @andrewejaffe Kristen
R Maynard @kr_maynard Keri Martinowich @martinowk

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 DOI 10.1093/bioinformatics/btac299 Since Feb 2020
spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects Stephanie C Hicks @stephaniehicks Lukas M Weber @lmweber

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 twitter.com/CrowellHL/status/1597579271945715717 DOI 10.1093/bioinformatics/btac299 Since Feb
2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects

#spatialDLPFC 38 doi.org/10.1101/2023.02.15.528722 Louise A Huuki-Myers @lahuuki Abby Spangler @abspangler
Nicholas J Eagles @Nick-Eagles (GitHub)

BayesSpace clustering with batch correction worked best for multiple samples
39 doi.org/10.1101/2023.02.15.528722 twitter.com/CrowellHL/status/1597579271945715717

Different Resolutions of BayesSpace Clustering k = number of clusters
• k=2: separate white vs. grey matter • k=9: best reiterated histological layers • k=16: data-driven optimal k based on fast H+ statistic 40 More Clusters = More Complexity doi.org/10.1101/2023.02.15.528722

Spatial Registration Adds Anatomical Context • Validate detection of laminar
structure • Correlate enrichment t-statistics for top marker genes of reference ◦ Cluster vs. manual annotation • Annotate with strongly associated histological layer 41 Sp k D d ~L doi.org/10.1101/2023.02.15.528722

Spatial Registration of Spatial Domains • Map SpDs to Maynard
et al. manual annotated layers • Highlight most strongly associated histological layer to add biological context 42 doi.org/10.1101/2023.02.15.528722

Identify Layer Associated Neuron Populations 43 • Apply Spatial Registration
with manual layers • 13 layer-level cell types ◦ Assign Excitatory Neurons histological layers ◦ Pool other cell type groups Kelsey D Montgomery

Layer-level Cell Type Marker genes 44 Louise A Huuki-Myers @lahuuki

Spot Deconvolution 45 Cell 1 Cell 2 … Cell N
Gene 1 0 0 … 0 Gene 2 2 5 … 3 … … … … … Gene i 1 0 … 0 Spot 1 Spot 2 … Spot M Gene 1 1 0 … 3 Gene 2 0 1 … 0 … … … … … Gene j 4 2 … 2 Astro Excit … Inhib Spot 1 1 1 … 1 Spot 2 … … … … … … Spot N 1 0 … 2 Single- Nucleus Spatial Deconvolved Results Spot 1 Nicholas J Eagles @Nick-Eagles (GitHub)

Existing Spot Deconvolution Software - Explored 3 novel software methods
from the literature Software name Overall approach Input Cell Counts Output Tangram (Biancalani et al.) Mapping individual cells Every spot Integer counts Cell2location (Kleshchevnikov et al.) Matching gene-expression profile Average across spots Decimal counts SPOTlight (Elosua-Bayes et al.) Matching gene-expression profile Not used Proportions 46 Excit L5 Counts

Benchmarking Spot Deconvolution Software: Theory - How do we measure
performance or accuracy of cell-type predictions? - Make orthogonal measurements*: image-derived counts - Leverage prior knowledge: neurons localize to gray matter? - Self-consistency of results: broad vs. ﬁne cell-type results 47 Nicholas J Eagles @Nick-Eagles (GitHub) doi.org/10.1101/2023.02.15.528722

Visium Spatial Proteogenomics (SPG) Images as an Orthogonal Measurement 48

Visium Spatial Proteogenomics (Visium-SPG) Visium-SPG = Visium SRT + immunofluorescence
(using identical tissue samples) Sang Ho Kwon @sanghokwon17

Visium Spatial Proteogenomics (Visium-SPG) - Gene expression captured like ordinary
Visium - Multi-channel ﬂuorescent images captured of the same tissue - Channels measure proteins marking for speciﬁc cell types Kristen R. Maynard 50 Sang Ho Kwon Visium-SPG = Visium SRT + immunofluorescence (using identical tissue samples) Fluorescent Protein Cell Type TMEM119 Microglia Neun Neurons OLIG2 Oligodendrocytes GFAP Astrocytes

Segmenting Cells on Visium-SPG IF Images 1. Segment cells on
IF image 2. Manually label N cells 3. Train cell-type classiﬁer and apply on remaining data Sriworarat, 2023. samuibrowser.com 51 Nicholas J Eagles @Nick-Eagles (GitHub) doi.org/10.1101/2023.01.28.525943

Constructing Dataset of Labeled Cells 1. Segment cells on IF
image 2. Manually label example cells 3. Train cell-type classiﬁer and apply on remaining data Image Channels Classified Cell Type Cell Mask 52 Annie B. Nguyen

Constructing Dataset of Labeled Cells 53 1. Segment cells on
IF image 2. Manually label N cells 3. Train cell-type classiﬁer and apply on remaining data Annie B. Nguyen 4 sections * 5 cell types * 30 cells = 600 manually labeled cells doi.org/10.1101/2023.01.28.525943 Sriworarat, 2023. samuibrowser.com

Addressing Bias in Cell Selection - Trained logistic regression model
on 600-cell dataset - Broke cells into 4 quartiles based on model confidence - Labelled 320 more cells, evenly sampled from all 4 quartiles 54 Cell Type Probability Astro 0.2 Oligo 0.3 Micro 0.1 Neuron 0.45 Other 0.05 4 quartiles * 4 sections * 5 cell types * 4 cells = 320 new cells 600 old cells + 320 new cells = 920 total cells Cell Type Probability Astro 0.01 Oligo 0.02 Micro 0.01 Neuron 0.93 Other 0.03 Less-confident neuron More-confident neuron Confidence = 0.45 Confidence = 0.93 Nicholas J Eagles @Nick-Eagles (GitHub)

Training Cell-Type Classiﬁer 55 Model Test Precision Test Recall Decision
tree 0.86 0.87 Dataset # Training # Test Split Old 600 480 120 80/20 New 320 240 80 75/25 Combined 920 720 200 ~78/22 1. Segment cells on IF image 2. Manually label N cells 3. Train cell-type classiﬁer and apply on remaining data Grid search with 5-fold CV for each model to select hyperparameters Data Model Final model chosen

Benchmark Results: Make Orthogonal Measurements 56 Neuron Layer Broad Decision
Tree - Sum across ﬁner cell types to compare against broader - Drop EndoMural for comparison to decision tree Software Predictions Nicholas J Eagles @Nick-Eagles (GitHub)

Benchmark Results: Make Orthogonal Measurements 57 Micro (Br6522_Ant_IF) All points
in A -> one point in B

Benchmark Results: Make Orthogonal Measurements 58 Broad cell type level
Layer level (Excit by layer)

59 Max across layers Not max Benchmark Results: Leverage Prior
Knowledge

Benchmark Results: Leverage Prior Knowledge 60

Benchmark Results: Self-Consistency of Results Counts from software results using
both broad and layer-level cell types were compared, by “collapsing” onto just 4 major cell types. We expect results perfectly on the diagonal! 61

Benchmark Summary 62 Metric Tangram Cell2location SPOTlight Metric Type Avg.
cor (spot-level) 0.31 0.30 0.21 Orthogonal measurements Avg. RMSE (spot-level) 1.35 1.24 1.3 Orthogonal measurements Overall prop.: (KL Div.) 0.44 0.49 0.41 Orthogonal measurements Overall prop.: (cor.) 0.46 0.37 0.47 Orthogonal measurements Overall prop.: (RMSE) 3020 3890 3040 Orthogonal measurements Histological mapping 0.69 0.77 0.23 Leverage known biology Broad vs. layer (cor.) 1.00 0.77 -0.36 Self-consistency of results Broad vs. layer (RMSE) 102 4200 4220 Self-consistency of results

Viewing Spot Deconvolution Results: Samui Browser - View: - Fluorescence
channels - Spot deconvo results - Segmented cells - Gene expression - Interactive - Quickly zoom/scroll - Full-resolution images samuibrowser.com/from?url=data2.loopybrowser.com/VisiumIF/&s=Br2720_Ant_IF&s=Br6432_Ant_IF&s=Br6522_Ant_IF&s=Br8667_Post_IF Sriworarat, 2023. 63

Viewing Spot Deconvolution Results: spatialLIBD apps - View: - spot
deconvolution results - spatial domains/ clusters - gene expression - Huge amount of aesthetic customization 64 https://libd.shinyapps.io/spatialDLPFC_Visium_SPG/

How Spot Deconvolution Results Were Used A. Better characterize unsupervised
spatial domains B. Cell-cell communication; cell-type-informed ligand-receptor interactions in the context of schizophrenia risk A 65 Boyi Guo Melissa Grant-Peters

Visium spatial clustering works for variables with high % variance
explained. But what about other ones? DOI: 10.1038/s41593-020-00787-0

twitter.com/sanghokwon17/status/1650589385379962881 from 2023-04-24 Sang Ho Kwon @sanghokwon17 DOI: 10.1101/2023.04.20.537710 #Visium_SPG_AD

Experimental design & study overview Braak V-VI & CERAD frequent
Sang Ho Kwon

AD pathology signal is too small to detect by spatially-resolved
gene expression alone research.libd.org/Visium_SPG_AD/

Identifying transcriptional signatures of AD-related neuropathology Sang Ho Kwon

Some challenges ⚠ with Visium 71

sc/snRNA-seq QC metrics such as # detected genes, # UMI,
mitochondria expression % are likely biologically related!

Prashanthi Ravichandran @prashanthi-ravichandran (GH) Artifacts in general are normalized away
by library size, though there are caveats

Diffusion issues: could be related to permeabilization step

Having more data is useful to provide context! Here 4
new samples have low sequencing saturation (outliers) but are within range of good samples from other studies

Having more data is useful to provide context! Those 4
samples have great median UMI counts per spot ^_^

Software keeps evolving and as leaders in the field we
aim to use the best methods 77 Moses, L., Pachter, L. Museum of spatial transcriptomics. Nat Methods 19, 534–546 (2022). https://doi.org/10.1038/s41592-022-01409-2

The Development Process - Making a module - New, experimental
software can change dramatically (function and syntax) between versions - Promotes collaboration by allowing two researchers to share exact code and instantly run software without special set-up SpatialExperiment release 3.14 SpatialExperiment devel 3.15 module load tangram/1.0.2 module load cell2location/0.8a0 module load spagcn/1.2.0 https://github.com/LieberInstitute/jhpce_mod_source https://github.com/LieberInstitute/jhpce_module_conﬁg Nicholas J Eagles @Nick-Eagles (GitHub)

The Development Process - Regular interaction with software authors to
clarify functionality and report bugs - Documentation for code and author responsiveness on GitHub can be critical in successfully applying software to our data Nicholas J Eagles @Nick-Eagles (GitHub)

Documentation + wrapper functions + tests (GitHub Actions + Bioconductor)
80 bioconductor.org/packages/spatialLIBD

More challenges ahead Working with multiple capture areas per tissue
Nicholas J Eagles @Nick-Eagles (GitHub) Prashanthi Ravichandran @prashanthi-ravichandran (GH) Spot diameter error: ~1.8 → ~1.1 Another pair: ~2.8 → ~0.76

lcolladotor.github.io/#projects • Every assay has caveats • We re-use tricks:
think adding 0, multiplying by 1 • It nearly always takes a team • Data sharing accelerates science + democratizes access to it • Zooming in allows us to reduce the heterogeneity • We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?

jhpce.jhu.edu/knowledge-base/knowledge-base-articles-from-lieber-institute/ research.libd.org/rstatsclub/ Join us Fridays at 9 AM (check the
code of conduct please!)

www.youtube.com/@lcolladotor/playlists Videos allow us to multiply ourselves We can make
you custom selections of videos for a speciﬁc problem on DSgs sessions

20 chapters and counting! lcolladotor.github.io/bioc_team_ds

lcolladotor.github.io/pkgs lcolladotor.github.io/biocthis

@MadhaviTippani Madhavi Tippani @HeenaDivecha Heena R Divecha @lmwebr Lukas M
Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran

lcolladotor.github.io @lcolladotor

2023-07-18_Verge_Genomics

2023-07-18_Verge_Genomics

More Decks by Leonardo Collado-Torres

Other Decks in Science

Featured

Transcript