Slide 1

Slide 1 text

@lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Studying the human prefrontal cortex transcriptome at different resolutions Leonardo Collado Torres, Investigator #LCG20 at LCG-UNAM 󰐏 August 31 2023 Slides available at speakerdeck.com/lcolladotor

Slide 2

Slide 2 text

Interesadxs → Usarixs → Desarrolladores ¿Cómo le hacemos para fomentar este paso en México y Latino América?

Slide 3

Slide 3 text

• 2017: ○ idea en BioC2017 e inicio de la fundación de CDSB • 2018: ○ primer taller ^_^, con instructores de Bioconductor: Martin Morgan & Benilton Carvalho • 2019: ○ BioC2019: apoyo a solicitud de becas ○ Taller con materiales adaptados de RStudio • 2020: ○ regutools: primer paquete en Bioconductor ○ Taller con RStudio & Bioconductor • 2021: ○ primera vez con 2 talleres https://comunidadbioinfo.github.io/

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

https://doi.org/10.1093/bioinformatics/btaa575

Slide 6

Slide 6 text

Junta Directiva CDSB hasta 2019

Slide 7

Slide 7 text

comunidadbioinfo.github.io

Slide 8

Slide 8 text

@lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Studying the human prefrontal cortex transcriptome at different resolutions Leonardo Collado Torres, Investigator #LCG20 at LCG-UNAM 󰐏 August 31 2023 Slides available at speakerdeck.com/lcolladotor

Slide 9

Slide 9 text

doi.org/10.1016/j.biopsych.2020.06.005 Michael Gandal @mikejg84 Transcriptomic Insight Into the Polygenic Mechanisms Underlying Psychiatric Disorders

Slide 10

Slide 10 text

Background: Human DLPFC 10 slideshare.net Louise A Huuki-Myers @lahuuki

Slide 11

Slide 11 text

Zoom in: base pair resolution Jeff Leek @jtleek Ph.D. advisor Andrew E Jaffe @andrewejaffe Ph.D. co-advisor

Slide 12

Slide 12 text

lcolladotor.github.io/talk/lcg2014/ #LCG10 fue organizado por Esperanza Martínez Romero et al.

Slide 13

Slide 13 text

Flexible expressed region analysis for RNA-seq with #derfinder doi.org/10.1093/nar/gkw852 Jeff Leek @jtleek Ph.D. advisor

Slide 14

Slide 14 text

Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Discovery data Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data Andrew E Jaffe @andrewejaffe Ph.D. co-advisor Developmental regulation of human cortex transcription and its clinical relevance at single base resolution doi.org/10.1038/nn.3898 github.com/leekgroup/libd_n36

Slide 15

Slide 15 text

doi.org/10.1038/nn.3898 Developmental regulation of human cortex transcription and its clinical relevance at single base resolution github.com/leekgroup/libd_n36

Slide 16

Slide 16 text

Zoom in: more data! Ben Langmead @BenLangmead Abhinav Nellore @nellore (GitHub) Christopher Wilks @chrisnwilks Shannon Ellis @Shannon_E_Ellis Kasper Daniel Hansen @KasperDHansen Andrew E Jaffe @andrewejaffe Ph.D. co-advisor + LIBD former boss Jeff Leek @jtleek Ph.D. advisor

Slide 17

Slide 17 text

doi.org/10.1038/543007a

Slide 18

Slide 18 text

expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis Reproducible RNA-seq analysis using #recount2 + Improving the value of public RNA-seq expression data by phenotype prediction doi.org/10.1038/nbt.3838 doi.org/10.1093/nar/gky102

Slide 19

Slide 19 text

recount3: over 700,000 human and mouse RNA-seq samples #recount3: summaries and queries for large-scale RNA-seq expression and splicing Christopher Wilks @chrisnwilks research.libd.org/recount3-docs/ doi.org/10.1186/s13059-021-02533-6

Slide 20

Slide 20 text

Zoom in: snRNA-seq → deconvolution of bulk RNA-seq Matthew N Tran @mattntran Kristen R Maynard @kr_maynard Louise A Huuki-Myers @lahuuki Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks

Slide 21

Slide 21 text

What is Deconvolution? ● Inferring the composition of different cell types in a bulk RNA-seq data Louise A Huuki-Myers @lahuuki

Slide 22

Slide 22 text

Interaction eQTLs with cell type proportions github.com/LieberInstitute/goesHyde_mdd_rnaseq/tree/master/eqtl/code

Slide 23

Slide 23 text

Reference Single Cell Data 23 deconvolution(Y, Z) = Proportion of Cell Types Louise A Huuki-Myers @lahuuki

Slide 24

Slide 24 text

10x snRNA-seq Reference Data AMY DLPFC HPC NAc sACC Astro 1638 782 1170 1099 907 Endo 31 0 0 0 0 Macro 0 10 0 22 0 Micro 1168 388 1126 492 784 Mural 39 18 43 0 0 Oligo 6080 5455 5912 6134 4584 OPC 1459 572 838 669 911 Tcell 31 9 26 0 0 Excit 443 2388 623 0 4163 Inhib 3117 1580 366 11476 3974 @mattntran Matthew N Tran doi.org/10.1016/j.neuron.2021.09.001

Slide 25

Slide 25 text

Marker Finding 25 deconvolution(Y, Z) = Proportion of Cell Types Louise A Huuki-Myers @lahuuki

Slide 26

Slide 26 text

1vAll Markers vs. Mean Ratio Markers 26 Louise A Huuki-Myers @lahuuki research.libd.org/DeconvoBuddies/

Slide 27

Slide 27 text

1vAll Markers vs. Mean Ratio Markers 27 Louise A Huuki-Myers @lahuuki research.libd.org/DeconvoBuddies/

Slide 28

Slide 28 text

Methods 28 deconvolution(Y, Z) = Proportion of Cell Types Louise A Huuki-Myers @lahuuki

Slide 29

Slide 29 text

Which Method is the Most Accurate? ● Benchmarking shows that different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) ● Benchmarking results from different papers on “real” data ○ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ○ Bisque paper: Bisque > MuSiC > CIBERSORT ■ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ○ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b) ○ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq ■ Human PMBC flow sorted (Fig 7) 29 Louise A Huuki-Myers @lahuuki

Slide 30

Slide 30 text

Results + Validation 30 deconvolution(Y, Z) = Proportion of Cell Types Louise A Huuki-Myers @lahuuki

Slide 31

Slide 31 text

Mean Proportions By Region: Tran et al, bioRxiv, 2020 (5 donors, 6 cell types) Louise A Huuki-Myers @lahuuki

Slide 32

Slide 32 text

Peric = Mural + Endo Mean Proportions By Region: Tran et al, Neuron, 2021 (8 donors, 10 cell types) Louise A Huuki-Myers @lahuuki

Slide 33

Slide 33 text

Sean Maden @MadenSean Sang Ho Kwon @sanghokwon17 #deconvochallenge doi.org/10.48550/arXiv.2305.06501

Slide 34

Slide 34 text

#deconvochallenge Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets doi.org/10.48550/arXiv.2305.06501 Sean Maden @MadenSean

Slide 35

Slide 35 text

Sean Maden @MadenSean Sang Ho Kwon @sanghokwon17 #deconvochallenge doi.org/10.48550/arXiv.2305.06501

Slide 36

Slide 36 text

research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki

Slide 37

Slide 37 text

Zoom in: spatial omics Kristen R Maynard @kr_maynard Keri Martinowich @martinowk Stephanie C Hicks @stephaniehicks Andrew E Jaffe @andrewejaffe Stephanie C Page @CerceoPage

Slide 38

Slide 38 text

Visium Platform for Spatial Gene Expression Image from 10x Genomics - A slide contains 4 capture areas, each full of thousands of 55um-wide “spots” (often containing 1-10 cells) - Unique barcodes in each spot bind to particular genes; after sequencing, gene expression can be tied back to exact spots, forming a spatial map Kristen R. Maynard 38

Slide 39

Slide 39 text

2 pairs spatial adjacent replicates x subject = 12 sections 39 Subject 1 Subject 2 Subject 3 Adjacent spatial replicates (0μm) Adjacent spatial replicates (300μm) PCP4 Maynard, Collado-Torres, et al, Nat Neuro, 2021

Slide 40

Slide 40 text

“Pseudo-bulking” collapses data: spot to layer level 40 Maynard, Collado-Torres, et al, Nat Neuro, 2021

Slide 41

Slide 41 text

bioconductor.org/packages/spatialLIBD Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w Maynard, Collado-Torres, 2021 DOI 10.1038/s41593-020-00787-0 Brenda Pardo Abby Spangler @PardoBree @abspangler Louise A. Huuki-Myers @lahuuki

Slide 42

Slide 42 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 Andrew E Jaffe @andrewejaffe Kristen R Maynard @kr_maynard Keri Martinowich @martinowk

Slide 43

Slide 43 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects Stephanie C Hicks @stephaniehicks Lukas M Weber @lmweber

Slide 44

Slide 44 text

DOI: 10.1038/s41593-020-00787-0 twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29 twitter.com/CrowellHL/status/1597579271945715717 DOI 10.1093/bioinformatics/btac299 Since Feb 2020 spatialLIBD::fetch_data() provides access to SpatialExperiment R/Bioconductor objects

Slide 45

Slide 45 text

#spatialDLPFC 45 doi.org/10.1101/2023.02.15.528722 Louise A Huuki-Myers @lahuuki Abby Spangler @abspangler Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 46

Slide 46 text

Different Resolutions of BayesSpace Clustering k = number of clusters ● k=2: separate white vs. grey matter ● k=9: best reiterated histological layers ● k=16: data-driven optimal k based on fast H+ statistic 46 More Clusters = More Complexity doi.org/10.1101/2023.02.15.528722

Slide 47

Slide 47 text

Spatial Registration Adds Anatomical Context ● Validate detection of laminar structure ● Correlate enrichment t-statistics for top marker genes of reference ○ Cluster vs. manual annotation ● Annotate with strongly associated histological layer 47 Sp k D d ~L doi.org/10.1101/2023.02.15.528722

Slide 48

Slide 48 text

Spatial Registration of Spatial Domains ● Map SpDs to Maynard et al. manual annotated layers ● Highlight most strongly associated histological layer to add biological context 48 doi.org/10.1101/2023.02.15.528722

Slide 49

Slide 49 text

Identify Layer Associated Neuron Populations 49 ● Apply Spatial Registration with manual layers ● 13 layer-level cell types ○ Assign Excitatory Neurons histological layers ○ Pool other cell type groups Kelsey D Montgomery

Slide 50

Slide 50 text

Layer-level Cell Type Marker genes 50 Louise A Huuki-Myers @lahuuki

Slide 51

Slide 51 text

Spot Deconvolution 51 Cell 1 Cell 2 … Cell N Gene 1 0 0 … 0 Gene 2 2 5 … 3 … … … … … Gene i 1 0 … 0 Spot 1 Spot 2 … Spot M Gene 1 1 0 … 3 Gene 2 0 1 … 0 … … … … … Gene j 4 2 … 2 Astro Excit … Inhib Spot 1 1 1 … 1 Spot 2 … … … … … … Spot M 1 0 … 2 Single- Nucleus Spatial Deconvolved Results Spot 1 Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 52

Slide 52 text

Existing Spot Deconvolution Software - Explored 3 novel software methods from the literature Software name Overall approach Input Cell Counts Output Tangram (Biancalani et al.) Mapping individual cells Every spot Integer counts Cell2location (Kleshchevnikov et al.) Matching gene-expression profile Average across spots Decimal counts SPOTlight (Elosua-Bayes et al.) Matching gene-expression profile Not used Proportions 52 Excit L5 Counts

Slide 53

Slide 53 text

Benchmarking Spot Deconvolution Software: Theory - How do we measure performance or accuracy of cell-type predictions? - Make orthogonal measurements*: image-derived counts - Leverage prior knowledge: neurons localize to gray matter? - Self-consistency of results: broad vs. fine cell-type results 53 Nicholas J Eagles @Nick-Eagles (GitHub) doi.org/10.1101/2023.02.15.528722

Slide 54

Slide 54 text

Visium Spatial Proteogenomics (SPG) Images as an Orthogonal Measurement 54 Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 55

Slide 55 text

Visium Spatial Proteogenomics (Visium-SPG) Visium-SPG = Visium SRT + immunofluorescence (using identical tissue samples) Sang Ho Kwon @sanghokwon17

Slide 56

Slide 56 text

Visium Spatial Proteogenomics (Visium-SPG) - Gene expression captured like ordinary Visium - Multi-channel fluorescent images captured of the same tissue - Channels measure proteins marking for specific cell types Kristen R. Maynard 56 Sang Ho Kwon Visium-SPG = Visium SRT + immunofluorescence (using identical tissue samples) Fluorescent Protein Cell Type TMEM119 Microglia Neun Neurons OLIG2 Oligodendrocytes GFAP Astrocytes

Slide 57

Slide 57 text

57 Max across layers Not max Benchmark Results: Leverage Prior Knowledge

Slide 58

Slide 58 text

Benchmark Results: Leverage Prior Knowledge 58

Slide 59

Slide 59 text

Benchmark Summary 59 Metric Tangram Cell2location SPOTlight Metric Type Avg. cor (spot-level) 0.31 0.30 0.21 Orthogonal measurements Avg. RMSE (spot-level) 1.35 1.24 1.3 Orthogonal measurements Overall prop.: (KL Div.) 0.44 0.49 0.41 Orthogonal measurements Overall prop.: (cor.) 0.46 0.37 0.47 Orthogonal measurements Overall prop.: (RMSE) 3020 3890 3040 Orthogonal measurements Histological mapping 0.69 0.77 0.23 Leverage known biology Broad vs. layer (cor.) 1.00 0.77 -0.36 Self-consistency of results Broad vs. layer (RMSE) 102 4200 4220 Self-consistency of results

Slide 60

Slide 60 text

Viewing Spot Deconvolution Results: Samui Browser - View: - Fluorescence channels - Spot deconvo results - Segmented cells - Gene expression - Interactive - Quickly zoom/scroll - Full-resolution images samuibrowser.com/from?url=data2.loopybrowser.com/VisiumIF/&s=Br2720_Ant_IF&s=Br6432_Ant_IF&s=Br6522_Ant_IF&s=Br8667_Post_IF Sriworarat, 2023. 60

Slide 61

Slide 61 text

Viewing Spot Deconvolution Results: spatialLIBD apps - View: - spot deconvolution results - spatial domains/ clusters - gene expression - Huge amount of aesthetic customization 61 https://libd.shinyapps.io/spatialDLPFC_Visium_SPG/

Slide 62

Slide 62 text

How Spot Deconvolution Results Were Used A. Better characterize unsupervised spatial domains B. Cell-cell communication; cell-type-informed ligand-receptor interactions in the context of schizophrenia risk A 62 Boyi Guo Melissa Grant-Peters

Slide 63

Slide 63 text

Visium spatial clustering works for variables with high % variance explained. But what about other ones? DOI: 10.1038/s41593-020-00787-0

Slide 64

Slide 64 text

twitter.com/sanghokwon17/status/1650589385379962881 from 2023-04-24 Sang Ho Kwon @sanghokwon17 DOI: 10.1101/2023.04.20.537710 #Visium_SPG_AD

Slide 65

Slide 65 text

Experimental design & study overview Braak V-VI & CERAD frequent Sang Ho Kwon

Slide 66

Slide 66 text

AD pathology signal is too small to detect by spatially-resolved gene expression alone research.libd.org/Visium_SPG_AD/

Slide 67

Slide 67 text

Identifying transcriptional signatures of AD-related neuropathology Sang Ho Kwon

Slide 68

Slide 68 text

Some challenges ⚠ with Visium 68

Slide 69

Slide 69 text

sc/snRNA-seq QC metrics such as # detected genes, # UMI, mitochondria expression % are likely biologically related!

Slide 70

Slide 70 text

Prashanthi Ravichandran @prashanthi-ravichandran (GH) Artifacts in general are normalized away by library size, though there are caveats

Slide 71

Slide 71 text

Diffusion issues: could be related to permeabilization step

Slide 72

Slide 72 text

Having more data is useful to provide context! Here 4 new samples have low sequencing saturation (outliers) but are within range of good samples from other studies

Slide 73

Slide 73 text

Having more data is useful to provide context! Those 4 samples have great median UMI counts per spot ^_^

Slide 74

Slide 74 text

Software keeps evolving and as leaders in the field we aim to use the best methods 74 Moses, L., Pachter, L. Museum of spatial transcriptomics. Nat Methods 19, 534–546 (2022). https://doi.org/10.1038/s41592-022-01409-2

Slide 75

Slide 75 text

The Development Process - Making a module - New, experimental software can change dramatically (function and syntax) between versions - Promotes collaboration by allowing two researchers to share exact code and instantly run software without special set-up SpatialExperiment release 3.14 SpatialExperiment devel 3.15 module load tangram/1.0.2 module load cell2location/0.8a0 module load spagcn/1.2.0 https://github.com/LieberInstitute/jhpce_mod_source https://github.com/LieberInstitute/jhpce_module_config Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 76

Slide 76 text

The Development Process - Regular interaction with software authors to clarify functionality and report bugs - Documentation for code and author responsiveness on GitHub can be critical in successfully applying software to our data Nicholas J Eagles @Nick-Eagles (GitHub)

Slide 77

Slide 77 text

Documentation + wrapper functions + tests (GitHub Actions + Bioconductor) 77 bioconductor.org/packages/spatialLIBD

Slide 78

Slide 78 text

More challenges ahead Working with multiple capture areas per tissue Nicholas J Eagles @Nick-Eagles (GitHub) Prashanthi Ravichandran @prashanthi-ravichandran (GH) Spot diameter error: ~1.8 → ~1.1 Another pair: ~2.8 → ~0.76

Slide 79

Slide 79 text

lcolladotor.github.io/#projects ● Every assay has caveats ● We re-use tricks: think adding 0, multiplying by 1 ● It nearly always takes a team ● Data sharing accelerates science + democratizes access to it ● Zooming in allows us to reduce the heterogeneity ● We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?

Slide 80

Slide 80 text

@MadhaviTippani Madhavi Tippani @HeenaDivecha Heena R Divecha @lmwebr Lukas M Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran

Slide 81

Slide 81 text

lcolladotor.github.io @lcolladotor