present and preparing the future Psychgenomics: Ichan School of Medicine Leonardo Collado Torres @lcolladotor 2022-03-03 https://lcolladotor.github.io/bioc_team_ds/
Huuki-Myers @kr_maynard Kristen R Maynard @stephaniehicks Stephanie C Hicks @martinowk Keri Martinowich NIH grant R01MH123183 @CerceoPage Stephanie C Page
made of complex tissues consisting of different types of cells • Some Dx associated with changes in cell type specific expression ◦ Ex. Pitt-Hopkins syndrome and oligodendrocytes (Phan et al, Nature Neuroscience, 2020) 3
bulk RNA-seq data What is Deconvolution? Tissue Bulk RNA-seq snRNA-seq Estimated proportions 4 Deconvolution Get single cell like data from bulk RNA-seq $$$ $ Free! https://twitter.com/BoXia7/status/1261464021322137600
can differ in cell type composition due to biology or dissection • Different cell types express genes at different levels • Check for differences between groups (i.e. case vs. control) • Controlling for cell fractions between samples can make differential expression analysis cleaner ◦ Confounding factor in differential expression analysis ◦ Deconvoluting data can prevent false-positives and false-negatives 5
Data selection ◦ Region specific or all available data ◦ Cell type resolution: Broad vs. specific ◦ NeuN + non-Neun sorted • Marker Gene Selection ◦ How to find best markers? ◦ How many markers? • Validating Results ◦ What do we expect to find? In the Field • Differences between snRNA-seq and bulk RNA-seq ◦ Nuclear vs. cytoplasmic compartments ◦ Missing cell types in reference • Cell sizes • RNA-fraction vs. Cell fraction Sosina et al, F1000Research, 2021 Future Work: Maynard + Hicks R01MH123183 6
specific expression profiles, regress out optimal cell proportions from bulk data ◦ Examples: CIBERSORT, OLS, nnls, RLR, deconvoSeq • Single Cell RNA-seq reference based ◦ Use a annotated single cell expression data as reference ◦ Examples: DWLS, MuSiC, Bisque, SCDC, SPLITR 8
MuSiC Wang et al, Nature Communications, 2019 W-NNLS regression (Weighted - Non-negative least squares) None Tree guided deconvolution, good for closely related cell types Bisque Jew et al, Nature Communications, 2020 NNLS regresion Gene specific transformation of bulk data Leverage overlapping bulk & sc data SCDC Dong et al, Briefings in Bioinformatics, 2020 W-NNLS framework proposed by MuSiC Option for Gene specific transformation of bulk data (from Bisque) Multiple reference datasets can be used, results combined with ENSEMBL weights DWLS Tsoucas, Nature Communications, 2019 Dampened Weighted least squares None 9
vs. Bulk Tissues Tested Consider Cell Size Reference Set MuSiC W-NNLS Min. Internal Weighting No Pancreatic Islet, Rat & Mouse Kidney Yes Bisque NNLS Min. No Yes Adipose, DLPFC Recommend 3+ donors SCDC W-NNLS Min. Internal Weighting Yes Pancreatic Islet, mouse mammary Can input multiple references DWLS DWLS Hours Internal Selection No Mouse kidney, lung, liver, small intestine 10
expressed between cell types • Historically ◦ Know markers associated with key cell types ◦ Ex. MBP: major constituent of the myelin sheath, marker for oligodendrocytes • What does the Data tell us? ◦ Human vs. model organisms ◦ Regional ◦ Technical differences 18
and bulk data • Looking for genes expressed in only one cell type ◦ Test for specificity of each gene for each cell type • Observe expression of selected marker genes ◦ Heat maps of pseudo-bulked data ▪ Summation of counts from nuclei from one donor + cell type ◦ Violin plots by cell type Marker Genes shared by sn & bulk RNA-seq The Ideal Heatmap snRNA-seq data, Pseudo-bulked by cell type + donor 19 Stephanie C Hicks
cell type vs. all other cells • Select genes with highest fold change • Observed lots of noise between different cell types snRNA-seq sACC Pseudo-bulked by cell type + donor The Actual Heatmap 20
cell type Mean Expression highest non-target cell type = Mean Ratio Higher mean ratio: • the more specific that gene is to the target cell type • the better a marker gene it is 23
ratio also have high fold changes • Not all genes with high fold changes have high mean ratios • Selecting marker genes by mean ratio helps avoid “noisy” genes 24
in the “worst” cell type ◦ Least amount of signal ◦ Balance overfitting vs. adding noise ◦ Looking at Inhib: we chose 25 markers • Same number for each cell type • Typical methods [ findMarkers() ] are useful for finding cell identity markers, not necessarily good cell decomposition markers
specific cell types • We are looking for genes with big differences between cell types: hence why it’s easier to work at the broad cell type resolution 28
• Robust to marker set • Robust to library prep • More reasonable estimates on GTEx dataset • With final reference dataset (8 donors) we have more than the 4 donors recommended
• Deconvolute three large human brain datasets ◦ GTEx, MAYO, ROSMAP • Introduce new deconvolution algorithm: SPLITR • 48 donor reference set • Validate with previous composition estimates + immunostaining • Variation across brain regions Figure 2
vs. SPLITR • MuSiC predicts large proportions of Endo + Mural (Peric) • Both estimate lower proportions of Excit ◦ MuSiC is more extreme and also predicts low portion Inhib Bisque & MuSiC vs SPLITR Different deconvolution methods, bulk RNA-seq data source, marker genes, and reference snRNA-seq data
different cell sizes and transcriptional activity ◦ Ex. Neurons are more active than Glia ◦ Can only predict fraction of RNA not fraction of cells in tissue ◦ Sosina et al., F1000Research, 2021 MDDseq Bisque
◦ Kristen Maynard, Stephanie C Hicks • Use six slices of DLPFC to generate corresponding RNA-seq & RNAscope data • This information will be useful to evaluate and design deconvolution algorithms DLPFC Bulk RNA-seq snRNA-seq Spatial RNAscope RNAscope 47 polyA RiboZero
and RNA content between cell types • Use smFISH with RNAscope to establish data set of: ◦ Cellular composition ◦ Nuclei sizes of major cell types ◦ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG
Expression is proportional to the overall RNA expression in a cell • In smFISH the count of TREG puncta in a cell can estimate the RNA content ◦ Linking RNA content to nucleus size
• Expression tracks with total expression (should vary) • Involved in basic cellular function • Consistent across tissues • Expressed evenly across tissue • expressed in every cell TREGs We will compare candidate TREGs to housekeeping gene POLR2A to examine differences
across tissues and cell types • Expressed at a constant level in respect to other genes across different cell types RNAscope • Expressed in top 50% of genes for easy detection • Show a dynamic range of puncta* • Individual puncta are countable* * Will need to evaluated experimentally
snRNA-seq ◦ Tran, Maynard et al., Neuron, 2021 ◦ Eight donors, 5 Brain Regions ◦ Nine broad cell types • smFISH with RNAscope in DLPFC tissue ◦ 3 tissue sections from one donor Method Overview 1. Filter to top 50% expressed genes 2. Filter out genes with high proportion of zero expression 3. Select genes with high Rank Invariance as candidate TREGs 4. Evaluate candidate gene performance in RNAscope experiment @mattntran Matthew N Tran Kelsey D Montgomery Sang Ho Kwon
genes in dataset • Compute Proportion Zero • Used distribution to pick 0.75 as max Proportion Zero cutoff ◦ For a gene to pass the maximum proportion of zeros among groups must be less than the cutoff • Only 877 (3.8%) of genes pass ◦ POLR2A doesn’t pass
Rank of a gene within a cell type? Between cell types? • Expressed at a constant level in respect to other genes = High Rank Invariance ◦ Evaluate within and between cell types Expression Rank
candidate TREGs ◦ In top 10 RI values, and has available probes • TREGs show consistent Expression Ranks within all cells and with in most cell types Expression Rank Expression Rank Expression Rank
slides ◦ 90k-100k per sample • Some QC was required to remove out of focus regions • HALO data tracks really well with what we see visually in the slides and expected anatomy of the tissue ◦ Neurons are larger ◦ Neurons are transcriptionally more active ◦ Excit, Inhib and Oligo are where you would expect them to be Typo** GAD1
cells • AKT3 tracks really well with pattern of expression seen in snRNA-seq (ARID1B is also pretty good) snRNA-seq RNAscope Gene Mean Prop. Cells with Expression Prop. non-zero in DLPFC snRNA Standardized β (95% CI) AKT3 0.948 0.92 -1.38 (-1.39,-1.37) ARID1B 0.908 0.94 -0.62 (-0.62,-0.61) MALAT1 0.910 1.00 -0.11 (-0.12,-0.11) POLR2A 0.853 0.30 -0.98 (-0.99,-0.98) snRNA-seq NA NA -1.33 (-1.35,-1.31) Remember: MALAT1’s puncta data is unreliable
of candidate TREGs in snRNA-seq • AKT3 appears to be a TREG compatible with RNAscope in the human brain • TREGs would allow the observation of total RNA expression via RNAscope ◦ Allow us to collect data that can be used to improve deconvolution algorithms
over MuSiC for postmortem human brain data • You might want to use the mean ratio method for selecting snRNA-seq cell type marker genes • You’ll benefit from having more than 3 donors (we had 8) in your snRNA-seq data • Plan to generate RNAscope data for some major cell types using adjacent tissue dissections: could be a gold standard • Use a TREG (AKT3) in your RNAscope to estimate nuclei RNA and size: could be useful input parameters in future deconvolution algorithms https://research.libd.org/DeconvoBuddies/ http://research.libd.org/TREG/
Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @JoshStolz2 Joshua M Stolz Moods Workgroup Fernando Goes Patricia Braun Peter Zandi Thomas Hyde Joel Kleinman Christopher Ross Shizhong Han @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban https://www.libd.org/careers/