Estimating cell type composition in whole blood using differentially methylated regions Stephanie Hicks Assistant Professor, Biostatistics Johns Hopkins Bloomberg School of Public Health
Data from GSE32148 20 30 40 50 60 70 0.02 0.06 0.10 Age Methylation DNA methylation in whole blood correlates with age at this one CpG Slide courtesy of A. Jaffe and R. Irizarry
Blood is a mixture of many cell types NK NK NK NK NK NK CD8T CD8T CD8T CD8T CD8T CD8T CD4T CD4T CD4T CD4T CD4T CD4T Gran Gran Gran Gran Gran Gran Bcell Bcell Bcell Bcell Bcell Bcell Mono Mono Mono Mono Mono Mono CpGs Cell types Whole blood cell types: • Tcells • CD8T • CD4T • Natural Killer • Bcells • Granulocytes • Monocytes Bioconductor data package available: • Data originally from Reinius et al. (2012) > library(FlowSorted.Blood.450k)
Jaffe and Irizarry (2014). Genome Biology • Different cell compositions in whole blood imply different observed whole blood DNA methylation profiles • Important to estimate differences in cell composition Cell composition changes with age
Statistical Model: Houseman et al. (2012) Y ij = πik k=1 K ∑ X jk +εij = + Y (Jx1) X (JxK) = E (Jx1) π (Kx1) J CpGs K cell type profiles whole blood sample i = (1,..., N) = whole blood samples j = (1,...., J) = CpGs k = (1,...,K) = cell type profiles Measurement error relative cell type proportions NK NK NK NK NK NK CD8T CD8T CD8T CD8T CD8T CD8T CD4T CD4T CD4T CD4T CD4T CD4T Gran Gran Gran Gran Gran Gran Bcell Bcell Bcell Bcell Bcell Bcell Mono Mono Mono Mono Mono Mono
New platform technologies emerging First approach • Apply Houseman method using new platform technology Problems with this approach 1. Not all CpGs are included in new platforms 2. Observed methylation levels depend on platform used
New platform technologies emerging First approach • Apply Houseman method using new platform technology Problems with this approach 1. Not all CpGs are included in new platforms 2. Observed methylation levels depend on platform used
Platform-dependent differences between 450k array and RRBS platforms 0 50 100 0.00 0.25 0.50 0.75 1.00 Methylation density Regions Not methylated Methylated Platform 450k
Platform-dependent differences between 450k array and RRBS platforms 0 50 100 0.00 0.25 0.50 0.75 1.00 Methylation density Regions Not methylated Methylated Platform 450k RRBS
Platform-dependent differences between 450k array and RRBS platforms 0 50 100 0.00 0.25 0.50 0.75 1.00 Methylation density Regions Not methylated Methylated Platform 450k RRBS
Use informative genomic regions that are clearly methylated or unmethylated for each cell type 1. Initialize parameter values 2. Use EM algorithm for estimation Estimation θi (0) = (πi1 (0),πi2 (0),...,πiK (0),α0 (0),α1 (0),(σ0 2 )(0),(σ1 2 )(0),(σ 2 )(0) )
Just need the conditional distributions: Constructing the likelihood Complete-data likelihood: Complete-data vector: i = (1,..., N) = whole blood samples r = (1,...., R) = differentially methylated regions k = (1,...,K) = cell types
For more information methylCC: https://github.com/stephaniehicks/methylCC Comments/Suggestions: email: [email protected] GitHub & Twitter: @stephaniehicks Pre-print on bioRxiv: https://www.biorxiv.org/content/early/2017/11/03/213769 CCG Me