$30 off During Our Annual Pro Sale. View Details »

methylCC presentation

methylCC presentation

Stephanie Hicks

April 24, 2018
Tweet

More Decks by Stephanie Hicks

Other Decks in Research

Transcript

  1. Estimating cell type composition
    in whole blood using
    differentially methylated regions
    Stephanie Hicks
    Assistant Professor, Biostatistics
    Johns Hopkins Bloomberg School of Public Health

    View Slide

  2. ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    m
    m
    m
    m
    What is DNA Methylation?

    View Slide

  3. What is DNA Methylation?
    ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    m

    View Slide

  4. What is DNA Methylation?
    ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    m

    View Slide

  5. Data from GSE32148
    20 30 40 50 60 70
    0.02 0.06 0.10
    Age
    Methylation
    DNA methylation in whole blood
    correlates with age at this one CpG
    Slide courtesy of A. Jaffe and R. Irizarry

    View Slide

  6. Blood is a mixture of many cell types
    NK
    NK
    NK
    NK
    NK
    NK
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    Gran
    Gran
    Gran
    Gran
    Gran
    Gran
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Mono
    Mono
    Mono
    Mono
    Mono
    Mono
    CpGs
    Cell types
    Whole blood cell types:
    • Tcells
    • CD8T
    • CD4T
    • Natural Killer
    • Bcells
    • Granulocytes
    • Monocytes
    Bioconductor data package available:
    • Data originally from Reinius et al. (2012)
    > library(FlowSorted.Blood.450k)

    View Slide

  7. Jaffe and Irizarry (2014). Genome Biology
    • Different cell compositions in whole blood imply different
    observed whole blood DNA methylation profiles
    • Important to estimate differences in cell composition
    Cell composition changes with age

    View Slide

  8. Statistical Model: Houseman et al. (2012)
    Y
    ij
    = πik
    k=1
    K
    ∑ X
    jk
    +εij
    = +
    Y
    (Jx1)
    X
    (JxK)
    = E
    (Jx1)
    π
    (Kx1)
    J CpGs
    K cell type profiles
    whole blood sample
    i = (1,..., N) = whole blood samples
    j = (1,...., J) = CpGs
    k = (1,...,K) = cell type profiles
    Measurement
    error
    relative cell type
    proportions
    NK
    NK
    NK
    NK
    NK
    NK
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    Gran
    Gran
    Gran
    Gran
    Gran
    Gran
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Mono
    Mono
    Mono
    Mono
    Mono
    Mono

    View Slide

  9. New platform technologies emerging
    First approach
    • Apply Houseman method using new platform technology

    View Slide












  10. ●●





    ●●



    ● ●





























    Mono Tcell
    Bcell Gran
    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    450K platform
    RRBS platform
    Cell composition estimates from whole blood samples
    measured on two platforms (Houseman method)
    450k samples (n = 10):
    (485,512 CpGs)
    RRBS samples (n = 10):
    (6,823,620 CpGs)
    Total CpG overlaps:
    (142,002 CpGs)
    Houseman cell type-
    specific CpG overlaps:
    (91/600 CpGs)
    Consider n = 10 whole
    blood sample measured
    on two platforms:
    • 450k (microarray)
    • RRBS (sequencing)

    View Slide

  11. New platform technologies emerging
    First approach
    • Apply Houseman method using new platform technology
    Problems with this approach
    1. Not all CpGs are included in new platforms
    2. Observed methylation levels depend on platform used

    View Slide

  12. Chromosome 14
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation































































































































    ● ●











    ● CD8T ● CD4T ● NK ● Bcell ● Mono ● Gran
    CpG
    DMR
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation


    ● ●
    ●● ●
    ● ●

    ●● ●



    ● 450k ● RRBS
    102.6767 mb
    102.6768 mb
    102.6769 mb
    102.677 mb
    102.6771 mb
    102.6772 mb
    102.6773 mb
    Cell types preserve their
    methylation state across regions
    Beta values
    (Purified cell types on
    measured on
    microarray platform)
    Beta values
    (One whole blood
    sample measured on
    sequencing platform)

    View Slide

  13. Cell types preserve their
    methylation state across regions
    Cell type-specific CpG
    Cell type-specific region
    Beta values
    (Purified cell types on
    measured on 450k array)
    • Identify regions using
    bumphunter BioC pkg
    Chromosome 14
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation































































































































    ● ●











    ● CD8T ● CD4T ● NK ● Bcell ● Mono ● Gran
    CpG
    DMR
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation


    ● ●
    ●● ●
    ● ●

    ●● ●



    ● 450k ● RRBS
    102.6767 mb
    102.6768 mb
    102.6769 mb
    102.677 mb
    102.6771 mb
    102.6772 mb
    102.6773 mb
    Beta values
    (One whole blood sample)
    Microarray
    platform
    Sequencing
    platform
    Using
    CpGs
    0.45 NA
    Using
    Regions
    0.55 0.50

    View Slide

  14. New platform technologies emerging
    First approach
    • Apply Houseman method using new platform technology
    Problems with this approach
    1. Not all CpGs are included in new platforms
    2. Observed methylation levels depend on platform used

    View Slide

  15. Chromosome 6
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation























































































































































































































































































































































































































    ●●










































































































































































































































































































    ● ●


















































































    ● ●


































    ● ●














































































































































































































































































    ● ●
































    ●●

















































    ● ●




    ● ●


    ● ●


    ● ●
    ●●














    ● ●




    ● ●














    ●●
    ● ●













    ● 450k ● RRBS
    33.257 mb
    33.258 mb
    33.259 mb
    33.26 mb
    33.261 mb
    33.262 mb
    33.263 mb
    33.264 mb
    Platform-dependent differences between
    microarray (450k) and sequencing (RRBS) platforms

    View Slide

  16. 0
    50
    100
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k
    Platform-dependent differences
    between 450k array and RRBS platforms
    Chromosome 6
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation



































































































































































































































































































































































































































































































































    ●●










































































































    ● ●






























































































    ● ●





































































































































































































































































































































































































































































    ● ●



















































































    ● ●


















    ● ●










    ●●































    ● 450k ● RRBS
    33.284 mb
    33.285 mb
    33.286 mb
    33.287 mb
    33.288 mb
    33.289 mb

    View Slide

  17. Platform-dependent differences
    between 450k array and RRBS platforms
    0
    50
    100
    0.00 0.25 0.50 0.75 1.00
    Methylation
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k

    View Slide

  18. Platform-dependent differences
    between 450k array and RRBS platforms
    0
    50
    100
    0.00 0.25 0.50 0.75 1.00
    Methylation
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k
    RRBS

    View Slide

  19. Recall Houseman Model:
    = +
    Y
    (Jx1)
    X
    (JxK)
    = E
    (Jx1)
    π
    (Kx1)
    J CpGs
    1 whole
    blood sample
    relative cell type
    proportions
    Measurement
    error
    0.78 0.77 0.85 0.82
    0.05 0.73 0.81 0.77
    0.79 0.02 0.73 0.84
    0.83 0.80 0.03 0.78
    0.87 0.89 0.83 0.07
    ! ! ! !
    0.06 0.09 0.81 0.08
    0.07 0.06 0.03 0.77
    0.02 0.04 0.08 0.03
    Platform-dependent
    methylation profiles
    Y
    ij
    = πik
    k=1
    K
    ∑ X
    jk
    +εij
    i = (1,..., N) = whole blood samples
    j = (1,...., J) = CpGs
    k = (1,...,K) = cell type profiles

    View Slide

  20. Our proposed model:
    = +
    Y
    (Rx1)
    X
    (RxK)
    = E
    (Rx1)
    π
    (Kx1)
    R regions
    1 whole
    blood sample
    relative cell type
    proportions
    Measurement
    error
    r = (1,...., R) = differentially methylated regions
    k = (1,...,K) = cell types
    +
    1-Z
    (RxK)
    δ0
    δ1
    Z
    (RxK)
    1 1 1 1
    0 1 1 1
    1 0 1 1
    1 1 0 1
    1 1 1 0
    ! ! ! !
    0 0 1 0
    0 0 0 1
    0 0 0 0
    0 0 0 0
    1 0 0 0
    0 1 0 0
    0 0 1 0
    0 0 0 1
    ! ! ! !
    1 1 0 1
    1 1 1 0
    1 1 1 1
    Z
    rk
    =
    1 if region r and cell type k is methylated
    0 otherwise





    0.05
    0.08
    0.02
    0.04
    0.05
    !
    0.09
    0.07
    0.06
    0.87
    0.89
    0.75
    0.82
    0.79
    !
    0.81
    0.76
    0.90
    !"
    ~
    $%
    ~
    $&
    ~

    View Slide

  21. Platform-dependent differences
    between 450k array and RRBS platforms
    0
    50
    100
    0.00 0.25 0.50 0.75 1.00
    Methylation
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k
    RRBS

    View Slide

  22. Our proposed model:
    = +
    Y
    (Rx1)
    X
    (RxK)
    = E
    (Rx1)
    π
    (Kx1)
    R regions
    1 whole
    blood sample
    relative cell type
    proportions
    Measurement
    error
    r = (1,...., R) = differentially methylated regions
    k = (1,...,K) = cell types
    +
    1-Z
    (RxK)
    δ0
    δ1
    Z
    (RxK)
    1 1 1 1
    0 1 1 1
    1 0 1 1
    1 1 0 1
    1 1 1 0
    ! ! ! !
    0 0 1 0
    0 0 0 1
    0 0 0 0
    0 0 0 0
    1 0 0 0
    0 1 0 0
    0 0 1 0
    0 0 0 1
    ! ! ! !
    1 1 0 1
    1 1 1 0
    1 1 1 1
    Z
    rk
    =
    1 if region r and cell type k is methylated
    0 otherwise





    0.05
    0.08
    0.02
    0.04
    0.05
    !
    0.09
    0.07
    0.06
    0.87
    0.89
    0.75
    0.82
    0.79
    !
    0.81
    0.76
    0.90
    !"
    ~
    $%
    ~
    $&
    ~

    View Slide

  23. 200
    150
    100
    50
    Cell type−specific DNAm profiles
    Methylated (black), Unmethylated (white)
    Cell types
    Regions (R=212)
    Tcell Bcell Mono Gran
    the “Z” matrix

    View Slide

  24. Use informative genomic regions that are clearly methylated or
    unmethylated for each cell type
    1. Initialize parameter values
    2. Use EM algorithm for estimation
    Estimation
    θi
    (0) = (πi1
    (0),πi2
    (0),...,πiK
    (0),α0
    (0),α1
    (0),(σ0
    2 )(0),(σ1
    2 )(0),(σ 2 )(0) )

    View Slide

  25. Just need the conditional
    distributions:
    Constructing the likelihood
    Complete-data likelihood:
    Complete-data vector:
    i = (1,..., N) = whole blood samples
    r = (1,...., R) = differentially methylated regions
    k = (1,...,K) = cell types

    View Slide

  26. Theorem
    If
    Conditional distribution
    X
    ((r+s)×1)
    =
    (X
    1
    )
    (r×1)
    (X
    2
    )
    (s×1)








    ~ N
    r+s
    (µ,Σ)
    where µ((r+s)×1)
    =
    µ1
    µ2








    and Σ =
    Σ11
    Σ12
    Σ21
    Σ22








    X
    2
    | X
    1
    ~ N
    s
    (µ2
    + Σ21
    Σ11
    −1(X
    1
    − µ1
    ), Σ22
    − Σ21
    Σ11
    −1Σ12
    )
    Then,

    View Slide

  27. where
    (Similar step for )
    Conditional distribution
    Use conditional distribution for the Expectation Step

    View Slide

  28. Maximization Step
    !"
    = $
    %&"
    '
    (),%
    !+
    = $
    %&"
    '
    (",%
    !,
    = $
    %&"
    '
    (),%
    +
    !-
    = $
    %&"
    '
    (",%
    +
    where

    View Slide

  29. Maximization Step
    Use quadratic programming:
    solve.QP() in quadprog R package
    (nonnegative and constrained)

    View Slide

  30. How does our model perform?

    View Slide











  31. ● ●
















    ●●
















































    ●●

















    ●●





    ● ●


































    ● ●



































































    ●●











































































    ●●




    ● ●




























































    ● ●
































    ●●



    ● ●


    ● ●















    ● ●















    ● ●













    ● ●






































    ● ●













































    ● ●


































    ● ●


















    ●●








    ● ●



























    ● ●





































































    ●●



















    ● ●


    ● ●


































    ● ●










    ●●






    ●●














































    ●●








































    ● ●






















    ● ●

    ●●















    ● ●












    ● ●










    ● ●









    ● ●

















































    ● ●





    ●●






    ● ●



































































    ●●


































































































    ● ●
    ●●



    ● ●














    ● ●



    ● ●
















































    ● ●




































































    ● ●








    ● ●

    ●●




















    ● ●


















    ●●


































































    ● ●





    ● ●














































    ● ●






































    ● ●


















    ●●






    ●●





    ● ●



    ● ●














    ● ●





















    ● ●


    ● ●

















    ● ●

    ● ●




    ● ●





































    ●●






    ●●


















    ● ●





    ●●


    ●●
























    ●●












































    ●●












    ●●
    ●●













    ● ●





    ●●
















    ●●









    ●●





    ●●













    ● ●


    ●●


































    ● ●















    ● ●

































    ●●







    ● ●
    ● ●



    ●●


    ●●


    ●●













    ●●







    ● ●


















    ● ●




    ● ●











    ●●




    ● ●




    ●●

    ● ●●










































    ●●





















    ● ●







    ●●







    ● ●













    ●●

























    ●●






    ●●






    ● ●




    ● ●









    ● ●


    ● ●





    ● ●




    ●●



    ●●








    ● ●















    ●●




    ● ●





























    ● ●




    ●●●











    ●●

    ● ●


















    ● ●






    ●●




    ● ●









    ● ●






























    ● ●

    ●●






    ● ●

    ● ●




    ● ●


















    ● ●





    ●●










    ●●


    ● ●
















    ● ●




    ● ●





    ●●



    ●●























    ●●







    ● ●
    ● ●

































    ●●





    ●●





    ●●
    ●●







    ● ●




    ● ●

















    ● ●






















    ●●













    ● ●



































    ●●

    ● ●


    ●●











    ● ●

































    ●●









    ● ●











    ●●



    ●●








    ●●







    ● ●








    ● ●

    ●●





    ● ●




    ● ●











    ●●
    ● ●












    ●●










    ● ●







    ● ●



    ● ●
















    ●●
































    ●●





    ● ●














    ● ●
















    ● ●






    ●●































    ● ●




















    ●●

























    ●●




    ● ●










    ●●










    ●●








    ● ●

    ●●●














    ● ●


























    ●●




















    ● ●





































    ● ●


















    ● ●














    ● ●



























    ●●
    ● ●







    ●●
























































































































































    ● ●





























    ●●



































    ●●


    ● ●











































































    ● ●





    ●●





































    ● ●












    ● ●





    ● ●




































    ●●











































    ●●



























    ● ●




































































    ● ●







    ●●































    ●●














    ●●







    ● ●







































    ●●




    ● ●
























































    ● ●





























































    ● ● ●




    ● ●


















    ● ●



























    ● ●





    ●●


























































    ● ●












    ● ●
































































































    ●●




























    ● ●































































































    ● ●












    ● ●














































































    ● ●








    ●●















































    ●●

    ● ●
    ● ●














    ● ●




































    ●●






















































    ●●
    ● ●


























    ● ●








    ● ●







    ● ●

    Lymph Mono Gran
    Houseman Hicks
    0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    True Cell Composition (measured with flow cytometry)
    Model−based Cell Composition Estimates
    N = 800 whole blood samples run on 450k
    microarray platform
    RMSE:
    0.0385
    RMSE:
    0.0531

    View Slide


































































































  32. ●●



    ● ●




    ● ●






































    ● ●




































































    ●●


















    ●●














    ● ●






    ● ●





    ● ●































































    ● ●





























































































    ●●












    ●●









































    ● ●









    ●●






    ● ●













    ● ●













































    ● ●














































    ● ●













































































    ● ●






    ● ●


























    ● ●






    ● ●

















    ●●












    ● ●






    ●●




















    ● ●




























    ● ●













    ● ●

    ●●

















    ●●




    ●●




    ●●













































    ● ●













    ● ●

























    ● ●

    ●●















    ●● ●






















    ● ●





    ● ●


    ●●




























    ● ●





















    ●●


    ●●













    ●●











    ● ●
















    ● ●



    ● ●









    ●●


    ● ●














    ●●













    ●●







    ●●









    ●●


    ● ●

    ● ●









    ●●















    ● ●












    ● ●














    ● ●







    ●●
    ●●

    ● ●













    ● ●



















    ●●



    ● ●
    ●●

    ● ●


    ● ●
    ● ●

    ● ●

    ●●



    ● ●










    ●●



    ●●







    ●●





















    ●●


































    ●●

    ● ●



    ●●

























    ● ●















    ●●





    ●●



























































    ●●●
    ● ●

































    ●●









    ● ●














    ●●




    ●●








    ● ●
    ●●


    ● ●









    ●●

    ● ●





    ●●
    ●●



















































    ● ●



    ●●












    ●●






















    ●●


    ● ●




    ● ●























    ● ●



    ●●











    ●●






    ●●







    ●●


    ● ●
    ●●











    ●●











    ●●






















    ●●




    ●●

















    ●●



    ● ●








    ●●

    ●●





















































    ●●










    ● ●




    ● ●













    ● ●

















    ● ●
























    ● ●
























    ● ●



























    ● ●




















    ● ●










































    ● ●








































































































    ● ●





    ● ●

    ● ●


    ● ●















    ● ●













































    ● ●


    ● ●







    ● ●
















































    ● ●






























    ● ●





















    ● ●








    ●●
















    ● ●







































    ● ●

















































































































































    Tcell Bcell Mono Gran
    0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    Reference−based model (Houseman)
    Proposed model (Hicks)
    Model−based cell composition estimates from whole blood samples (n=689, Li et al. 2013)
    N = 689 whole blood samples run on 450k
    microarray platform

    View Slide

  33. Simulation Study
    0.0 0.2 0.4 0.6 0.8 1.0
    0 10 20 30 40
    Simulated platform−dependent
    random effects
    Methylation
    density
    450k
    RRBS
    A

    Houseman methylCC
    0.03 0.05 0.07 0.09
    Simulated data from 450k platform
    Cell composition estimation method
    RMSE
    B


    Houseman methylCC
    0.03 0.05 0.07 0.09
    Simulated data from RRBS platform
    Cell composition estimation method
    RMSE
    C

    View Slide

  34. N = 10 samples measured
    on two platforms:
    • 450k microarray
    • RRBS sequencing
















    ●●


    ● ●


    ●● ●


















    ● ●




































    ● ●





























    Mono Gran
    Bcell Tcell
    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    450K platform
    RRBS platform
    Method ● ●
    Our method Houseman
    Cell composition estimates from whole blood samples
    measured on two platforms

    View Slide

  35. For more information
    methylCC:
    https://github.com/stephaniehicks/methylCC
    Comments/Suggestions:
    email: [email protected]
    GitHub & Twitter: @stephaniehicks
    Pre-print on bioRxiv:
    https://www.biorxiv.org/content/early/2017/11/03/213769
    CCG
    Me

    View Slide