Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bioconductor 2017

Bioconductor 2017

Slides from my talk at Bioconductor 2017 in Boston, MA

Stephanie Hicks

July 27, 2017
Tweet

More Decks by Stephanie Hicks

Other Decks in Science

Transcript

  1. Es#ma#ng cell type composi#on
    in whole blood using
    differen#ally methylated regions
    Stephanie Hicks
    Bioconductor 2017

    View Slide

  2. ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    m
    m
    m
    m
    What is DNA Methyla#on?

    View Slide

  3. ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    m
    What is DNA Methyla#on?

    View Slide

  4. ATCGCGTTACTGCGGAA
    TAGCGCAATGTCGCCTT
    m
    m
    What is DNA Methyla#on?
    m

    View Slide

  5. Data from GSE32148
    20 30 40 50 60 70
    0.02 0.06 0.10
    Age
    Methylation
    DNA methyla#on in whole blood
    correlates with age at this one CpG
    Slide courtesy of A. Jaffe and R. Irizarry

    View Slide

  6. Blood is a mixture of many cell types
    NK
    NK
    NK
    NK
    NK
    NK
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    Gran
    Gran
    Gran
    Gran
    Gran
    Gran
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Mono
    Mono
    Mono
    Mono
    Mono
    Mono
    CpGs
    Cell types
    Whole blood cell types:
    •  Tcells
    •  CD8T
    •  CD4T
    •  Natural Killer
    •  Bcells
    •  Granulocytes
    •  Monocytes
    Bioconductor data package available:
    •  Data originally from Reinius et al. (2012)
    > library(FlowSorted.Blood.450k)

    View Slide

  7. Cell composi#on changes with age
    Jaffe and Irizarry (2014). Genome Biology
    •  Different cell composi#ons in whole blood imply different
    observed whole blood DNA methyla#on profiles
    •  Important to es#mate differences in cell composi#on

    View Slide

  8. Sta#s#cal Model: Houseman et al. (2012)
    Y
    ij
    = πik
    k=1
    K
    ∑ X
    jk
    +εij
    = +
    Y
    (Jx1)
    X
    (JxK)
    = E
    (Jx1)
    π
    (Kx1)
    J CpGs
    K cell type profiles
    whole blood sample
    i = (1,..., N) = whole blood samples
    j = (1,...., J) = CpGs
    k = (1,...,K) = cell type profiles
    Measurement
    error
    rela#ve cell type
    propor#ons
    NK
    NK
    NK
    NK
    NK
    NK
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD8T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    CD4T
    Gran
    Gran
    Gran
    Gran
    Gran
    Gran
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Bcell
    Mono
    Mono
    Mono
    Mono
    Mono
    Mono

    View Slide

  9. New plaYorm technologies emerging
    First approach
    •  Apply Houseman method using new plaYorm technology
    Problems with this approach
    1.  Observed methyla#on levels depend on plaYorm used
    2.  Not all CpGs are included in new plaYorms

    View Slide

  10. 0
    50
    100
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k
    PlaYorm-dependent differences
    between 450k array and RRBS plaYorms
    Chromosome 6
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation



































































































































































































































































































































































































































































































































    ●●










































































































    ● ●






























































































    ● ●





































































































































































































































































































































































































































































    ● ●



















































































    ● ●


















    ● ●










    ●●































    ● 450k ● RRBS
    33.284 mb
    33.285 mb
    33.286 mb
    33.287 mb
    33.288 mb
    33.289 mb

    View Slide

  11. PlaYorm-dependent differences
    between 450k array and RRBS plaYorms
    0
    50
    100
    0.00 0.25 0.50 0.75 1.00
    Methylation
    density
    Regions
    Not methylated
    Methylated
    Platform
    450k
    RRBS

    View Slide

  12. New plaYorm technologies emerging
    First approach
    •  Apply Houseman method using new plaYorm technology
    Problems with this approach
    1.  Observed methyla#on levels depend on plaYorm
    2.  Not all CpGs are included in new plaYorms

    View Slide

  13. Cell types preserve their
    methyla#on state across regions
    Cell type-specific CpG
    Cell type-specific region
    Beta values
    (Purified cell types on
    measured on 450k array)
    •  Iden#fy regions using
    bumphunter BioC pkg
    Chromosome 14
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation































































































































    ● ●











    ● CD8T ● CD4T ● NK ● Bcell ● Mono ● Gran
    CpG
    DMR
    0
    0.2
    0.4
    0.6
    0.8
    Observed Methylation


    ● ●
    ●● ●
    ● ●

    ●● ●



    ● 450k ● RRBS
    102.6767 mb
    102.6768 mb
    102.6769 mb
    102.677 mb
    102.6771 mb
    102.6772 mb
    102.6773 mb
    Beta values
    (One whole blood sample)
    Microarray
    plaGorm
    Sequencing
    plaGorm
    Using
    CpGs
    0.45 NA
    Using
    Regions
    0.55 0.50

    View Slide

  14. Recall Houseman Model:
    = +
    Y
    (Jx1)
    X
    (JxK)
    = E
    (Jx1)
    π
    (Kx1)
    J CpGs
    1 whole
    blood sample
    rela#ve cell type
    propor#ons
    Measurement
    error
    0.78 0.77 0.85 0.82
    0.05 0.73 0.81 0.77
    0.79 0.02 0.73 0.84
    0.83 0.80 0.03 0.78
    0.87 0.89 0.83 0.07
    ! ! ! !
    0.06 0.09 0.81 0.08
    0.07 0.06 0.03 0.77
    0.02 0.04 0.08 0.03
    PlaYorm-dependent
    methyla#on profiles
    Y
    ij
    = πik
    k=1
    K
    ∑ X
    jk
    +εij
    i = (1,..., N) = whole blood samples
    j = (1,...., J) = CpGs
    k = (1,...,K) = cell type profiles

    View Slide

  15. Our proposed model:
    = +
    Y
    (Rx1)
    X
    (RxK)
    = E
    (Rx1)
    π
    (Kx1)
    R regions
    1 whole
    blood sample
    rela#ve cell type
    propor#ons
    Measurement
    error
    Y
    r
    = πk
    (1− Z
    rk
    )δo,r
    + Z
    rk
    δ1,r

    ⎣ ⎤
    ⎦+εr
    k=1
    K

    r = (1,...., R) = differentially methylated regions
    k = (1,...,K) = cell types
    δ0,r
    ~ N(α0
    , σ0
    2 )
    δ1,r
    ~ N(α1
    , σ1
    2 )
    εr
    ~ N(0, σ 2 )
    +
    1-Z
    (RxK)
    δ0
    δ1
    Z
    (RxK)
    1 1 1 1
    0 1 1 1
    1 0 1 1
    1 1 0 1
    1 1 1 0
    ! ! ! !
    0 0 1 0
    0 0 0 1
    0 0 0 0
    0 0 0 0
    1 0 0 0
    0 1 0 0
    0 0 1 0
    0 0 0 1
    ! ! ! !
    1 1 0 1
    1 1 1 0
    1 1 1 1
    Z
    rk
    =
    1 if region r and cell type k is methylated
    0 otherwise





    0.05
    0.08
    0.02
    0.04
    0.05
    !
    0.09
    0.07
    0.06
    0.87
    0.89
    0.75
    0.82
    0.79
    !
    0.81
    0.76
    0.90

    View Slide

  16. How does our model perform?

    View Slide











  17. ● ●
















    ●●
















































    ●●

















    ●●





    ● ●


































    ● ●



































































    ●●











































































    ●●




    ● ●




























































    ● ●
































    ●●



    ● ●


    ● ●















    ● ●















    ● ●













    ● ●






































    ● ●













































    ● ●


































    ● ●


















    ●●








    ● ●



























    ● ●





































































    ●●



















    ● ●


    ● ●


































    ● ●










    ●●






    ●●














































    ●●








































    ● ●






















    ● ●

    ●●















    ● ●












    ● ●










    ● ●









    ● ●

















































    ● ●





    ●●






    ● ●



































































    ●●


































































































    ● ●
    ●●



    ● ●














    ● ●



    ● ●
















































    ● ●




































































    ● ●








    ● ●

    ●●




















    ● ●


















    ●●


































































    ● ●





    ● ●














































    ● ●






































    ● ●


















    ●●






    ●●





    ● ●



    ● ●














    ● ●





















    ● ●


    ● ●

















    ● ●

    ● ●




    ● ●





































    ●●






    ●●


















    ● ●





    ●●


    ●●
























    ●●












































    ●●












    ●●
    ●●













    ● ●





    ●●
















    ●●









    ●●





    ●●













    ● ●


    ●●


































    ● ●















    ● ●

































    ●●







    ● ●
    ● ●



    ●●


    ●●


    ●●













    ●●







    ● ●


















    ● ●




    ● ●











    ●●




    ● ●




    ●●

    ● ●●










































    ●●





















    ● ●







    ●●







    ● ●













    ●●

























    ●●






    ●●






    ● ●




    ● ●









    ● ●


    ● ●





    ● ●




    ●●



    ●●








    ● ●















    ●●




    ● ●





























    ● ●




    ●●●











    ●●

    ● ●


















    ● ●






    ●●




    ● ●









    ● ●






























    ● ●

    ●●






    ● ●

    ● ●




    ● ●


















    ● ●





    ●●










    ●●


    ● ●
















    ● ●




    ● ●





    ●●



    ●●























    ●●







    ● ●
    ● ●

































    ●●





    ●●





    ●●
    ●●







    ● ●




    ● ●

















    ● ●






















    ●●













    ● ●



































    ●●

    ● ●


    ●●











    ● ●

































    ●●









    ● ●











    ●●



    ●●








    ●●







    ● ●








    ● ●

    ●●





    ● ●




    ● ●











    ●●
    ● ●












    ●●










    ● ●







    ● ●



    ● ●
















    ●●
































    ●●





    ● ●














    ● ●
















    ● ●






    ●●































    ● ●




















    ●●

























    ●●




    ● ●










    ●●










    ●●








    ● ●

    ●●●














    ● ●


























    ●●




















    ● ●





































    ● ●


















    ● ●














    ● ●



























    ●●
    ● ●







    ●●
























































































































































    ● ●





























    ●●



































    ●●


    ● ●











































































    ● ●





    ●●





































    ● ●












    ● ●





    ● ●




































    ●●











































    ●●



























    ● ●




































































    ● ●







    ●●































    ●●














    ●●







    ● ●







































    ●●




    ● ●
























































    ● ●





























































    ● ● ●




    ● ●


















    ● ●



























    ● ●





    ●●


























































    ● ●












    ● ●
































































































    ●●




























    ● ●































































































    ● ●












    ● ●














































































    ● ●








    ●●















































    ●●

    ● ●
    ● ●














    ● ●




































    ●●






















































    ●●
    ● ●


























    ● ●








    ● ●







    ● ●

    Lymph Mono Gran
    Houseman Hicks
    0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    True Cell Composition (measured with flow cytometry)
    Model−based Cell Composition Estimates
    N = 800 whole blood samples run on 450k
    microarray plaYorm
    RMSE:
    0.0385
    RMSE:
    0.0531

    View Slide

  18. N = 12 samples measured
    on two plaYorms:
    •  450k microarray
    •  RRBS sequencing
















    ●●


    ● ●


    ●● ●


















    ● ●




































    ● ●





























    Mono Gran
    Bcell Tcell
    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    450K platform
    RRBS platform
    Method ● ●
    Our method Houseman
    Cell composition estimates from whole blood samples
    measured on two platforms

    View Slide

  19. View Slide

  20. For more informa#on
    methylCC:
    hbps://github.com/stephaniehicks/methylCC
    Comments/SuggesNons:
    email: [email protected]
    GitHub & Twiber: @stephaniehicks
    CCG
    Me
    #BioC2017
    #RLadies
    #dataparasite

    View Slide