Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Simulating DNA methylation data

Simulating DNA methylation data

Presented at the Australian Statistics Conference in conjunction with the Institute of Mathematical Statistics Annual Meeting on 10 July, 24 held in Sydney, Australia.

Simulating DNA methylation data by Peter Hickey is licensed under a Creative Commons Attribution 4.0 International License.

Peter Hickey

July 10, 2014
Tweet

More Decks by Peter Hickey

Other Decks in Science

Transcript

  1. DNA methylation ACGCGAAACGTTCTATCG CH 3 CH 3 CH 3 Peter

    Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
  2. DNA methylation ACGCGAAACGTTCTATCG CH 3 CH 3 CH 3 Peter

    Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 1 / 14
  3. Measuring DNA methylation β i = 3/3 β i+1 =

    4/4 β i+2 = 2/4 β i+3 = 0/4 Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 2 / 14
  4. Differentially methylated regions (DMRs)1 0.2 0.5 0.8 1 kb Normals

    Cancers Position (bp) Methylation β-values 1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14
  5. Differentially methylated regions (DMRs)1 0.2 0.5 0.8 1 kb CpG

    islands (CGIs) Normals Cancers Position (bp) Methylation β-values 1Hansen, K. D. et al. Nat Genet 43, 768–775 (2011) Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 3 / 14
  6. Why I care about simulating DNA methylation data Methods development

    and validation Do methods designed to find DMRs actually work? What method reigns supreme? Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
  7. Why I care about simulating DNA methylation data Methods development

    and validation Do methods designed to find DMRs actually work? What method reigns supreme? How to decide? No “gold standard” data ⇒ simulate Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
  8. Why I care about simulating DNA methylation data Methods development

    and validation Do methods designed to find DMRs actually work? What method reigns supreme? How to decide? No “gold standard” data ⇒ simulate No simulation software ⇒ I’m writing methsim. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 4 / 14
  9. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  10. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  11. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  12. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  13. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  14. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious. Simulate individual methylation events Higher resolution. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  15. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious. Simulate individual methylation events Higher resolution. Contains the mechanistic dependence structure. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  16. Simulation approaches Simulate β-values Simulate independent βi d = Beta(µi

    , νi ) + induce correlation via variogram model. Re-sample real data in a way that tries to preserve correlation structure. β-values are summarised measurements. Correlations of β-values are spurious. Simulate individual methylation events Higher resolution. Contains the mechanistic dependence structure. Difficult given current data. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 5 / 14
  17. My solution methsim: An R package for simulating whole genome

    DNA methylation data. Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14
  18. My solution methsim: An R package for simulating whole genome

    DNA methylation data. Parameter distributions estimated from input data. Parts written in C++ (via Rcpp). Results today from a preliminary version of methsim. Outline of methsim 1 Segment genome into “region of similarity” (MethylSeekR1) 2 Simulate “meta-haplotypes” within each region using Markov model. 3 Simulate sequencing of reads. aBurger, L., Gaidatzis, D., Schübeler, D. & Stadler, M. B. Nucleic Acids Res (2013). doi:10.1093/nar/gkt599 Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 6 / 14
  19. Simulating meta-haplotypes (2) For each region: Simulate each meta-haplotype using

    a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency q i,r q 1,r q i,r q H,r q 1,r+1 q i,r+1 q H,r+1 Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability q i,r Simulate sequencing error Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14
  20. Simulating meta-haplotypes (2) For each region: Simulate each meta-haplotype using

    a Markov model Transition matrices depend on distance between CGs and the type of region Assign haplotype i in region r frequency q i,r q 1,r q i,r q H,r q 1,r+1 q i,r+1 q H,r+1 Region r Region r+1 (3) Simulate read positions Simulate reads for region r by sampling from ith haplotype with probability q i,r Simulate sequencing error Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 7 / 14
  21. CGI Non−CGI 0 1 2 3 4 0 1 0

    1 β values density data Real (ADS) methsim Distribution of β values Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 8 / 14
  22. 0 4 0 4 CGI Non−CGI 0 50 100 150

    200 Distance between CpGs (bp) median log odds ratio data Real (ADS) methsim Within haplotype co-methylation at neighbouring CpGs Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 9 / 14
  23. 0 4 0 4 all all 0 50 100 150

    200 Distance between CpGs (bp) median log odds ratio data ADS MySim Within haplotype co-methylation at neighbouring CpGs Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14
  24. 0 4 0 4 all all 0 50 100 150

    200 Distance between CpGs (bp) median log odds ratio (80% percentile band) data ADS MySim Within haplotype co-methylation at neighbouring CpGs Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 10 / 14
  25. 0 1 0 1 CGI Non−CGI 0 250 500 750

    1000 Distance between CpGs (bp) Pearson correlation data Real (ADS) methsim Correlations of pairs of β values Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 11 / 14
  26. Summary methsim models the mechanistic dependence structure of DNA methylation

    data. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
  27. Summary methsim models the mechanistic dependence structure of DNA methylation

    data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
  28. Summary methsim models the mechanistic dependence structure of DNA methylation

    data. Will be using methsim to simulate data with inserted DMRs and compare DMR-detection methods. methsim is open source and developed on GitHub. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 12 / 14
  29. Thanks For advice and supervision Terry Speed (WEHI) and Peter

    Hall (University of Melbourne). For data Ryan Lister (UWA). For R and C++ help Bioconductor and Rcpp mailing lists, especially Martin Morgan. For funding Australian Postgraduate Award, Victorian Life Sciences Computing Initiative. For sanity Friends and family. Peter Hickey (@PeteHaitch) Simulating DNA methylation data 10 July 2014 13 / 14