Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor decomposition based unsupervised feature extraction with optimized standard deviation applied to differentially expressed genes, DNA methylation and histone modification

Tensor decomposition based unsupervised feature extraction with optimized standard deviation applied to differentially expressed genes, DNA methylation and histone modification

Presentation at ISMB2022
11th July 2022, COSI: Function
https://www.iscb.org/ismb2022

The presentation of this slide can be available for watching from the link below.
https://www.youtube.com/watch?v=pfpnyZR7b24

C1dc09144bfc05a03df625bef683e160?s=128

Ryo Ishibashi

July 04, 2022
Tweet

Other Decks in Research

Transcript

  1. Function COSI ISMB2022 Tensor decomposition based unsupervised feature extraction with

    optimized standard deviation applied to differentially expressed genes, DNA methylation and histone modification Y-h. Taguchi Ryo Ishibashi Department of Physics, Chuo University, Tokyo, Japan
  2. Function COSI ISMB2022 Basic claims 1. Our method applied to

    identification of differentially expressed genes (DEGs) can outperform various state of art methods when standard deviations (SDs) used to generate the null hypothesis are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well.
  3. Function COSI ISMB2022 The contents were published in the following

    three preprints in bioRxiv. • https://doi.org/10.1101/2022.02.18.481115 • https://doi.org/10.1101/2022.04.02.486807 • https://doi.org/10.1101/2022.04.29.490081
  4. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  5. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  6. Function COSI ISMB2022 P >> N problem (large P, small

    N problem) large P Traditionally, the number of variables is large Ex.) genes, epigenomes small N The number of samples is small Ex.) subjects, experimental animals, or cultured cells → difficult to handle computationally.
  7. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  8. Function COSI ISMB2022 Why do we choose PCA and TD

    ? Merits ⚫Highly versatile ⚫Easy interpretation of results ⚫Easy to implement programs Effective for complicated life science data
  9. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  10. Function COSI ISMB2022 What is tensor decomposition? xijk G ul1i

    ul2j ul3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) N M K 𝑥𝑖𝑗𝑘 ≃ ෍ 𝑙1=1 𝐿1 ෍ 𝑙2=1 𝐿2 ෍ 𝑙3=1 𝐿3 𝐺 𝑙1 𝑙2 𝑙3 𝑢𝑙1𝑖 𝑢𝑙2𝑗 𝑢𝑙3𝑘 N: number of genes (i) M: number of samples (j) K: number of tissues (k) xijk : gene expression Example
  11. Function COSI ISMB2022 Interpretation of TD (1/2) j: samples Healthy

    control Patients ul2j For some specific l2 For some specific l3 k: tissues Tissue specific expression ul3k
  12. Function COSI ISMB2022 Interpretation of TD (2/2) i:genes ul1i tDEG:

    tissue specific Differentially Expressed Genes tDEG: Healthy controls < Patients tDEG: Healthy controls > Patients For some specific l1 with max |G(l1 l2 l3 )| If G(l1 l2 l3 )>0 Fixed
  13. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  14. Function COSI ISMB2022 Analysis Procedure(1/2) Matrix Tensor PCA TD Gene

    vectors Sample vectors Gene vectors Sample vectors Tissue vectors Gene Sample Sample Tissue Gene
  15. ISMB2022 Analysis Procedure(2/2) Gaussian Dist. Novelty σ used to generate

    the null hypothesis are optimized. Gene Selection 0 1 1-P DEG 𝜎ℎ = σ𝑛<𝑛0 ℎ𝑛 − ℎ𝑛 2 𝑁𝑛 𝑛 < 𝑛0
  16. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  17. Function COSI ISMB2022 MAQC(benchmark data set for DEG *) RNA-seq:

    x ij represents expression of ith gene at jth sample Samples: seven Universal Human Reference RNA (UHRR) vs seven Human Brain Reference RNA (HBRR) Measured for 40933 genes (done by the presenter) (*)https://www.fda.gov/science-research/bioinformatics- tools/microarraysequencing-quality-control-maqcseqc
  18. Function COSI ISMB2022 Gene wise Log FC ratio between two

    classes Gene wise mean log x ij Density distribution MA plot
  19. Function COSI ISMB2022 Sapmle wise principal components v 1j v

    2j Corresponds to mean log x ij Corresponds to log FC ratio Corresponds to MA plot PCA
  20. Function COSI ISMB2022 u 1i u 2i Density distribution Gene

    wise embedding by PCA
  21. Function COSI ISMB2022 Null hypothesis: u 2i obeys Gaussian Left:

    Right: optimal σ minimizes σ h n 0 h n h n n n Select genes with adjusted P i <0.1 Cumulative χ2 distribution Histogram 1-P i , h n of nth bin Adjusted P(n 0 )=0.1
  22. Function COSI ISMB2022 “Highly expressed genes should be more likely

    selected” DESeq2: empirical dispersion relation PCA : naturally satisfied μ σ2 σ2
  23. Function COSI ISMB2022 Biological validation PCA vs DESeq2 Tissue specificity

  24. Function COSI ISMB2022 PCA based unsupervised FE with optimized SD

    outperforms various state of the art methods while assuming neither empirical dispersion relation nor negative binomial distribution.
  25. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  26. Function COSI ISMB2022 Application to DMC identification ( EH1072, sequencing)

    Chromosome t test of P-values attributed by PCA between DHS and non-DHS
  27. Function COSI ISMB2022 PCA and TD based unsupervised FE with

    optimized SD can be applied to identification of DMC without the specific modification • https://doi.org/10.1101/2022.04.02.486807
  28. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  29. Function COSI ISMB2022 Application to differential histone modification Histograms of

    1-P i do not obey Gaussian (double peak) but…. H3K4me3 H3K27me3 H3K27ac
  30. Function COSI ISMB2022 Comparisons with other methods (H3K9me3, GSE24850) The

    number of histone modification experiments overlapped with selected genes
  31. Function COSI ISMB2022 PCA and TD based unsupervised FE with

    optimized SD can be applied to identification of differential histone modification without the specific modification • https://doi.org/10.1101/2022.04.29.490081
  32. Function COSI ISMB2022 Outlines ⚫Background ◆P >> N problem (large

    P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
  33. Function COSI ISMB2022 Conclusions 1. Principal component analysis (PCA) based-

    and tensor decomposition (TD) based- unsupervised feature extraction (FE) applied to identification of differentially expressed genes (DEGs) can outperform various state of art methods including DESeq2, when standard deviations (SDs) used to generate the null hypothesis (Gaussian distribution of principal components) are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well.