Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Integrated Analysis of single cell multi omics data sets using tensor decomposition based unsupervised feature extraction

Y-h. Taguchi
November 27, 2021

Integrated Analysis of single cell multi omics data sets using tensor decomposition based unsupervised feature extraction

Poster presentation at MBSJ44
https://www2.aeplan.co.jp/mbsj2021/

Presentation video is here
https://youtu.be/n2QPBDflkZc

Paper is here
https://doi.org/10.3390/genes12091442

Y-h. Taguchi

November 27, 2021
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. MBSJ 44 Integrated Analysis of single cell multi omics data

    sets using tensor decomposition based unsupervised feature extraction Y-h. Taguchi Department of Physics, Chuo University, Tokyo Japan The contents of this poster was published in: Taguchi, Y.-h.; Turki, T. Tensor-Decomposition-Based Unsupervised Feature Extraction in Single-Cell Multiomics Data Analysis. Genes 2021, 12, 1442. doi:10.3390/genes12091442
  2. MBSJ 44 Introduction Integrated analysis of single cell multi-omics data

    is difficult because….. 1. The number of features in individual omics differ (gene~104, DNA methylation/accessibility~108) 2. Full of missing values The percentages of missing values: 70% for gene expression >90% for DNA methylation and accessibility Careful pre-processing is usually required….
  3. MBSJ 44 In this poster, we proposed the usage of

    tensor decomposition so as to enable us to integrate gene expression, DNA methylation and accessibility without particular pre-processing.
  4. MBSJ 44 Singular value decomposition (SVD) xij N M (uli)T

    N L vlj L M ⨉ ≈ x ij ≃∑ l=1 L u li λl v l j L L ⨉ λl
  5. MBSJ 44 x ijk G u l1i u l2j u

    l3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) Extension to tensor….. N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k
  6. MBSJ 44 Application to integrated analysis of multiomics data set

    Since individual omics data is associated with distinct features. N k : number of features of individual omics data M: number of single cells K: number of omics (K=3, in the present study) x ijk ∈ℝN k ×M ×K
  7. MBSJ 44 Apply SVD to x ijk Project x ijk

    onto u li . x ijk ≃∑ l=1 L u lik λl v l jk x ljk =∑ i=1 N u li x ijk ∈ℝL×M× K
  8. MBSJ 44 Apply HOSVD to x ljk as Apply Umap

    to u l2j (1≦ l 2 ≦ 3L, L=10) x ljk ≃∑ l 1 =L L 1 ∑ l 2 =1 M ∑ l 3 =1 K G (l 1 l 2 l 3 )u l 1 l u l 2 j u l 3 k
  9. MBSJ 44 Dataset 1 Dataset 1 The multiomics dataset retrieved

    from GEO ID GSE154762, which is denoted as Dataset 1 in this study, is composed of 899 single cells for which gene expression, DNA methylation, and DNA accessibility were measured. These single cells represent human oocyte maturation. Dataset 2 Dataset 2 The multiomics dataset retrieved from GEO ID GSE154762, which is denoted as Dataset 2 in this study, is composed of 852 single cells for which DNA methylation and DNA accessibility were measured, as well as 758 single cells for which gene expression was measured. These single cells represent the four time points of the mouse embryo.
  10. MBSJ 44 Identify which u l2j are coincident with classification.

    Categorical regression: δ js =1 when j ∈ sth category, otherwise 0. a l2 ,b l2s : regression coefficients. Dataset 1 & 2: 18 l 2 s are associated with corrected P-values less than 0.05. u l 2 j =a l 2 +∑ s=1 S b l 2 s δjs
  11. MBSJ 44 Select l 1 that has the largest where

    l 2 s are restricted to the selected 18 l 2 s, since such l 1 should be associated with classifications. For dataset 1 and 2, l 1 =1 has the largest value. ∑ l 2 ∑ l 3 =1 K |(G (l 1 l 2 l 3 ))|
  12. MBSJ 44 Gene selection: Generate u l1i from u l1l

    Attribute P-value to gene i with assuming that u l1i obey Gaussian. P i =P χ2 [>∑(u 1i σ5 )2] u l 1 i =∑ l=1 L u l 1 l u li1 ∈ℝL×M×K
  13. MBSJ 44 45 genes and 175 genes are selected for

    dataset 1 and 2, respectively, as those associated with adjusted P-values less than 0.01. Enrichment analysis was performed toward these genes in order to validate selected genes biologically.
  14. MBSJ 44 Dataset 1 Forty-seven genes were enriched by H3K36me3

    based on “ENCODE Histone Modifications 2015”; H3K36m3 is known to play critical roles during oocyte maturation [12]. Forty-seven genes were also targeted by MYC based on “ENCODE and ChEA Consensus TFs from ChIP-X”; Myc is known to play critical roles in oogenesis [13]. Forty-seven genes were also targeted by TAF7 based on “ENCODE and ChEA Consensus TFs from ChIP-X” and “ENCODE TF ChIP- seq 2015”; TAF7 is known to play critical roles during oocyte growth [14]. Forty-seven genes were also targeted by ATF2 based on “ENCODE and ChEA Consensus TFs from ChIP-X”; the expression of ATF2 is known to be altered during oocyte development [15].
  15. MBSJ 44 Dataset 2 One-hundred and seventy-five genes were enriched

    by H3K36me3 based on “ENCODE Histone Modifications 2015”; H3K36m3 is known to play critical roles during gastrulation [17]. One-hundred and seventy-five genes were also targeted by MYC based on “ENCODE and ChEA Consensus TFs from ChIP-X”; Myc is also known to play critical roles in gastrulation [18]. One hundred and seventy-five genes were also targeted by TAF7 based on “ENCODE and ChEA Consensus TFs from ChIP-X” and “ENCODE TF ChIP-seq 2015”; TAF7 is known to play critical roles during gastrulation [19]. One- hundred and seventy-five genes were also targeted by ATF2 based on “ENCODE and ChEA Consensus TFs from ChIP-X”; the expression of ATF2 is known to be maintained during gastrulation [20].
  16. MBSJ 44 This suggests that TD based unsupervised FE correctly

    selected biologically reasonable genes.
  17. MBSJ 44 TD can deal with data set full of

    missing values. TD can deal with data set full of missing values.
  18. MBSJ 44 Conclusions TD is an effective method that can

    integrate single cell multiomics data set as it is without preprocessing individial omics data set specifically.