Upgrade to Pro — share decks privately, control downloads, hide ads and more …

シングルセル遺伝子発現プロファイル解析へのテンソル分解を用いた教師なし学習による変数選択法の応用

 シングルセル遺伝子発現プロファイル解析へのテンソル分解を用いた教師なし学習による変数選択法の応用

第66回バイオ情報学研究発表会
https://www.ipsj.or.jp/kenkyukai/event/mps133bio66.html
における発表のプレゼンです

948966d9c690e72faba4fd76e1858c56?s=128

Y-h. Taguchi

June 30, 2021
Tweet

Transcript

  1. SIGBIO66 1 シングルセル遺伝子発現プロファイル解プロファイル解析へのテンソル分へのテンソル分解をテンソル分解を用用 いた教師なし学習に教師なし学習によるなし学習による変数学習による変数選択による変数選択法の応変数選択法の応用のテンソル分解を応用 田口 善弘(中央大理工学部物理学科) ターキー ターキー(キング・アブドゥルアズィーズ大) Application of

    tensor decomposition based unsupervised featureextraction to single cell gene expression profile analysis Y-h. Taguchi and Turki Turki Department of Physics, Chuo University/ King Abdulaziz University
  2. SIGBIO66 2 The method used in this presentation was fully

    described in the following my book published by Springer International, at Sep. 2019. I am glad if the audience can buy it and learn how to apply this method to your own research!
  3. SIGBIO66 3 Singular value decomposition xij N M (uli)T N

    L vlj L M ⨉ ≈ x ij ≃∑ l=1 L u li λl v l j L L ⨉ λl N: number of genes (i) M: number of samples (j) xij: gene expression Example
  4. SIGBIO66 4 Interpretation….. j:samples Healthy control Patients vlj i:genes uli

    DEG: Differentially Expressed Genes For some specific l Healthy controls < Patients DEG: DEG: Healthy controls > Patients
  5. SIGBIO66 5 x ijk G u l1i u l2j u

    l3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) Extension to tensor….. N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k N: number of genes (i) M: number of samples (j) K: number of tissues (k) xijk: gene expression Example
  6. SIGBIO66 6 Interpretation….. j:samples Healthy control Patients ul2j For some

    specific l2 For some specific l3 k:tissues Tissue specific expression ul3k
  7. SIGBIO66 7 i:genes ul1i tDEG: tissue specific Differentially Expressed Genes

    Healthy controls < Patients tDEG: tDEG: Healthy controls > Patients For some specific l1 with max |G(l1l2l3)| If G(l1l2l3)>0 Fixed
  8. SIGBIO66 8 Integrated analysis of multiple matrices and/or tensors xij

    : expression of gene i of sample j xkj: methylaion of region k of sample j x xijk ijk ≡ ≡ x xij ij ⨉ ⨉ x xkj kj G u l1i u l2j u l3k L1 L2 L3 x ijk N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k
  9. SIGBIO66 9 Interpretation….. j:samples Healthy control Patients ul2j For some

    specific l2
  10. SIGBIO66 10 i:genes ul1i DEG: Differentially Expressed Genes Healthy controls

    < Patients DEG: DEG: Healthy controls > Patients If G(l1l2l3)>0 For gene expression For some specific l1, l3 with max |G(l1l2l3)| Fixed
  11. SIGBIO66 11 k:regions ul3k DMR: Differentially Methylated Regions Healthy controls

    < Patients DMR: DMR: Healthy controls > Patients For methylation
  12. SIGBIO66 12 Tensor Decomposition-Based Unsupervised Feature Extraction Applied to Single-Cell

    Gene Expression Analysis Y-h. Taguchi and Turki Turki Frontiers in Genetics, Volume 10, Article 864, 2019. doi: 10.3389/fgene.2019.00864 OPEN ACCESS
  13. SIGBIO66 13 Human x ij ∈ℝ19531×1977 x ik ∈ℝ24378×1907 Mouse

    Data set: GSE76381 scRNA-seq of human and mouse mid brain developments i:Genes j,k:cells Purpose of the analysis: Selection of genes associated with mid brain development commonly between human and mouse
  14. SIGBIO66 14 Cell numbers and time points Human: 6w:287cells,7w:131cells,8w:331cells, 9w:322cells,10w:509cells,11w:397cells,

    in total, 1977cells (w:week) Mouse: E11.5:349cells,E12.5:350cells, E13.5:345cells,E14.5:308cells, E15.5:356cells、E18.5:142cells, unknown:57cells, in total, 1907cells.
  15. SIGBIO66 15 Tensor decomposition : Tensor is generated Tensor decomposition

    : Tensor is generated from product of cells using 13,384 common from product of cells using 13,384 common genes between human and mouse genes between human and mouse xijk = xij × xik ∈ ℝ13384×1977×1907 i:Genes j,k:Cells Size reduction needed because of too huge tensors xjk: decomposed by singular value decomposition vlj: lth human cell singular value vectors vlk: lth mouse cell singular value vectors x jk =∑ i x ijk
  16. SIGBIO66 16 v lj =a l +∑ t b lt

    δjt v lk =a l ' +∑ t b lt ' δkt δjt,δkt: 1 when cells j,k is measured at t 0 otherwise vlj and vlk with any kind of time dependence are selected with categorical regression(ANOVA)
  17. SIGBIO66 17 How are selected singular value vectors are common?

    23 12 32 32 human mouse Singular value vectors associated with adjusted P-values less than 0.01 are selected.
  18. SIGBIO66 18 uli are generated from vlj and vlk u

    li ( j)=∑ j v lj x ij u li (k)=∑ k v lk x ik lth human gene singular value vectors lth mouse gene singular value vectors P-values are attributed to gene singular value vectors by χ2 distribution, corrected by BH criterion, genes associated with adjusted P- values less than 0.01 are selected.
  19. SIGBIO66 19 Benjamini-Hochberg corrected P <0.01 P(p) 1-p 0 1

    P i =P[ >∑ l ( u li σ ) 2 ] P-values by χ2 dist 151 200 305 305 Human Mouse Selected genes
  20. SIGBIO66 20 Enrichr Enrichr

  21. SIGBIO66 21 Validation:uploaded to Enrichr Enrichr (Enrichment server) “Allen Brain

    Atlas” Top ranked five terms For both Human and Mouse, four out of top five are related to Hypothalamus, which belong to mid brain.
  22. SIGBIO66 22 Enrichr Enrichr

  23. SIGBIO66 23 Enrichr Enrichr

  24. SIGBIO66 24 Enrichr Enrichr

  25. SIGBIO66 25 Enrichr Enrichr

  26. SIGBIO66 26 Comparisons with other methods Highly variable genes: 144

    127 44 44 Human Mouse Selected genes Less overlaps between human and mouse. No biological terms related to brains are enriched. More comparisons are available in the following paper. Y-h. Taguchi, ICIC2018 (2018) “Principal Component Analysis-Based Unsupervised Feature Extraction Applied to Single-Cell Gene Expression Analysis” https://doi.org/10.1007/978-3-319-95933-7_90
  27. SIGBIO66 27 Conclusions: Tensor decomposition based unsupervised feature extraction is

    applicable to massive single cell RNA-seq data and is capable to select biologically reasonable genes. Since it is an unsupervised method, it is easy to use and is applicable to wide range of scRNA-seq data set.