decomposition and principal component analysis as feature selection tools Y-h. Taguchi, Department of Physics, Chuo University, Tokyo, Japan. Turki Turki, King Abdulaziz University, Jeddah, Saudi Arabia. This was rejected by Conference Journal Truck, but can be read This was rejected by Conference Journal Truck, but can be read as a preprint. as a preprint. BioRxiv doi: https://doi.org/10.1101/2020.10.02.324616
international. I am glad if the audience can buy it and learn my method. Y-h. Taguchi, Unsupervised Feature Extraction Applied to Bioinformatics --- A PCA and TD Based Approach --- Springer International (2020)
Principal component analysis (PCA) and tensor decomposition (TD) based unsupervised feature selection in detail described in the book mentioned in the previous page.
=λl u l ∈ℝN v l =XT u l ∈ℝM P i =P χ2 [> (u li σl )2] Generate N ⨉ N matrix Obtain eigen vector u l attributed to feature i Compute eigen vector v l attributed to sample j Identify which v l is biologically intersting Attribute P values to feature i With assuming that u l obeys Gaussian. P i is corrected by Benjamini-Hochberg criterion and is associated with corrected P i <0.01 are selected.
l 1 =1 N G(l 1 l 2 l 3 )u l 1i u l 2 j u l 3 k G∈ℝN ×M×K ,u l 1 i ∈ℝN ×N ,u l 2 j ∈ℝM ×M ,u l 3 k ∈ℝK×K ith feature attributed to samples with jth and kth experimental conditions N: number of features, M,K:number of conditions (samples) Identify biologically interesting l 2 ,l 3 and find l 1 that shares absolutely large G(l 1 ,l 2 ,l 3 ) with identified l 2 ,l 3 .
1 )2] Attribute P values to feature i with assuming that u l1 obeys Gaussian. P i is corrected by Benjamini-Hochberg criterion and is associated with corrected P i <0.01 are selected.
=∑j x ij x kj ∈ℝN ×K x ik =∑l=1 min( N , K ) λl u li u lk v lj mRNA =∑ i x ij u li ,v lj miRNA =∑ k x kj u l k P i =P χ2 [> (u li σl )2], P k =P χ2 [> (u lk σl )2] N mRNAs K miRNAs 72 mRNAs and 11 miRNAs are selected
j≤M T y j = M M N , M T < j≤M b i =∑ j x ij y j b k =∑ j x kj y j P i =P χ2 [> ( b i σb )2] P k =P χ2 [> (b k σb )2] M T : number of tumors M N : number of normal kidneys 73 mRNAs and 18 miRNAs are selected
null hypothesis of Gaussian distribution is not fulfilled, it is empirically coincident with null distribution generated by shuffling, although threshold P values differ (0.01 for TD based unsupervised FE and 0.1 for null distribution)