“Animals have evolved a greater diversity of cell types in a multicellular body (100–150 different cell types)” Old methods • Surface markers • Morphology New methods • Single-cell RNA-Seq
algorithms available • Single-Cell data is new and high-dimensional • Standard robust and efficient algorithm is k-means Problems with new algorithms: • Parameters • Speed • Scalability
Euclidean Minkowski Manhattan Gene Filter Genes Cell Filter d - first d eigenvectors N Cells reduction of dimensionality k-means k clusters k is known! d Dimensionality reduction pipeline
between two clusterings • If ARI = 0.8 then clustering is very good d Distance Dimensionality reduction Pearson Spearman Euclidean Minkowski Manhattan Gene Filter Genes Cell Filter d - first d eigenvectors N Cells reduction of dimensionality k clusters gold standard is known! PCA Spectral MDS Spectral Reg.