Reference-free comparative transcriptomics

Reference-free comparative transcriptomics

Single-cell RNA-seq is a powerful technology for identifying novel and known cell types, however its power is limited to organisms with well-annotated genomes. We present a reference-free method to compare single cells both within and across species. In this method, k-mers from each cell’s RNA-seq profile are randomly subsampled into a compressed representation called a “sketch” using document comparison algorithms of MinHash or HyperLogLog. For within-species comparison, the RNA sketches are sufficient, but as protein sequence is more stable across species, we translate the RNA k-mers into protein k-mers with 6-frame translation, discarding all protein k-mers containing stop codons. We show this method can “lift over” single-cell RNA-seq annotations from mouse to human and compare to using purely 1:1 mapping orthologous genes. Thus, k-mer sketches are an efficient method to find shared and unique cell types both within and across species without need for a reference genome or transcriptome.


Olga Botvinnik

May 18, 2019