Single-cell RNA-sequencing is a powerful tool for identifying known and novel cell types. However, the task of identifying even known cell types in species with poorly annotated genomes is nontrivial, as 99.999% of the predicted 8.7 million Eukaryotic species  on Earth have no submitted genome assembly . Additionally, current best practices in comparative transcriptomics relies on identifying orthologous genes, which remains an open problem [3, 4]. Thus, there is an unmet need to quantitatively compare single-cell transcriptomes across species, without the need for orthologous gene mapping, gene annotations, or a reference genome. We introduce `kmermaid`, a novel computational method for identifying orthologous cell types and discovering *de novo* orthologous genes across species. By extracting putative protein-coding sequences from RNA-seq reads, we randomly sample k-mers in reduced amino acid alphabets [5-11], allowing for embedding transcriptomes across a wide range of divergence times into a common subspace. We benchmark the genome-agnostic method on the Quest for Orthologs Opisthokonta dataset , demonstrating how k-mers from the human proteome in reduced amino acid alphabets are sufficient to estimate orthology. Using human amino acid sequences, we extract putative protein-coding reads from 239 Opisthokonta species in ENSEMBL, and present the best k-mer size and reduced amino acid alphabet for divergence times up to 1105 millions of years ago. As `kmermaid` skips both traditional alignment and gene orthology assignment it can, a) be applied to transcriptomes from organisms with no or poorly annotated genomes, b) predicts protein-coding sequences from raw RNA-seq reads, and c) identify putative functions of protein sequences contributing to shared cell types. By enabling analyses across divergent species' transcriptomes in an orthology-, genome- and gene annotation-agnostic manner, `kmermaid` illustrates the potential of non-model organisms in building the cell type evolutionary tree of life .
 Paper draft: https://czbiohub.github.io/de-novo-orthology-paper/