Towards shared and standardized cell annotations, HCA General Meeting 2021
HCA General Meeting 2021
Cell Annotation Platform
Presentation: Towards shared and standardized cell annotations
Speaker: Peter Kharchenko
28 June 2021
prior biological knowledge, and annotate cells. Problem There’s no standard source of accumulated annotations with associated molecular data for researchers to explore Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
associated molecular data for researchers to explore Problem There’s no standard source of accumulated annotations with associated molecular data for researchers to explore Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
associated molecular data for researchers to explore Problem • Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups. • This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. • There's no medium for researchers to compare annotations across studies, potentially resolving conflicting results. • There’s no central location to access annotations used in publications. Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
associated molecular data for researchers to explore Opportunity A standardized source of annotations could: 1. Make life easier 2. Enable machine learning efforts 3. Facilitate common terminology, including use of standardized ontology terms Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
and store annotations • Infrastructure to accumulate, share, and analyze annotation terms with associated molecular signatures to interpret cellular identities • Centralized repository of cell annotations, empowering researchers to reproduce analyses and to investigate relationship between annotation terms • Encourage researchers to converge upon consensus nomenclature •
2021 • Download annotations in standardized formats • Search published annotations organized by metadata • View alternative versions and explore data • Analyze relationship between terms, including potential synonyms and hierarchies
a paper • Contains one or more datasets • can be from different organisms, platforms, etc. • Can be released to the public (published) • Project publications are tracked, assigned accession IDs that can be cited in the manuscripts • Additional files can be attached to a publication • Published annotations can be downloaded in standardized formats CAP website: Project CAP website: Project
deployed via Cloud Run • Workloads distributed with Cloud Load Balancing • Bioinformatic processing on autoscaled VMs • Event-driven hooks with Pub/Sub messaging • Data structured and stored in CloudSQL
Typescript • Recoil for advanced state management • GraphQL Apollo Client for data management • Tailwind CSS for styling • deck.gl for complex data visualization • TileDB and SQL for data management • ElasticSearch for dataset queries
• Term Editing, with Suggestions • Confidence scores • Integration of Cell Ontology • Helping to Annotate • Annotation Transfer support • Registry of Automated Annotators Jane Cooper Jane Cooper
Collections • Many datasets are integrated • Batch information is critical for ML • Dataset IDs / Cell IDs are poorly defined • Granularity and Annotation Help • Different granularities are needed • Suggestions vs. Batch Automated Annotations Jane Cooper Jane Cooper
(Broad Institute) • Tim Tickle • Helmholtz Zentrum München • EMBL-EBI • eBook Applications Special acknowledgements to: Nick Akhmetov Denis Ilguzin Evan Biederstedt Maxim Svetlakov Colin Maher