Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards shared and standardized cell annotations, HCA General Meeting 2021

Evan Biederstedt
June 30, 2021
220

Towards shared and standardized cell annotations, HCA General Meeting 2021

HCA General Meeting 2021
Cell Annotation Platform
Presentation: Towards shared and standardized cell annotations
Speaker: Peter Kharchenko
28 June 2021

Evan Biederstedt

June 30, 2021
Tweet

Transcript

  1. Motivation Researchers manually examine prominent molecular patterns in light of

    prior biological knowledge, and annotate cells. Problem There’s no standard source of accumulated annotations with associated molecular data for researchers to explore Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
  2. Motivation Problem There’s no standard source of accumulated annotations with

    associated molecular data for researchers to explore Problem There’s no standard source of accumulated annotations with associated molecular data for researchers to explore Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
  3. Motivation Problem There’s no standard source of accumulated annotations with

    associated molecular data for researchers to explore Problem • Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups. • This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. • There's no medium for researchers to compare annotations across studies, potentially resolving conflicting results. • There’s no central location to access annotations used in publications. Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
  4. Motivation Problem There’s no standard source of accumulated annotations with

    associated molecular data for researchers to explore Opportunity A standardized source of annotations could: 1. Make life easier 2. Enable machine learning efforts 3. Facilitate common terminology, including use of standardized ontology terms Kameneva et al., Nature Genetics 2021 https://www.nature.com/articles/s41588-021-00818-x
  5. Cell Annotation Platform (CAP) • Community-driven platform to create, explore,

    and store annotations • Infrastructure to accumulate, share, and analyze annotation terms with associated molecular signatures to interpret cellular identities • Centralized repository of cell annotations, empowering researchers to reproduce analyses and to investigate relationship between annotation terms • Encourage researchers to converge upon consensus nomenclature • 

  6. Cell Annotation Platform (CAP) Main Components • Data Repository •

    Annotation Upload and Publication • Annotation UI • “CellCards” Reference Summaries •
  7. Cell Annotation Platform (CAP) Main Components • Data Repository •

    Annotation Upload and Publication • Annotation UI • “CellCards” Reference Summaries •
  8. Cell Annotation Platform (CAP) Jane Cooper Jane Cooper Release, Summer

    2021 • Download annotations in standardized formats • Search published annotations organized by metadata • View alternative versions and explore data • Analyze relationship between terms, including potential synonyms and hierarchies
  9. CAP website: Homepage • Browse Datasets & Cell Labels •

    Upload • Search Quick Queries • Species • Organs • Log-in & Sign up • User authentication & permissions
  10. CAP website: Uploads Upload annotation files and metadata File formats

    we currently support: • AnnData • Seurat • SingleCellExperiment • CAP annotation file format
  11. CAP website: Project • Project: “Repo” for researchers to organize

    annotations • Datasets: Cell annotations with/ without molecular data • Cell Label: Term associated with a cell or molecular subpopulation.
  12. • A collection of dataset annotations • Typically corresponding to

    a paper • Contains one or more datasets • can be from different organisms, platforms, etc. • Can be released to the public (published) • Project publications are tracked, assigned accession IDs that can be cited in the manuscripts • Additional files can be attached to a publication • Published annotations can be downloaded in standardized formats CAP website: Project CAP website: Project
  13. CAP website: Profile • User profile page (based on GitHub)

    • Username & Short Bio • Location to organize projects
  14. CAP website: Dataset Search Browse Datasets Search Quick Queries •

    Project • Dataset • Species • Organs • Assay • Number of Cells examples: Find “mouse brain” datasets. Filter by assay. Find “immune cells, human liver” Find datasets with >1e6 cells
  15. CAP website: Data visualization • Initial Data Exploration UI •

    Analyze annotations & metadata of embeddings (if available) • Selection of clusters, or individual cells • Future releases • Cell x Gene Integration • Annotation UI
  16. Architecture Google Cloud Platform • All applications containerized and versioned,

    deployed via Cloud Run • Workloads distributed with Cloud Load Balancing • Bioinformatic processing on autoscaled VMs • Event-driven hooks with Pub/Sub messaging • Data structured and stored in CloudSQL
  17. Architecture Tech Stack • React + Next.js frameworks implemented with

    Typescript • Recoil for advanced state management • GraphQL Apollo Client for data management
 • Tailwind CSS for styling • deck.gl for complex data visualization • TileDB and SQL for data management • ElasticSearch for dataset queries
  18. Cell Annotation Platform (CAP) Next Steps • Towards Common Nomenclature

    • Term Editing, with Suggestions • Confidence scores • Integration of Cell Ontology
 • Helping to Annotate • Annotation Transfer support • Registry of Automated Annotators Jane Cooper Jane Cooper
  19. Interactive Annotation UI · View molecular data · Differential expression

    analysis · Interactively annotate molecular subpopulations “CellCards” View · Informative reference pages for each cell term · Molecular signatures + marker genes - interactive rows/columns · Related datasets · External links to references Annotation · CellName : cell label, with definition · CellSets: cell labels mapped to cell IDs · Metadata: species, protocol, organ, tissue, etc. Upcoming Features
  20. Cell Annotation Platform (CAP) Questions / Uncertainties • Datasets vs.

    Collections • Many datasets are integrated • Batch information is critical for ML • Dataset IDs / Cell IDs are poorly defined
 • Granularity and Annotation Help • Different granularities are needed • Suggestions vs. Batch Automated Annotations Jane Cooper Jane Cooper
  21. Jane Cooper Development • Anna Hupalowska • Single Cell Portal

    (Broad Institute) • Tim Tickle • Helmholtz Zentrum München • EMBL-EBI • eBook Applications Special acknowledgements to: Nick Akhmetov Denis Ilguzin Evan Biederstedt Maxim Svetlakov Colin Maher