HCA Biological Network Seminar, 9 Feb 2023

Evan Biederstedt De fi ning Cell Types and States for
the Human Cell Atlas and Beyond 1 Cell Annotation Platform

Outline 2 1. Motivation 2. CAP overview 3. Current features
& live demo 4. Upcoming features 5. Outreach / questions

Human Cell Atlas v1.0 3

Researchers manually examine prominent molecular patterns in light of prior
biological knowledge, and annotate cells. Motivation: Cell Annotations https://www.nature.com/articles/s41588-021-00818-x 4

https://www.nature.com/articles/s41588-021-00818-x MKI67: gene associated with cellular proliferation HBB: gene associated
with hemoglobin production PAX2: gene associated with kidney development Motivation: Cell Annotations 5

Our community has been manually creating cell annotations. Researchers then
publish fi gures with cell labels in scienti fi c journals Such an approach cannot scale    This approach is not consistent enough to build a precise & accurate atlases Motivation: Cell Annotations 6 https://www.nature.com/articles/s41588-021-00818-x Problem

NK Cells Cytokines Monocytes (A) Need medium for researchers to
compare annotations across studies, potentially resolving con fl icting results.  (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems

Why need medium to compare annotations? 9 • Research groups
end up di ff erent cells by the same entity    • Study the same thing & call it something di ff erent Comparisons between publication plots usually not informative How did research group A de fi ne cell types? Why/how do research groups A & B disagree?

NK Cells Cytokines Monocytes (A) There’s no medium for researchers
to compare annotations across studies, potentially resolving con fl icting results.  (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Problems

Inconsistent terms 11 • Free text • Abbreviations •“Broad” labels
vs “precise” labels https://www.nature.com/articles/s41588-021-00818-x https://www.science.org/doi/10.1126/science.aay0267

NK Cells Cytokines Monocytes (A) There’s no medium for researchers
to compare annotations across studies, potentially resolving con fl icting results.  (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems

Manual annotation alone 13 • Time-consuming  • Error-prone • Reproducibility
issue Not scalable for challenge of reference atlases

Cell Annotation Platform (CAP) • Community-driven platform to create, explore,
and store annotations • Infrastructure to accumulate, share, and analyze annotation terms with associated molecular signatures to interpret cellular identities • Encourage researchers to converge upon consensus nomenclature •   •

Main Components • Data Repository • Annotation Upload and Publication
• Annotation UI: Browse & Create Annotations • “CellCards” Reference Summaries • Cell Annotation Platform (CAP)

Basic User Work fl ow 1. Upload HCA datasets 2.
Generate cell annotations 3. Publish for public 4. Browse cell annotations + molecular signatures 5. Download standardized cell annotations Cell Annotation Platform (CAP)

CAP organization • Workspace: Collaborative “repo” for researchers to organize
datasets  • Publication: Version for public, corresponding to scienti fi c publication • Datasets: Cell annotations with molecular data • Cell Label: Term associated with a cell or molecular subpopulation. Workspace Publication

User roles: • viewer (read-only) • editor (write access) •
owner (administrative) •   • Controlled Data Access 18 Keep your data private until ready!    Save until ready for public, and then publish

• Users roles for collaborative work on annotations  •  
• 19 Controlled Data Access

• Collections of datasets, typically corresponding to a scienti fi
c journal article • Timestamped  • DOIs for citations in journals  • Versioning • Downloaded annotations in standardized formats   Publications 20

• Collections of datasets, typically corresponding to a scienti fi
c journal article • Timestamped  • DOIs for citations in journals  • Versioning • Downloaded annotations in standardized formats   Publications 21

• Timestamped  • DOIs for citations in journals  • Versioning
  22 Cell annotations become more re fi ned Revisions Why Versioning? Need coherent tracking of changes

Workspace • Collaborative space to edit collections of annotations &
other relevant metadata • Advanced user form • Allow user to “hide” irrelevant metadata within dataset • Specify which annotations & which metadata fi elds are relevant  • Allow user to “hide” irrelevant metadata within dataset 23

  • Autocomplete recommendations (with synonyms and related terms) from
Open Biomedical Ontologies  • “Nudges” to encourage consensus and standardization (if possible) but no requirements   Workspace 24

Record: Synonyms & Evidence & Marker Genes • Synonyms  •
Categories   e.g. “CD8+ T cell” is a subset of “T Lymphocyte” • Evidence: Rationales for annotating cell   • List of marker genes used

View: Relationship between sets of cell annotations

Interactive Exploration: Molecular Data For every data on CAP, any
user may:    • Explore the annotations associated with this dataset  • Select cells on embedding  • Explore the heat maps with precalculated DE values for each annotation  • Using the selection tool, select cells and calculate new DE values

Manual Annotation via the Browser Annotate cells via the browser 
  • Users select cells (based either on prede fi ned clusters, or selections via the selection tool), and add cell annotations • UI basis for cell predictions (next slides)

NK Cells Cytokines Monocytes (A) Need medium for researchers to
compare annotations across studies, potentially resolving con fl icting results.  (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems

(A) Need medium for researchers to compare annotations across studies,
potentially resolving con fl icting results.  (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. • Researchers can now explore DEGs de fi ning cell types (on all HCA data hosted on CAP) • UI allows users within BioNetworks to begin resolving di ff erences based on the molecular data (more on this later) Solution

fl (B) Individual research groups end up annotating (potentially millions
of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Solution • Required metadata associated with cell labels • Single string +/- ontology ID not enough Su ffi cient for scienti fi c publication Su ffi cient for building accurate atlases

fl (B) Individual research groups end up annotating (potentially millions
of) cells manually, which results in cells with inconsistent terms and labelings between groups.    (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Solution • Required metadata associated with cell labels • Single string +/- ontology ID not enough • Free-text • Free text label paired with associated Cell Ontology label • Stronger metadata requirements for cell annotations Existing strategies Goal    • Build precise, accurate atlases    • Re fi ne the Cell Ontology

https://www.nature.com/articles/s41556-021-00787-7 • What information would we need to improve the
Cell Ontology?   • What fi elds/information have biologists asked us to see on CAP?  • What information would we need to resolve differences between cell annotations?   • What information could we collect to accurately construct a cell atlas? Solution Annotation schema proposal  Discussions underway 

end  (C) This approach cannot scale Automated predictions

36 Basic User Work fl ow Predictions are hints  Users
must accept/decline/edit these individually    Uncertainty -> manual curation  Yosef Lab

Annotation Transfer Clarke, Z.A., Andrews, T.S., Atif, J. et al.
Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16, 2749–2764 (2021).   • REF dataset used to transfer cell annotations to QUERY dataset    • Promise to overcome bottleneck posed by cell annotations

Annotation Transfer: User Work fl ow Clarke, Z.A., Andrews, T.S.,
Atif, J. et al. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16, 2749–2764 (2021). • User choses model & transfer algorithm  • View predictions imposed on molecular data: accept/ edit/decline  • Publish/share/compare

Annotation Transfer: PopV 39 Yosef Lab Can Ergen-Behr

Annotation Transfer: PopV 40 Yosef Lab Can Ergen-Behr Siletti et
al. 2022

De novo predictions 41

De novo predictions 42 Felix Fischer •(Iterative) training on all
available data  • Current model accurately predicts coarse cell types based on unseen data • Improvements for more granular distinctions

Current 43 • Data Repository • Annotation Upload and Publication
• Annotation UI: Browse & Create Annotations • Annotation predictions • “CellCards” Reference Summaries •

Upcoming 44 • Annotation predictions (talk to us about adding
new models!)   • Improvements to UI for exploring molecular data + manual annotations     • Feature requests tailored to individual BioNetworks e.g. consensus for annotating integrated atlases     • “CellCards” summary pages

45 • BioNetwork user interviews: Work with each HCA BioNetwork
to develop tailored features and assist with human cell atlas annotations • Seminars & hands-on workshops: User can learn to use/navigate CAP for annotating new and/or integrated datasets • BioNetwork annotation jamborees: Leverage CAP to empower BioNetwork to reach consensus annotation for their HCA v1.0 integration e ff orts through jamboree to be organized in collaboration with di ff erent partners (e.g. CZI) CAP Outreach Activities

Cell Annotation Platform (CAP) Success impossible without community participation! Please
contact us for feature requests celltype.info 46

Thank you! •Fabian Theis •Nils Gehlenborg •David Osumi-Sutherland •Aviv Regev
•John Marioni •Peter Kharchenko •Chloé Villani 47

Special thank you! •Can Ergen-Behr •Nir Yosef •Felix Fischer •David
Fischer •Ellen Todres •& many others! 48

Denis Ilguzin Maxim Svetlakov Levon Ghukasyan Michael Loktionov Sultan Arapov
Mary Futey Nick Akhmetov Lusine Barseghyan Tigran Markosjan Konstantin Boyandin Uğur Bayindir Pavel Istomin Dennis Bolgov Andrey Isaev Evan Biederstedt 49

HCA Biological Network Seminar, 9 Feb 2023

HCA Biological Network Seminar, 9 Feb 2023

More Decks by Evan Biederstedt

Featured

Transcript