Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HCA Biological Network Seminar, 9 Feb 2023

Evan Biederstedt
February 19, 2023
170

HCA Biological Network Seminar, 9 Feb 2023

HCA Biological Network Seminar: Updates on Atlas Integration and Cell Annotation Platform (CAP)
https://www.youtube.com/watch?v=jGgShE-s_Hg

CAP walkthrough as of Feb 2023:
https://www.youtube.com/watch?v=QfaTrY9Fn-U

Evan Biederstedt

February 19, 2023
Tweet

Transcript

  1. Evan Biederstedt De fi ning Cell Types and States for

    the Human Cell Atlas and Beyond 1 Cell Annotation Platform
  2. Outline 2 1. Motivation 2. CAP overview 3. Current features

    & live demo 4. Upcoming features 5. Outreach / questions
  3. Researchers manually examine prominent molecular patterns in light of prior

    biological knowledge, and annotate cells. Motivation: Cell Annotations https://www.nature.com/articles/s41588-021-00818-x 4
  4. https://www.nature.com/articles/s41588-021-00818-x MKI67: gene associated with cellular proliferation HBB: gene associated

    with hemoglobin production PAX2: gene associated with kidney development Motivation: Cell Annotations 5
  5. Our community has been manually creating cell annotations. Researchers then

    publish fi gures with cell labels in scienti fi c journals Such an approach cannot scale
 
 This approach is not consistent enough to build a precise & accurate atlases Motivation: Cell Annotations 6 https://www.nature.com/articles/s41588-021-00818-x Problem
  6. NK Cells Cytokines Monocytes (A) Need medium for researchers to

    compare annotations across studies, potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems
  7. NK Cells Cytokines Monocytes (A) Need medium for researchers to

    compare annotations across studies, potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems
  8. Why need medium to compare annotations? 9 • Research groups

    end up di ff erent cells by the same entity
 
 • Study the same thing & call it something di ff erent Comparisons between publication plots usually not informative How did research group A de fi ne cell types? Why/how do research groups A & B disagree?
  9. NK Cells Cytokines Monocytes (A) There’s no medium for researchers

    to compare annotations across studies, potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Problems
  10. Inconsistent terms 11 • Free text • Abbreviations •“Broad” labels

    vs “precise” labels https://www.nature.com/articles/s41588-021-00818-x https://www.science.org/doi/10.1126/science.aay0267
  11. NK Cells Cytokines Monocytes (A) There’s no medium for researchers

    to compare annotations across studies, potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems
  12. Manual annotation alone 13 • Time-consuming
 • Error-prone • Reproducibility

    issue Not scalable for challenge of reference atlases
  13. Cell Annotation Platform (CAP) • Community-driven platform to create, explore,

    and store annotations • Infrastructure to accumulate, share, and analyze annotation terms with associated molecular signatures to interpret cellular identities • Encourage researchers to converge upon consensus nomenclature • 
 •
  14. Main Components • Data Repository • Annotation Upload and Publication

    • Annotation UI: Browse & Create Annotations • “CellCards” Reference Summaries • Cell Annotation Platform (CAP)
  15. Basic User Work fl ow 1. Upload HCA datasets 2.

    Generate cell annotations 3. Publish for public 4. Browse cell annotations + molecular signatures 5. Download standardized cell annotations Cell Annotation Platform (CAP)
  16. CAP organization • Workspace: Collaborative “repo” for researchers to organize

    datasets
 • Publication: Version for public, corresponding to scienti fi c publication • Datasets: Cell annotations with molecular data • Cell Label: Term associated with a cell or molecular subpopulation. Workspace Publication
  17. User roles: • viewer (read-only) • editor (write access) •

    owner (administrative) • 
 • Controlled Data Access 18 Keep your data private until ready!
 
 Save until ready for public, and then publish
  18. • Collections of datasets, typically corresponding to a scienti fi

    c journal article • Timestamped
 • DOIs for citations in journals
 • Versioning • Downloaded annotations in standardized formats 
 Publications 20
  19. • Collections of datasets, typically corresponding to a scienti fi

    c journal article • Timestamped
 • DOIs for citations in journals
 • Versioning • Downloaded annotations in standardized formats 
 Publications 21
  20. • Timestamped
 • DOIs for citations in journals
 • Versioning

    
 22 Cell annotations become more re fi ned Revisions Why Versioning? Need coherent tracking of changes
  21. Workspace • Collaborative space to edit collections of annotations &

    other relevant metadata • Advanced user form • Allow user to “hide” irrelevant metadata within dataset • Specify which annotations & which metadata fi elds are relevant
 • Allow user to “hide” irrelevant metadata within dataset 23
  22. 
 • Autocomplete recommendations (with synonyms and related terms) from

    Open Biomedical Ontologies
 • “Nudges” to encourage consensus and standardization (if possible) but no requirements 
 Workspace 24
  23. Record: Synonyms & Evidence & Marker Genes • Synonyms
 •

    Categories 
 e.g. “CD8+ T cell” is a subset of “T Lymphocyte” • Evidence: Rationales for annotating cell 
 • List of marker genes used
  24. Interactive Exploration: Molecular Data For every data on CAP, any

    user may:
 
 • Explore the annotations associated with this dataset
 • Select cells on embedding
 • Explore the heat maps with precalculated DE values for each annotation
 • Using the selection tool, select cells and calculate new DE values
  25. Manual Annotation via the Browser Annotate cells via the browser


    
 • Users select cells (based either on prede fi ned clusters, or selections via the selection tool), and add cell annotations • UI basis for cell predictions (next slides)
  26. NK Cells Cytokines Monocytes (A) Need medium for researchers to

    compare annotations across studies, potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive, accurate references with a standardized nomenclature. Problems
  27. (A) Need medium for researchers to compare annotations across studies,

    potentially resolving con fl icting results.
 (B) Individual research groups end up annotating (potentially millions of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. • Researchers can now explore DEGs de fi ning cell types (on all HCA data hosted on CAP) • UI allows users within BioNetworks to begin resolving di ff erences based on the molecular data (more on this later) Solution
  28. fl (B) Individual research groups end up annotating (potentially millions

    of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Solution • Required metadata associated with cell labels • Single string +/- ontology ID not enough Su ffi cient for scienti fi c publication Su ffi cient for building accurate atlases
  29. fl (B) Individual research groups end up annotating (potentially millions

    of) cells manually, which results in cells with inconsistent terms and labelings between groups.
 
 (C) This approach cannot scale. We need a solution for creating comprehensive references with a standardized nomenclature for all species. Solution • Required metadata associated with cell labels • Single string +/- ontology ID not enough • Free-text • Free text label paired with associated Cell Ontology label • Stronger metadata requirements for cell annotations Existing strategies Goal
 
 • Build precise, accurate atlases
 
 • Re fi ne the Cell Ontology
  30. https://www.nature.com/articles/s41556-021-00787-7 • What information would we need to improve the

    Cell Ontology? 
 • What fi elds/information have biologists asked us to see on CAP?
 • What information would we need to resolve differences between cell annotations? 
 • What information could we collect to accurately construct a cell atlas? Solution Annotation schema proposal
 Discussions underway

  31. 36 Basic User Work fl ow Predictions are hints
 Users

    must accept/decline/edit these individually
 
 Uncertainty -> manual curation
 Yosef Lab
  32. Annotation Transfer Clarke, Z.A., Andrews, T.S., Atif, J. et al.

    Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16, 2749–2764 (2021). 
 • REF dataset used to transfer cell annotations to QUERY dataset
 
 • Promise to overcome bottleneck posed by cell annotations
  33. Annotation Transfer: User Work fl ow Clarke, Z.A., Andrews, T.S.,

    Atif, J. et al. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16, 2749–2764 (2021). • User choses model & transfer algorithm
 • View predictions imposed on molecular data: accept/ edit/decline
 • Publish/share/compare
  34. De novo predictions 42 Felix Fischer •(Iterative) training on all

    available data
 • Current model accurately predicts coarse cell types based on unseen data • Improvements for more granular distinctions
  35. Current 43 • Data Repository • Annotation Upload and Publication

    • Annotation UI: Browse & Create Annotations • Annotation predictions • “CellCards” Reference Summaries •
  36. Upcoming 44 • Annotation predictions (talk to us about adding

    new models!) 
 • Improvements to UI for exploring molecular data + manual annotations 
 
 • Feature requests tailored to individual BioNetworks e.g. consensus for annotating integrated atlases 
 
 • “CellCards” summary pages
  37. 45 • BioNetwork user interviews: Work with each HCA BioNetwork

    to develop tailored features and assist with human cell atlas annotations • Seminars & hands-on workshops: User can learn to use/navigate CAP for annotating new and/or integrated datasets • BioNetwork annotation jamborees: Leverage CAP to empower BioNetwork to reach consensus annotation for their HCA v1.0 integration e ff orts through jamboree to be organized in collaboration with di ff erent partners (e.g. CZI) CAP Outreach Activities
  38. Thank you! •Fabian Theis •Nils Gehlenborg •David Osumi-Sutherland •Aviv Regev

    •John Marioni •Peter Kharchenko •Chloé Villani 47
  39. Denis Ilguzin Maxim Svetlakov Levon Ghukasyan Michael Loktionov Sultan Arapov

    Mary Futey Nick Akhmetov Lusine Barseghyan Tigran Markosjan Konstantin Boyandin Uğur Bayindir Pavel Istomin Dennis Bolgov Andrey Isaev Evan Biederstedt 49