Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Effective and Comparative Methods for Single-Ce...

Fritz Lekschas
November 13, 2024

Effective and Comparative Methods for Single-Cell Embedding Visualizations

Fritz Lekschas

November 13, 2024
Tweet

More Decks by Fritz Lekschas

Other Decks in Science

Transcript

  1. November 13, 2024 Effective and Comparative Methods for Single-Cell Embedding

    Visualizations Fritz Lekschas Head of Visualization Research at Ozette Technologies lekschas.de linkedin.com/in/flekschas 1 Visual Analytics Lab at Tufts University
  2. ! MASSIVE SHOUT OUTS! Trevor Manz, PhD Candidate at HMS

    First author of CEV paper and former Ozette intern Evan Greene, Ozette Co-Founder–––––––– First author and creator of data transformation methods –––––––– Nezar Abdennur, Asst. Prof. at UMASS MED Long term collaborator and embedding nerd Arpan Neupane, Principal Computational Biologist–––––––– helps me better understand immunology–––––––– 2
  3. 3 EDUCATION PhD '21 in CS from Harvard University MSc

    '16 in Bioinformatics from Freie Universität Berlin RESEARCH Visualization Human-Centered ML Design WORK Head of Visualization Research at Ozette
  4. 5 Data from Mair et al., 2022. Nature. From To

    General Cell Types High-Resolution Cell Phenotypes Well-Resolved High-Resolution Cell Phenotypes Cytotoxic T Cells T Helper Cells B Cells Naïve T Cells
  5. 6 Data from Mair et al., 2022. Nature. Cytotoxic T

    Cells T Helper Cells B Cells Naïve T Cells Healthy Tissue Cancer Tissue
  6. Single-Cell Embeddings Greene et al., 2021, Patterns. Granja et al.,

    2020, Nature Biotechnology. FEATURES Chromatin Accessibility Peaks FEATURES Cell-Surface Antibodies FEATURES Genes Tabula Sapiens Consortium, 2022, Science.
  7. Why Visualize Embeddings? OVERVIEW Broad distribution & cell heterogeneity Hypothesis

    Generation COMPARE Relative similarity of cell populations Trajectory Analysis CLUSTER Identify cell types/phenotypes Annotate clusters
  8. Why Visualize Embeddings? OVERVIEW Broad distribution & cell heterogeneity Hypothesis

    Generation COMPARE Relative similarity of cell populations Trajectory Analysis CLUSTER Identify cell types/phenotypes Annotate clusters Sheih et al., 2020, Nature Communications.
  9. Why Visualize Embeddings? OVERVIEW Broad distribution & cell heterogeneity Hypothesis

    Generation COMPARE Relative similarity of cell populations Trajectory Analysis CLUSTER Identify cell types/phenotypes Annotate clusters CD4+ Tabula Sapiens Consortium, 2022, Science.
  10. Why Visualize Embeddings? OVERVIEW Broad distribution & cell heterogeneity Hypothesis

    Generation COMPARE Relative similarity of cell populations Trajectory Analysis CLUSTER Identify cell types/phenotypes Annotate clusters CD4 Expression CD4+ CD3 Expression CD3+ Greene et al., 2021, Patterns PD-1 Expression PD-1+ HLADR Expression HLADR+
  11. Visualization Challenges CLUSTER RESOLUTION Focus on general or specific cellular

    phenotypes? SAMPLE COMPARISON How to handle batch effects and aligning embeddings? vs CD8+ T Cells T Helper Cells B Cells Naive T Cells Sample B Sample A
  12. Visualization Challenges CLUSTER RESOLUTION Focus on general or specific cellular

    phenotypes? SAMPLE COMPARISON How to handle batch effects and aligning embeddings? Sample B Sample A EXPLORATION VS EXPLANATION Is the visualization a representation of the clustering? vs CD8+ T Cells T Helper Cells B Cells Naive T Cells
  13. 15 Cytotoxic T Cells T Helper Cells B Cells Naïve

    T Cells Healthy Tissue Cancer Tissue Data from Mair et al., 2022. Nature.
  14. FAUST Annotation + Clustering ANNOTATE Define expression levels E.g.: Positive

    / Negative Fully interpretable clusters Greene et al., 2021, Pattern.
  15. FAUST Annotation + Clustering ANNOTATE Define expression levels E.g.: Positive

    / Negative Fully interpretable clusters Greene et al., 2021, Pattern.
  16. FAUST Annotation + Clustering ANNOTATE Define expression levels E.g.: Positive

    / Negative Fully interpretable clusters Greene et al., 2021, Pattern.
  17. Data Transformation FOR EACH PHENOTYPE: 1. Remove outlier expression values

    Winsorize to [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value
  18. Data Transformation 0. Raw Expression FOR EACH PHENOTYPE: 1. Remove

    outlier expression values Winsorize to [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value CD3+ CD4+ CD8-
  19. Data Transformation 0. Raw Expression FOR EACH PHENOTYPE: 1. Remove

    outlier expression values Winsorize to [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value CD3+ CD4+ CD8-
  20. FOR EACH PHENOTYPE: 1. Remove outlier expression values Winsorize to

    [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value Data Transformation 0. Raw Expression 1. Winsorized Expression CD3+ CD4+ CD8-
  21. Data Transformation FOR EACH PHENOTYPE: 1. Remove outlier expression values

    Winsorize to [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value 0. Raw Expression 1. Winsorized Expression 2. Normalized Expression CD3+ CD4+ CD8-
  22. Data Transformation FOR EACH PHENOTYPE: 1. Remove outlier expression values

    Winsorize to [1th, 99th] percentile 2. Remove inter marker differences Normalize to zero mean and unit variance 3. Align marker expressions by their expression level Translate mean to a fixed value 0. Raw Expression 1. Winsorized Expression 2. Normalized Expression 3. Translated Expression CD3+ CD4+ CD8-
  23. Untransformed Transformed Tumor sample 6 from Mair et al., 2022,

    Nature. CD38 Expression Difference CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38- CD127- Tim3- CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38+ CD127- Tim3-
  24. Untransformed Transformed Tumor sample 6 from Mair et al., 2022,

    Nature. CD38 Expression Difference CD38- CD38+ CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38- CD127- Tim3- CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38+ CD127- Tim3-
  25. Untransformed Transformed Tumor sample 6 from Mair et al., 2022,

    Nature. CD38 Expression Difference CD38- CD38+ CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38- CD127- Tim3- CD4- CD8+ CD3+ CD45RA- CD27+ CD19- CD103+ CD28+ CD69+ PD1+ HLADR- GranzymeB- CD25- ICOS- TCRgd- CD38+ CD127- Tim3- “Our study suggest that increased CD38 expression defines tumor-infiltrating CD8+ T cells been pre-activated …” Wu et al., 2021, Cancer Immunology, Immunotherapy.
  26. Joint Embedding Data from Mair et al., 2022, Nature. Untransformed

    Transformed Tumor 27 Tissue 138 Mair et al., 2022, Nature. CD8- CD4+ CD45RA- CD27+ CD103- CD69- CD28 + HLADR+ GranzymeB- PD1+ CD25+ ICOS+ TCRgd- CD38+ Tim3+
  27. SEMI-CONCLUSION • “Tune” the data and the embedding method •

    Use a data transformation close to your objective • The annotation transformation is not bound to FAUST
  28. 36 Cytotoxic T Cells T Helper Cells B Cells Naïve

    T Cells Healthy Tissue Cancer Tissue Data from Mair et al., 2022. Nature.
  29. SAME DATA Data from Mair et al., 2022, Nature. Seed

    42 Seed 123 High Visual Similarity
  30. SAME DATA Data from Mair et al., 2022, Nature. Seed

    42 Seed 123 High Visual Similarity Jaccard similarity Different set sizes for Jaccard similarity in kNN graphs Cumulative Probability Point-wise Similarity Low Jaccard Similarity
  31. TUMOR TISSUE How can we facilitate more effective and systematic

    comparisons of these complex 2D scatters?
  32. TUMOR TISSUE How can we facilitate more effective and systematic

    comparisons of these complex 2D scatters? " challenge: establish meaningful relationships between points in different views
  33. Class-based comparison • Compare groups of points rather than individual

    points • Flexible comparisons at various abstraction levels • Key considerations: • Intermixing / separation • Similarity / cohesion of neighbor groups • Shifts in relative size (for data comparison) \
  34. Class-based comparison • Compare groups of points rather than individual

    points • Flexible comparisons at various abstraction levels • Key considerations: • Intermixing / separation • Similarity / cohesion of neighbor groups • Shifts in relative size (for data comparison) \ Where do class labels come from?
  35. Class-based comparison • Compare groups of points rather than individual

    points • Flexible comparisons at various abstraction levels • Key considerations: • Intermixing / separation • Similarity / cohesion of neighbor groups • Shifts in relative size (for data comparison) \ Where do class labels come from? External metadata (e.g., ground truth) Unsupervised methods (e.g., clustering algorithms) Can be hierarchical (animal: # $ % ..., fruit: & ' ( ...)
  36. Embedding Confusion Neighborhood Size Confusion: the degree of intermixing between

    points of the same label and others. Orange Plum Lime Blueberry Core
  37. Neighborhood stability: the degree to which local neighbors are shared

    between visualizations. Embedding Confusion Neighborhood Size Orange Plum Lime Blueberry Context
  38. Size: the change in relative class-label sizes with respect to

    the neighborhood. Embedding Confusion Neighborhood Size Orange Plum Lime Blueberry Combined
  39. Methodology • Create Delauney graph • For each label: conduct


    breadth-first search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set
  40. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  41. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  42. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  43. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  44. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  45. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  46. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  47. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  48. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  49. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  50. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology Candidate Confusion Set for Yellow
  51. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology Confusion Distance Adjustment Distance Cutoff
  52. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology Confusion Distance Adjustment Final Confusion Set for Yellow
  53. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology
  54. • Create Delauney graph • For each label: conduct
 breadth-first

    search for every point with that label • Points within one hop account to label confusion set • Points with 1+ hop and not in the confusion set account for neighborhood set Methodology Neighborhood Set for Yellow
  55. Methodology Neighborhood connectivity-based adjustment Scale neighborhood strength of each neighboring

    label by: 1. Average number of connections between all labels 2. Average distances of connections between all labels 5 connections to blue 2 connections to gray 1 connection to green and purple
  56. Methodology Neighborhood connectivity-based adjustment Scale neighborhood strength of each neighboring

    label by: 1. Average number of connections between all labels 2. Average distances of connections between all labels Neighborhood Likelihoods for Yellow 1.0 0.3 0.6 0.8
  57. VS

  58. In summary • We compare embedding visualizations based on class

    labels • Can be defined dynamically and different levels of abstraction • Addresses limitations of traditional point-based methods • Evaluation study: Guided comparisons increase confidence in findings • Note: Intended to complement embedding quality assessment methods \
  59. Thanks! What’s Next? How to compare more than two embedding

    plots? How to conditionally and dynamically balance feature importance? How to dynamically and continuously adjust local vs global patterns? November 13, 2024 Visual Analytics Lab at Tufts University