Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LCG20

 LCG20

Leonardo Collado-Torres

August 30, 2023
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. @lcolladotor
    lcolladotor.github.io
    lcolladotor.github.io/bioc_team_ds
    Studying the human prefrontal
    cortex transcriptome at different
    resolutions
    Leonardo Collado Torres, Investigator
    #LCG20 at LCG-UNAM 󰐏
    August 31 2023
    Slides available at speakerdeck.com/lcolladotor

    View Slide

  2. Interesadxs → Usarixs → Desarrolladores
    ¿Cómo le hacemos para fomentar este paso
    en México y Latino América?

    View Slide

  3. • 2017:
    ○ idea en BioC2017 e inicio de la fundación de CDSB
    • 2018:
    ○ primer taller ^_^, con instructores de Bioconductor: Martin Morgan &
    Benilton Carvalho
    • 2019:
    ○ BioC2019: apoyo a solicitud de becas
    ○ Taller con materiales adaptados de RStudio
    • 2020:
    ○ regutools: primer paquete en Bioconductor
    ○ Taller con RStudio & Bioconductor
    • 2021:
    ○ primera vez con 2 talleres https://comunidadbioinfo.github.io/

    View Slide

  4. View Slide

  5. https://doi.org/10.1093/bioinformatics/btaa575

    View Slide

  6. Junta Directiva CDSB hasta 2019

    View Slide

  7. comunidadbioinfo.github.io

    View Slide

  8. @lcolladotor
    lcolladotor.github.io
    lcolladotor.github.io/bioc_team_ds
    Studying the human prefrontal
    cortex transcriptome at different
    resolutions
    Leonardo Collado Torres, Investigator
    #LCG20 at LCG-UNAM 󰐏
    August 31 2023
    Slides available at speakerdeck.com/lcolladotor

    View Slide

  9. doi.org/10.1016/j.biopsych.2020.06.005
    Michael Gandal
    @mikejg84
    Transcriptomic
    Insight Into the
    Polygenic
    Mechanisms
    Underlying
    Psychiatric
    Disorders

    View Slide

  10. Background: Human DLPFC
    10
    slideshare.net
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  11. Zoom in: base pair resolution
    Jeff Leek
    @jtleek
    Ph.D. advisor
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor

    View Slide

  12. lcolladotor.github.io/talk/lcg2014/
    #LCG10 fue organizado por
    Esperanza Martínez Romero
    et al.

    View Slide

  13. Flexible expressed region analysis for RNA-seq with #derfinder
    doi.org/10.1093/nar/gkw852
    Jeff Leek
    @jtleek
    Ph.D. advisor

    View Slide

  14. Fetal Infant
    Child Teen
    Adult 50+
    6 / group, N = 36
    Discovery data
    Postmortem Human Brain Samples
    Fetal Infant
    Child Teen
    Adult 50+
    6 / group, N = 36
    Replication data
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    doi.org/10.1038/nn.3898
    github.com/leekgroup/libd_n36

    View Slide

  15. doi.org/10.1038/nn.3898
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    github.com/leekgroup/libd_n36

    View Slide

  16. Zoom in: more data!
    Ben Langmead
    @BenLangmead
    Abhinav Nellore
    @nellore (GitHub)
    Christopher Wilks
    @chrisnwilks
    Shannon Ellis
    @Shannon_E_Ellis
    Kasper Daniel Hansen
    @KasperDHansen
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor
    + LIBD former
    boss
    Jeff Leek
    @jtleek
    Ph.D. advisor

    View Slide

  17. doi.org/10.1038/543007a

    View Slide

  18. expression data for ~70,000 human samples
    samples
    phenotypes
    ?
    GTEx
    N=9,962
    TCGA
    N=11,284
    SRA
    N=49,848
    samples
    expression
    estimates
    gene
    exon
    junctions
    ERs
    Answer meaningful
    questions about
    human biology and
    expression
    slide adapted from Shannon Ellis
    Reproducible RNA-seq analysis using #recount2
    + Improving the value of public RNA-seq expression data by phenotype prediction
    doi.org/10.1038/nbt.3838
    doi.org/10.1093/nar/gky102

    View Slide

  19. recount3: over 700,000 human and mouse RNA-seq samples
    #recount3: summaries and queries for large-scale RNA-seq expression and splicing
    Christopher Wilks
    @chrisnwilks
    research.libd.org/recount3-docs/
    doi.org/10.1186/s13059-021-02533-6

    View Slide

  20. Zoom in:
    snRNA-seq → deconvolution of
    bulk RNA-seq
    Matthew N Tran
    @mattntran
    Kristen R Maynard
    @kr_maynard
    Louise A Huuki-Myers
    @lahuuki
    Keri Martinowich
    @martinowk
    Stephanie C Hicks
    @stephaniehicks

    View Slide

  21. What is Deconvolution?
    ● Inferring the composition of
    different cell types in a bulk
    RNA-seq data
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  22. Interaction eQTLs with cell type proportions
    github.com/LieberInstitute/goesHyde_mdd_rnaseq/tree/master/eqtl/code

    View Slide

  23. Reference Single Cell
    Data
    23
    deconvolution(Y, Z) = Proportion of Cell Types
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  24. 10x snRNA-seq Reference Data
    AMY DLPFC HPC NAc sACC
    Astro 1638 782 1170 1099 907
    Endo 31 0 0 0 0
    Macro 0 10 0 22 0
    Micro 1168 388 1126 492 784
    Mural 39 18 43 0 0
    Oligo 6080 5455 5912 6134 4584
    OPC 1459 572 838 669 911
    Tcell 31 9 26 0 0
    Excit 443 2388 623 0 4163
    Inhib 3117 1580 366 11476 3974
    @mattntran
    Matthew N Tran
    doi.org/10.1016/j.neuron.2021.09.001

    View Slide

  25. Marker Finding
    25
    deconvolution(Y, Z) = Proportion of Cell Types
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  26. 1vAll Markers vs. Mean Ratio Markers
    26
    Louise A
    Huuki-Myers
    @lahuuki
    research.libd.org/DeconvoBuddies/

    View Slide

  27. 1vAll Markers vs. Mean Ratio Markers
    27
    Louise A
    Huuki-Myers
    @lahuuki
    research.libd.org/DeconvoBuddies/

    View Slide

  28. Methods
    28
    deconvolution(Y, Z) = Proportion of Cell Types
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  29. Which Method is the Most Accurate?
    ● Benchmarking shows that different methods perform best on
    different data sets (Cobos et al, Nature Communications, 2020)
    ● Benchmarking results from different papers on “real” data
    ○ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT
    ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a)
    ○ Bisque paper: Bisque > MuSiC > CIBERSORT
    ■ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category
    (Fig 4)
    ○ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT
    ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b)
    ○ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq
    ■ Human PMBC flow sorted (Fig 7)
    29
    Louise A
    Huuki-Myers
    @lahuuki

    View Slide

  30. Results + Validation
    30
    deconvolution(Y, Z) = Proportion of Cell Types
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  31. Mean Proportions By Region: Tran et al, bioRxiv, 2020 (5 donors, 6 cell types)
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  32. Peric =
    Mural + Endo
    Mean Proportions By Region: Tran et al, Neuron, 2021 (8 donors, 10 cell types)
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  33. Sean Maden
    @MadenSean
    Sang Ho Kwon
    @sanghokwon17
    #deconvochallenge doi.org/10.48550/arXiv.2305.06501

    View Slide

  34. #deconvochallenge
    Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes
    using single cell RNA-sequencing datasets
    doi.org/10.48550/arXiv.2305.06501
    Sean Maden
    @MadenSean

    View Slide

  35. Sean Maden
    @MadenSean
    Sang Ho Kwon
    @sanghokwon17
    #deconvochallenge
    doi.org/10.48550/arXiv.2305.06501

    View Slide

  36. research.libd.org/TREG/
    doi.org/10.1101/2022.04.28.489923
    Louise A
    Huuki-Myers
    @lahuuki

    View Slide

  37. Zoom in: spatial omics
    Kristen R Maynard
    @kr_maynard
    Keri Martinowich
    @martinowk
    Stephanie C Hicks
    @stephaniehicks
    Andrew E Jaffe
    @andrewejaffe
    Stephanie C Page
    @CerceoPage

    View Slide

  38. Visium Platform for Spatial Gene Expression
    Image from 10x Genomics
    - A slide contains 4 capture areas, each full of thousands of 55um-wide “spots”
    (often containing 1-10 cells)
    - Unique barcodes in each spot bind to particular genes; after sequencing, gene
    expression can be tied back to exact spots, forming a spatial map
    Kristen R. Maynard
    38

    View Slide

  39. 2 pairs spatial adjacent replicates x subject = 12 sections
    39
    Subject 1
    Subject 2
    Subject 3
    Adjacent spatial replicates (0μm) Adjacent spatial replicates (300μm)
    PCP4
    Maynard, Collado-Torres, et al, Nat Neuro, 2021

    View Slide

  40. “Pseudo-bulking” collapses data: spot to layer level
    40
    Maynard, Collado-Torres, et al, Nat Neuro, 2021

    View Slide

  41. bioconductor.org/packages/spatialLIBD
    Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w
    Maynard, Collado-Torres, 2021 DOI 10.1038/s41593-020-00787-0
    Brenda Pardo Abby Spangler
    @PardoBree @abspangler
    Louise A. Huuki-Myers
    @lahuuki

    View Slide

  42. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    Andrew E Jaffe
    @andrewejaffe
    Kristen R Maynard
    @kr_maynard
    Keri Martinowich
    @martinowk

    View Slide

  43. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    DOI 10.1093/bioinformatics/btac299
    Since Feb 2020
    spatialLIBD::fetch_data()
    provides access to
    SpatialExperiment
    R/Bioconductor objects
    Stephanie C Hicks
    @stephaniehicks
    Lukas M Weber
    @lmweber

    View Slide

  44. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    twitter.com/CrowellHL/status/1597579271945715717
    DOI 10.1093/bioinformatics/btac299
    Since Feb 2020
    spatialLIBD::fetch_data()
    provides access to
    SpatialExperiment
    R/Bioconductor objects

    View Slide

  45. #spatialDLPFC
    45
    doi.org/10.1101/2023.02.15.528722
    Louise A
    Huuki-Myers
    @lahuuki
    Abby Spangler
    @abspangler
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)

    View Slide

  46. Different Resolutions of
    BayesSpace Clustering
    k = number of clusters
    ● k=2: separate white vs. grey matter
    ● k=9: best reiterated histological layers
    ● k=16: data-driven optimal k based on
    fast H+ statistic
    46
    More Clusters = More Complexity
    doi.org/10.1101/2023.02.15.528722

    View Slide

  47. Spatial Registration Adds Anatomical Context
    ● Validate detection of laminar structure
    ● Correlate enrichment t-statistics for top marker genes of reference
    ○ Cluster vs. manual annotation
    ● Annotate with strongly associated histological layer
    47
    Sp
    k
    D
    d
    ~L
    doi.org/10.1101/2023.02.15.528722

    View Slide

  48. Spatial Registration of
    Spatial Domains
    ● Map SpDs to Maynard et al. manual
    annotated layers
    ● Highlight most strongly associated
    histological layer to add biological
    context
    48
    doi.org/10.1101/2023.02.15.528722

    View Slide

  49. Identify Layer Associated Neuron Populations
    49
    ● Apply Spatial Registration with
    manual layers
    ● 13 layer-level cell types
    ○ Assign Excitatory Neurons
    histological layers
    ○ Pool other cell type groups
    Kelsey D Montgomery

    View Slide

  50. Layer-level Cell Type Marker genes
    50
    Louise A
    Huuki-Myers
    @lahuuki

    View Slide

  51. Spot Deconvolution
    51
    Cell 1 Cell 2 … Cell N
    Gene 1 0 0 … 0
    Gene 2 2 5 … 3
    … … … … …
    Gene i 1 0 … 0
    Spot 1 Spot 2 … Spot M
    Gene 1 1 0 … 3
    Gene 2 0 1 … 0
    … … … … …
    Gene j 4 2 … 2
    Astro Excit … Inhib
    Spot 1 1 1 … 1
    Spot 2 …
    … … … … …
    Spot M 1 0 … 2
    Single- Nucleus
    Spatial
    Deconvolved Results
    Spot 1
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)

    View Slide

  52. Existing Spot Deconvolution Software
    - Explored 3 novel software methods from the literature
    Software name Overall
    approach
    Input Cell
    Counts
    Output
    Tangram
    (Biancalani et al.)
    Mapping
    individual cells
    Every spot Integer counts
    Cell2location
    (Kleshchevnikov et al.)
    Matching
    gene-expression
    profile
    Average across
    spots
    Decimal
    counts
    SPOTlight
    (Elosua-Bayes et al.)
    Matching
    gene-expression
    profile
    Not used Proportions
    52
    Excit L5 Counts

    View Slide

  53. Benchmarking Spot Deconvolution Software: Theory
    - How do we measure performance or accuracy of cell-type
    predictions?
    - Make orthogonal measurements*: image-derived counts
    - Leverage prior knowledge: neurons localize to gray matter?
    - Self-consistency of results: broad vs. fine cell-type results
    53
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)
    doi.org/10.1101/2023.02.15.528722

    View Slide

  54. Visium Spatial
    Proteogenomics
    (SPG) Images as an
    Orthogonal
    Measurement
    54
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)

    View Slide

  55. Visium Spatial Proteogenomics (Visium-SPG)
    Visium-SPG = Visium SRT + immunofluorescence
    (using identical tissue samples)
    Sang Ho Kwon
    @sanghokwon17

    View Slide

  56. Visium Spatial Proteogenomics (Visium-SPG)
    - Gene expression captured like ordinary Visium
    - Multi-channel fluorescent images captured of the
    same tissue
    - Channels measure proteins marking for specific cell types
    Kristen R. Maynard
    56 Sang Ho Kwon
    Visium-SPG = Visium SRT + immunofluorescence
    (using identical tissue samples)
    Fluorescent Protein Cell Type
    TMEM119 Microglia
    Neun Neurons
    OLIG2 Oligodendrocytes
    GFAP Astrocytes

    View Slide

  57. 57
    Max across layers
    Not max
    Benchmark Results:
    Leverage Prior Knowledge

    View Slide

  58. Benchmark Results: Leverage Prior Knowledge
    58

    View Slide

  59. Benchmark Summary
    59
    Metric Tangram Cell2location SPOTlight Metric Type
    Avg. cor (spot-level) 0.31 0.30 0.21 Orthogonal measurements
    Avg. RMSE (spot-level) 1.35 1.24 1.3 Orthogonal measurements
    Overall prop.: (KL Div.) 0.44 0.49 0.41 Orthogonal measurements
    Overall prop.: (cor.) 0.46 0.37 0.47 Orthogonal measurements
    Overall prop.: (RMSE) 3020 3890 3040 Orthogonal measurements
    Histological mapping 0.69 0.77 0.23 Leverage known biology
    Broad vs. layer (cor.) 1.00 0.77 -0.36 Self-consistency of results
    Broad vs. layer (RMSE) 102 4200 4220 Self-consistency of results

    View Slide

  60. Viewing Spot Deconvolution Results: Samui Browser
    - View:
    - Fluorescence
    channels
    - Spot deconvo results
    - Segmented cells
    - Gene expression
    - Interactive
    - Quickly zoom/scroll
    - Full-resolution
    images
    samuibrowser.com/from?url=data2.loopybrowser.com/VisiumIF/&s=Br2720_Ant_IF&s=Br6432_Ant_IF&s=Br6522_Ant_IF&s=Br8667_Post_IF
    Sriworarat, 2023.
    60

    View Slide

  61. Viewing Spot
    Deconvolution
    Results: spatialLIBD
    apps
    - View:
    - spot deconvolution results
    - spatial domains/ clusters
    - gene expression
    - Huge amount of
    aesthetic customization
    61
    https://libd.shinyapps.io/spatialDLPFC_Visium_SPG/

    View Slide

  62. How Spot
    Deconvolution
    Results Were
    Used
    A. Better characterize unsupervised
    spatial domains
    B. Cell-cell communication;
    cell-type-informed ligand-receptor
    interactions in the context of
    schizophrenia risk
    A
    62
    Boyi Guo Melissa Grant-Peters

    View Slide

  63. Visium spatial clustering works for variables with high %
    variance explained. But what about other ones?
    DOI: 10.1038/s41593-020-00787-0

    View Slide

  64. twitter.com/sanghokwon17/status/1650589385379962881 from 2023-04-24
    Sang Ho Kwon
    @sanghokwon17
    DOI: 10.1101/2023.04.20.537710
    #Visium_SPG_AD

    View Slide

  65. Experimental design & study overview
    Braak V-VI & CERAD frequent
    Sang Ho Kwon

    View Slide

  66. AD pathology signal is too small to detect by
    spatially-resolved gene expression alone research.libd.org/Visium_SPG_AD/

    View Slide

  67. Identifying transcriptional signatures of AD-related neuropathology
    Sang Ho Kwon

    View Slide

  68. Some challenges ⚠ with
    Visium
    68

    View Slide

  69. sc/snRNA-seq QC metrics such as # detected genes, # UMI,
    mitochondria expression % are likely biologically related!

    View Slide

  70. Prashanthi Ravichandran
    @prashanthi-ravichandran (GH)
    Artifacts in general are normalized away by library size,
    though there are caveats

    View Slide

  71. Diffusion issues: could be related to
    permeabilization step

    View Slide

  72. Having more data is useful to provide context!
    Here 4 new samples have low sequencing saturation (outliers) but
    are within range of good samples from other studies

    View Slide

  73. Having more data is useful to provide context!
    Those 4 samples have great median UMI counts per spot ^_^

    View Slide

  74. Software keeps evolving and as leaders in the field we aim to
    use the best methods
    74
    Moses, L., Pachter, L. Museum of spatial transcriptomics. Nat Methods 19,
    534–546 (2022). https://doi.org/10.1038/s41592-022-01409-2

    View Slide

  75. The Development Process
    - Making a module
    - New, experimental software can change dramatically (function and
    syntax) between versions
    - Promotes collaboration by allowing two researchers to share exact
    code and instantly run software without special set-up
    SpatialExperiment release 3.14
    SpatialExperiment devel 3.15
    module load tangram/1.0.2
    module load cell2location/0.8a0
    module load spagcn/1.2.0
    https://github.com/LieberInstitute/jhpce_mod_source
    https://github.com/LieberInstitute/jhpce_module_config
    Nicholas J Eagles
    @Nick-Eagles (GitHub)

    View Slide

  76. The Development Process
    - Regular interaction with software
    authors to clarify functionality and
    report bugs
    - Documentation for code and author
    responsiveness on GitHub can be
    critical in successfully applying
    software to our data
    Nicholas J Eagles
    @Nick-Eagles (GitHub)

    View Slide

  77. Documentation + wrapper functions + tests (GitHub Actions +
    Bioconductor)
    77
    bioconductor.org/packages/spatialLIBD

    View Slide

  78. More challenges
    ahead
    Working with multiple
    capture areas per tissue
    Nicholas J Eagles
    @Nick-Eagles (GitHub)
    Prashanthi Ravichandran
    @prashanthi-ravichandran (GH)
    Spot
    diameter
    error:
    ~1.8 →
    ~1.1
    Another
    pair:
    ~2.8 →
    ~0.76

    View Slide

  79. lcolladotor.github.io/#projects
    ● Every assay has caveats
    ● We re-use tricks:
    think adding 0, multiplying by 1
    ● It nearly always takes a team
    ● Data sharing accelerates science +
    democratizes access to it
    ● Zooming in allows us to reduce the
    heterogeneity
    ● We can learn from each other: from
    uniformly processing our data & re-using
    it → replicate / validate?

    View Slide

  80. @MadhaviTippani
    Madhavi Tippani
    @HeenaDivecha
    Heena R Divecha
    @lmwebr
    Lukas M Weber
    @stephaniehicks
    Stephanie C Hicks
    @abspangler
    Abby Spangler
    @martinowk
    Keri Martinowich
    @CerceoPage
    Stephanie C Page
    @kr_maynard
    Kristen R Maynard
    @lcolladotor
    Leonardo Collado-Torres
    @Nick-Eagles (GH)
    Nicholas J Eagles
    Kelsey D Montgomery
    Sang Ho Kwon
    Image Analysis
    Expression Analysis
    Data Generation
    Thomas M Hyde
    @lahuuki
    Louise A Huuki-Myers
    @BoyiGuo
    Boyi Guo
    @mattntran
    Matthew N Tran
    @sowmyapartybun
    Sowmya Parthiban
    Slides available at
    speakerdeck.com
    /lcolladotor
    + Many more LIBD, JHU, and
    external collaborators
    @mgrantpeters
    Melissa Grant-Peters
    @prashanthi-ravichandran (GH)
    Prashanthi Ravichandran

    View Slide

  81. lcolladotor.github.io
    @lcolladotor

    View Slide