$30 off During Our Annual Pro Sale. View Details »

UCL-ICH-2023

 UCL-ICH-2023

"Navigating human brain gene expression measurements at different resolutions to study psychiatric disorders" seminar on 2023-05-22 at UCL Great Ormond Street Institute of Child Health

Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. @lcolladotor
    lcolladotor.github.io
    lcolladotor.github.io/bioc_team_ds
    Navigating human brain gene
    expression measurements at
    different resolutions to study
    psychiatric disorders
    Leonardo Collado Torres, Investigator
    UCL Great Ormond Street Institute of Child Health
    May 22 2023
    Slides available at speakerdeck.com/lcolladotor

    View Slide

  2. doi.org/10.1016/j.biopsych.2020.06.005
    Michael Gandal
    @mikejg84
    Transcriptomic
    Insight Into the
    Polygenic
    Mechanisms
    Underlying
    Psychiatric
    Disorders

    View Slide

  3. https://twitter.com/lcolladotor/status/1506998412809412612
    Keri Martinowich
    @martinowk

    View Slide

  4. Zoom in: base pair resolution
    Jeff Leek
    @jtleek
    Ph.D. advisor
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor

    View Slide

  5. Slide adapted from Ben Langmead

    View Slide

  6. #recountWorkflow: Accessing over 70,000 human RNA-seq samples with Bioconductor
    doi.org/10.12688/f1000research.12223.1

    View Slide

  7. #recountWorkflow
    recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor
    doi.org/10.12688/f1000research.12223.1

    View Slide

  8. Flexible expressed region analysis for RNA-seq with #derfinder
    doi.org/10.1093/nar/gkw852
    Jeff Leek
    @jtleek
    Ph.D. advisor

    View Slide

  9. Fetal Infant
    Child Teen
    Adult 50+
    6 / group, N = 36
    Discovery data
    Postmortem Human Brain Samples
    Fetal Infant
    Child Teen
    Adult 50+
    6 / group, N = 36
    Replication data
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    doi.org/10.1038/nn.3898
    github.com/leekgroup/libd_n36

    View Slide

  10. doi.org/10.1038/nn.3898
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    github.com/leekgroup/libd_n36

    View Slide

  11. DERs outside of “known genes”
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    doi.org/10.1038/nn.3898
    github.com/leekgroup/libd_n36

    View Slide

  12. doi.org/10.1038/nn.3898 BrainSpan data
    Developmental regulation of
    human cortex transcription and its clinical relevance at single base resolution
    github.com/leekgroup/libd_n36

    View Slide

  13. doi.org/10.1038/s41593-018-0197-y
    Andrew E Jaffe
    @andrewejaffe
    LIBD former
    boss
    Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis
    LIBD BrainSEQ Phase 1
    eqtl.brainseq.org/phase1/

    View Slide

  14. doi.org/10.1038/s41593-018-0197-y
    Andrew E Jaffe
    @andrewejaffe
    Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis
    LIBD BrainSEQ
    Phase 1
    eqtl.brainseq.org/phase1/

    View Slide

  15. doi.org/10.1016/j.neuron.2019.05.013
    Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus
    across Development and Schizophrenia
    LIBD BrainSEQ
    Phase 2: DLPFC + HPC
    eqtl.brainseq.org/phase2/

    View Slide

  16. doi.org/10.1016/j.neuron.2019.05.013
    Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus
    across Development and Schizophrenia
    LIBD BrainSEQ
    Phase 2:
    DLPFC + HPC
    eqtl.brainseq.org/
    phase2/

    View Slide

  17. doi.org/10.1016/j.neuron.2019.05.013
    Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus
    across Development and Schizophrenia
    LIBD BrainSEQ
    Phase 2:
    DLPFC + HPC
    eqtl.brainseq.org/
    phase2/

    View Slide

  18. doi.org/10.1016/j.neuron.
    2019.05.013
    Regional Heterogeneity in
    Gene Expression,
    Regulation, and Coherence
    in the Frontal Cortex and
    Hippocampus across
    Development and
    Schizophrenia
    LIBD BrainSEQ
    Phase 2: DLPFC + HPC
    eqtl.brainseq.org/phase2/
    doi.org/10.1073/pnas.1617384114
    qSVA framework for RNA quality
    correction in differential expression
    analysis
    Amy Peterson
    @amptrsn

    View Slide

  19. doi.org/10.1016/j.neuron.2019.05.013
    LIBD BrainSEQ Phase 2: DLPFC +
    HPC
    eqtl.brainseq.org/phase2/
    48 Supplementary Figures 😅
    data.mendeley.com/datasets/3j93ybf4md/1
    DLPFC_donor
    i
    HPC_donor
    i
    g
    1
    5 10
    g
    2
    6 12
    … … …
    g
    k
    10 20

    View Slide

  20. Zoom in: more data!
    Ben Langmead
    @BenLangmead
    Abhinav Nellore
    @nellore (GitHub)
    Christopher Wilks
    @chrisnwilks
    Shannon Ellis
    @Shannon_E_Ellis
    Kasper Daniel Hansen
    @KasperDHansen
    Andrew E Jaffe
    @andrewejaffe
    Ph.D. co-advisor
    + LIBD former
    boss
    Jeff Leek
    @jtleek
    Ph.D. advisor

    View Slide

  21. doi.org/10.1038/nrg.2017.113
    #RailRNA
    Cloud computing for genomic data analysis and collaboration
    Ben Langmead
    @BenLangmead
    Abhinav Nellore
    @nellore (GitHub)

    View Slide

  22. doi.org/10.1038/543007a

    View Slide

  23. https://jhubiostatistics.shinyapps.io/recount/
    doi.org/10.1038/nbt.3838

    View Slide

  24. expression data for ~70,000 human samples
    samples
    phenotypes
    ?
    GTEx
    N=9,962
    TCGA
    N=11,284
    SRA
    N=49,848
    samples
    expression
    estimates
    gene
    exon
    junctions
    ERs
    Answer meaningful
    questions about
    human biology and
    expression
    slide adapted from Shannon Ellis
    Reproducible RNA-seq analysis using #recount2
    + Improving the value of public RNA-seq expression data by phenotype prediction
    doi.org/10.1038/nbt.3838
    doi.org/10.1093/nar/gky102

    View Slide

  25. SRA phenotype information is far from complete
    SubjectID Sex Tissue Race Age
    6620 NA female liver NA NA
    6621 NA female liver NA NA
    6622 NA female liver NA NA
    6623 NA female liver NA NA
    6624 NA female liver NA NA
    6625 NA male liver NA NA
    6626 NA male liver NA NA
    6627 NA male liver NA NA
    6628 NA male liver NA NA
    6629 NA male liver NA NA
    6630 NA male liver NA NA
    6631 NA NA blood NA NA
    6632 NA NA blood NA NA
    6633 NA NA blood NA NA
    6634 NA NA blood NA NA
    6635 NA NA blood NA NA
    6636 NA NA blood NA NA
    z z z
    z
    slide adapted from shannon ellis
    Shannon Ellis
    @Shannon_E_Ellis

    View Slide

  26. Category Frequency
    F 95
    female 2036
    Female 51
    M 77
    male 1240
    Male 141
    Total 3640
    Even when information is provided, it’s not always clear…
    sra_meta$Sex
    “1 Male, 2 Female”, “2 Male, 1 Female”, “3
    Female”, “DK”, “male and female” “Male (note:
    ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not
    available”, “not applicable”, “not collected”,
    “not determined”, “pooled male and female”,
    “U”, “unknown”, “Unknown”
    slide adapted from Shannon Ellis
    Shannon Ellis
    @Shannon_E_Ellis
    Improving the value of public RNA-seq expression data by phenotype prediction
    doi.org/10.1093/nar/gky102

    View Slide

  27. install.packages(
    "BiocManager"
    )
    BiocManager::
    install("recount")
    recount::
    download_study()
    load()

    #recountWorkflow: Accessing over 70,000 human RNA-seq samples with Bioconductor
    doi.org/10.12688/f1000research.12223.1

    View Slide

  28. http://research.libd.org/recount-brain/
    doi.org/10.1101/618025
    recount-brain: a curated repository of human brain RNA-seq datasets metadata
    Ashkaun Razmara
    @ashkaun_razmara

    View Slide

  29. http://research.libd.org/recount-brain/
    recount-brain: a curated repository of human brain RNA-seq datasets metadata
    doi.org/10.1101/618025
    RNASE2

    View Slide

  30. related projects
    • Bioconductor recountWorkflow: doi.org/10.12688/f1000research.12223.1
    • Shannon Ellis & Leek: phenotype prediction doi.org/10.1093/nar/gky102
    • Jack Fu & Taub: transcript estimations doi.org/10.1101/247346
    • Madugundu & Pandey (JHU):
    proteomics doi.org/10.1002/pmic.201800315
    • Luidi-Imada & Marchionni (JHU):
    cancer and FANTOM doi.org/10.1101/659490
    • Kuri-Magaña & Martínez-Barnetche (INSP Mexico):
    immune expression doi.org/10.3389/fimmu.2018.02679
    • Ryten (UCL):
    Guelfi: validating expressed regions (ERs) eQTLs
    doi.org/10.1038/s41467-020-14483-x
    Zhang: improving the detection of ERs doi.org/10.1126/sciadv.aay8299
    Mina Ryten
    @MinaRyten ??? 🤔

    View Slide

  31. recount3: over 700,000 human and mouse RNA-seq samples
    #recount3: summaries and queries for large-scale RNA-seq expression and splicing
    Christopher Wilks
    @chrisnwilks
    research.libd.org/recount3-docs/
    doi.org/10.1186/s13059-021-02533-6

    View Slide

  32. #recount3: summaries and queries for large-scale RNA-seq expression and splicing
    rna.recount.bio
    doi.org/10.1186/s13059-021-02533-6

    View Slide

  33. bioconductor.org/
    packages/recount3
    #recount3: summaries and queries for large-scale RNA-seq expression and splicing

    View Slide

  34. bioconductor.org/packages/megadepth
    doi.org/10.1093/bioinformatics/btab152
    #Megadepth: efficient coverage quantification for BigWigs and BAMs
    David Zhang
    @dyzhang32
    Christopher Wilks
    @chrisnwilks

    View Slide

  35. doi.org/10.1093/nar/gkac1056
    doi.org/10.1101/2023.03.29.534370
    #IntroVerse: a comprehensive database of introns across human tissues
    +
    Splicing accuracy varies across human introns, tissues and age
    Sonia García-Ruiz
    @sonigruiz

    View Slide

  36. Zoom in:
    snRNA-seq → deconvolution of
    bulk RNA-seq
    Matthew N Tran
    @mattntran
    Kristen R Maynard
    @kr_maynard
    Louise A Huuki-Myers
    @lahuuki
    Keri Martinowich
    @martinowk
    Stephanie C Hicks
    @stephaniehicks

    View Slide

  37. 10x snRNA-seq Reference Data
    Tran, Maynard et al., Neuron, 2021
    AMY DLPFC HPC NAc sACC
    Astro 1638 782 1170 1099 907
    Endo 31 0 0 0 0
    Macro 0 10 0 22 0
    Micro 1168 388 1126 492 784
    Mural 39 18 43 0 0
    Oligo 6080 5455 5912 6134 4584
    OPC 1459 572 838 669 911
    Tcell 31 9 26 0 0
    Excit 443 2388 623 0 4163
    Inhib 3117 1580 366 11476 3974
    @mattntran
    Matthew N Tran

    View Slide

  38. Sean Maden
    @MadenSean
    Sang Ho Kwon
    @sanghokwon17
    #deconvochallenge

    View Slide

  39. 1vAll Markers vs. Mean Ratio Markers
    39
    Louise A
    Huuki-Myers
    @lahuuki
    research.libd.org/DeconvoBuddies/

    View Slide

  40. Peric =
    Mural + Endo
    Mean Proportions By Region: Tran et al, Neuron, 2021 (8 donors, 10 cell types)
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  41. Motivation
    ● Improve Deconvolution algorithms by considering differences in size and RNA
    content between cell types
    ● Use smFISH with RNAscope to establish data set of:
    ○ Cellular composition
    ○ Nuclei sizes of major cell types
    ○ Average nuclei RNA content of major cell types
    How do we measure total RNA content of a cell if we can only observe a few
    genes at a time? Use a TREG
    Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA
    Abundance in Heterogeneous Cell Types
    research.libd.org/TREG/
    doi.org/10.1101/2022.04.28.489923
    Louise A Huuki-Myers
    @lahuuki

    View Slide

  42. What is a TREG?
    ● Total RNA Expression Gene
    ● Expression is proportional to the
    overall RNA expression in a nucleus
    ● In smFISH the count of TREG
    puncta in a nucleus can estimate
    the RNA content
    Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA
    Abundance in Heterogeneous Cell Types
    research.libd.org/TREG/
    doi.org/10.1101/2022.04.28.489923

    View Slide

  43. #deconvochallenge
    Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes
    using single cell RNA-sequencing datasets
    doi.org/10.48550/arXiv.2305.06501
    Sean Maden
    @MadenSean

    View Slide

  44. Zoom in: spatial omics
    Kristen R Maynard
    @kr_maynard
    Keri Martinowich
    @martinowk
    Stephanie C Hicks
    @stephaniehicks
    Andrew E Jaffe
    @andrewejaffe
    Stephanie C Page
    @CerceoPage

    View Slide

  45. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    Andrew E Jaffe
    @andrewejaffe
    Kristen R Maynard
    @kr_maynard
    Keri Martinowich
    @martinowk

    View Slide

  46. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    DOI 10.1093/bioinformatics/btac299
    Since Feb 2020
    spatialLIBD::fetch_data()
    provides access to
    SpatialExperiment
    R/Bioconductor objects
    Stephanie C Hicks
    @stephaniehicks
    Lukas M Weber
    @lmweber

    View Slide

  47. DOI: 10.1038/s41593-020-00787-0
    twitter.com/lcolladotor/status/1233661576433061888 from 2020-02-29
    twitter.com/CrowellHL/status/1597579271945715717
    DOI 10.1093/bioinformatics/btac299
    Since Feb 2020
    spatialLIBD::fetch_data()
    provides access to
    SpatialExperiment
    R/Bioconductor objects

    View Slide

  48. #spatialDLPFC
    48
    doi.org/10.1101/2023.02.15.528722
    Louise A
    Huuki-Myers
    @lahuuki
    Abby Spangler
    @abspangler
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)

    View Slide

  49. Visium spatial clustering works for variables with high %
    variance explained. But what about other ones?
    DOI: 10.1038/s41593-020-00787-0

    View Slide

  50. twitter.com/sanghokwon17/status/1650589385379962881 from 2023-04-24
    Sang Ho Kwon
    @sanghokwon17
    DOI: 10.1101/2023.04.20.537710
    #Visium_SPG_AD

    View Slide

  51. Hypothesis
    Local tissue microenvironments in close proximity to AD-related
    neuropathology have distinct cellular and molecular signatures. Sang Ho Kwon

    View Slide

  52. Visium Spatial Proteogenomics (Visium-SPG)
    Visium-SPG = Visium SRT + immunofluorescence
    (using identical tissue samples)
    Sang Ho Kwon

    View Slide

  53. Experimental design & study overview
    Braak V-VI & CERAD frequent
    Sang Ho Kwon

    View Slide

  54. AD pathology signal is too small to detect by
    spatially-resolved gene expression alone research.libd.org/Visium_SPG_AD/

    View Slide

  55. Estimating pathological burden per spot
    to generate transcriptome-scale maps of AD pathology

    View Slide

  56. Identifying transcriptional signatures of AD-related neuropathology
    Sang Ho Kwon

    View Slide

  57. bioconductor.org/packages/spatialLIBD
    Pardo et al, 2022 DOI 10.1186/s12864-022-08601-w
    Maynard, Collado-Torres, 2021 DOI 10.1038/s41593-020-00787-0
    Brenda Pardo Abby Spangler
    @PardoBree @abspangler
    Louise A. Huuki-Myers
    @lahuuki

    View Slide

  58. Zoom in: transcripts?
    work in progress

    View Slide

  59. Boxplots of non-DE genes with DE tx
    Ankrd11: acts in head
    morphogenesis; expressed in
    cerebral cortex
    Trpc4: acts upstream of or within
    gamma-aminobutyric acid
    secretion and oligodendrocyte
    differentiation; expressed in brain
    Scaf11: predicted to be involved in
    spliceosomal complex assembly;
    expressed in diencephalon lateral
    wall ventricular layer; ; midbrain
    ventricular layer; and telencephalon
    ventricular layer
    Daianna Gonzalez-Padilla
    @daianna_glez

    View Slide

  60. Boxplots of DEG with Up and Down DE tx
    Dcun1d5: predicted to be involved in
    protein modification by small protein
    conjugation or removal, protein
    neddylation, and regulation of cell
    growth; expressed in NS
    Pnisr: predicted to be active in
    presynaptic active zone;
    expressed in NS

    View Slide

  61. spliced alignment
    (HISAT2, STAR)
    RNA
    sequencing
    (paired reads)
    exon2
    exon1 exon3
    exon1 exon3
    exon2 genome sequence
    GT AG GT AG
    exon3
    exon1
    isoform1
    isoform1
    alignments
    isoform2
    alignments
    isoform2
    transcript assembly
    (Cufflinks,StringTie) exon2 exon3
    exon1
    exon3
    exon1
    isoform1
    isoform2
    1
    2
    3
    Transcript reconstruction from read mappings to the genome
    exons & introns do not have to
    be defined in the reference
    annotation
    captures potentially "novel"
    isoforms

    View Slide

  62. Transcriptional noise makes transcript reconstruction difficult - inflation of "novel" transfrags
    read alignments
    assembled transfrags
    (transcript fragments)
    observed junctions
    read coverage

    View Slide

  63. Figure 4: RNA quality surrogate variable assessment of Lieber Institute Datasets. Comparing gene-level degradation effects in the full degradation
    experiment (all regions) vs. t-statistic from Differential expression of case vs. schizophrenia for five Lieber Institute publicly available datasets (rows
    TODO supp table) over six different models (columns). Backgrounds shaded by value of absolute correlation.
    Joshua M Stolz
    @JoshStolz2
    Hédia Tnani
    @TnaniHedia
    #qsvaR

    View Slide

  64. Correlate Proportion Cell Type vs. qSVs
    Louise A
    Huuki-Myers
    @lahuuki

    View Slide

  65. Figure 5: Effect of correcting models on reproducibility of differential expression. The replication rate
    between over p-value cutoffs for all available models for A. BSP1 and BSP2 DLPFC B. CMC and BSP1 C. CMC
    and BSP2 DLPFC Joshua M Stolz
    @JoshStolz2

    View Slide

  66. doi.org/10.1186/s12859-021-04142-3
    Nicholas J Eagles
    @Nick-Eagles
    (GitHub)
    Upcoming LIBD Data portal built on top of SPEAQeasy & PopTop & BiocMAP

    View Slide

  67. lcolladotor.github.io/#projects
    ● Every assay has caveats
    ● We re-use tricks:
    think adding 0, multiplying by 1
    ● It nearly always takes a team
    ● Data sharing accelerates science +
    democratizes access to it
    ● Zooming in allows us to reduce the
    heterogeneity
    ● We can learn from each other: from
    uniformly processing our data & re-using
    it → replicate / validate?

    View Slide

  68. https://youtu.be/33scakbTNO0

    View Slide

  69. Another type of data science
    group
    Leonardo Collado Torres
    lcolladotor.github.io
    2020-08-19

    View Slide

  70. There is increasingly more data & tools
    - Greater demand for data skills:
    wrangling, visualization,
    analysis
    - LIBD itself generates results
    that are large data collections
    - Greater demand across LIBD
    scientists to learn how to work
    with data
    https://ceramics.org/ceramic-tech-today/supercomputer-
    powered-materials-database-unleashes-data-deluge
    … and many more

    View Slide

  71. Protected time goes both ways
    - You need protected time to learn, guide, build training material
    - You need time also for collaborating
    - It’s important to respect both and plan accordingly
    2 20% 80% research

    View Slide

  72. jhpce.jhu.edu/knowledge-base/knowledge-base-articles-from-lieber-institute/
    research.libd.org/rstatsclub/
    Join us Fridays at 9 AM (check the code of conduct
    please!)

    View Slide

  73. www.youtube.com/@lcolladotor/playlists
    Videos allow us to multiply
    ourselves
    We can make you custom
    selections of videos for a
    specific problem on DSgs
    sessions

    View Slide

  74. 20 chapters and counting!
    lcolladotor.github.io/bioc_team_ds

    View Slide

  75. Melissa
    Grant-Peters
    @mgrantpeters
    Endorsed by
    your UCL
    colleague:

    View Slide

  76. lcolladotor.github.io/pkgs
    lcolladotor.github.io/biocthis

    View Slide

  77. @MadhaviTippani
    Madhavi Tippani
    @HeenaDivecha
    Heena R Divecha
    @lmwebr
    Lukas M Weber
    @stephaniehicks
    Stephanie C Hicks
    @abspangler
    Abby Spangler
    @martinowk
    Keri Martinowich
    @CerceoPage
    Stephanie C Page
    @kr_maynard
    Kristen R Maynard
    @lcolladotor
    Leonardo Collado-Torres
    @Nick-Eagles (GH)
    Nicholas J Eagles
    Kelsey D Montgomery
    Sang Ho Kwon
    Image Analysis
    Expression Analysis
    Data Generation
    Thomas M Hyde
    @lahuuki
    Louise A Huuki-Myers
    @BoyiGuo
    Boyi Guo
    @mattntran
    Matthew N Tran
    @sowmyapartybun
    Sowmya Parthiban
    Slides available at
    speakerdeck.com
    /lcolladotor
    + Many more LIBD, JHU, and
    external collaborators
    @mgrantpeters
    Melissa Grant-Peters
    @prashanthi-ravichandran (GH)
    Prashanthi Ravichandran

    View Slide

  78. #GBD23 Thank you for having us over in the UK 󰏅!

    View Slide

  79. lcolladotor.github.io
    @lcolladotor

    View Slide