$30 off During Our Annual Pro Sale. View Details »

bsp2-sobp2018

 bsp2-sobp2018

Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. 11
    Unique Molecular Correlates of
    Schizophrenia and Its Genetic Risk in
    the Hippocampus Compared to Frontal
    Cortex
    L. Collado-Torres
    @fellgernon @LieberInstitute #SOBP18
    Andrew Jaffe’s data science team

    View Slide

  2. THE NEED
    URGENT UNMET NEED
    Neuropsychiatric disorders cost the U.S. Economy > $80 billion/year.
    Neuropsychiatric conditions are the leading cause of disability
    in young people worldwide.
    70%
    70% of youth in thejuvenile
    justice system are living
    with at least one mental
    healthcondition.
    Traumatic brain injury is the
    leading cause of long-term
    disability in children andadults
    younger than 35years.
    1 in 4 experience mentalillness
    in a givenyear.
    More veterans die of suicide
    than in combat at rate of 20
    suicides perday.
    2

    View Slide

  3. 2018
    Elon Musk
    put a Tesla in
    space
    THE SCIENTIFIC FRONTIER
    65+ YEARS: WAITING FOR A BREAKTHROUGH
    Molecular targets of all current psychotherapeutic drugs are the same as their 1950’s prototypes.
    1957
    Sputnik I
    1952
    Discovery of
    Antipsychotic
    Chlorpromazine
    (DRD2 blockade)
    2018
    Antipsychotics
    for treatment
    of schizophrenia
    all work via
    DRD2 blockade
    ?
    3

    View Slide

  4. Animal Models Neuronal Cell Models
    Drug Discovery
    New Treatments
    2300+
    Human postmortem brains
    1000+
    Cell lines from individuals
    Genomics + Transcriptomics + Proteomics
    8
    Mechanisms of Illness
    Clinical Genetics
    BrainSeq: A Human Brain Genomics Consortium
    THE SCIENTIFIC FRONTIER

    View Slide

  5. Animal Models Neuronal Cell Models
    Drug Discovery
    New Treatments
    2300+
    Human postmortem brains
    1000+
    Cell lines from individuals
    Genomics + Transcriptomics + Proteomics
    8
    Mechanisms of Illness
    Clinical Genetics
    BrainSeq: A Human Brain Genomics Consortium
    DLPFC
    495 samples
    BrainSeq Phase I
    polyA+
    Jaffe et al., Nature Neuroscience, 2018
    THE SCIENTIFIC FRONTIER

    View Slide

  6. Animal Models Neuronal Cell Models
    Drug Discovery
    New Treatments
    2300+
    Human postmortem brains
    1000+
    Cell lines from individuals
    Genomics + Transcriptomics + Proteomics
    8
    Mechanisms of Illness
    Clinical Genetics
    BrainSeq: A Human Brain Genomics Consortium
    DLPFC
    495 samples
    BrainSeq Phase I
    polyA+
    DLPFC
    453 samples
    HIPPO
    447 samples
    BrainSeq Phase II
    RiboZero
    THE SCIENTIFIC FRONTIER
    Jaffe et al., Nature Neuroscience, 2018

    View Slide

  7. DATA
    7
    BrainSeq Phase II RNA-seq samples
    DLPFC HIPPO total
    adult (age >= 18) 374 370 744
    prenatal 29 28 57
    0 <= age < 18 50 49 99
    total 453 447 900
    • Non-psychiatric control and schizophrenia affected individuals
    • Two brain regions: dorsolateral prefrontal cortex and hippocampus
    All
    samples

    View Slide

  8. DATA
    8
    BrainSeq Phase II RNA-seq samples: by case status
    DLPFC HIPPO total
    adult 222 238 460
    prenatal 29 28 57
    0 <= age < 18 49 48 97
    total 300 314 614
    DLPFC HIPPO total
    adult 152 132 284
    prenatal 0 0 0
    0 <= age < 18 1 1 2
    total 153 133 286
    Control
    Schizophrenia
    cases

    View Slide

  9. DATA ANALYSIS
    9
    Focus on being conservative
    1.Use well established processing methods
    2.Apply strict expression cutoffs
    3.Use replication when possible
    4.Adjust for RNA quality degradation confounding
    • Using the qSVA method
    5.Avoid potential batch effects
    • Drop problematic samples
    6.Take into account correlation at the individual
    level

    View Slide

  10. RNA-SEQ APPROACH
    10
    BrainSeq Phase II
    Pre-natal
    Adult
    ACTG
    Birth
    Unaffected
    Controls
    Patients with
    Schizophrenia
    RNA Sequencing
    Genotyping
    + +
    +
    Gene Exons Expressed Regions
    Transcripts Junctions
    Age
    CC CA AA
    SZ CONT
    DLPFC
    HIPPO
    + region differences

    View Slide

  11. DATA ANALYSIS
    11
    Main processing steps
    1.Quality check (QC) on raw reads (FastQC)
    2.Failed QC? Then trim reads (Trimmomatic)
    3.Align reads to the genome (HISAT2)
    4.Count features (featureCounts + others)
    5.Calculate coverage (bam2wig)
    6.Quantify transcripts (Salmon)
    7. Create count tables (R)
    8.Genotype samples (samtools + vcftools)
    L. Collado-Torres & Emily E. Burke

    View Slide

  12. DATA ANALYSIS
    12
    Main processing steps
    1. Quality check (QC) on raw reads (FastQC)
    2. Failed QC? Then trim reads (Trimmomatic)
    3. Align reads to the genome (HISAT2)
    4. Count features (featureCounts + others)
    5. Calculate coverage (bam2wig)
    6. Quantify transcripts (Salmon)
    7. Create count tables (R):
    RangedSummarizedExperiment objects
    8. Genotype samples (samtools + vcftools)
    L. Collado-Torres & Emily E. Burke
    Nextflow version in preparation
    with Winter Genomics

    View Slide

  13. DATA ANALYSIS
    13
    Filter features with low expression


















    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
    0.0 0.2 0.4 0.6 0.8 1.0
    20000 30000 40000 50000
    mean expression cutoff
    Number of features > cutoff
    Feature Cutoff Unit
    Gene 0.25 RPKM
    Exon 0.30 RPKM
    Jxn 0.46 RP10M
    Tx 0.38 TPM
    Gene
    • Used mean across all 900 samples
    • All analyses with filtered data
    L. Collado-Torres
    Mean RPKM
    # genes > cutoff

    View Slide

  14. DATA ANALYSIS
    14
    Differential expression models
    • Region-specific for adult or fetal ages
    • Using only adult samples or only prenatal samples
    • Test for differences between DLFPC and HIPPO
    • Development
    • Linear age splines with breakpoints at developmental stages
    • Test for interaction between age and brain region at these splines
    • Case-control
    • By brain region
    • Test for differences between non-psychiatric controls and individuals with
    schizophrenia
    • For the first two models, we account for the fact that an individual can have two
    correlated samples: one for each brain region

    View Slide

  15. DATA ANALYSIS
    15
    Differential expression models
    • Region-specific for adult or prenatal ages
    • Alternative: !"#$ = &'
    + )*+ + ,+" +
    ∑./0
    1 23#45.
    + 6789:)8+ + 898);<227*3+=>+3+ + :?@ + :+*793
    • Development
    • Alternative: !"#$ = &'
    + )*+ ∗ :+*793 + B+8); ∗ :+*793 + C7$8ℎ ∗ :+*793 +
    73B)38 ∗ :+*793 + Eℎ7;= ∗ :+*793 + 8++3 ∗ :+*793 + )=F;8 ∗ :+*793 + ,+" +
    ∑./0
    1 23#45.
    + 6789:)8+ + 898);<227*3+=>+3+ + :?@ + :+*793
    • Case-control
    • Alternative: !"#$ = &'
    + )*+ + ,+" + 6789:)8+ + $:@+
    ∑./0
    1 23#45.
    + 898);<227*3+=>+3+ + :?@ + $+*793,#+E7B7EK,L2 + M7)*39272

    View Slide

  16. DATA ANALYSIS
    16
    Using BrainSpan for replication: region-specific model
    • P-value < 0.05 in BrainSpan, consistent direction
    Similar results for the development model
    L. Collado-Torres
    adult fetal
    exon gene jxn
    p<0.05
    p<0.01
    p<0.001
    p<1e−04
    p<1e−05
    p<1e−06
    p<0.05
    p<0.01
    p<0.001
    p<1e−04
    p<1e−05
    p<1e−06
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    p−value threshold
    Replication rate

    View Slide

  17. DATA ANALYSIS
    17
    Region-specific by age results
    • For adults: 1,612 DE genes with Bonferroni < 0.01 that replicate in BrainSpan
    • For prenatal samples: 32 DE genes
    L. Collado-Torres
    23
    6
    2
    0
    3
    0
    3
    gene exon
    jxn
    DE features grouped by gene id (prenatal)
    328
    422
    778
    23
    839
    647
    388
    gene exon
    jxn
    DE features grouped by gene id (adult)

    View Slide

  18. DLPFC HIPPO
    1.5 2.0 2.5 3.0 3.5 4.0 4.5
    ENSG00000250479.8 CHCHD10 FDR 4.04e−06
    log2(CPM + 0.5) − covariate effects removed
    DATA ANALYSIS
    18
    Region-specific by age results: examples with top results
    Adult
    Prenatal
    L. Collado-Torres
    DLPFC HIPPO
    0 2 4 6 8
    ENSG00000268089.2 GABRQ FDR 2.87e−84
    log2(CPM + 0.5) − covariate effects removed

    View Slide

  19. DATA ANALYSIS
    19
    Region-specific by age results: adult-only enriched biological processes
    L. Collado-Torres
    G−protein coupled serotonin receptor signaling pathway
    serotonin receptor signaling pathway
    pallium development
    cGMP−mediated signaling
    small GTPase mediated signal transduction
    regulation of small GTPase mediated signal transduction
    positive regulation of GTPase activity
    behavior
    regulation of GTPase activity
    axon development
    axonogenesis
    regulation of membrane potential
    regulation of hormone levels
    positive regulation of neuron projection development
    positive regulation of cell development
    regulation of cell morphogenesis
    positive regulation of neuron differentiation
    regulation of neuron projection development
    positive regulation of neurogenesis
    positive regulation of nervous system development
    cell morphogenesis involved in neuron differentiation
    G:E:J
    (714)
    E:J
    (583)
    G:E
    (260)
    GeneRatio
    0.02
    0.04
    0.06
    0.08
    0.01
    0.02
    0.03
    0.04
    0.05
    p.adjust
    ontology: BP
    G: gene, E: exon, J: exon-exon junction

    View Slide

















  20. ● ●











































    Pluripotency Glial
    Fetal
    Infant
    C
    hild
    Teens
    Adult
    50+
    Fetal
    Infant
    C
    hild
    Teens
    Adult
    50+
    0.4
    0.6
    0.8
    0.0
    0.1
    0.2
    0.3
    Age Group
    Cell Type Proportion
    Region
    DLPFC
    Hippo
    DATA ANALYSIS
    20
    Development model: similar composition in prenatal across regions by DNAm
    Stephen A. Semick

    View Slide

  21. DATA ANALYSIS
    21
    Development model results
    • 5,982 (~55%) genes contain differentially expressed exons and splice junctions that
    replicated in BrainSpan (Bonferroni < 1%)
    L. Collado-Torres
    2354
    2260
    1762
    243
    5982
    8501
    1558
    gene exon
    jxn
    DE features grouped by gene id

    View Slide

  22. DATA ANALYSIS
    22
    Development model results
    • Normalized expression over age for GABRD
    L. Collado-Torres
    −2 0 2 4 6
    log2(CPM + 0.5)
    14 16 18 20 22
    PCW
    0.0 0.2 0.4 2 4 6 8 12 14 16 18 2020 30 40 50 50 55 60 65 70 75 80 85
    ENSG00000187730.7 GABRD p−bonf 0
    Age
    DLPFC HIPPO

    View Slide

  23. DATA ANALYSIS
    23
    Development model results
    • Normalized expression over age after removing effect of terms from the null model
    L. Collado-Torres
    For more check LieberInstitute/jaffelab::cleaningY()
    #rstats
    −2 −1 0 1 2 3
    log2(CPM + 0.5) − covariate effects removed
    14 16 18 20 22
    PCW
    0.0 0.2 0.4 2 4 6 8 12 14 16 18 2020 30 40 50 50 55 60 65 70 75 80 85
    ENSG00000187730.7 GABRD p−bonf 0
    Age
    DLPFC HIPPO

    View Slide

  24. qSVA WORKFLOW
    24 Slide adapted from Amy Peterson
    Jaffe et al., PNAS, 2017
    Model 1 (6429 genes)
    Log2 FC Dx
    Log2 FC Degradation

    View Slide

  25. qSVA WORKFLOW
    25 Slide adapted from Amy Peterson

    View Slide

  26. qSVA WORKFLOW
    26 Slide adapted from Amy Peterson

    View Slide

  27. qSVA WORKFLOW
    27
    PCA
    Slide adapted from Amy Peterson

    View Slide

  28. DEqual HIPPO
    28
    Model 1 (6429 genes)
    Model 1. Naïve model
    E / = 1
    0
    + 1
    1
    45
    DEqual plots demonstrate effectiveness of statistical correction
    HIPPO
    333 samples
    r = 0.412
    Slide adapted from Amy Peterson
    Log2 FC Dx
    Log2 FC Degradation

    View Slide

  29. DEqual HIPPO
    29
    Model 1 (6429 genes) Model 2 (63 genes)
    Model 1. Naïve model
    E / = 10
    + 11
    45
    Model 2. Added RNA-quality and demographic covariates
    E / = 10
    + 11
    45 + 12
    FGH + 13
    JH5+ 14
    LMNOPFNH+ 15
    RPSTRFte + 16
    GHVHPFNH
    + 17
    PXS + ∑Z[\
    ] γM
    JV_`aM
    DEqual plots demonstrate effectiveness of statistical correction
    HIPPO
    333 samples
    r = 0.412 r = 0.0712
    Slide adapted from Amy Peterson
    Log2 FC Dx Log2 FC Dx, adj. cov
    Log2 FC Degradation
    Log2 FC Degradation

    View Slide

  30. DEqual HIPPO
    30
    Model 1 (6429 genes) Model 2 (63 genes) Model 3 (48 genes)
    Model 1. Naïve model
    E / = 10
    + 11
    45
    Model 2. Added RNA-quality and demographic covariates
    E / = 10
    + 11
    45 + 12
    FGH + 13
    JH5+ 14
    LMNOPFNH+ 15
    RPSTRFte + 16
    GHVHPFNH
    + 17
    PXS + ∑Z[\
    ] γM
    JV_`aM
    Model 3. Added qSVs
    E / = 10
    + 11
    45 + 12
    FGH + 13
    JH5+ 14
    LMNOPFNH+ 15
    RPSTRFte + 16
    GHVHPFNH
    + 17
    PXS + ∑Z[\
    ] γM
    JV_`aM
    + ∑Z[\
    d eM
    fghM
    DEqual plots demonstrate effectiveness of statistical correction
    HIPPO
    333 samples
    r = 0.412 r = 0.0712 r = -0.00173
    Slide adapted from Amy Peterson
    Log2 FC Dx Log2 FC Dx, adj. cov Log2 FC Dx, adj qSVs
    Log2 FC Degradation
    Log2 FC Degradation
    Log2 FC Degradation

    View Slide

  31. DATA ANALYSIS
    31
    Case-control: by region
    • 48 DE genes at FDR <5% in hippocampus, 243 in
    DLPFC (FDR <5%) suggesting regional
    heterogeneity of the molecular correlates of
    schizophrenia diagnosis
    • DLPFC results agree with BrainSeq Phase I
    354
    0
    242
    25
    0
    0
    0
    11
    146
    0
    0 0
    0
    150
    0
    HIPPO_control HIPPO_schizo
    DLPFC_control DLPFC_schizo
    DLPFC FDR10%, HIPPO FDR20%
    −6 −4 −2 0 2 4 6
    −6 −4 −2 0 2 4 6
    t−statistic DLPFC
    t−statistic BSP1
    r = 0.809
    L. Collado-Torres & Amy Peterson
    −6 −4 −2 0 2 4
    −4 −2 0 2 4
    t−statistic HIPPO
    t−statistic DLPFC
    r = 0.644
    DLPFC Ctrl > SCZD DLPFC Ctrl < SCZD
    HIPPO Ctrl < SCZD
    HIPPO Ctrl > SCZD
    t-stat HIPPO t-stat DLPFC
    t-stat DLPFC
    t-stat DLPFC BSP1

    View Slide

  32. DATA ANALYSIS
    32
    Case-control: by region
    • Only enrichment in Control > SCZD
    • Immune processes
    L. Collado-Torres & Amy Peterson
    regulation of lymphocyte proliferation
    neutrophil chemotaxis
    regulation of B cell activation
    positive regulation of leukocyte activation
    B cell receptor signaling pathway
    myeloid leukocyte activation
    positive regulation of cell activation
    B cell activation
    lymphocyte activation
    lymphocyte migration
    response to organophosphorus
    positive regulation of T cell activation
    leukocyte chemotaxis
    positive regulation of cell adhesion
    leukocyte migration
    cell chemotaxis
    positive regulation of cell−cell adhesion
    protein folding
    protein folding in endoplasmic reticulum
    H_c
    (136)
    D_c
    (294)
    GeneRatio
    0.04
    0.06
    0.08
    0.10
    0.005
    0.010
    0.015
    0.020
    0.025
    p.adjust
    ontology: BP
    HIPPO control > SCZD DLPFC control > SCZD

    View Slide

  33. DATA ANALYSIS
    33
    HIPPO eQTLs
    • 11,237,357 eQTL associations (FDR <1%) across genes, exons and junctions
    corresponding to 17,719 genes
    Emily E. Burke
    2061
    3183
    2163
    274
    5945
    2915
    1178
    gene exon
    jxn
    eQTLs grouped by gene id
    0 1 2
    0 1 2 3
    NDRG4
    chr16:58509353−58512046(+) (Jxn)
    rs42945
    Residualized Expression
    p=5.21e−42
    0 1 2
    0.0 0.5 1.0 1.5 2.0 2.5
    NDRG4
    ENST00000565981.5 (Tx)
    rs42945
    Residualized Expression
    p=5.43e−30
    rs7188697 in NDRG4 has also been associated by
    Watanabe et al., J Clin Psychopharmacol., 2017
    Includes 26 risk SNPs from PGC2

    View Slide

  34. DATA ANALYSIS
    34
    Region dependent eQTLs
    • 81,837 region-dependent eQTLs (FDR <1%) corresponding to 1,484 genes
    • Includes 5 PCG2 schizophrenia risk loci
    • We will soon update our eQTL browser at http://eqtl.brainseq.org/
    Emily E. Burke
    19394
    10734
    6493
    2622
    3937
    4398
    34259
    gene exon
    jxn
    eQTLs grouped by SNP id
    488
    246
    319
    18
    99
    63
    251
    gene exon
    jxn
    eQTLs grouped by gene id

    View Slide

  35. WRAPPING UP
    35
    Summary
    • Used conservative methods/options to reduce false positives
    • Quantified expression at different feature levels
    • Widespread development differences between HIPPO and DLPFC
    in postnatal life
    • Adapted the qSVA framework for 2 brain regions and set the
    ground work for N > 2
    • Results suggest regional specificity for case-control effects with
    enrichment towards genes with decreased expression in
    schizophrenia
    • Potential need to have regionally targeted therapies for
    schizophrenia because schizophrenia risk seems region-specific
    In progress: pre-print

    View Slide

  36. Acknowledgements
    • Leonardo Collado-Torres
    o [email protected]
    o @fellgernon
    • Emily E. Burke
    • Amy Peterson (JHU MPH class 2018)
    o amy-peterson.github.io
    • Joo Heon Shin
    • Stephen A. Semick
    • Anandita Rajpurohit
    • Courtney Williams
    • Ran Tao
    • Amy Deep-Soboslay
    • Thomas M. Hyde
    • Joel E. Kleinman
    • Daniel R. Weinberger+
    • Andrew E. Jaffe+
    o [email protected]
    o @andrewjaffe
    36
    • BrainSeq Consortium
    • LIBD @lieberinstitute
    Funding
    We are hiring! Multiple positions open

    View Slide