$30 off During Our Annual Pro Sale. View Details »

liigh2019

 liigh2019

BrainSEQ Phase II and recount-brain presentation at LIIGH-UNAM 2019

Leonardo Collado-Torres

August 05, 2019
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. 11
    Analyzing BrainSEQ Phase II and generating the
    recount-brain resource
    Leonardo Collado-Torres
    Staff Scientist II
    @fellgernon
    Data Science I with Andrew Jaffe

    View Slide

  2. Questions I’ll answer today
    • Who am I?
    • What is the BrainSeq Phase II project?
    • What is the recount-brain resource?

    View Slide

  3. What defines me
    • Bioinformatics
    • R and Bioconductor
    • Reproducibility and best practices
    • Outreach and community building
    • Back in 2005 at @LCGUNAM:
    I like math and coding; biology provides the challenging problems

    View Slide

  4. History
    2005-2009
    Undergrad in
    Genomic Sciences
    2009-2011 2011-2016 August 2016+
    Data Science
    Division Leader
    !
    !
    PIs:
    • Jeff Leek: 2012+
    • Andrew Jaffe: 2013+
    Ph.D. Biostatistics Staff Scientist I → II
    Data Science Team I
    PI: Andrew Jaffe

    View Slide

  5. 2008+
    • BioC 2008-2011, 2014, 2017
    • useR!2013
    • rOpenSci unconf 2018
    • RStudio::conf 2019
    @fellgernon
    2010+
    Interests
    !
    @LIBDrstats
    2018+
    @CDSBMexico
    2018+
    Defunct: BmoreBiostats, Biostats Cultural Mixers
    Guest
    @RLadiesBmore
    Blog: http://lcolladotor.github.io
    2011+
    FB: 75k, Tw: 66k
    weekly

    View Slide

  6. Software tools I use every day
    Mercurial/git user 2009+
    https://github.com/lcolladotor
    From: Feb 21, 2019
    Communication tool for work and communities

    View Slide

  7. My path to LIBD
    2005-2009
    Undergrad in
    Genomic Sciences
    2009-2011 2011-2016 August 2016+
    Data Science
    Division Leader
    !
    !
    PIs:
    • Jeff Leek: 2012+
    • Andrew Jaffe: 2013+
    Ph.D. Biostatistics Staff Scientist
    Data Science Team I
    Microarrays,
    little RNA-seq
    N = 12
    (bacteria)
    N = 59
    N = 72
    potential for 1,000
    BSP2: N = 900
    + many more

    View Slide

  8. Ph.D. overview
    The goal was to develop statistical methods and software [...] in
    RNA-seq […]. We applied these methods to further our
    understanding of neuropsychiatric disorders using the Lieber Institute
    for Brain Development human brains collection (> 1000 samples).

    View Slide

  9. Lieber Institute for Brain Development
    • Role: Staff Scientist II
    • PI: Andrew Jaffe
    Team:
    • Staff Scientist: Emily Burke
    • Research Associate: Madhavi Tippani
    • Grad students: Matt Nguyen, Brianna Barry,
    Kira Perzel Mandell
    • Research Assistant: Nick Eagles
    • Recent Alumni: Amanda Price, Stephen Semick
    • Close collaborators: Carrie Wright, Nina Rajpurohit
    • Role details: like a postdoc (major role in some projects) with some support projects
    Lab at Amy Peterson’s MPH capstone presentation
    amy-peterson.github.io

    View Slide

  10. Papers since 2016

    View Slide

  11. Papers since 2016
    RNA-seq
    WGBS
    Biostatistics
    RNA-seq
    > 70,000 samples

    View Slide

  12. Lieber Institute for Brain Development
    • DNAm on WGBS across development & cell types biorxiv.org/content/early/2018/09/29/428391
    • RNA-seq from stem cells biorxiv.org/content/early/2018/07/31/380758
    Peer reviewed:
    • RNA-seq DE in Schizophrenia disorder & 2 brain regions doi.org/10.1016/j.neuron.2019.05.013
    • miRNA kit prep comparison doi.org/10.1186/s12864
    • DNAm and gene DE in Alzheimer’s disease doi.org/10.1007/s00401-019-01966-5
    • RNA-seq smoking during pregnancy doi.org/10.1038/s41380-018-0223-1
    • RNA-seq DE in Schizophrenia disorder on DLPFC doi.org/10.1038/s41593-018-0197-y
    • Histamine signaling in autism spectrum disorder doi.org/10.1038/tp.2017.87
    Pre-prints:

    View Slide

  13. Today
    BrainSeq Phase II
    brain

    View Slide

  14. View Slide

  15. BrainSeq: A Human Brain Genomics Consortium
    DLPFC
    495
    samples
    BrainSeq Phase I
    polyA+
    Jaffe et al., Nature Neuroscience, 2018
    DLPFC
    453
    samples
    HIPPO
    447
    samples
    BrainSeq Phase II
    RiboZero
    THE SCIENTIFIC FRONTIER
    Neuron, 2015
    Collado-Torres et al, bioRxiv, 2018 → Neuron, 2019

    View Slide

  16. DATA
    16
    BrainSeq Phase II RNA-seq samples
    DLPFC HIPPO total
    adult (age >= 18) 374 370 744
    prenatal 29 28 57
    0 <= age < 18 50 49 99
    total 453 447 900
    • Non-psychiatric control and schizophrenia affected individuals
    • Two brain regions: dorsolateral prefrontal cortex and hippocampus
    All
    samples

    View Slide

  17. DATA
    17
    BrainSeq Phase II RNA-seq samples: by case status
    DLPFC HIPPO total
    adult 222 238 460
    prenatal 29 28 57
    0 <= age < 18 49 48 97
    total 300 314 614
    DLPFC HIPPO total
    adult 152 132 284
    prenatal 0 0 0
    0 <= age < 18 1 1 2
    total 153 133 286
    Control
    Schizophrenia
    cases

    View Slide

  18. DATA ANALYSIS
    18
    Focus on being conservative
    1.Use well established processing methods
    2.Apply strict expression cutoffs
    3.Use replication when possible
    4.Adjust for RNA quality degradation confounding
    • Using the qSVA method
    5.Avoid potential batch effects
    • Drop problematic samples
    6.Take into account correlation at the individual level

    View Slide

  19. RNA-SEQ APPROACH
    19
    BrainSeq Phase II
    Pre-natal
    Adult
    ACTG
    Birth
    Unaffected
    Controls
    Patients with
    Schizophrenia
    RNA Sequencing
    Genotyping
    + +
    +
    Gene Exons Expressed Regions
    Transcripts Junctions
    Age
    CC CA AA
    SZ CONT
    DLPFC
    HIPPO
    + region differences
    For public data, check recount2

    View Slide

  20. DATA ANALYSIS
    20
    Main processing steps
    1. Quality check (QC) on raw reads (FastQC)
    2. Failed QC? Then trim reads (Trimmomatic)
    3. Align reads to the genome (HISAT2)
    4. Count features (featureCounts + others)
    5. Calculate coverage (bam2wig)
    6. Quantify transcripts (Salmon)
    7. Create count tables (R)
    8. Genotype samples (samtools + vcftools)
    Emily E. Burke
    Nextflow version in
    preparation with
    Winter Genomics
    and Nick Eagles

    View Slide

  21. DATA ANALYSIS
    21
    Filter features with low expression


















    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
    0.0 0.2 0.4 0.6 0.8 1.0
    20000 30000 40000 50000
    mean expression cutoff
    Number of features > cutoff
    Feature Cutoff Unit
    Gene 0.25 RPKM
    Exon 0.30 RPKM
    Jxn 0.46 RP10M
    Tx 0.38 TPM
    Gene
    • Used mean across all 900 samples
    • All analyses with filtered data
    Mean
    RPKM
    # genes >
    cutoff
    Based on the segmented R package:
    break-point estimation
    jaffelab::expression_cutoff()

    View Slide

  22. DATA ANALYSIS
    22
    Differential expression models
    • Region-specific for adult or fetal ages
    • Using only adult samples or only prenatal samples
    • Test for differences between DLFPC and HIPPO
    • Development
    • Linear age splines with breakpoints at developmental stages
    • Test for interaction between age and brain region at these splines
    • Case-control
    • By brain region
    • Test for differences between non-psychiatric controls and individuals with
    schizophrenia
    • For the first two models, we account for the fact that an individual can have two
    correlated samples: one for each brain region
    limma::duplicateCorrelation()

    View Slide

  23. DATA ANALYSIS
    23
    Differential expression models
    • Region-specific for adult or prenatal ages
    • Alternative: !"#$ = &' + )*+ + ,+" +
    ∑./0
    1 23#45. + 6789:)8+ + 898);<227*3+=>+3+ + :?@ + :+*793
    • Development
    • Alternative: !"#$ = &' + )*+ ∗ :+*793 + B+8); ∗ :+*793 +
    C7$8ℎ ∗ :+*793 + 73B)38 ∗ :+*793 + Eℎ7;= ∗ :+*793 +
    8++3 ∗ :+*793 + )=F;8 ∗ :+*793 + ,+" +
    ∑./0
    1 23#45. + 6789:)8+ + 898);<227*3+=>+3+ + :?@ + :+*793
    • Case-control
    • Alternative: !"#$ = &' + )*+ + ,+" + 6789:)8+ +
    $:@∑./0
    1 23#45. + 898);<227*3+=>+3+ + :?@ + $+*793,#+E7B7EK,L2 + M7)*39272

    View Slide

  24. DATA ANALYSIS
    24
    Using BrainSpan for replication: region-specific model
    • P-value < 0.05 in BrainSpan, consistent direction
    Similar results for the development model
    prenatal adult
    gene exon jxn
    p<0.05
    p<0.01
    p<0.001
    p<1e−04
    p<1e−05
    p<1e−06
    p<0.05
    p<0.01
    p<0.001
    p<1e−04
    p<1e−05
    p<1e−06
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    0.00
    0.25
    0.50
    0.75
    1.00
    p−value threshold
    Replication rate

    View Slide

  25. DATA ANALYSIS
    25
    Region-specific by age results
    • For adults: 1,612 DE genes with Bonferroni < 0.01 that replicate in BrainSpan
    • For prenatal samples: 32 DE genes
    23
    6
    2
    0
    3
    0
    3
    gene exon
    jxn
    DE features grouped by gene id (prenatal)
    328
    422
    778
    23
    839
    647
    388
    gene exon
    jxn
    DE features grouped by gene id (adult)

    View Slide

  26. DATA ANALYSIS
    26
    Region-specific by age results: adult enriched biological processes by region
    G: gene, E: exon, J: exon-exon junction
    D: DLPFC, H: HIPPO
    cell−matrix adhesion
    extracellular matrix organization
    ameboidal−type cell migration
    microtubule bundle formation
    axoneme assembly
    cilium movement
    establishment of synaptic vesicle localization
    synaptic vesicle transport
    synaptic vesicle localization
    signal release from synapse
    neurotransmitter secretion
    presynaptic process involved in chemical synaptic transmission
    positive regulation of GTPase activity
    regulation of GTPase activity
    synaptic vesicle exocytosis
    neurotransmitter transport
    regulation of calcium ion−dependent exocytosis
    synaptic vesicle cycle
    regulation of synapse organization
    positive regulation of synaptic transmission
    regulation of neuron projection development
    regulation of small GTPase mediated signal transduction
    extracellular structure organization
    positive regulation of nervous system development
    positive regulation of neurogenesis
    positive regulation of cell development
    axon development
    axonogenesis
    regulation of hormone levels
    cardiac chamber development
    cardiac septum development
    kidney epithelium development
    nephron development
    urogenital system development
    cardiac chamber morphogenesis
    cardiac septum morphogenesis
    renal system development
    kidney development
    regulation of transmembrane transport
    actomyosin structure organization
    regulation of trans−synaptic signaling
    modulation of chemical synaptic transmission
    synapse organization
    regulation of cell morphogenesis
    synaptic transmission, glutamatergic
    regulation of synaptic transmission, glutamatergic
    potassium ion transmembrane transport
    cellular potassium ion transport
    potassium ion transport
    G.D
    (609)
    G.H
    (573)
    E.D
    (1101)
    E.H
    (1087)
    J.D
    (850)
    J.H
    (823)
    GeneRatio
    0.02
    0.03
    0.04
    0.05
    0.06
    0.01
    0.02
    0.03
    0.04
    p.adjust
    ontology: BP

    View Slide

  27. DATA ANALYSIS
    27
    Development model: similar composition in prenatal across regions by RNA fraction







































































































































































































    ● ●















    ● ●


















    FQN Neurons Oligodendrocytes
    Prenatal
    Infant
    C
    hild
    Teen
    Adult
    50+
    Prenatal
    Infant
    C
    hild
    Teen
    Adult
    50+
    Prenatal
    Infant
    C
    hild
    Teen
    Adult
    50+
    0.0
    0.2
    0.4
    0.0
    0.2
    0.4
    0.6
    0.0
    0.2
    0.4
    0.6
    Cell Type Proportion
    Region
    DLPFC
    HIPPO

    View Slide

















  28. ● ●











































    Pluripotency Glial
    Fetal
    Infant
    C
    hild
    Teens
    Adult
    50+
    Fetal
    Infant
    C
    hild
    Teens
    Adult
    50+
    0.4
    0.6
    0.8
    0.0
    0.1
    0.2
    0.3
    Age Group
    Cell Type Proportion
    Region
    DLPFC
    Hippo
    DATA ANALYSIS
    28
    Development model: similar composition in prenatal across regions by DNAm
    Stephen A. Semick

    View Slide

  29. DATA ANALYSIS
    29
    Development model results
    • 5,982 (~55%) genes contain differentially expressed exons and splice junctions that
    replicated in BrainSpan (Bonferroni < 1%)
    2354
    2260
    1762
    243
    5982
    8501
    1558
    gene exon
    jxn
    DE features grouped by gene id

    View Slide

  30. DATA ANALYSIS
    30
    Development model results
    • Normalized expression over age for GABRD
    −2 0 2 4 6
    log2(CPM + 0.5)
    14 16 18 20 22
    PCW
    0.0 0.2 0.4 2 4 6 8 12 14 16 18 2020 30 40 50 50 55 60 65 70 75 80 85
    ENSG00000187730.7 GABRD p−bonf 0
    Age
    DLPFC HIPPO

    View Slide

  31. DATA ANALYSIS
    31
    Development model results
    • Normalized expression over age after removing effect of terms from the null model
    jaffelab::cleaningY() and jaffelab::agePlotter()
    −2 −1 0 1 2 3
    log2(CPM + 0.5) − covariate effects removed
    14 16 18 20 22
    PCW
    0.0 0.2 0.4 2 4 6 8 12 14 16 18 2020 30 40 50 50 55 60 65 70 75 80 85
    ENSG00000187730.7 GABRD p−bonf 0
    Age
    DLPFC HIPPO

    View Slide

  32. qSVA WORKFLOW
    32 Slide adapted from Amy Peterson
    Jaffe et al., PNAS, 2017
    Model 1 (6429 genes)
    Log2 FC Dx
    Log2 FC Degradation
    http://research.libd.org/rstatsclub/2018/12/11/quality-surrogate-variable-analysis/

    View Slide

  33. DEqual HIPPO
    33
    Model 1 (6429 genes)
    Model 1. Naïve model
    E / = 10
    + 11
    45
    DEqual plots demonstrate effectiveness of statistical correction
    r = 0.412
    Slide adapted from Amy Peterson
    Log2 FC Dx
    Log2 FC Degradation
    HIPPO
    333
    samples

    View Slide

  34. DEqual HIPPO
    34
    Model 1 (6429 genes) Model 2 (63 genes)
    Model 1. Naïve model
    E / = 10 + 1145
    Model 2. Added RNA-quality and demographic covariates
    E / = 10 + 1145 + 12FGH + 13JH5+ 14LMNOPFNH+ 15RPSTRFte + 16GHVHPFNH
    + 17PXS + ∑Z[\
    ] γMJV_`aM
    DEqual plots demonstrate effectiveness of statistical correction
    r = 0.412 r = 0.0712
    Slide adapted from Amy Peterson
    Log2 FC Dx Log2 FC Dx, adj. cov
    Log2 FC Degradation
    Log2 FC Degradation
    HIPPO
    333
    samples

    View Slide

  35. DEqual HIPPO
    35
    Model 1 (6429 genes) Model 2 (63 genes) Model 3 (48 genes)
    Model 1. Naïve model
    E / = 10 + 1145
    Model 2. Added RNA-quality and demographic covariates
    E / = 10 + 1145 + 12FGH + 13JH5+ 14LMNOPFNH+ 15RPSTRFte + 16GHVHPFNH
    + 17PXS + ∑Z[\
    ] γMJV_`aM
    Model 3. Added qSVs
    E / = 10 + 1145 + 12FGH + 13JH5+ 14LMNOPFNH+ 15RPSTRFte + 16GHVHPFNH
    + 17PXS + ∑Z[\
    ] γMJV_`aM
    + ∑Z[\
    d eMfghM
    DEqual plots demonstrate effectiveness of statistical correction
    HIPPO
    333
    samples
    r = 0.412 r = 0.0712 r = -0.00173
    Slide adapted from Amy Peterson
    Log2 FC Dx Log2 FC Dx, adj. cov Log2 FC Dx, adj
    qSVs
    Log2 FC Degradation
    Log2 FC Degradation
    Log2 FC Degradation

    View Slide

  36. DATA ANALYSIS
    36
    Case-control: by region
    • 48 DE genes at FDR <5% in hippocampus, 243 in DLPFC (FDR <5%)
    −6 −4 −2 0 2 4
    −6 −4 −2 0 2 4 6
    t−statistic HIPPO
    t−statistic DLPFC
    r = 0.276
    Amy Peterson
    DLPFC FDR<10%, HIPPO FDR<20%

    View Slide

  37. DATA ANALYSIS
    37
    Case-control: by region
    −6 −4 −2 0 2 4 6
    −6 −4 −2 0 2 4 6
    t−statistic DLPFC
    t−statistic BSP1
    r = 0.413

    View Slide

  38. axon development
    axonogenesis
    regulation of axonogenesis
    regulation of neuron projection development
    neutrophil chemotaxis
    regulation of B cell proliferation
    regulation of B cell activation
    positive regulation of B cell activation
    B cell receptor signaling pathway
    B cell activation
    lymphocyte activation
    regulation of leukocyte cell−cell adhesion
    regulation of cell−cell adhesion
    positive regulation of alpha−beta T cell activation
    regulation of leukocyte activation
    positive regulation of lymphocyte activation
    positive regulation of cell activation
    positive regulation of leukocyte activation
    leukocyte migration
    positive regulation of leukocyte cell−cell adhesion
    positive regulation of T cell activation
    leukocyte chemotaxis
    response to organophosphorus
    ATF6−mediated unfolded protein response
    positive regulation of cell adhesion
    cell chemotaxis
    positive regulation of cell−cell adhesion
    protein folding
    protein folding in endoplasmic reticulum
    HgC
    (137)
    HeC
    (306)
    DgC
    (297)
    DeS
    (250)
    GeneRatio
    0.025
    0.050
    0.075
    0.100
    0.01
    0.02
    0.03
    0.04
    p.adjust
    ontology: BP
    DATA ANALYSIS
    38
    • BP enrichment in Control > SCZD at gene level
    • Immune processes
    g: gene, e: exon, j: exon-exon junction
    D: DLPFC, H: HIPPO, C: control, S: SCZD
    Case-control: by region

    View Slide

  39. Gene Individual 1 Individual 2
    1 10 15
    2 5 22

    DLPFC
    Gene Individual 1 Individual 2
    1 10 17
    2 6 16

    HIPPO
    correlation( (10, 5, …), (10, 6, …) ) =~ 0.9
    correlation( (15, 22, …), (17, 16, …) ) =~ 0.7
    Individual 1
    Individual 2
    Control vs SCZD

    View Slide

  40. Gene: 0.0164
    Exon: 0.0499
    Jxn: 1.72 * 10-5
    Tx: 0.00992
    Control SCZD
    0.54 0.55 0.56 0.57
    cleaned expr (keeping Dx) − jxnRp10m
    p−value: 1.72e−05
    SCZD diagnosis
    Correlation
    Decreased coherence in SCZD

    View Slide

  41. DATA ANALYSIS
    41
    HIPPO eQTLs
    • 11,237,357 eQTL associations (FDR <1%) across genes, exons and junctions
    corresponding to 17,719 genes
    Emily E. Burke
    2061
    3183
    2163
    274
    5945
    2915
    1178
    gene exon
    jxn
    eQTLs grouped by gene id
    Includes 60 risk SNPs from PGC2
    8
    +,332
    E
    0 1 2
    −1 0 1 2 3 4
    chr2:5822204358229ï-[Q
    rs74563533:58250433:G:A
    r2:58 250 433
    Residualized Expression







    ● ●










































































































































    ●●







































    ● ●
























































    ● ●



    ● ●





    ● ●
















    ● ●


    ●●




































    ●●



























    ● ●











    ● ●




























    ● ●


    0 1 2
    −1 0 1 2 3 4
    chr2:5822204358229ï-[Q
    rs75575209:58138192:A:T
    r2:58 138 192
    Residualized Expression



    ● ●
































    ●●
    ● ●






    ● ●









    ●●





























    ●●























































    ● ●

    ●●




































    ● ●
























































    ●●


    ● ●


    ● ●























    ● ●







    ● ●

















    ●●






    ● ●
    ● ●



























    ● ●























    ● ● ●








    ●●




    ● ● ●


    p=5.75e−34
    p=3.82e−81
    F

    View Slide

  42. DATA ANALYSIS
    42
    Region dependent eQTLs
    • 205,618 region-dependent eQTLs (FDR <1%) corresponding to 1,484 genes
    • Includes 3 PCG2 schizophrenia risk loci
    Emily E. Burke
    19394
    10734
    6493
    2622
    3937
    4398
    34259
    gene exon
    jxn
    eQTLs grouped by SNP id
    488
    246
    319
    18
    99
    63
    251
    gene exon
    jxn
    eQTLs grouped by gene id

    View Slide

  43. DATA ANALYSIS Region dependent eQTLs
    0.D 1.D 2.D 0.H 1.H 2.H
    −1 0 1 2 3
    Overlaps gene NRGN
    chr11:124742693−124745500(+) (Jxn)
    rs12293670:124612932:A:G
    Residualized Expression
    p=5.27e−26

    View Slide

  44. A B
    0.D 1.D 2.D 0.H 1.H 2.H
    4.0 4.4 4.8 5.2
    LRP1(16(
    (Exon)
    rs324015
    Residualized Expression
    p=3.91e−05
    0.D 1.D 2.D 0.H 1.H 2.H
    −1 0 1 2 3
    2YHUODSVJHQH15*1
    FKUï-[Q
    rs12293670:124612932:A:G
    Residualized Expression
    p=5.27e−26
    C D
    0.D 1.D 2.D 0.H 1.H 2.H
    −1 0 1 2 3 4
    2YHUODSVJHQH1*()
    FKUïï-[Q
    rs4144797:233562197:T:C
    Residualized Expression
    p=3.5e−14
    0.D 1.D 2.D 0.H 1.H 2.H
    6.8 7.2 7.6 8.0
    LRP1
    FKUï-[Q
    rs324015
    Residualized Expression
    p=2.02e−06
    D: DLPFC, H: HIPPO
    DATA ANALYSIS
    Region dependent eQTLs
    These 3 SNPs are also
    meQTLs (Jaffe et al, 2016)

    View Slide

  45. Overall eQTLs at FDR < 1% - rAggr eQTL analysis with PCG2 risk SNPs
    Hippocampus DLPFC
    # unique SNPs (unique index SNPs) 5510 (103) 6780 (116)
    # unique features 1731 2525
    # Unique genes 123 171
    # Unique transcripts 244 332
    # Unique exons 857 1363
    # Unique junctions 507 659
    Emily E. Burke
    21 8
    95
    DLPFC HIPPO
    In BSP2: 163/179 (91.1%)
    FDR <1%: 124/163 (76.6%)

    View Slide

  46. • We will soon update our eQTL browser at http://eqtl.brainseq.org/ Bill Ulrich

    View Slide

  47. TWAS
    Gusev et al, Nature Genetics, 2016

    View Slide

  48. TWAS
    SNP ID chromosome position
    rs101 12 1
    rs102 13 10000

    SNP ID chromosome position
    rs101:A:T 12 1
    rs102 13 10005

    SNP ID chromosome position
    rs101 12 1
    rs102 13 10000

    LD reference LIBD BSP2 SNPs
    GWAS summary SNPs
    https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/README.md
    1,190,321 82,393,211
    6,500,475
    CLOZUK+PGC2

    View Slide

  49. TWAS
    SNP ID chromosome position
    rs101 12 1
    rs102 13 10000

    SNP ID chromosome position
    rs101:A:T 12 1
    rs102 13 10005

    SNP ID chromosome position
    rs101 12 1
    rs102 13 10000

    LD reference LIBD BSP2 SNPs
    GWAS summary SNPs
    https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/README.md
    1,190,321 82,393,211
    6,500,475
    CLOZUK+PGC2

    View Slide

  50. TWAS
    SNP ID chromosome position
    rs101:A:T 12 1
    rs102 13 10005

    SNP ID chromosome position
    rs101:A:T 12 1
    rs102 13 10005

    SNP ID chromosome position
    rs101:A:T 12 1
    rs102 13 10005

    LD reference LIBD BSP2 SNPs
    GWAS summary SNPs
    https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/README.md
    1,022,527
    1,022,546
    4,316,013
    CLOZUK+PGC2

    View Slide

  51. TWAS
    DLPFC HIPPO
    gene 7,683 (32.8%) 5,866 (25.1%)
    exon 61,678 (16.2%) 47,583 (12.5%)
    jxn 33,754 (12.5%) 27,088 (10.1%)
    tx 14,118 (15.9%) 11,567 (13%)
    Features with TWAS weights
    Removed sex, snp PCs 1-5, expr PCs
    Kept diagnosis
    jaffelab::cleaningY()

    View Slide

  52. gene exon jxn tx
    Other Risk Locus
    −10 −5 0 5 −10 −5 0 5 −10 −5 0 5 −10 −5 0 5
    −5
    0
    5
    −5
    0
    5
    DLPFC
    HIPPO
    in both
    FALSE
    TRUE
    FDR <5%
    None
    DLPFC
    HIPPO
    Both
    TWAS Z by brain region
    TWAS

    View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. View Slide

  72. View Slide

  73. View Slide

  74. View Slide

  75. View Slide

  76. View Slide

  77. View Slide

  78. View Slide

  79. View Slide

  80. View Slide

  81. View Slide

  82. View Slide

  83. View Slide

  84. View Slide

  85. View Slide