Reproducibility and best practices • Outreach and community building • Back in 2005 at @LCGUNAM: I like math and coding; biology provides the challenging problems
Data Science Division Leader ! ! PIs: • Jeff Leek: 2012+ • Andrew Jaffe: 2013+ Ph.D. Biostatistics Staff Scientist I → II Data Science Team I PI: Andrew Jaffe
2011-2016 August 2016+ Data Science Division Leader ! ! PIs: • Jeff Leek: 2012+ • Andrew Jaffe: 2013+ Ph.D. Biostatistics Staff Scientist Data Science Team I Microarrays, little RNA-seq N = 12 (bacteria) N = 59 N = 72 potential for 1,000 BSP2: N = 900 + many more
software [...] in RNA-seq […]. We applied these methods to further our understanding of neuropsychiatric disorders using the Lieber Institute for Brain Development human brains collection (> 1000 samples).
• PI: Andrew Jaffe Team: • Staff Scientist: Emily Burke • Research Associate: Madhavi Tippani • Grad students: Matt Nguyen, Brianna Barry, Kira Perzel Mandell • Research Assistant: Nick Eagles • Recent Alumni: Amanda Price, Stephen Semick • Close collaborators: Carrie Wright, Nina Rajpurohit • Role details: like a postdoc (major role in some projects) with some support projects Lab at Amy Peterson’s MPH capstone presentation amy-peterson.github.io
development & cell types biorxiv.org/content/early/2018/09/29/428391 • RNA-seq from stem cells biorxiv.org/content/early/2018/07/31/380758 Peer reviewed: • RNA-seq DE in Schizophrenia disorder & 2 brain regions doi.org/10.1016/j.neuron.2019.05.013 • miRNA kit prep comparison doi.org/10.1186/s12864 • DNAm and gene DE in Alzheimer’s disease doi.org/10.1007/s00401-019-01966-5 • RNA-seq smoking during pregnancy doi.org/10.1038/s41380-018-0223-1 • RNA-seq DE in Schizophrenia disorder on DLPFC doi.org/10.1038/s41593-018-0197-y • Histamine signaling in autism spectrum disorder doi.org/10.1038/tp.2017.87 Pre-prints:
processing methods 2.Apply strict expression cutoffs 3.Use replication when possible 4.Adjust for RNA quality degradation confounding • Using the qSVA method 5.Avoid potential batch effects • Drop problematic samples 6.Take into account correlation at the individual level
Unaffected Controls Patients with Schizophrenia RNA Sequencing Genotyping + + + Gene Exons Expressed Regions Transcripts Junctions Age CC CA AA SZ CONT DLPFC HIPPO + region differences For public data, check recount2
on raw reads (FastQC) 2. Failed QC? Then trim reads (Trimmomatic) 3. Align reads to the genome (HISAT2) 4. Count features (featureCounts + others) 5. Calculate coverage (bam2wig) 6. Quantify transcripts (Salmon) 7. Create count tables (R) 8. Genotype samples (samtools + vcftools) Emily E. Burke Nextflow version in preparation with Winter Genomics and Nick Eagles
or fetal ages • Using only adult samples or only prenatal samples • Test for differences between DLFPC and HIPPO • Development • Linear age splines with breakpoints at developmental stages • Test for interaction between age and brain region at these splines • Case-control • By brain region • Test for differences between non-psychiatric controls and individuals with schizophrenia • For the first two models, we account for the fact that an individual can have two correlated samples: one for each brain region limma::duplicateCorrelation()
1,612 DE genes with Bonferroni < 0.01 that replicate in BrainSpan • For prenatal samples: 32 DE genes 23 6 2 0 3 0 3 gene exon jxn DE features grouped by gene id (prenatal) 328 422 778 23 839 647 388 gene exon jxn DE features grouped by gene id (adult)
processes by region G: gene, E: exon, J: exon-exon junction D: DLPFC, H: HIPPO cell−matrix adhesion extracellular matrix organization ameboidal−type cell migration microtubule bundle formation axoneme assembly cilium movement establishment of synaptic vesicle localization synaptic vesicle transport synaptic vesicle localization signal release from synapse neurotransmitter secretion presynaptic process involved in chemical synaptic transmission positive regulation of GTPase activity regulation of GTPase activity synaptic vesicle exocytosis neurotransmitter transport regulation of calcium ion−dependent exocytosis synaptic vesicle cycle regulation of synapse organization positive regulation of synaptic transmission regulation of neuron projection development regulation of small GTPase mediated signal transduction extracellular structure organization positive regulation of nervous system development positive regulation of neurogenesis positive regulation of cell development axon development axonogenesis regulation of hormone levels cardiac chamber development cardiac septum development kidney epithelium development nephron development urogenital system development cardiac chamber morphogenesis cardiac septum morphogenesis renal system development kidney development regulation of transmembrane transport actomyosin structure organization regulation of trans−synaptic signaling modulation of chemical synaptic transmission synapse organization regulation of cell morphogenesis synaptic transmission, glutamatergic regulation of synaptic transmission, glutamatergic potassium ion transmembrane transport cellular potassium ion transport potassium ion transport G.D (609) G.H (573) E.D (1101) E.H (1087) J.D (850) J.H (823) GeneRatio 0.02 0.03 0.04 0.05 0.06 0.01 0.02 0.03 0.04 p.adjust ontology: BP
contain differentially expressed exons and splice junctions that replicated in BrainSpan (Bonferroni < 1%) 2354 2260 1762 243 5982 8501 1558 gene exon jxn DE features grouped by gene id
al., PNAS, 2017 Model 1 (6429 genes) Log2 FC Dx Log2 FC Degradation http://research.libd.org/rstatsclub/2018/12/11/quality-surrogate-variable-analysis/
model E / = 10 + 11 45 DEqual plots demonstrate effectiveness of statistical correction r = 0.412 Slide adapted from Amy Peterson Log2 FC Dx Log2 FC Degradation HIPPO 333 samples
development neutrophil chemotaxis regulation of B cell proliferation regulation of B cell activation positive regulation of B cell activation B cell receptor signaling pathway B cell activation lymphocyte activation regulation of leukocyte cell−cell adhesion regulation of cell−cell adhesion positive regulation of alpha−beta T cell activation regulation of leukocyte activation positive regulation of lymphocyte activation positive regulation of cell activation positive regulation of leukocyte activation leukocyte migration positive regulation of leukocyte cell−cell adhesion positive regulation of T cell activation leukocyte chemotaxis response to organophosphorus ATF6−mediated unfolded protein response positive regulation of cell adhesion cell chemotaxis positive regulation of cell−cell adhesion protein folding protein folding in endoplasmic reticulum HgC (137) HeC (306) DgC (297) DeS (250) GeneRatio 0.025 0.050 0.075 0.100 0.01 0.02 0.03 0.04 p.adjust ontology: BP DATA ANALYSIS 38 • BP enrichment in Control > SCZD at gene level • Immune processes g: gene, e: exon, j: exon-exon junction D: DLPFC, H: HIPPO, C: control, S: SCZD Case-control: by region