Slide 1

Slide 1 text

11 Reproducible RNA-seq analysis with Leonardo Collado-Torres @fellgernon #PSB19 Data Science I with Andrew Jaffe Slides: speakerdeck.com/lcolladotor/psb-recount2

Slide 2

Slide 2 text

https://jhubiostatistics.shinyapps.io/recount/ #PSB19

Slide 3

Slide 3 text

> library('recount') > download_study( 'ERP001942', type='rse-gene') > load(file.path('ERP001942 ', 'rse_gene.Rdata')) > rse <- scale_counts(rse_gene) github.com/leekgroup/recount-analyses/ #PSB19

Slide 4

Slide 4 text

doi.org/10.12688/f1000research.12223.1 Powered by @bioconductor #rstats #PSB19

Slide 5

Slide 5 text

jx 1 jx 2 jx 3 jx 4 jx 5 jx 6 Coverage Reads Gene Isoform 1 Isoform 2 Potential isoform 3 exon 1 exon 2 exon 3 exon 4 Expressed region 1: potential exon 5 doi.org/10.12688/f1000research.12223.1 #PSB19

Slide 6

Slide 6 text

slide adapted from Jeff Leek #PSB19

Slide 7

Slide 7 text

related projects • Bioconductor recountWorkflow: use and documentation doi.org/10.12688/f1000research.12223.1 • Snaptron by Christopher Wilks & Langmead: exon-exon junctions doi.org/10.1093/bioinformatics/btx547 • Shannon Ellis & Leek: phenotype prediction doi.org/10.1093/nar/gky102 • Jack Fu & Taub: transcript estimations biorxiv.org/content/early/2018/05/25/247346 • Madugundu & Pandey (JHU): proteomics • Luidy-Imada & Marchionni (JHU): cancer • Kuri-Magaña & Martínez-Barnetche (INSP Mexico): immune expression doi.org/10.3389/fimmu.2018.02679 • D. Zhang & S. Guelfi with Ryten (UCL) improving annotation biorxiv.org/content/early/2018/12/19/499103 #PSB19

Slide 8

Slide 8 text

expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis #PSB19 doi.org/10.1093/nar/gky102

Slide 9

Slide 9 text

Category Frequency F 95 female 2036 Female 51 M 77 male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$S ex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis #PSB19 doi.org/10.1093/nar/gky102

Slide 10

Slide 10 text

Goal : to accurately predict critical phenotype information for all samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 GTEx Genotype Tissue Expression Project N=9,662 divide samples build and optimize phenotype predictor training set predict phenotypes across SRA samples test accuracy of predictor predict phenotypes across samples in TCGA test set TCGA The Cancer Genome Atlas N=11,284 slide adapted from Shannon Ellis #PSB19 doi.org/10.1093/nar/gky102

Slide 11

Slide 11 text

Sex Female Male Age/Development Fetus Child Adolescent Adult Race/Ethnicity Asian Black Hispanic White Tissue Site 1 Cerebral cortex Hippocampus Brainstem Cerebellum Tissue Site 2 Frontal lobe Temporal lobe Midbrain Basal ganglia Tissue Site 3 Dorsolateral prefrontal cortex Superior temporal gyrus Substantia nigra Caudate Hemisphere Left Right Brodmann Area 1-52 Disease Status Disease Neurological control Disease Brain tumor Alzheimer’s disease Parkinson’s disease Bipolar disorder Tumor Type Glioblastoma Astrocytoma Oligodendroglioma Ependymoma Clinical Stage 1 Grade I Grade II Grade III Grade IV Clinical Stage 2 Primary Secondary Recurrent Viability Postmortem Biopsy Preparation Frozen Thawed Ashkaun Razmara, in prep. github.com/LieberInstitute/recount-brain #PSB19

Slide 12

Slide 12 text

The recount2 team Hopkins Kai Kammers Shannon E. Ellis Margaret Taub Kasper Hansen Jeff T. Leek Ben Langmead OHSU Abhinav Nellore LIBD Leonardo Collado-Torres Andrew E. Jaffe recount-brain Ashkaun Razmara Dustin J. Sokolowski Michael D. Wilson Sean Davis Funding NIH R01 GM105705 NIH 1R21MH109956 CONACyT 351535 AWS in Education Seven Bridges IDIES SciServer Hosting recount2 #PSB19

Slide 13

Slide 13 text

Research Symbiont awards: http://researchsymbionts.org/ Apply!!! #PSB19

Slide 14

Slide 14 text

expression data for ~70,000 human samples (Multiple) Postdoc positions available to - develop methods to process and analyze data from recount2 - use recount2 to address specific biological questions This project involves the Hansen, Leek, Langmead and Battle labs at JHU Contact: Kasper D. Hansen (khansen@jhsph.edu | www.hansenlab.org) #PSB19 Or ask me @fellgernon and I’ll put you in touch with the #recount2 PIs