Slide 1

Slide 1 text

11 recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor Leonardo Collado-Torres @fellgernon sites.google.com/view/lalresearchgroup/our-weekly-webinar speakerdeck.com/lcolladotor/recount-webinar April 17, 2018

Slide 2

Slide 2 text

Reference genome Reads f1000research.com/articles/6-1558/v1

Slide 3

Slide 3 text

f1000research.com/articles/6-1558/v1

Slide 4

Slide 4 text

GTEx TCGA slide adapted from Shannon Ellis

Slide 5

Slide 5 text

SRA

Slide 6

Slide 6 text

http://rail.bio/ Slide adapted from Ben Langmead

Slide 7

Slide 7 text

http://blogs.citrix.com/2012/10/17/announcing-general-availability-of-sharefile-with-storagezones/

Slide 8

Slide 8 text

https://jhubiostatistics.shinyapps.io/recount/

Slide 9

Slide 9 text

jx 1 jx 2 jx 3 jx 4 jx 5 jx 6 Coverage Reads Gene Isoform 1 Isoform 2 Potential isoform 3 exon 1 exon 2 exon 3 exon 4 Expressed region 1: potential exon 5 f1000research.com/articles/6-1558/v1

Slide 10

Slide 10 text

f1000research.com/articles/6-1558/v1

Slide 11

Slide 11 text

exon 1 exon 2 exon 3 f1000research.com/articles/6-1558/v1

Slide 12

Slide 12 text

disjoint exon 1 disjoint exon 2 disjoint exon 3 f1000research.com/articles/6-1558/v1

Slide 13

Slide 13 text

f1000research.com/articles/6-1558/v1

Slide 14

Slide 14 text

f1000research.com/articles/6-1558/v1

Slide 15

Slide 15 text

> library('recount') > download_study( 'ERP001942', type='rse-gene') > load(file.path('ERP001942 ', 'rse_gene.Rdata')) > rse <- scale_counts(rse_gene) github.com/leekgroup/recount-analyses/

Slide 16

Slide 16 text

>library('recount') > download_study('SRP029880', type='rse-gene') > download_study('SRP059039', type='rse-gene') > load(file.path('SRP029880 ', 'rse_gene.Rdata')) > load(file.path('SRP059039', 'rse_gene.Rdata')) > mdat <- do.call(cbind, dat) github.com/leekgroup/recount-analyses/

Slide 17

Slide 17 text

Collado Torres et al. Nat. Biotech 2017

Slide 18

Slide 18 text

slide adapted from Kai Kammers Can combine with genotype data to identify eQTLs

Slide 19

Slide 19 text

biorxiv.org/content/early/2018/01/12/247346

Slide 20

Slide 20 text

http://www.rna-seqblog.com/ Can we use expression data to predict tissue? slide adapted from Shannon Ellis

Slide 21

Slide 21 text

Number of Regions 589 589 589 589 Number of Samples (N) 4,769 4,769 7,193 8,951 97.3% 96.5% 71.9% 50.6% Tissue prediction is accurate across data sets doi.org/10.1093/nar/gky102

Slide 22

Slide 22 text

Number of Regions 589 589 589 589 589 Number of Samples (N) 4,769 4,769 613 6,579 8,951 97.3% 96.5% 91.0% 70.2% Prediction is more accurate in healthy tissue 50.6% doi.org/10.1093/nar/gky102

Slide 23

Slide 23 text

> library('recount') > download_study( 'ERP001942', type='rse-gene') > load(file.path('ERP001942 ', 'rse_gene.Rdata')) > rse <- scale_counts(rse_gene) > rse_with_pred <- add_predictions(rse_gene) github.com/leekgroup/recount-analyses/

Slide 24

Slide 24 text

Ashkaun Razmara, in prep.

Slide 25

Slide 25 text

expression data for ~70,000 human samples samples phenotypes ? GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression sex tissue M Blood F Heart F Liver slide adapted from Shannon Ellis

Slide 26

Slide 26 text

Code Example: research.libd.org/recount-brain/example_PMI/example_PMI.html research.libd.org/recount-brain/example_PMI/example_PMI.Rmd Replicates part of the GTEx PMI paper by Ferreira et al. doi.org/10.1038/s41467-017-02772-x Ashkaun Razmara, in prep.

Slide 27

Slide 27 text

slide adapted from Jeff Leek

Slide 28

Slide 28 text

The recount2 team Hopkins Kai Kammers Shannon Ellis Margaret Taub Kasper Hansen Jeff Leek Ben Langmead OHSU Abhinav Nellore LIBD Leonardo Collado-Torres Andrew Jaffe recount-brain Ashkaun Razmara Funding and hosting NIH R01 GM105705 NIH 1R21MH109956 CONACyT 351535 AWS in Education Seven Bridges IDIES SciServer

Slide 29

Slide 29 text

expression data for ~70,000 human samples (Multiple) Postdoc positions available to - develop methods to process and analyze data from recount2 - use recount2 to address specific biological questions This project involves the Hansen, Leek, Langmead and Battle labs at JHU Contact: Kasper D. Hansen ([email protected] | www.hansenlab.org) Code for making a 5 min video on recount2: github.com/lcolladotor/biopeerprize2018