Slide 1

Slide 1 text

Ben Langmead Assistant Professor, JHU Computer Science [email protected], langmead-lab.org, @BenLangmead IBM Research, Almaden Making the Most of Petabases of Genomic Data October 25, 2018

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

2nd-gen sequencing: the (Lego) Movie bit.ly/2genseq_1 bit.ly/2genseq_2 bit.ly/2genseq_3 T G CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT CCATAGTATATCTCGGCTCTAGGCCCTCATTTTTT CCATAGTA TATCTCGG CTCTAGGCCCTC ATTTTTT CCA TAGTATAT CTCGGCTCTAGGCCCTCA TTTTTT CCATAGTAT ATCTCGGCTCTAG GCCCTCA TTTTTT CCATAG TATATCT CGGCTCTAGGCCCT CATTTTTT C C A T A G C A DNA polymerase

Slide 5

Slide 5 text

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG Input DNA GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG Reads

Slide 6

Slide 6 text

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG TAGCATTGCGAGACG GGTATGCACGCGATA TGGAGCCGGAGCACC CGCTGGAGCCGGAGC Input DNA Reads

Slide 7

Slide 7 text

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG TAGCATTGCGAGACG GGTATGCACGCGATA TGGAGCCGGAGCACC CGCTGGAGCCGGAGC TGTCTTTGATTCCTG CGCGATAGCATTGCG GCATTGCGAGACGCT CCTATGTCGCAGTAT Input DNA Reads

Slide 8

Slide 8 text

CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTG GTATGCACGCGATAG TATGTCGCAGTATCT CACCCTATGTCGCAG GAGACGCTGGAGCCG TAGCATTGCGAGACG GGTATGCACGCGATA TGGAGCCGGAGCACC CGCTGGAGCCGGAGC TGTCTTTGATTCCTG CGCGATAGCATTGCG GCATTGCGAGACGCT CCTATGTCGCAGTAT GACGCTGGAGCCGGA GCACCCTATGTCGCA GTATCTGTCTTTGAT CCTCATCCTATTATT TATCGCACCTACGTT CAATATTCGATCATG GATCACAGGTCTATC ACCCTATTAACCACT TGCATTTGGTATTTT CGTCTGGGGGGTATG CACGCGATAGCATTG GTATGCACGCGATAG ACCTACGTTCAATAT TATTTATCGCACCTA CCACTCACGGGAGCT GCGAGACGCTGGAGC CTATCACCCTATTAA CTGTCTTTGATTCCT ACTCACGGGAGCTCT CCTACGTTCAATATT GCACCTACGTTCAAT GTCTGGGGGGTATGC AGCCGGAGCACCCTA GACGCTGGAGCCGGA GCACCCTATGTCGCA GTATCTGTCTTTGAT CCTCATCCTATTATT TATCGCACCTACGTT CAATATTCGATCATG GATCACAGGTCTATC ACCCTATTAACCACT CACGGGAGCTCTCCA TGCATTTGGTATTTT CGTCTGGGGGGTATG CACGCGATAGCATTG CACGGGAGCTCTCCA Input DNA Reads

Slide 9

Slide 9 text

100 nt 100,000,000 nt Input DNA Reads

Slide 10

Slide 10 text

Input DNA 100 nt 100,000,000 nt ? Reads

Slide 11

Slide 11 text

Input DNA 100,000,000 nt ? Reference genome + Reads

Slide 12

Slide 12 text

Input DNA Reads Reference genome +

Slide 13

Slide 13 text

Sequence Read Archive Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018 May;19(5):325. Currently ~ 20 petabases

Slide 14

Slide 14 text

Lab goals Efficient Scalable Interpretable Software: Topics: Bowtie 1&2, Arioc, Dashing applied algorithms, text indexing, sketching, thread scaling Rail-RNA, recount2, Snaptron, Boiler parallel and high-performance computing, cloud computing, indexing To make high-throughput life science data as usable as possible for scientific labs, especially small ones Qtip, FORGe modeling mapping quality, graph- genome variants, addressing biases Software: Topics: Software: Topics:

Slide 15

Slide 15 text

Themes • Cloud computing & supercomputing are poised to add big value to archived sequencing data • Archives can tell us how much we don't know about something • Archives can generate hypotheses, inform experimental design, even validate results • When one door opens, another one opens

Slide 16

Slide 16 text

Sequence Read Archive Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018 May;19(5):325. Currently ~ 20 petabases

Slide 17

Slide 17 text

An index is a great leveler GB Shaw Even a summary would be an improvement Not GB Shaw

Slide 18

Slide 18 text

Public summaries Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018 Apr;19(4):208-219.

Slide 19

Slide 19 text

Indexing raw sequencing data Mantis. Ferdman, M., Johnson, R., & Patro, R. Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index. In Research in Computational Molecular Biology (p. 271). Springer. BIGSI: Bradley, P., den Bakker, H., Rocha, E., McVean, G., & Iqbal, Z. (2017). Real-time search of all bacterial and viral genomic data. bioRxiv, 234955. Image from Mantis paper Image from Split SBT paper Sequence Bloom Trees. Solomon B, Kingsford C. Fast search of thousands of short-read sequencing experiments. Nat Biotechnol. 2016 Mar;34(3):300-2. Solomon B, Kingsford C. Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees. J Comput Biol. 2018 Mar 12. Sun C, Harris RS, Chikhi R, Medvedev P. AllSome Sequence Bloom Trees. J Comput Biol. 2018 May; 25(5):467-479. 1000 Genomes FM Index: Dolle DD, Liu Z, Cotten M, Simpson JT, Iqbal Z, Durbin R, McCarthy SA, Keane TM. Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes. Genome Res. 2017 Feb;27(2):300-309.

Slide 20

Slide 20 text

A search engine for RNA-seq Snaptron Index & query engine w/ REST API • snaptron.cs.jhu.edu • doi:10.1093/bioinformatics/btx547 Summaries of data, metadata, packaged as R objects • jhubiostatistics.shinyapps.io/recount/ • doi:10.1038/nbt.3838 Scalable, cloud-based spliced alignment of archived RNA-seq datasets • rail.bio • doi:10.1093/bioinformatics/btw575

Slide 21

Slide 21 text

RNA-seq Picture from: Roy H, Ibba M. Molecular biology: sticky end in protein synthesis. Nature. 2006 Sep 7;443(7107):41-2. DNA RNA Protein Transcription Translation

Slide 22

Slide 22 text

Splicing gene Intron Exon Exon

Slide 23

Slide 23 text

Splicing AGGGCTGGGCATAAAAGTCAGGGCAGAGCCATCTATTGCTTACATTTGCTTCTGACACAACTGTGTTCACTAGCAAC CTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTG GATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGTTTAAGGAGACCAATAGAAACTG GGCATGTGGAGACAGAGAAGACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACC CTTAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGT TATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACA ACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGGTG AGTCTATGGGACGCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCATGTCATAGGAAGGGGATAAGTAA CAGGGTACAGTTTAGAATGGGAAACAGACGAATGATTGCATCAGTGTGGAAGTCTCAGGATCGTTTTAGTTTCTTTT ATTTGCTGTTCATAACAATTGTTTTCTTTTGTTTAATTCTTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTTACTAT TATACTTAATGCCTTAACATTGTGTATAACAAAAGGAAATATCTCTGAGATACATTAAGTAACTTAAAAAAAAACTT TACACAGTCTGCCTAGTACATTACTATTTGGAATATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATT TTCTTTTATTTTTAATTGATACATAATCATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACACA TATTGACCAAATCAGGGTAATTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAATATACTTTTTTGTTTAT CTTATTTCTAATACTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTTTGCACCATT CTAAAGAATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATCTCTGCATATAAATATTTCTGCATATAAATT GTAACTGATGTAAGAGGTTTCATATTGCTAATAGCAGCTACAATCCAGCTACCATTCTGCTTTTATTTTATGGTTGG GATAAGGCTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGCTAATCATGTTCATACCTCTTATCTTCCTCCCACAG CTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTA TCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTC TATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTC intron 1 intron 2 exon 1 exon 2 exon 3 ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGC TGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGG CAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTG CACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTG CCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAA exon 1 exon 2 exon 3

Slide 24

Slide 24 text

Alternative splicing Genes can have many isoforms Exons can be independently included/excluded; boundaries can shift

Slide 25

Slide 25 text

Gene annotation Gene annotation:curated collection of isoforms UCSC genome browser

Slide 26

Slide 26 text

Abhinav Nellore OHSU Jeff Leek, JHU Image by Rgocs http://rail.bio Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J, Morton J, Leek JT, Langmead B. Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics. 2016 Sep 4.

Slide 27

Slide 27 text

Spliced RNA-seq aligner for analyzing many samples at once • Aggregate across samples to borrow strength and eliminate redundant alignment work • Let data prune false junction calls, not annotation • Concise outputs: junctions, junction evidence, coverage vectors; no alignments, unless asked for • Ready for commercial AWS cloud, other clusters http://rail.bio Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J, Morton J, Leek JT, Langmead B. Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics. 2016 Sep 4.

Slide 28

Slide 28 text

dbGaP http://docs.rail.bio/dbgap/ Nellore A, Wilks C, Hansen KD, Leek JT, Langmead B. Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce. Bioinformatics. 2016 Aug 15;32(16):2551-3.

Slide 29

Slide 29 text

Toward recount2 • Analyzed ~21,500 human RNA-seq samples with Rail-RNA; about 62 Tbp • http://github.com/nellore/runs • ~ $0.72 / sample (Compare to sequencing costs) (Commands we used to run on AWS) jxs samples http://intropolis.rail.bio Nellore A, et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 2016 Dec 30;17(1):266.

Slide 30

Slide 30 text

a 0 2000 4000 6000 8000 10000 12000 14000 0 100000 200000 300000 400000 500000 600000 700000 Minimum number S of samples in which jx is called Junction (jx) count J 18.6% 56,861 jx 100% 96.5% 81.4% 85.8% Novel Alternative donor/acceptor Exon skip Fully annotated 800 900 1000 1100 1200 240000 260000 280000 300000 320000 b 8000 10000 samples c 2500 3000 Annotation includes: UCSC, GENCODE v19 & v24, RefSeq, CCDS, MGC, lincRNAs, SIB genes, AceView, Vega http://intropolis.rail.bio Nellore A, et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 2016 Dec 30;17(1):266.

Slide 31

Slide 31 text

• Discovery of new splicing has leveled off • Time ripe for a more complete annotation? http://intropolis.rail.bio Nellore A, et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 2016 Dec 30;17(1):266. Toward recount2

Slide 32

Slide 32 text

recount2 • >50K human RNA-seq samples from SRA (open) • >10K human RNA-seq samples spanning cancer types in The Cancer Genome Atlas (dbGaP) Image: https://www.sevenbridges.com/welcome-to-the-cancer-genomics-cloud-2/ • >10K human RNA-seq samples from Genotype-Tissue Expression (GTEx) project (dbGaP) • Total: ~4.4 trillion reads, 100s of terabases Image: doi:10.1038/ng.2653 Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. Reproducible RNA-seq analysis using recount2. Nature Biotechnology. 2017 Apr 11;35(4):319-321.

Slide 33

Slide 33 text

Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. Reproducible RNA-seq analysis using recount2. Nature Biotechnology. 2017 Apr 11;35(4):319-321. https://jhubiostatistics.shinyapps.io/recount/ recount2 Leo Collado Torres Abhinav Nellore

Slide 34

Slide 34 text

Search engine for RNA-seq Snaptron

Slide 35

Slide 35 text

Snaptron Query planner delegates query components to appropriate systems (sqlite, tabix, lucene) and indexes (R-tree, B-tree, Lucene inverted text index) Chris Wilks Sample Filter 8 Region Limited Region Limited & Filtered Region Junction Records Sample Metadata Records Junction Records Filtered Region Filtered Samples Snaptron Query Planner Query Data Store/Index Output 1 2 6 7 3 9 4 5 10 11 12 13 4 7 3 1 2 8 5 6 Sample Metadata Terms Samples "Brain" 1,2,3,6 "Liver" 4,6,9,11 Sample Filter Tabix/R-tree Index Lucene/Inverted Document Index SQLite/B-tree Index Wilks C, Gaddipati P, Nellore A, Langmead B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics. 2018 Jan 1;34(1):114-116.

Slide 36

Slide 36 text

Snaptron Provides command-line tool and REST API for querying junctions, gene & exon expression, coverage Wilks C, Gaddipati P, Nellore A, Langmead B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics. 2018 Jan 1;34(1):114-116.

Slide 37

Slide 37 text

Snaptron • How prevalent is each junction in gene ABCD3 in each of 50K public datasets? • What is a junction's tissue specificity in the GTEx dataset? • In which samples is splicing pattern A overrepresented relative to B? Example queries http://snaptron.cs.jhu.edu Wilks C, Gaddipati P, Nellore A, Langmead B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics. 2018 Jan 1;34(1):114-116.

Slide 38

Slide 38 text

Snaptron case studies ● ● ● ● ● ● ● ● ● ● 0 5000 10000 15000 20000 GTEx SRAv2 Data compilation Shared sample count (SSC) Validation Failed Passed A. ABCD3 B. KMT2E 3 1 2 1 2 3 C. ALKATI 1 2 3 4 Wilks C, Gaddipati P, Nellore A, Langmead B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics. 2018 Jan 1;34(1):114-116.

Slide 39

Slide 39 text

In the field: splicing factors Dr. Ling studies how splicing factors affect certain cryptic splicing patterns • cryptic: infrequent, not conserved, "shouldn't happen" Jonathan Ling TDP-43 Seth Blackshaw

Slide 40

Slide 40 text

In the field: splicing factors Ling JP, Pletnikova O, Troncoso JC, Wong PC. TDP-43 repression of nonconserved cryptic exons is compromised in ALS-FTD. Science. 2015 Aug 7;349(6248):650-5.

Slide 41

Slide 41 text

In the field: splicing factors splicing factors splicing patterns

Slide 42

Slide 42 text

Ling JP, Wilks C, Charles R, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of photoreceptor-specific splicing. In preparation. Rods have characteristic patterns of exon usage Rod photoreceptors

Slide 43

Slide 43 text

Rod photoreceptors Exon usage is a useful cell-type signature; often not visible at the gene level Ling JP, Wilks C, Charles R, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of photoreceptor-specific splicing. In preparation.

Slide 44

Slide 44 text

Certain exons are used only in rods Ling JP, Wilks C, Charles R, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of photoreceptor-specific splicing. In preparation. Rod photoreceptors

Slide 45

Slide 45 text

Certain splicing factors are specific to rods -- could they drive rod-specific splicing? Rod photoreceptors Ling JP, Wilks C, Charles R, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of photoreceptor-specific splicing. In preparation.

Slide 46

Slide 46 text

Rod photoreceptors Ling JP, Wilks C, Charles R, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of photoreceptor-specific splicing. In preparation. Up-regulating those splicing factors yields rod-like splicing

Slide 47

Slide 47 text

Future: public data Rod photoreceptor study involved >90K public datasets Most figures I showed used public data only Desire: querying public data = everyday activity in bio research • "Leveler" in a field of haves & have nots One of the best ways for a neuroscientist like me to keep up to date with what colleagues are working on is to attend confer- ences. But on recent trips I have noticed a problem. Too few researchers are consulting and using publicly available data — my own included. What is going on? Massive amounts of biological information are being accumu- discrepancy, and propose a biologically valid reason for it. Why are so many bench biologists overlooking this wealth of cell-type-specific expression data? My hunch is there are two reasons. First, researchers under estimate how many of these data have been published over the past few years because they are being generated across so many different fields. Don’t let useful data go to waste Researchers must seek out others’ deposited biological sequences in community databases, urges Franziska Denk. MEGHNA ABRAHAM WORLD VIEW A personal take on events

Slide 48

Slide 48 text

Future: cloud computing Clouds are a natural fit for reanalyzing public data and for far-flung genomics collaborations • Elasticity, security, reproducibility, less copying Next-generation sequencing (NGS) technologies have been improving rapidly and have become the work- horse technology for studying nucleic acids. NGS plat- forms work by collecting information on a large array of poly merase reactions working in parallel, up to bil- lions at a time inside a single sequencer1. The speed and decreasing cost of NGS have led to the rapid accu- mulation of raw sequencing data (sequencing reads), used in published studies, in public archives2 such as programme17, among others (TABLE 1). gnomAD now spans over 120,000 exomes and over 15,000 whole genomes. ICGC encompasses over 70 subprojects target- ing distinct cancer types, which are conducted in more than a dozen countries and have already collected sam- ples from more than 20,000 donors. Aligned sequenc- ing reads for ICGC require over 1 petabyte (PB; that is, a million GB) of storage. The TOPMed programme, which plans to sequence more than 120,000 genomes17, ads A sequence as NA sequencer. f a computer . onent of a ich the Cloud computing for genomic data analysis and collaboration Ben Langmead1 and Abhinav Nellore2 Abstract | Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data. COMPUTATIONAL TOOLS REVIEWS Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nature Reviews Genetics. 2018 Apr;19(4):208-219.

Slide 49

Slide 49 text

Future: data science One dataset All of SRA Public data quickly confronts us with technical confounders & missing/incorrect metadata What questions can we answer robustly? At what points on the spectrum? Is metadata fixable? Ellis SE, Collado-Torres L, Jaffe A, Leek JT. Improving the value of public RNA-seq expression data by phenotype prediction. Nucleic Acids Res. 2018 May 18;46(9):e54.

Slide 50

Slide 50 text

Jeff Leek Jacob Pritt Abhinav Nellore Kasper Hansen Leo Collado Torres Chris Wilks Andrew Jaffe José Alquicira- Hernández Jamie Morton Kai Kammers Shannon Ellis Margaret Taub • NIH R01GM118568 • NSF CAREER IIS-1349906 • Sloan Research Fellowship • IDIES Seed Funding program • Amazon Web Services • NIH R01GM105705 (Leek) langmead-lab.org, @BenLangmead Thank you: IDIES Seed funding SciServer SciServer Compute Jonathan Ling Seth Blackshaw