COS Talk: Enabling Reproducible Bioinformatics Research with Service and Training

Enabling Reproducible Bioinformatics Research with Service and Training Stephen D.
Turner, Ph.D. Bioinformatics Core Director @genetics_blog 1 April 22, 2014 Slides at stephenturner.us/slides

What is bioinformatics? Moﬁﬁed from @drewconway 2

Subdisciplines • Sequence alignment • Genome assembly • Metagenomics •
Genome annotation • Evolutionary biology / comparative genomics • Analysis of gene expression • Analysis of gene regulation • Genotype-phenotype association • Mutation analysis • Structural biology • Biomarker identiﬁcation • Pathway analysis / "systems biology" • Literature analysis / text-mining 4

UVA Bioinformatics Core • Founded October 2011 • Mission: build
and maintain a centralized resource for expert bioinformatics consulting & data analysis and to help collaborators fund & publish their work - 1. Service - 2. Training - 3. Infrastructure building 5

6 UVA Bioinformatics Core

Recent work • Microbiome 2:22 (2014) doi: 10.1186/2049-2618-2-22 • Rhinovirus
challenge • Isolate/sequence DNA from nasal lavage ﬂuid samples • Analyze phylogenetic content 7

Recent work • Nature comm. 5:3273 (2014). doi: 10.1038/ncomms427 •
Deleted a gene in mouse. • Gene expression proﬁling reveals B-cell gene program and constrains differentiation. 8

Recent work • Cell metab. 19:667 (2014). doi: 10.1016/j.cmet.2014.03.005 •
Mouse model of T2DM • Gene expression proﬁling reveals new mechanism of insulin secretion suppression. 9

Bioinformatics Challenges • Data integration: how to best integrate multiple
disparate data types? - See “data integration” talk at stephenturner.us/slides • New technologies: how to best support new and emerging technologies? - See “new technologies” talk at stephenturner.us/slides • Transparency & reproducibility ! • Training 10

Reproducibility barriers • Data: not all available, difﬁcult to access.
• Tools: inaccessible, poor version control. • Publication: results, data, methods separate. • Incentives: - Scarce funding - Reward for being “ﬁrst” - Career incentives not obvious • Training: scientists aren’t taught these skills! 11

Enabling Reproducibility • Version control (git/GitHub) • Dynamic documents -
R, RStudio, knitr: Markdown + embedded R code » HTML/PDF report - IPython notebook • Galaxy - Web-based bioinformatics toolkit - Tracks history, versions, parameters, data • Wiki - Version controlled place to code, scripts, data, and results used for client projects • Training 12

Training • Software Carpentry (software-carpentry.org) - Volunteer organization to teach
basic computing skills to scientists - Core curriculum: ‣ Basic programming ‣ Version control ‣ Automation ‣ Testing - Two-day bootcamps - Coming soon: train-the-trainer program • Workshops (bioconnector.github.io/workshops) - All course material on GitHub - All R-related materials compiled as RMarkdown dynamic document - Courses: ‣ Introduction to R for life scientists ‣ RNA-seq data analysis (coming soon) ‣ Data visualization with R and ggplot2 (coming soon) ‣ Data manipulation with data.table and dplyr (coming soon) 13

The end? / Open questions 1. What does reproducibility mean?
2. How to incentivize open science + reproducibility for “traditional” scientists? How to change culture at the senior faculty level? 3. What are the technical barriers (if any) to open science and reproducibility? How solve? 4. Training: how to make sustainable & scalable? 14 twitter @genetics_blog web stephenturner.us core bioinformatics.virginia.edu email [email protected]

COS Talk: Enabling Reproducible Bioinformatics ...

COS Talk: Enabling Reproducible Bioinformatics Research with Service and Training

Stephen Turner

More Decks by Stephen Turner

Other Decks in Education

Featured

Transcript

Enabling Reproducible Bioinformatics Research with Service and Training Stephen D.

What is bioinformatics? Moﬁﬁed from @drewconway 2

3

Subdisciplines • Sequence alignment • Genome assembly • Metagenomics •

UVA Bioinformatics Core • Founded October 2011 • Mission: build

6 UVA Bioinformatics Core

Recent work • Microbiome 2:22 (2014) doi: 10.1186/2049-2618-2-22 • Rhinovirus

Recent work • Nature comm. 5:3273 (2014). doi: 10.1038/ncomms427 •

Recent work • Cell metab. 19:667 (2014). doi: 10.1016/j.cmet.2014.03.005 •

Bioinformatics Challenges • Data integration: how to best integrate multiple

Reproducibility barriers • Data: not all available, difﬁcult to access.

Enabling Reproducibility • Version control (git/GitHub) • Dynamic documents -

Training • Software Carpentry (software-carpentry.org) - Volunteer organization to teach

The end? / Open questions 1. What does reproducibility mean?