Slide 1

Slide 1 text

Enabling Reproducible Bioinformatics Research with Service and Training Stephen D. Turner, Ph.D. Bioinformatics Core Director @genetics_blog 1 April 22, 2014 Slides at stephenturner.us/slides

Slide 2

Slide 2 text

What is bioinformatics? Mofified from @drewconway 2

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

Subdisciplines • Sequence alignment • Genome assembly • Metagenomics • Genome annotation • Evolutionary biology / comparative genomics • Analysis of gene expression • Analysis of gene regulation • Genotype-phenotype association • Mutation analysis • Structural biology • Biomarker identification • Pathway analysis / "systems biology" • Literature analysis / text-mining 4

Slide 5

Slide 5 text

UVA Bioinformatics Core • Founded October 2011 • Mission: build and maintain a centralized resource for expert bioinformatics consulting & data analysis and to help collaborators fund & publish their work - 1. Service - 2. Training - 3. Infrastructure building 5

Slide 6

Slide 6 text

6 UVA Bioinformatics Core

Slide 7

Slide 7 text

Recent work • Microbiome 2:22 (2014) doi: 10.1186/2049-2618-2-22 • Rhinovirus challenge • Isolate/sequence DNA from nasal lavage fluid samples • Analyze phylogenetic content 7

Slide 8

Slide 8 text

Recent work • Nature comm. 5:3273 (2014). doi: 10.1038/ncomms427 • Deleted a gene in mouse. • Gene expression profiling reveals B-cell gene program and constrains differentiation. 8

Slide 9

Slide 9 text

Recent work • Cell metab. 19:667 (2014). doi: 10.1016/j.cmet.2014.03.005 • Mouse model of T2DM • Gene expression profiling reveals new mechanism of insulin secretion suppression. 9

Slide 10

Slide 10 text

Bioinformatics Challenges • Data integration: how to best integrate multiple disparate data types? - See “data integration” talk at stephenturner.us/slides • New technologies: how to best support new and emerging technologies? - See “new technologies” talk at stephenturner.us/slides • Transparency & reproducibility ! • Training 10

Slide 11

Slide 11 text

Reproducibility barriers • Data: not all available, difficult to access. • Tools: inaccessible, poor version control. • Publication: results, data, methods separate. • Incentives: - Scarce funding - Reward for being “first” - Career incentives not obvious • Training: scientists aren’t taught these skills! 11

Slide 12

Slide 12 text

Enabling Reproducibility • Version control (git/GitHub) • Dynamic documents - R, RStudio, knitr: Markdown + embedded R code » HTML/PDF report - IPython notebook • Galaxy - Web-based bioinformatics toolkit - Tracks history, versions, parameters, data • Wiki - Version controlled place to code, scripts, data, and results used for client projects • Training 12

Slide 13

Slide 13 text

Training • Software Carpentry (software-carpentry.org) - Volunteer organization to teach basic computing skills to scientists - Core curriculum: ‣ Basic programming ‣ Version control ‣ Automation ‣ Testing - Two-day bootcamps - Coming soon: train-the-trainer program • Workshops (bioconnector.github.io/workshops) - All course material on GitHub - All R-related materials compiled as RMarkdown dynamic document - Courses: ‣ Introduction to R for life scientists ‣ RNA-seq data analysis (coming soon) ‣ Data visualization with R and ggplot2 (coming soon) ‣ Data manipulation with data.table and dplyr (coming soon) 13

Slide 14

Slide 14 text

The end? / Open questions 1. What does reproducibility mean? 2. How to incentivize open science + reproducibility for “traditional” scientists? How to change culture at the senior faculty level? 3. What are the technical barriers (if any) to open science and reproducibility? How solve? 4. Training: how to make sustainable & scalable? 14 twitter @genetics_blog web stephenturner.us core bioinformatics.virginia.edu email [email protected]