Save 37% off PRO during our Black Friday Sale! »

COS Talk: Enabling Reproducible Bioinformatics Research with Service and Training

COS Talk: Enabling Reproducible Bioinformatics Research with Service and Training

Slides for talk at Center for Open Science

8c8cb9d49f0ff8139e459414aeb4c055?s=128

Stephen Turner

July 22, 2014
Tweet

Transcript

  1. Enabling Reproducible Bioinformatics Research with Service and Training Stephen D.

    Turner, Ph.D. Bioinformatics Core Director @genetics_blog 1 April 22, 2014 Slides at stephenturner.us/slides
  2. What is bioinformatics? Mofified from @drewconway 2

  3. 3

  4. Subdisciplines • Sequence alignment • Genome assembly • Metagenomics •

    Genome annotation • Evolutionary biology / comparative genomics • Analysis of gene expression • Analysis of gene regulation • Genotype-phenotype association • Mutation analysis • Structural biology • Biomarker identification • Pathway analysis / "systems biology" • Literature analysis / text-mining 4
  5. UVA Bioinformatics Core • Founded October 2011 • Mission: build

    and maintain a centralized resource for expert bioinformatics consulting & data analysis and to help collaborators fund & publish their work - 1. Service - 2. Training - 3. Infrastructure building 5
  6. 6 UVA Bioinformatics Core

  7. Recent work • Microbiome 2:22 (2014) doi: 10.1186/2049-2618-2-22 • Rhinovirus

    challenge • Isolate/sequence DNA from nasal lavage fluid samples • Analyze phylogenetic content 7
  8. Recent work • Nature comm. 5:3273 (2014). doi: 10.1038/ncomms427 •

    Deleted a gene in mouse. • Gene expression profiling reveals B-cell gene program and constrains differentiation. 8
  9. Recent work • Cell metab. 19:667 (2014). doi: 10.1016/j.cmet.2014.03.005 •

    Mouse model of T2DM • Gene expression profiling reveals new mechanism of insulin secretion suppression. 9
  10. Bioinformatics Challenges • Data integration: how to best integrate multiple

    disparate data types? - See “data integration” talk at stephenturner.us/slides • New technologies: how to best support new and emerging technologies? - See “new technologies” talk at stephenturner.us/slides • Transparency & reproducibility ! • Training 10
  11. Reproducibility barriers • Data: not all available, difficult to access.

    • Tools: inaccessible, poor version control. • Publication: results, data, methods separate. • Incentives: - Scarce funding - Reward for being “first” - Career incentives not obvious • Training: scientists aren’t taught these skills! 11
  12. Enabling Reproducibility • Version control (git/GitHub) • Dynamic documents -

    R, RStudio, knitr: Markdown + embedded R code » HTML/PDF report - IPython notebook • Galaxy - Web-based bioinformatics toolkit - Tracks history, versions, parameters, data • Wiki - Version controlled place to code, scripts, data, and results used for client projects • Training 12
  13. Training • Software Carpentry (software-carpentry.org) - Volunteer organization to teach

    basic computing skills to scientists - Core curriculum: ‣ Basic programming ‣ Version control ‣ Automation ‣ Testing - Two-day bootcamps - Coming soon: train-the-trainer program • Workshops (bioconnector.github.io/workshops) - All course material on GitHub - All R-related materials compiled as RMarkdown dynamic document - Courses: ‣ Introduction to R for life scientists ‣ RNA-seq data analysis (coming soon) ‣ Data visualization with R and ggplot2 (coming soon) ‣ Data manipulation with data.table and dplyr (coming soon) 13
  14. The end? / Open questions 1. What does reproducibility mean?

    2. How to incentivize open science + reproducibility for “traditional” scientists? How to change culture at the senior faculty level? 3. What are the technical barriers (if any) to open science and reproducibility? How solve? 4. Training: how to make sustainable & scalable? 14 twitter @genetics_blog web stephenturner.us core bioinformatics.virginia.edu email turner@virginia.edu