Pro Yearly is on sale from $80 to $50! »

2013 JMU Biology Seminar (Bioinformatics-as-a-Service)

2013 JMU Biology Seminar (Bioinformatics-as-a-Service)

Seminar at JMU Biology Department Seminar, October 11, 2013. Title: "Bioinformatics-as-a-Service: Applications, Opportunities, and Challenges with Large-Scale -Omics Data." Audience: Undergrads, faculty, grads.

8c8cb9d49f0ff8139e459414aeb4c055?s=128

Stephen Turner

October 11, 2013
Tweet

Transcript

  1. Bioinformatics-as-a-Service: Applications, Opportunities, and Challenges with Large-Scale -Omics Data Stephen

    D. Turner, Ph.D. Bioinformatics Core Director bioinformatics@virginia.edu bioinformatics.virginia.edu Slides available at: stephenturner.us/slides
  2. Today’s Talk 1.  My Background: Genetics, Statistics, & Bioinformatics 2. 

    Bioinformatics: Origins & Contemporary Applications 3.  Bioinformatics Core: Staying Relevant; Research Vignettes October 10, 2013 bioinformatics.virginia.edu
  3. JMU 2002-2006 •  Gene expression in amphibian tail development • 

    Biosymposium 2006 slides: stephenturner.us/slides October 10, 2013 bioinformatics.virginia.edu
  4. Grad School: 5 years in 5 minutes •  Ph.D. Human

    Genetics, M.S. Applied Statistics –  Research: Genetic Epidemiology –  Working Hypothesis: common disease, common variant •  Lipids: –  Risk factors for CVD •  1 mg/dL ↑ LDL = 1% ↑ risk for CV event. •  1 mg/dL ↓ HDL = 6% ↑ risk for CV event. –  Therapeutic targets –  Easy to phenotype –  Heritable (HDL ~70% heritable!) –  Finding genetic factors ~ HDL = new bio = new treatments. October 10, 2013 bioinformatics.virginia.edu
  5. eMERGE •  The electronic Medical Records and GEnomics Network: a

    consortium of biorepositories linked to electronic medical records data for conducting genomic studies. •  Many phenotypes - no “ascertainment” necessary. •  Multiple sites = replication, joint analysis. •  One of the goals: assessment. Will this work? October 10, 2013 bioinformatics.virginia.edu
  6. Genome-Wide Association Study Manolio TA. N Engl J Med 2010;363:166-176.

    October 10, 2013 bioinformatics.virginia.edu SNP (say “snip”) = Single Nucleotide Polymorphism. A common variant (mutation) in the population. Millions exist throughout the human genome.
  7. Quality Control i j ji i i i y c

    G e µ β = + + + ∑ 2 ~ (0, ) e G MVN σ Φ ˆ ˆ ( ) i j ji i i e y c G µ β = − + + ∑ ˆ i i i i e g e µ β = + + Ancestry: PCA Relatedness: used linear mixed effects model October 10, 2013 bioinformatics.virginia.edu •  Marker call rate •  Sample call rate •  Mendelian errors •  Discordant calls •  Minor allele frequency •  Hardy-Weinberg equilibrium •  …
  8. GWAS: HDL-Cholesterol Peripheral Cell Lipid Source ABCA1 FC CE FC

    CE LCAT Peripheral Cell Lipid Destination LIPC TGàFFA LIPG PLàFFA LPL TGàFFA TG CE CETP Hepatobiliary Elimination
  9. None
  10. Epistasis October 10, 2013 bioinformatics.virginia.edu B/_; E/_ b/b; E/_ _/_;

    e/e
  11. GxG Interaction •  “Missing heritability” may be found in GxG

    interactions. –  B. Maher 2008 Nature –  T. Manolio, F. Collins, et al. Nature 2009 –  T. Manolio 2010 NEJM –  Mouse, E. coli, S. cerevisiae •  GxG hard to test: –  GWAS: 1.25×1011 tests –  Computationally difficult –  Multiple testing •  Limit GxG tests to: –  SNPs with large main effect –  GxG with biological relevance October 10, 2013 bioinformatics.virginia.edu
  12. Results •  Main effects of each SNP in each dataset

    reduce HDL. •  Interaction effect raises HDL. –  Joint effect is nonlinear. –  Epistasis – heterogeneity, not synergy. •  LPL mediates the release of FFA and TG from HDL particles. •  ABCA1 shuttles FC into HDL particles during intravascular remodeling. SNP 1 Gene 1 SNP 2 Gene 2 MF β1 MF β2 MF β3 MF P BioVU β1 BioVU β2 BioVU β3 BioVU P rs253 LPL rs2515614 ABCA1 - - + 0.006 - - + 0.001 rs253 LPL rs2472509 ABCA1 - - + 0.006 - - + 0.001
  13. Results •  Main effects of each SNP in each dataset

    reduce HDL. •  Interaction effect coefficient is positive –  Joint effect is nonlinear. –  Epistasis – heterogeneity, not synergy. •  LPL mediates the release of FFA and TG from HDL particles. •  ABCA1 shuttles FC into HDL particles during intravascular remodeling. SNP 1 Gene 1 SNP 2 Gene 2 MF β1 MF β2 MF β3 MF P BioVU β1 BioVU β2 BioVU β3 BioVU P rs253 LPL rs2515614 ABCA1 - - + 0.006 - - + 0.001 rs253 LPL rs2472509 ABCA1 - - + 0.006 - - + 0.001 Peripheral Cell Lipid Source ABCA1 FC CE FC CE LCAT Peripheral Cell Lipid Destination LIPC TGàFFA LIPG PLàFFA LPL TGàFFA TG CE CETP Hepatobiliary Elimination
  14. Grammatical Evolution of Neural Networks •  Turner SD, Ritchie MD,

    Bush WS. Conquering the Needle-in-a-Haystack: How Correlated Input Variables Beneficially Alter the Fitness Landscape for Neural Networks. Lec Notes Comp Sci. 5483:80-91 (2009). •  Turner SD, Dudek SK, Ritchie MD. Grammatical Evolution of Neural Networks for Discovering Epistasis among Quantitative Trait Loci. Lec Notes Comp Sci. 6023:86-97 (2010). •  Holzinger ER, Buchanan C, Turner SD, Dudek SM, Torstenson ES, Ritchie MD. Initialization Parameter Sweep in ATHENA: Optimizing Neural Networks for Detecting Gene-Gene Interactions in the Presence of Small Main Effects. Genetic and Evolutionary Computation Conference – GECCO 2010: 203-210. ACM Press (2010). •  Turner SD, Dudek SM, Ritchie MD. Incorporating Domain Knowledge into Evolutionary Computing for Discovering Gene- Gene Interaction. 11th Int’l Conference on Parallel Problem Solving From Nature (PPSN), Lecture Notes in Computer Science. 6238(I): 394-403 (2010). •  Turner SD, Dudek SM, Ritchie MD. ATHENA: A Knowledge-Based Hybrid Backpropagation- Grammatical Evolution Neural Network Algorithm for Discovering Epistasis among Quantitative Trait Loci. BMC BioData Mining. 3:5 (2010). y Σ x x x x x Σ Σ Σ Σ Σ Σ Σ
  15. Postdoc: Obesity Epidemiology No Data <10% 10%–14% 15%–19% 20%–24% 25%–29%

    ≥30% 1990 1999 2009 !
  16. Obesity No Data <10% 10%–14% 15%–19% 20%–24% 25%–29% ≥30% 1990

    1999 2009 Central obesity Liver Fat
  17. Obesity Central obesity Liver Fat DXA ($$) MRI ($$$)

  18. Obesity Central obesity Liver Fat DXA ($$) MRI ($$$) Biomarkers

    ($) BMI WHR Lipidomics Adipokines Cytokines
  19. Study Design •  MEC: >215,000 adults •  5 ethnic groups:

    AA, JA, H, NHW, NH. •  30 JA, 30 NHW postmenopausal women •  Anthro data + 60 biomarkers for adipocytokines, inflammation, insulin resistance & lipid profile. •  Random Forest to predict: –  Total body fat (DXA) –  Trunk:periphery fat ratio (DXA) –  Hepatic adiposity (MRI)
  20. Random Forest: Results •  Automatic variable selection using RF: model

    explains fat distribution better with biomarkers (vs anthro alone). •  Important biomarkers varied by trait. •  RF >>> linear regression. October 10, 2013 bioinformatics.virginia.edu PLoS ONE Aug 2012 7(8):e43502
  21. Other UHCC projects •  Obesity biomarkers / RF •  Rare

    variant analysis in IGF1 •  GWAS in MEC, pathway analysis •  Other statistical analysis, pathway analysis, etc. •  All of the above, and others at UHCC and USC: host genetics x microbiome U01 proposal (funded Aug 2012) October 10, 2013 bioinformatics.virginia.edu
  22. Meanwhile… •  Democratization of next-gen sequencing •  UHCC role: Bioinformatics-as-a-service

    •  Interest in consulting / contract research October 10, 2013 bioinformatics.virginia.edu What does bioinformatics mean in 2013?
  23. Bioinformatics Origins •  Rooted in sequence analysis •  Driven by

    need to: -  Collect -  Annotate -  Analyze
  24. Margaret Dayhoff 1925-1983 •  Collected all known protein sequences • 

    Published in 1965 •  Pioneered algorithm development for –  Comparison of protein sequences –  Derivation of evolutionary histories from alignments “In this paper we shall describe a completed computer program for the IBM 7090, which to our knowledge is the first successful attempt at aiding the analysis of the amino acid chain structure of protein.” October 10, 2013 bioinformatics.virginia.edu
  25. IBM 7090

  26. Margaret Dayhoff 1925-1983 “There is a tremendous amount of information

    regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.” M. Dayhoff, February 27, 1967 October 10, 2013 bioinformatics.virginia.edu
  27. What is bioinformatics? Modified from @drewconway

  28. What is bioinformatics? 1960 1970 1980 1990 2000 2010 October

    10, 2013 bioinformatics.virginia.edu
  29. Between April-October 2012: Cost of a human genome: +$717 (+12%)

    genome.gov/sequencingcosts
  30. After the Gold Rush… •  Hall, N. “After the Gold

    Rush”. Genome Biol 2013. •  What if microscopes got 10x more powerful every year… –  Could do the same experiment every few months with the same slide. –  Make new discoveries! Publish interesting findings! •  Not too different from genomics… –  Sequence a Human Genome (HGP 2001) –  Sequence 1000 human genomes (1000genomes.org) –  Sequence 2000 human genomes (1000genomes.org) –  Sequence Human Microbiomes (hmpdacc.org) –  Sequence Earth (earthmicrobiome.org)
  31. After the Gold Rush… •  What’s possible next year will

    be the same as what’s possible now. •  Fresh ideas needed! •  Stability will be good for us in the end.
  32. Challenges in Bioinformatics •  Data integration (see data integration talk

    from 2012 ISMB at stephenturner.us/slides) •  Training: how to make scalable and sustainable? •  New technologies: how to best support new and emerging technologies? October 10, 2013 bioinformatics.virginia.edu
  33. UVA Bioinformatics Core •  A centralized resource for providing expert

    and timely bioinformatics consulting and data analysis. •  Main goals: help collaborators publish and get funding. –  1. Service –  2. Training –  3. Infrastructure October 10, 2013 bioinformatics.virginia.edu
  34. Sample prep Sequencing Raw data Differential expression Gene identification Novel

    Genes Discoveries …etc. This is the “stuff” we do in the bioinformatics core! Find out what this “stuff” is at bioinformatics.virginia.edu
  35. bioinformatics.virginia.edu/services •  Gene expression: Microarray Analysis •  Gene expression: RNA-seq

    Analysis •  Pathway analysis •  DNA Methylation •  DNA Binding / ChIP-Seq •  DNA Variation •  Metagenomics •  Grant / Manuscript support •  Custom development October 10, 2013 bioinformatics.virginia.edu
  36. Bioinformatics in a world of Genome Factories October 10, 2013

    bioinformatics.virginia.edu •  Adaptation to the environment •  Bundled analysis – easy answers •  Collaboration •  Automation vs. innovation •  Downstream analysis •  New tech: no pre-built pipelines •  Training & Infrastructure: help collaborators help themselves!
  37. BioConnector (bioconnector.virginia.edu) •  Partnership between –  Bioinformatics core –  Health

    Sciences Library –  Div. Clinical Informatics •  Mission: Get researchers connected to the tools and people they need. •  Tools: –  Galaxy server –  VIVO (collaboration) –  Wiki (documentation) –  CDR –  Awesome space October 10, 2013 bioinformatics.virginia.edu
  38. Research Vignettes

  39. Research Vignette #1: Valeria Mas •  Kidney transplant health (GFR)

    •  Integrate microarray analysis with clinical data using machine learning •  Made the cover of Transplantation (2012) October 10, 2013 bioinformatics.virginia.edu
  40. Research Vignette #2: Gomez/Belyea •  Mouse model of leukemia October

    10, 2013 bioinformatics.virginia.edu
  41. Research Vignette #2: Gomez/Belyea •  Mouse model of leukemia – 

    How does gene knockout result in leukemia? –  What are the downstream molecular effects? •  Gene Expression Microarray: QC, differential gene expression, pathway analysis •  Results: –  KO De-represses a B cell specific gene program –  Increased cell-cycle progression •  Now: currently looking for mutations in human gene October 10, 2013 bioinformatics.virginia.edu
  42. Research Vignette #3: Deb Lannigan •  Personalizing breast cancer chemotherapy

    •  Current state of the art: test survival in 2D culture October 10, 2013 bioinformatics.virginia.edu Excise cancer cells Survival assays in 2D culture
  43. •  Goal: Develop 3D culture system to mimic in situ

    cancer •  Bioinformatics: –  How does “tumoroid” grown in 3D compare to tumor tissue? –  Gene expression profiling of multiple samples comparing tumoroids to tumors, and cells isolated from tumor margins. Research Vignette #3: Deb Lannigan October 10, 2013 bioinformatics.virginia.edu
  44. Research vignette #4: U.S. Government •  Microbial forensics: analysis and

    interpretation of evidence for attribution of an act of bioterrorism, biocrime, hoax, or inadvertent release of a toxin or biological threat agent. October 10, 2013 bioinformatics.virginia.edu Figures from Turner et al 2013 Report to DOD, “Harnessing Next-Generation Sequencing Capabilities for Microbial Forensics.”
  45. Other Current Projects •  Metagenomics & Microbial Forensics •  Microarray

    analysis •  RNA-seq •  MeDIP-seq •  ChIP-seq •  GWAS •  Predictive analysis & machine learning for biomarker discovery •  Acquisition and Analysis of public data (GEO, SRA, dbGaP, etc.) •  Grant preparation •  Literature & database searching for gene expression signatures •  Pathway analysis gettinggeneticsdone.blogspot.com/2012/03/pathway-analysis-for-high-throughput.html •  Gene ID conversion gettinggeneticsdone.blogspot.com/2012/03/video-tip-convert-gene-ids-with-biomart.html •  Array annotation gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html October 10, 2013 bioinformatics.virginia.edu
  46. Thank you Web: bioinformatics.virginia.edu E-mail: bioinformatics@virginia.edu Blog: www.GettingGeneticsDone.com Twitter: @genetics_blog

    October 10, 2013 bioinformatics.virginia.edu Slides available at: stephenturner.us/slides