Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ISMB 2012 Data Integration Talk

ISMB 2012 Data Integration Talk

"From Data to Knowledge: Extracing Biological Insight from Diverse Data Sources" talk from 2012 ISMB meeting, Bioinformatics Core session.

Stephen Turner

July 16, 2012
Tweet

More Decks by Stephen Turner

Other Decks in Research

Transcript

  1. From  Data  to  Knowledge:   Extrac4ng  Biological  Insight  from  Diverse

     Data  Sources   Stephen  D.  Turner,  Ph.D.   Bioinforma4cs  Core  Director   bioinforma4cs@virginia.edu   bioinforma4cs.virginia.edu  
  2. GWAS:  One  gene,  one  enzyme,  one  func4on?   Manolio  TA.

     N  Engl  J  Med  2010;363:166-­‐176.   Genome.gov/GWAStudies   October  9,  2013   bioinforma4cs.virginia.edu  
  3. DNA  Varia4on:  Limita4ons   October  9,  2013   bioinforma4cs.virginia.edu  

                       GWAS  DOES  NOT  INFORM:     •  Which  gene  is  affected   •  How  gene  func4on  is  perturbed   •  How  biological  processes  are  altered  
  4. One  gene,  one  enzyme,  one  func4on?   Jeong,  H.  et

     al..  (2001)  Nature  411:41–42.   Ptacek,  J.  et  al.  (2005)  Nature  438:679–684.   Guimera  and  Amaral.  (2005).  Nature  433:895-­‐900.   Tong,  A.H.  et  al.  (2001).  Science  294:2364-­‐2368.   Zhu  X.  et  al.  (2007).  Genes  &  Dev  21:1010-­‐1024.   October  9,  2013   bioinforma4cs.virginia.edu  
  5. Distribu4on  of  Disease  Genes   Diseases  connected  if  same  

    gene  implicated  in  both.   Genes  connected  if  implicated   in  the  same  disorder.   Goh  et  al.  (2007).  PNAS  104:8685.  
  6. Distribu4on  of  Disease  Genes   Protein-­‐protein  interac4ons   Genes  connected

     if  implicated   in  the  same  disorder.   Overlay  with  PPI  data   Goh  et  al.  (2007).  PNAS  104:8685.  
  7. Distribu4on  of  Disease  Genes   Protein-­‐protein  interac4ons   Genes  connected

     if  implicated   in  the  same  disorder.   Overlay  with  PPI  data   Genes  contribuCng  to  a  common   disease  interact  through  protein-­‐ protein  interacCons.   Goh  et  al.  (2007).  PNAS  104:8685.  
  8. Distribu4on  of  Disease  Genes   Seebacher  and  Gavin  (2011).  Cell

     144:1000-­‐1001   Goh  et  al.  (2007).  PNAS  104:8685.   k  =  degree        =  #  interac4on  partners   •  “EssenCal”  genes   -­‐  Encode  hubs   -­‐  Are  expressed  globally     •  “Non-­‐essenCal”  disease  genes   -­‐  Do  not  encode  hubs   -­‐  Tissue  specific  expression   Nonrandom  placement  of     disease  genes  in  interactome!  
  9. Interactome  Mapping  &  Data  Integra4on   Vidal  et  al,  Cell

     2011.   October  9,  2013   bioinforma4cs.virginia.edu  
  10. Data  Integra4on:  Gene4c  Varia4on  &  Gene  Expression   +  

    Are  DNA  variants   that  are  associated   with  disease  also   associated  with  gene   expression  levels?   October  9,  2013   bioinforma4cs.virginia.edu  
  11. Data  Integra4on:  Gene  expression  +  DNA  Binding   October  9,

     2013   bioinforma4cs.virginia.edu   Gene  expression  arrays  +  ChIP-­‐Seq  
  12. Data  Integra4on:  4  Dimensions   Schadt  et  al.  2009.  Network

     view  of   disease  and  compound  screening.   Nat  Rev  Drug  Discovery  8:286.   Probabilis4c  Bayesian   Network  Integra4ng:   1.  Gene4c  varia4on   2.  Gene  expression   3.  Protein-­‐protein   interac4ons   4.  Transcript  factor  binding   October  9,  2013   bioinforma4cs.virginia.edu  
  13. Data  Integra4on:  6  Dimensions   October  9,  2013   bioinforma4cs.virginia.edu

      1.  Metabolite  concentra4ons   2.  RNA  expression   3.  DNA  Varia4on   4.  DNA-­‐protein  binding   5.  Protein-­‐protein  interac4on   6.  Protein-­‐metabolite  interac4on     •  Metabolites  linked  to  DNA  variants  (MetQTLs)   •  MetQTLs  co-­‐localize  with  eQTLs   •  Using  a  Bayesian  network   –  Nodes:  DNA  varia4on,  gene  expresion,  metabolite  concentra4on   –  Priors:  Protein-­‐DNA  binding,  protein-­‐protein  interac4on,  metabolite-­‐protein  interac4on   –  Edges:  Inferred  rela4onships  à  mechanism   Zhu  J,  …  Schadt  EE.  2012.  S4tching  together  Mul4ple   Data  Dimensions  Reveals  Interac4ng  Metabolomic   and  Transcriptomic  Networks  that  Modulate  Cell   Regula4on.  PLoS  Biol.   Infer  causality   Special  Session  4:     BioinformaCc  IntegraCon  of  Diverse  Experimental  Data  Sources   Today,  room  201A,  2:30-­‐4:25pm     Part  D  (4:00-­‐4:25):     SCtching  together  mulCple  data  dimensions…  
  14. Data  Integra4on:  Mouse  Cis-­‐Regulatory  Map   •  RNA-­‐Seq  and  ChIP-­‐Seq

     for  6  DNA-­‐ binding  factors  *  19  cell  types   –  ChIP:  PolII,  H3K4me3,  H3K4me1,   H3K27ac,  P300,  CTCF   –  Adult  Tissues:  bone  marrow,   cerebellum,  cortex,  heart,  intes4ne,   kidney,  liver,  lung,  olfactory  bulb,   placenta,  spleen,  tes4s,  thymus   –  Embryonic  Tissues:  brain,  heart,  limb,   liver   –  Cell  lines:  mESCs,  MEFs   •  Found  300,000  cis-­‐reg  features   –  11%  mouse  genome   –  70%  conserved  non-­‐coding  sequence   October  9,  2013   bioinforma4cs.virginia.edu   Shen  et  al.  A  map  of  the  cis-­‐ regulatory  sequences  in  the   mouse  genome.  Nature,  July   2012.  
  15. Data  Integra4on:  Epigenome  &  Transcriptome   •  Zhang  JA,  Mortazavi

     A,  Williams  BA,  Wold  BJ,  Rothenberg  EV.  Dynamic  Transforma4ons  of   Genome-­‐wide  Epigene4c  Marking  and  Transcrip4onal  Control  Establish  T  Cell  Iden4ty.  Cell  2012.   •  ChIP-­‐Seq  +  RNA-­‐Seq  in  sequen4al  T-­‐cell  developmental  stages   •  Changes  in  gene  expression  co-­‐occur  w/  histone  modifica4on  at  cis-­‐regulatory  sites.   October  9,  2013   bioinforma4cs.virginia.edu  
  16. Summary   •  Data  is  cheap  and  diverse.   – 

    Gene4c  varia4on:  GWAS,  next-­‐gen  sequencing   –  Gene  expression:  Microarray,  RNA-­‐seq   –  Proteomics:  Y2H,  CoAP/MS   •  Cellular  components  interact  in  a  network  with  other  cellular   components.   •  Disease  is  the  result  of  an  abnormality  in  that  network.   •  Integrate  mul4ple  data  types,  understand  network,   understand  disease.   October  9,  2013   bioinforma4cs.virginia.edu  
  17. Thank  you   Web:    bioinforma4cs.virginia.edu   E-­‐mail:  bioinforma4cs@virginia.edu  

    Blog:    www.GesngGene4csDone.com   Twiter:  twiter.com/gene4cs_blog   October  9,  2013   bioinforma4cs.virginia.edu