BIO 299: Lecture 12 (NAU Spring 2014)

24f019bae9a9c4282123961b01c7f0d5?s=47 Greg Caporaso
February 20, 2014

BIO 299: Lecture 12 (NAU Spring 2014)

24f019bae9a9c4282123961b01c7f0d5?s=128

Greg Caporaso

February 20, 2014
Tweet

Transcript

  1. 1.
  2. 2.

    Applica2on  presenta2ons   1.  What  is  the  biological  problem  that

     the   authors  are  trying  to  address?   2.  What  is  the  mo2va2on  for  addressing  this   problem?   3.  What  previous  work  has  been  done  in  this   area?  Are  there  pre-­‐exis2ng  tools  that   address  this  problem?   4.  What  computa2onal  technologies  did  the   authors  make  use  of  to  create  this  tool?    
  3. 3.

    Applica2on  presenta2ons   5.  What  preexis2ng  biological  resources  did  the

      authors  make  use  of?   6.  What  is  the  input  to  this  tool?   7.  What  is  the  output  from  this  tool?   8.  How  did  the  authors  test  this  tool?  Was   performance  benchmarking  included  in  their   paper?   9.  How  did  the  authors  evaluate  whether  this  tool   was  giving  biologically  meaningful  results?  
  4. 4.

    QIIME:  Quan2ta2ve  Insights  Into   Microbial  Ecology   •  Nature

     Methods  (2010)   •  Presented  as  a  LeYer  to  the  Editor,  most  of   the  descrip2ve  text  is  in  the  supplementary   text   •  Project  is  s2ll  under  ac2ve  development   (current  version  1.8.0)   •  www.qiime.org  
  5. 5.

    The image cannot be displayed. Your computer may not have

    enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
  6. 7.

    Using  barcoded  DNA  sequencing,  we  can  study   the  rRNA

     gene  composi2on  of  many  samples  in  a   single  DNA  sequencing  reac2on.   Pool  samples   and  sequence   Micah  Hamady,  et  al.,  Nature  Methods,  2008.   Error-­‐correc2ng  barcodes  for  pyrosequencing  hundreds  of  samples  in  mul2plex.   Barcode  the  rRNA  on  a   per-­‐sample  basis.   >GCACCTGAGGACAGGCATGAGGAA…   >GCACCTGAGGACAGGGGAGGAGGA…   >TCACATGAACCTAGGCAGGACGAA…   >CTACCGGAGGACAGGCATGAGGAT…   >TCACATGAACCTAGGCAGGAGGAA…   >GCACCTGAGGACACGCAGGACGAC…   >CTACCGGAGGACAGGCAGGAGGAA…   >CTACCGGAGGACACACAGGAGGAA…   >GAACCTTCACATAGGCAGGAGGAT…   >TCACATGAACCTAGGGGCAAGGAA…   >GCACCTGAGGACAGGCAGGAGGAA…    
  7. 8.

    Assign  millions  of   sequences  from  thousands   of  samples

     to  reference   Compare  samples   sta2s2cally  and  visually   www.qiime.org   Assign  reads  to  samples   >GCACCTGAGGACAGGCATGAGGAA…   >GCACCTGAGGACAGGGGAGGAGGA…   >TCACATGAACCTAGGCAGGACGAA…   >CTACCGGAGGACAGGCATGAGGAT…   >TCACATGAACCTAGGCAGGAGGAA…   >GCACCTGAGGACACGCAGGACGAC…   >CTACCGGAGGACAGGCAGGAGGAA…   >CTACCGGAGGACACACAGGAGGAA…   >GAACCTTCACATAGGCAGGAGGAT…   >TCACATGAACCTAGGGGCAAGGAA…   >GCACCTGAGGACAGGCAGGAGGAA…     RefSeq 1 RefSeq 2 RefSeq 3 RefSeq 4 RefSeq 5 RefSeq 6 RefSeq 7 RefSeq 8 RefSeq 9 RefSeq 10
  8. 9.

    Peter J. Turnbaugh et al., Nature 2006 An obesity-associated gut

    microbiome with increased capacity for energy harvest Do differences in our microbiota matter?
  9. 10.

    Previous  work  and  pre-­‐exi2ng  tools   •  Mothur:  can’t  support

     very  high-­‐throughput   analysis;  not  extensively  tested.   •  Many  tools  for  individual  steps,  but  no   comprehensive  pipeline  –  independent  tools   require  a  lot  of  work  to  plug  together.  
  10. 11.

    Computa2on  technologies     •  Python,  and  third-­‐party  python  modules

      (PyCogent,  numpy,  matplotlib,  …)   •  Revision  control  and  code  reviews  via  GitHub.   •  Technical  support:  QIIME  forum,  online   documenta2on  as  IPython  Notebooks,   developer-­‐led  workshops.    
  11. 12.

    Computa2on  technologies     •  Many  exis2ng  soeware  packages  are

     wrapped   in  QIIME:  uclust,  BLAST,  muscle,  PyNAST,  RDP   classifier,  cdhit  (full  list  in  install  documents).   •  Soeware  can  be  installed  na2vely  on  OS  X  and   Linux,  or  via  Virtual  Machines.  Pre-­‐built  Virtual   Machines  are  available  for  all  releases.   •  Parallel  compu2ng  supported  across  many   environments  (mul2-­‐core/cluster/grid/cloud   computers).  
  12. 13.

    Biological  resources   •  Exis2ng  bioinforma2cs  tools   •  Databases

     for  reference-­‐based  sequence   alignment  and  taxonomy  classifica2on  (e.g.,   the  Greengenes  database)  
  13. 14.

    Input   •  Raw  sequencing  data,  can  be  generated  on

      various  plajorms   •  Per-­‐sample  metadata  (barcodes,   environmental  data;  tab-­‐separated  text)   •  Op2onal:  reference  sequence  data  (e.g.,   Greengenes  database)  
  14. 15.

    Output   •  Many,  depending  on  which  steps.  Primary  

    results  are  visualiza2ons  such  as  3D  PCoA   plots,  alpha  rarefac2on  curves,  and  taxonomic   summaries.  
  15. 16.

    Tes2ng   •  Extensive  unit  tests  via  python’s  unit_test  

    module  (and  extensions  in  PyCogent).   •  Script  interface  tes2ng  via  qcli.   •  Con2nuous  integra2on  tes2ng  via  Jenkins.  
  16. 18.

    Biological  evalua2on,  but  liYle   performance  benchmarking   •  Performed

     via  a  proof-­‐of-­‐concept  by   performing  meta-­‐analysis  of  ten  454  FLX  runs   (3.8  million  sequences)  
  17. 19.
  18. 21.

    Moving  Pictures  of  the  Human   Microbiome   •  Two

     subjects  sampled  daily,  one  for  six   months,  one  for  18  months   •  Four  body  sites:  tongue,  palm  of  lee  hand,   palm  of  right  hand,  and  gut  (via  fecal  swabs).  
  19. 22.

    Moving  Pictures  of  the  Human   Microbiome:  QIIME  tutorial  (demo)

      •  Inves2gate  the  rela2ve  temporal  variability  of   body  sites.   •  A  small  subset  of  the  full  data  set  to  facilitate   short  run  2me:  ~0.1%  of  the  full  sequence   collec2on.   •  Sequenced  on  Illumina  GAIIx;  a  subset  of  the   samples  also  sequenced  on  454.   •  The  online  tutorial  contains  details  on  all  of   the  steps.    
  20. 23.

    This  work  is  licensed  under  the  Crea2ve  Commons  AYribu2on  3.0

     United  States  License.  To  view  a   copy  of  this  license,  visit   hYp://crea2vecommons.org/licenses/by/3.0/us/  or  send  a  leYer  to  Crea2ve  Commons,  171   Second  Street,  Suite  300,  San  Francisco,  California,  94105,  USA.     Feel  free  to  use  or  modify  these  slides,  but  please  credit  me  by  placing  the  following  aYribu2on   informa2on  where  you  feel  that  it  makes  sense:  Greg  Caporaso,  www.caporaso.us.