Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Platforms for scientific data analysis

Platforms for scientific data analysis

Presented at a Big Data seminar series at Novo Nordisk recently

Deepak Singh

June 11, 2014
Tweet

More Decks by Deepak Singh

Other Decks in Technology

Transcript

  1. There is no magic, only awesome
 Building a platform for

    scientific data analysis D e e p a k S i n g h ( @ m n d o c i ) P r i n c i p a l P r o d u c t M a n a g e r - A m a z o n E C 2
  2. Genomic  data  is  assembled  and  analyzed  in  Complete   Genomics’

     data  center  and  then  is  securely  transferred  over  a   dedicated  network  to  Amazon  Web  Services  (AWS)  for  delivery   to  customers  either  by  shipping  hard  disk  drives  or   electronically. Automatic  transfer  of  raw  data  to  the  cloud  in  real  time  during  the   course  of  the  sequencing  run   http://blog.basespace.illumina.com/2012/08/10/basespace-­‐growth-­‐the-­‐ numbers/     • 70%  of  all  installed  MiSeqs  have  connected  to  BaseSpace   • BaseSpace  on  HiSeq  in  Q4  2012 Sequencing data going straight to the cloud
  3. Computational compound analysis
 Solar panel material 
 Estimated computation time

    264 years
 
 156,314 core cluster across 8 regions 1.21 petaFLOPS (Rpeak) Simulated 205,000 materials 18 hours for $33,000 16¢ per molecule 1 c|net news ! http://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-processor-cores/
  4. Migrated clinical trials simulations platform Simulations in 1.2hrs vs. 60hrs!

    64% reduction in costs Clinical Pharmacology & Pharmacometrics Molecular Dynamics Computational Genomics Research Application Portfolio 98% time saved for clinical trial simulations Internal System Cloud Individual Clinical Trial Simulation Run Time (Min) 56 56 Total Number of Clinical Trial Simulations 2000 2000 No. Servers 2 256 No. CPU’s 32 2048 Total Analysis Run Time (hr) 60 1.2 Cost ?? $336
  5. • An online environment to drive collaboration among researchers •

    Synapse hosts clinical- genomic datasets • Provides a shared compute space and suite of analysis tools for researchers
  6. Global Collaboration for Global Manufacturing Cloud provides a global, distributed,

    secure, and scalable environment for collaborative design and manufacturing
  7. [plugin ipcluster] setup_class = ipcluster.IPCluster enable_notebook = true notebook_passwd =

    YOUR-PASS ! [cluster qiime]
 node_image_id = ami-2faa7346
 keyname = YOUR-KEY
 cluster_size = 4
 node_instance_type = m2.4xlarge
 plugins = ipcluster ! $ starcluster start -c qiime myqiime Source: Justin Riley