Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Advancing Autism Discovery Workshop, April 22, 2013. James Cuff, CTO - Cycle Computing.

Transcript

  1. Right Place, Right Time, Right Science: Lessons Learned from Utility

    HPC / Big Data James Cuff, CTO @jamesdotcuff, @cyclecomputing
  2. For over sixteen years I have seen a steady evolution

    in research computing 1996 @ Oxford 1 cpu @ 200Mhz / 18GB 2000 @ Sanger / EBI 360 cpu @ 168GHz / 50TB 2003-2006 @ Harvard / MIT 200 cpu @ 400GHz / 250-600TB 2012 @ Harvard >25,000 cpu @ 32THz / 10.0PB
  3. Massive increases in compute Massive increases in storage Massive increases

    in scale Massive increases in throughput Massive increases in performance Researchers need it all, and all need it now! Computational growth is not going away We need new systems, teams and methods to effectively support our scholarly and scientific research
  4. I don’t try and keep up with the big boys

    and girls…
  5. University of Manchester British Nuclear Fuels Limited Oxford University European

    Bioinformatics Institute Inpharmatica Wellcome Trust Sanger Institute Whitehead Genome Center Broad Institute of MIT and Harvard Harvard University Cycle Computing
  6. – 360 node DEC Alpha DS10L 1U – 9 racks

    – 100KW power – 1,440 cat5 crimps… – 466MHz x 360 CPU 168GHz The Human Genome Project ca. 2000
  7. THIS is not HPC…

  8. Neither is THIS…

  9. Or THIS … !!

  10. THIS is HPC!

  11. So…

  12. At Cycle, we believe Scientists and Researchers are shackled by

    a lack of access to compute
  13. History teaches how to collaborate and remove shackles surrounding our

    science
  14. The 60’s The 70’s The 80’s The 90’s The 00’s

    From centralized to decentralized, collaborative to independent and right back again! The 10’s Mainframes VAX The PC Beowulf Clusters Central Clusters Centers provide access to compute The supercomputing famine, funding gap Individual computing Computing is too big to fit under desk, Linux explodes Clouds/VMware IaaS, SaaS, PaaS 100% 60% 0% 40% ???% SHARING ~ 0Mbit ~ 1Mbit ~ 10Mbit ~ 1000 Mbit ~ 10,000 Mbit Bigger, better but further and further away from the scientist’s lab
  15. Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method “Test and Analyze” Require the most time, compute, data and effort
  16. Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method Any improvements to this cycle yield multiplicative benefits
  17. If we democratize access to high performance compute, we WILL

    accelerate science
  18. Not neuroscience…

  19. Or a good excuse for MRI data…

  20. We make software tools to easily orchestrate complex workloads and

    data access across Utility HPC NIMBUS Discovery 12 years of compute in 3 hours $20M of infrastructure for < $3,000 Big 10 Pharma Built 10,600 server cluster ($44M) in 2 hours, 40 years of compute in 11 hours for $4,372 Genomics Research Institute: 1 million hours or 115 years of compute in 1 week for $19,555
  21. Utility HPC in the News WSJ, NYTimes, Wired, Bio-IT World

    BusinessWeek
  22. We solve this challenge across many industries SLED/PS Insurance Financial

    Services Life Sciences Manufacturing & Electronics Energy, Media & Other
  23. Too small when you need it most, Too large every

    other time… Before, Local Cluster:
  24. – 360 node DEC Alpha DS10L 1U – 9 racks

    – 100KW power – 1,440 cat5 crimps… – 466MHz x 360 CPU 168GHz Remember this from earlier?
  25. The world is a very different place now…

  26. Life Science Activities: Compute vs. Data Compute Data/Bandwidth NGS Molecular

    Modeling PK/PD CAD/ CAM GWAS Neuroscience Genomics Proteomics Biomarker/ Image Analysis Sensor Data Import Creating Fake Charts, with Fake data
  27. When you’re not limited by fixed-size compute & data, what

    happens?
  28. #1: “Better” Science “Answer the question we want to ask”

    not constrained to what fits on local compute power all desired samples all desired queries easier collaboration
  29. #2 “Faster” Science Run our “better” science that would have

    taken months or years in hours or days
  30. A couple of use cases…

  31. Before: Trade-off compute time vs. accuracy Now: Better analysis, fewer

    false negatives Better results faster Initial Coarse Screen Higher Quality Analysis Best Quality Highest Quality Analysis Otherwise Unexplored Molecules Higher Quality Analysis Best Quality Scientific Process for Molecular Modeling
  32. Computational Chemistry Novartis  Need  Enable push-button Utility Supercomputing

    for molecular modeling  Solution  30,000 CPU run across US/EU Cloud (AWS)  10 years of compute in 8 hours for $10,000  Found 3 compounds now in the wetlab!
  33.  $$$/science  Application-aware data management  Data security Lessons

    learned
  34. Another big 10 pharma… Built a 10,600 server cluster ($44M)

    in 2 hours, running 40 years of compute in 11 hours for $4,372
  35. Big 10 Pharma created 10,600 instance cluster ($44M) in 2

    hours, running 40 years of compute in 11 hours for $4,372
  36.  Capacity is no longer an issue  Hardware =

    software  Testing (error handling, unit testing, etc.)  Cycle has spent >$1M dollars on AWS over 5yrs Lessons learned
  37. Gene Expression Analysis Morgridge Institute for Research  Need 

    Run a comparison of 78TB stem cell RNA samples to build a unique gene expression database  Make it easier to replicate disease in petri dishes w/induced stem cells  Solution  Enable massive RNAseq run using BowTie that was impossible before
  38. 1 Million compute hours 115 years of computing in 1

    week for $19,555
  39. What have we learned here?

  40. Servers are not house plants

  41. Servers are wheat!

  42. Protein Binding / GPU Large BioTech 128 GPU cluster 13

    GPU-Years of computing in 1.5 months for $150,000 vs. 5 months of CPU for $450,000 Local Data Corporate Firewall 3x the science, ¼ the cost Secure HPC Cluster 8 TB FS External Cloud 128 GPU cluster Scheduled Data Drug designer
  43. Genomic Analysis Research Lab Cloud HPC File System (100TB) (Track

    Directories) Internal compute Hi Seq instruments Blob data (S3) Cloud Filer Glacier (Archive) Auto-scaling external environment HPC Cluster LIMS Internal HPC Data Scheduling
  44. DataManager data aware scheduler for science and HPC

  45. So we can do reliable HPC and data movement in

    the cloud…
  46.  Hardware is HARD!  Great Software tools yield happiness

     IT solving scientific problems vs. low-level ops  Replicating to AWS Glacier offers DR options…
  47.  Take advantage of Cloud storage scale (S3)  Capacity

    isn’t an issue  Large public data sets + secure, massive compute, provide huge opportunities for new science
  48. Let us quickly recap

  49. This isn’t neuroscience…

  50. Servers are not house plants!

  51. Servers are wheat!

  52. At Cycle, we believe Scientists and Researchers are shackled by

    a lack of access to compute
  53. Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method on Utility HPC Yields “Better”, “Faster” Research for way less $$$
  54. None
  55. Oh, and one more thing…

  56. 2013 BigScience Challenge $10,000 of free AWS and CycleComputing powered

    services to any science benefitting humanity The 2012 winner was a 115yr genomic analysis Enter at: cyclecomputing.com/big-science-challenge/enter
  57. Thank You! Questions? blog.cyclecomputing.com www.cyclecomputing.com @cyclecomputing @jamesdotcuff