Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Advancing Autism Discovery Workshop, April 22, 2013. James Cuff, CTO - Cycle Computing.

Transcript

  1. 1.

    Right Place, Right Time, Right Science: Lessons Learned from Utility

    HPC / Big Data James Cuff, CTO @jamesdotcuff, @cyclecomputing
  2. 2.

    For over sixteen years I have seen a steady evolution

    in research computing 1996 @ Oxford 1 cpu @ 200Mhz / 18GB 2000 @ Sanger / EBI 360 cpu @ 168GHz / 50TB 2003-2006 @ Harvard / MIT 200 cpu @ 400GHz / 250-600TB 2012 @ Harvard >25,000 cpu @ 32THz / 10.0PB
  3. 3.

    Massive increases in compute Massive increases in storage Massive increases

    in scale Massive increases in throughput Massive increases in performance Researchers need it all, and all need it now! Computational growth is not going away We need new systems, teams and methods to effectively support our scholarly and scientific research
  4. 5.

    University of Manchester British Nuclear Fuels Limited Oxford University European

    Bioinformatics Institute Inpharmatica Wellcome Trust Sanger Institute Whitehead Genome Center Broad Institute of MIT and Harvard Harvard University Cycle Computing
  5. 6.

    – 360 node DEC Alpha DS10L 1U – 9 racks

    – 100KW power – 1,440 cat5 crimps… – 466MHz x 360 CPU 168GHz The Human Genome Project ca. 2000
  6. 11.
  7. 14.

    The 60’s The 70’s The 80’s The 90’s The 00’s

    From centralized to decentralized, collaborative to independent and right back again! The 10’s Mainframes VAX The PC Beowulf Clusters Central Clusters Centers provide access to compute The supercomputing famine, funding gap Individual computing Computing is too big to fit under desk, Linux explodes Clouds/VMware IaaS, SaaS, PaaS 100% 60% 0% 40% ???% SHARING ~ 0Mbit ~ 1Mbit ~ 10Mbit ~ 1000 Mbit ~ 10,000 Mbit Bigger, better but further and further away from the scientist’s lab
  8. 15.

    Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method “Test and Analyze” Require the most time, compute, data and effort
  9. 16.

    Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method Any improvements to this cycle yield multiplicative benefits
  10. 20.

    We make software tools to easily orchestrate complex workloads and

    data access across Utility HPC NIMBUS Discovery 12 years of compute in 3 hours $20M of infrastructure for < $3,000 Big 10 Pharma Built 10,600 server cluster ($44M) in 2 hours, 40 years of compute in 11 hours for $4,372 Genomics Research Institute: 1 million hours or 115 years of compute in 1 week for $19,555
  11. 22.

    We solve this challenge across many industries SLED/PS Insurance Financial

    Services Life Sciences Manufacturing & Electronics Energy, Media & Other
  12. 23.

    Too small when you need it most, Too large every

    other time… Before, Local Cluster:
  13. 24.

    – 360 node DEC Alpha DS10L 1U – 9 racks

    – 100KW power – 1,440 cat5 crimps… – 466MHz x 360 CPU 168GHz Remember this from earlier?
  14. 26.

    Life Science Activities: Compute vs. Data Compute Data/Bandwidth NGS Molecular

    Modeling PK/PD CAD/ CAM GWAS Neuroscience Genomics Proteomics Biomarker/ Image Analysis Sensor Data Import Creating Fake Charts, with Fake data
  15. 28.

    #1: “Better” Science “Answer the question we want to ask”

    not constrained to what fits on local compute power all desired samples all desired queries easier collaboration
  16. 31.

    Before: Trade-off compute time vs. accuracy Now: Better analysis, fewer

    false negatives Better results faster Initial Coarse Screen Higher Quality Analysis Best Quality Highest Quality Analysis Otherwise Unexplored Molecules Higher Quality Analysis Best Quality Scientific Process for Molecular Modeling
  17. 32.

    Computational Chemistry Novartis  Need  Enable push-button Utility Supercomputing

    for molecular modeling  Solution  30,000 CPU run across US/EU Cloud (AWS)  10 years of compute in 8 hours for $10,000  Found 3 compounds now in the wetlab!
  18. 34.

    Another big 10 pharma… Built a 10,600 server cluster ($44M)

    in 2 hours, running 40 years of compute in 11 hours for $4,372
  19. 35.

    Big 10 Pharma created 10,600 instance cluster ($44M) in 2

    hours, running 40 years of compute in 11 hours for $4,372
  20. 36.

     Capacity is no longer an issue  Hardware =

    software  Testing (error handling, unit testing, etc.)  Cycle has spent >$1M dollars on AWS over 5yrs Lessons learned
  21. 37.

    Gene Expression Analysis Morgridge Institute for Research  Need 

    Run a comparison of 78TB stem cell RNA samples to build a unique gene expression database  Make it easier to replicate disease in petri dishes w/induced stem cells  Solution  Enable massive RNAseq run using BowTie that was impossible before
  22. 42.

    Protein Binding / GPU Large BioTech 128 GPU cluster 13

    GPU-Years of computing in 1.5 months for $150,000 vs. 5 months of CPU for $450,000 Local Data Corporate Firewall 3x the science, ¼ the cost Secure HPC Cluster 8 TB FS External Cloud 128 GPU cluster Scheduled Data Drug designer
  23. 43.

    Genomic Analysis Research Lab Cloud HPC File System (100TB) (Track

    Directories) Internal compute Hi Seq instruments Blob data (S3) Cloud Filer Glacier (Archive) Auto-scaling external environment HPC Cluster LIMS Internal HPC Data Scheduling
  24. 46.

     Hardware is HARD!  Great Software tools yield happiness

     IT solving scientific problems vs. low-level ops  Replicating to AWS Glacier offers DR options…
  25. 47.

     Take advantage of Cloud storage scale (S3)  Capacity

    isn’t an issue  Large public data sets + secure, massive compute, provide huge opportunities for new science
  26. 53.

    Ask a Question Hypothesize Predict Experiment / Test Analyze Final

    Results The Scientific Method on Utility HPC Yields “Better”, “Faster” Research for way less $$$
  27. 54.
  28. 56.

    2013 BigScience Challenge $10,000 of free AWS and CycleComputing powered

    services to any science benefitting humanity The 2012 winner was a 115yr genomic analysis Enter at: cyclecomputing.com/big-science-challenge/enter