Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Right Place, Right Time, Right Science: Lessons Learned from Utility HPC/Big Data - James Cuff

Advancing Autism Discovery Workshop, April 22, 2013. James Cuff, CTO - Cycle Computing.

More Decks by National Database for Autism Research

Other Decks in Science

Transcript

  1. Right Place, Right Time,
    Right Science:
    Lessons Learned from
    Utility HPC / Big Data
    James Cuff, CTO
    @jamesdotcuff, @cyclecomputing

    View full-size slide

  2. For over sixteen years I have seen a
    steady evolution in research computing
    1996 @ Oxford
    1 cpu @ 200Mhz / 18GB
    2000 @ Sanger / EBI
    360 cpu @ 168GHz / 50TB
    2003-2006 @ Harvard / MIT
    200 cpu @ 400GHz / 250-600TB
    2012 @ Harvard
    >25,000 cpu @ 32THz / 10.0PB

    View full-size slide

  3. Massive increases in compute
    Massive increases in storage
    Massive increases in scale
    Massive increases in throughput
    Massive increases in performance
    Researchers need it all, and all need it now!
    Computational growth is not going away
    We need new systems, teams and methods to
    effectively support
    our scholarly and scientific research

    View full-size slide

  4. I don’t try and keep up with the
    big boys and girls…

    View full-size slide

  5. University of Manchester
    British Nuclear Fuels Limited
    Oxford University
    European Bioinformatics Institute
    Inpharmatica
    Wellcome Trust Sanger Institute
    Whitehead Genome Center
    Broad Institute of MIT and Harvard
    Harvard University
    Cycle Computing

    View full-size slide

  6. – 360 node DEC
    Alpha
    DS10L 1U
    – 9 racks
    – 100KW power
    – 1,440 cat5
    crimps…
    – 466MHz x
    360 CPU
    168GHz
    The Human Genome Project ca. 2000

    View full-size slide

  7. THIS is not HPC…

    View full-size slide

  8. Neither is THIS…

    View full-size slide

  9. Or THIS … !!

    View full-size slide

  10. THIS is HPC!

    View full-size slide

  11. At Cycle, we believe
    Scientists and Researchers
    are shackled by a
    lack of access to compute

    View full-size slide

  12. History teaches how to
    collaborate and remove
    shackles surrounding our
    science

    View full-size slide

  13. The
    60’s
    The
    70’s
    The
    80’s
    The
    90’s
    The
    00’s
    From centralized to decentralized, collaborative to
    independent and right back again!
    The
    10’s
    Mainframes VAX The PC Beowulf Clusters Central Clusters
    Centers provide
    access to compute
    The supercomputing
    famine, funding gap
    Individual
    computing
    Computing is too big to
    fit under desk, Linux explodes
    Clouds/VMware
    IaaS, SaaS, PaaS
    100% 60% 0% 40% ???%
    SHARING
    ~ 0Mbit ~ 1Mbit ~ 10Mbit ~ 1000 Mbit ~ 10,000 Mbit
    Bigger, better but further and further away from the
    scientist’s lab

    View full-size slide

  14. Ask a
    Question
    Hypothesize Predict
    Experiment /
    Test
    Analyze Final Results
    The Scientific Method
    “Test and Analyze”
    Require the most time,
    compute, data and effort

    View full-size slide

  15. Ask a
    Question
    Hypothesize Predict
    Experiment /
    Test
    Analyze Final Results
    The Scientific Method
    Any improvements to this
    cycle yield multiplicative
    benefits

    View full-size slide

  16. If we democratize access to
    high performance compute,
    we WILL accelerate science

    View full-size slide

  17. Not neuroscience…

    View full-size slide

  18. Or a good excuse for MRI data…

    View full-size slide

  19. We make software tools to easily orchestrate complex
    workloads and data access across Utility HPC
    NIMBUS
    Discovery
    12 years of
    compute in
    3 hours
    $20M of
    infrastructure
    for < $3,000
    Big 10 Pharma
    Built 10,600
    server cluster
    ($44M) in 2
    hours, 40 years
    of compute
    in 11 hours for
    $4,372
    Genomics
    Research
    Institute:
    1 million hours or
    115 years of
    compute in
    1 week for
    $19,555

    View full-size slide

  20. Utility HPC in the News
    WSJ, NYTimes, Wired, Bio-IT World BusinessWeek

    View full-size slide

  21. We solve this challenge across many industries
    SLED/PS Insurance
    Financial
    Services
    Life Sciences
    Manufacturing
    & Electronics
    Energy,
    Media &
    Other

    View full-size slide

  22. Too small when you need it most,
    Too large every other time…
    Before,
    Local Cluster:

    View full-size slide

  23. – 360 node DEC
    Alpha
    DS10L 1U
    – 9 racks
    – 100KW power
    – 1,440 cat5
    crimps…
    – 466MHz x
    360 CPU
    168GHz
    Remember this from earlier?

    View full-size slide

  24. The world is a very different place now…

    View full-size slide

  25. Life Science Activities:
    Compute vs. Data
    Compute
    Data/Bandwidth
    NGS
    Molecular
    Modeling
    PK/PD CAD/
    CAM
    GWAS
    Neuroscience
    Genomics
    Proteomics
    Biomarker/
    Image Analysis
    Sensor Data
    Import
    Creating Fake
    Charts, with
    Fake data

    View full-size slide

  26. When you’re not limited by
    fixed-size compute & data,
    what happens?

    View full-size slide

  27. #1: “Better” Science
    “Answer the question we want to ask”
    not constrained to what fits on local
    compute power
    all desired samples
    all desired queries
    easier collaboration

    View full-size slide

  28. #2 “Faster” Science
    Run our “better” science
    that would have taken
    months or years
    in hours or days

    View full-size slide

  29. A couple of use cases…

    View full-size slide

  30. Before:
    Trade-off compute time vs.
    accuracy
    Now:
    Better analysis, fewer false
    negatives Better results faster
    Initial
    Coarse
    Screen
    Higher
    Quality
    Analysis
    Best
    Quality
    Highest
    Quality Analysis
    Otherwise
    Unexplored
    Molecules
    Higher
    Quality
    Analysis
    Best
    Quality
    Scientific Process for Molecular Modeling

    View full-size slide

  31. Computational Chemistry
    Novartis
     Need
     Enable push-button Utility Supercomputing for
    molecular modeling
     Solution
     30,000 CPU run across US/EU Cloud (AWS)
     10 years of compute in 8 hours for $10,000
     Found 3 compounds now in the wetlab!

    View full-size slide

  32.  $$$/science
     Application-aware data management
     Data security
    Lessons learned

    View full-size slide

  33. Another big 10 pharma…
    Built a 10,600 server cluster
    ($44M) in 2 hours, running
    40 years of compute
    in 11 hours for $4,372

    View full-size slide

  34. Big 10 Pharma
    created 10,600
    instance cluster
    ($44M) in 2 hours,
    running
    40 years of compute
    in 11 hours for
    $4,372

    View full-size slide

  35.  Capacity is no longer an issue
     Hardware = software
     Testing (error handling, unit testing, etc.)
     Cycle has spent >$1M dollars on AWS over 5yrs
    Lessons learned

    View full-size slide

  36. Gene Expression Analysis
    Morgridge Institute for Research
     Need
     Run a comparison of 78TB stem cell RNA samples to
    build a unique gene expression database
     Make it easier to replicate disease in petri dishes
    w/induced stem cells
     Solution
     Enable massive RNAseq run using BowTie that was
    impossible before

    View full-size slide

  37. 1 Million compute hours
    115 years of computing in
    1 week for $19,555

    View full-size slide

  38. What have we learned
    here?

    View full-size slide

  39. Servers are not
    house plants

    View full-size slide

  40. Servers are wheat!

    View full-size slide

  41. Protein Binding / GPU
    Large BioTech
    128 GPU cluster 13 GPU-Years of computing in 1.5 months
    for $150,000 vs. 5 months of CPU for $450,000
    Local
    Data
    Corporate
    Firewall
    3x the science,
    ¼ the cost
    Secure
    HPC
    Cluster
    8 TB FS
    External Cloud
    128 GPU cluster
    Scheduled
    Data
    Drug designer

    View full-size slide

  42. Genomic Analysis
    Research Lab
    Cloud HPC
    File System (100TB)
    (Track Directories)
    Internal compute
    Hi Seq instruments
    Blob data
    (S3)
    Cloud Filer
    Glacier
    (Archive)
    Auto-scaling
    external
    environment
    HPC
    Cluster
    LIMS
    Internal HPC
    Data
    Scheduling

    View full-size slide

  43. DataManager
    data aware
    scheduler for
    science and HPC

    View full-size slide

  44. So we can do reliable HPC and data
    movement in the cloud…

    View full-size slide

  45.  Hardware is HARD!
     Great Software tools yield happiness
     IT solving scientific problems vs. low-level ops
     Replicating to AWS Glacier offers DR options…

    View full-size slide

  46.  Take advantage of Cloud storage scale (S3)
     Capacity isn’t an issue
     Large public data sets + secure, massive
    compute, provide huge opportunities for new
    science

    View full-size slide

  47. Let us quickly recap

    View full-size slide

  48. This isn’t neuroscience…

    View full-size slide

  49. Servers are not
    house plants!

    View full-size slide

  50. Servers are wheat!

    View full-size slide

  51. At Cycle, we believe
    Scientists and Researchers
    are shackled by a
    lack of access to compute

    View full-size slide

  52. Ask a
    Question
    Hypothesize Predict
    Experiment /
    Test
    Analyze Final Results
    The Scientific Method on Utility HPC
    Yields “Better”, “Faster”
    Research for way less $$$

    View full-size slide

  53. Oh, and one more thing…

    View full-size slide

  54. 2013 BigScience Challenge
    $10,000 of free AWS and CycleComputing
    powered services
    to any science benefitting humanity
    The 2012 winner was a 115yr genomic analysis
    Enter at:
    cyclecomputing.com/big-science-challenge/enter

    View full-size slide

  55. Thank You!
    Questions?
    blog.cyclecomputing.com
    www.cyclecomputing.com
    @cyclecomputing
    @jamesdotcuff

    View full-size slide