Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enabling Research in the Cloud

Matt Wood
April 28, 2012

Enabling Research in the Cloud

From a talk at Bio-IT World Expo 2012, these slides introduce how the cloud is enabling modern research, from data collection, computation and collaboration, to scientific reproducibility and reuse.

Matt Wood

April 28, 2012
Tweet

More Decks by Matt Wood

Other Decks in Science

Transcript

  1. Enabling
    Research
    Matt Wood
    D A T A I N T E N S I V E & H I G H P E R F O R M A N C E C O M P U T I N G
    Cloud
    in the

    View Slide

  2. Hello.

    View Slide

  3. View Slide

  4. View Slide

  5. Thank you

    View Slide

  6. Infrastructure
    building blocks

    View Slide

  7. View Slide

  8. Consumer
    business
    Seller
    business

    View Slide

  9. Decades of experience
    Operations, management and scale

    View Slide

  10. Programmatic access

    View Slide

  11. Unexpected innovation

    View Slide

  12. Blinding flash of the
    obvious

    View Slide

  13. View Slide

  14. 6 years young
    Amazon S3 launched on March 14th, 2006

    View Slide

  15. 0
    250.000
    500.000
    750.000
    1000.000
    Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q1 2012
    Billions of objects
    Objects in S3
    906B
    600k+ peak transactions per second

    View Slide

  16. 99.999999999%
    durability

    View Slide

  17. Life sciences

    View Slide

  18. A T C G

    View Slide

  19. Storage Compute Databases Tools

    View Slide

  20. Collection Computation Collaboration

    View Slide

  21. Collection Computation Collaboration

    View Slide

  22. Collection Computation Collaboration

    View Slide

  23. Collection Computation Collaboration

    View Slide

  24. Availability

    View Slide

  25. Availability
    Programmable
    On-demand

    View Slide

  26. Flexibility

    View Slide

  27. Elasticity

    View Slide

  28. Collection Computation Collaboration

    View Slide

  29. Collection Computation Collaboration

    View Slide

  30. View Slide

  31. Data stays local

    View Slide

  32. Availability zones
    Design for durability

    View Slide

  33. Shared responsibility

    View Slide

  34. Data movement

    View Slide

  35. Data movement
    Upload with large
    object support

    View Slide

  36. Data movement
    Upload with large
    object support
    Multi-part,
    parallel uploads

    View Slide

  37. Data movement
    Upload with large
    object support
    Multi-part,
    parallel uploads
    Physical media

    View Slide

  38. Data movement
    Upload with large
    object support
    Multi-part,
    parallel uploads
    Physical media
    Private network
    connection

    View Slide

  39. AWS Direct Connect

    View Slide

  40. Direct connection
    to AWS regions

    View Slide

  41. Consistent network
    performance

    View Slide

  42. Private connectivity

    View Slide

  43. Elastic
    1Gbps and 10 Gbps

    View Slide

  44. Reduced bandwidth
    costs
    ISP and lower Direct Connect pricing

    View Slide

  45. Globus Online
    3.8 PB moved (as of this morning!)

    View Slide

  46. Aspera

    View Slide

  47. Public Datasets

    View Slide

  48. 1000 Genomes
    Project
    aws.amazon.com/1000genomes

    View Slide

  49. Collection Computation Collaboration

    View Slide

  50. Scale

    View Slide

  51. Scale
    How much can I get?
    What size will get me time most quickly?

    View Slide

  52. Value
    How much do I need?
    What value does it have for me?

    View Slide

  53. Economies of scale

    View Slide

  54. 19 price drops
    Committed to passing savings to customers

    View Slide

  55. Utilisation

    View Slide

  56. Achieving economies of scale
    100%
    Time

    View Slide

  57. Achieving economies of scale
    100%
    Reserved capacity

    View Slide

  58. Achieving economies of scale
    100%
    Reserved capacity
    On-demand

    View Slide

  59. Achieving economies of scale
    100%
    Reserved capacity
    On-demand

    View Slide

  60. Spot market
    Choose your own price for compute

    View Slide

  61. Scale out

    View Slide

  62. 30k cores
    On the spot market. $1279 per hour.

    View Slide

  63. 50k cores
    Schrodinger and Cycle Computing

    View Slide

  64. 51,132 cores
    Schrodinger and Cycle Computing
    6742 instances. $4828 per hour

    View Slide

  65. Elastic MapReduce
    Myrna. Crossbow.

    View Slide

  66. Scale up

    View Slide

  67. CC2
    Tightly coupled workflows

    View Slide

  68. 240 TFLOPS
    42nd fastest supercomputer

    View Slide

  69. Scale cores

    View Slide

  70. GPU on demand
    AMBER

    View Slide

  71. Scale?

    View Slide

  72. Getting stuff done

    View Slide

  73. StarCluster

    View Slide

  74. Cloud BioLinux
    Ready to roll with 1000 Genomes data

    View Slide

  75. Collection Computation Collaboration

    View Slide

  76. Galaxy

    View Slide

  77. View Slide

  78. synapse.sagebase.org
    Collaboration platform for clinical genomic datasets

    View Slide

  79. View Slide

  80. AWS for Education
    aws.amazon.com/education

    View Slide

  81. Storage Compute Databases Tools

    View Slide

  82. Materials Methods Hypotheses Results

    View Slide

  83. Data Code Pipeline Infrastructure

    View Slide

  84. Data Code Pipeline Infrastructure

    View Slide

  85. Fully defined
    Data sources. Infrastructure stack. Metadata.

    View Slide

  86. Data Code Pipeline Infrastructure
    Results

    View Slide

  87. Data Code Pipeline Infrastructure
    Results
    Data Code Pipeline Infrastructure
    Results

    View Slide

  88. Data Code Pipeline Infrastructure
    Results
    Data Code Pipeline Infrastructure
    New results

    View Slide

  89. Data Code Pipeline Infrastructure
    Results
    v2
    Data Code Pipeline Infrastructure
    New results

    View Slide

  90. Reproduce.
    Remix. Reuse.

    View Slide

  91. Enabled by
    programmable
    infrastructure

    View Slide

  92. Enabling science

    View Slide

  93. aws.amazon.com
    /genomics

    View Slide

  94. Airbnb, CapitalIQ, Marketshare,
    Bioproximity, Schrodinger and MIT
    http://aws.amazon.com/big-data-and-hpc-event/boston/

    View Slide

  95. Thank you!

    View Slide

  96. View Slide

  97. Introducing the panel...

    View Slide

  98. Angel Pizarro - U. Pennsylvania
    Anushka Brownley - Complete Genomics
    Stephen Litster - Novartis

    View Slide