From a talk at Bio-IT World Expo 2012, these slides introduce how the cloud is enabling modern research, from data collection, computation and collaboration, to scientific reproducibility and reuse.
EnablingResearchMatt WoodD A T A I N T E N S I V E & H I G H P E R F O R M A N C E C O M P U T I N GCloudin the
View Slide
Hello.
Thank you
Infrastructurebuilding blocks
ConsumerbusinessSellerbusiness
Decades of experienceOperations, management and scale
Programmatic access
Unexpected innovation
Blinding flash of theobvious
6 years youngAmazon S3 launched on March 14th, 2006
0250.000500.000750.0001000.000Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q1 2012Billions of objectsObjects in S3906B600k+ peak transactions per second
99.999999999%durability
Life sciences
A T C G
Storage Compute Databases Tools
Collection Computation Collaboration
Availability
AvailabilityProgrammableOn-demand
Flexibility
Elasticity
Data stays local
Availability zonesDesign for durability
Shared responsibility
Data movement
Data movementUpload with largeobject support
Data movementUpload with largeobject supportMulti-part,parallel uploads
Data movementUpload with largeobject supportMulti-part,parallel uploadsPhysical media
Data movementUpload with largeobject supportMulti-part,parallel uploadsPhysical mediaPrivate networkconnection
AWS Direct Connect
Direct connectionto AWS regions
Consistent networkperformance
Private connectivity
Elastic1Gbps and 10 Gbps
Reduced bandwidthcostsISP and lower Direct Connect pricing
Globus Online3.8 PB moved (as of this morning!)
Aspera
Public Datasets
1000 GenomesProjectaws.amazon.com/1000genomes
Scale
ScaleHow much can I get?What size will get me time most quickly?
ValueHow much do I need?What value does it have for me?
Economies of scale
19 price dropsCommitted to passing savings to customers
Utilisation
Achieving economies of scale100%Time
Achieving economies of scale100%Reserved capacity
Achieving economies of scale100%Reserved capacityOn-demand
Spot marketChoose your own price for compute
Scale out
30k coresOn the spot market. $1279 per hour.
50k coresSchrodinger and Cycle Computing
51,132 coresSchrodinger and Cycle Computing6742 instances. $4828 per hour
Elastic MapReduceMyrna. Crossbow.
Scale up
CC2Tightly coupled workflows
240 TFLOPS42nd fastest supercomputer
Scale cores
GPU on demandAMBER
Scale?
Getting stuff done
StarCluster
Cloud BioLinuxReady to roll with 1000 Genomes data
Galaxy
synapse.sagebase.orgCollaboration platform for clinical genomic datasets
AWS for Educationaws.amazon.com/education
Materials Methods Hypotheses Results
Data Code Pipeline Infrastructure
Fully definedData sources. Infrastructure stack. Metadata.
Data Code Pipeline InfrastructureResults
Data Code Pipeline InfrastructureResultsData Code Pipeline InfrastructureResults
Data Code Pipeline InfrastructureResultsData Code Pipeline InfrastructureNew results
Data Code Pipeline InfrastructureResultsv2Data Code Pipeline InfrastructureNew results
Reproduce.Remix. Reuse.
Enabled byprogrammableinfrastructure
Enabling science
aws.amazon.com/genomics
Airbnb, CapitalIQ, Marketshare,Bioproximity, Schrodinger and MIThttp://aws.amazon.com/big-data-and-hpc-event/boston/
Thank you!
Q & A[email protected]
Introducing the panel...
Angel Pizarro - U. PennsylvaniaAnushka Brownley - Complete GenomicsStephen Litster - Novartis