Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Enabling Research Matt Wood D A T A I N T E N S I V E & H I G H P E R F O R M A N C E C O M P U T I N G Cloud in the
Slide 2
Slide 2 text
Hello.
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Thank you
Slide 6
Slide 6 text
Infrastructure building blocks
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Consumer business Seller business
Slide 9
Slide 9 text
Decades of experience Operations, management and scale
Slide 10
Slide 10 text
Programmatic access
Slide 11
Slide 11 text
Unexpected innovation
Slide 12
Slide 12 text
Blinding flash of the obvious
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
6 years young Amazon S3 launched on March 14th, 2006
Slide 15
Slide 15 text
0 250.000 500.000 750.000 1000.000 Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q1 2012 Billions of objects Objects in S3 906B 600k+ peak transactions per second
Slide 16
Slide 16 text
99.999999999% durability
Slide 17
Slide 17 text
Life sciences
Slide 18
Slide 18 text
A T C G
Slide 19
Slide 19 text
Storage Compute Databases Tools
Slide 20
Slide 20 text
Collection Computation Collaboration
Slide 21
Slide 21 text
Collection Computation Collaboration
Slide 22
Slide 22 text
Collection Computation Collaboration
Slide 23
Slide 23 text
Collection Computation Collaboration
Slide 24
Slide 24 text
Availability
Slide 25
Slide 25 text
Availability Programmable On-demand
Slide 26
Slide 26 text
Flexibility
Slide 27
Slide 27 text
Elasticity
Slide 28
Slide 28 text
Collection Computation Collaboration
Slide 29
Slide 29 text
Collection Computation Collaboration
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
Data stays local
Slide 32
Slide 32 text
Availability zones Design for durability
Slide 33
Slide 33 text
Shared responsibility
Slide 34
Slide 34 text
Data movement
Slide 35
Slide 35 text
Data movement Upload with large object support
Slide 36
Slide 36 text
Data movement Upload with large object support Multi-part, parallel uploads
Slide 37
Slide 37 text
Data movement Upload with large object support Multi-part, parallel uploads Physical media
Slide 38
Slide 38 text
Data movement Upload with large object support Multi-part, parallel uploads Physical media Private network connection
Slide 39
Slide 39 text
AWS Direct Connect
Slide 40
Slide 40 text
Direct connection to AWS regions
Slide 41
Slide 41 text
Consistent network performance
Slide 42
Slide 42 text
Private connectivity
Slide 43
Slide 43 text
Elastic 1Gbps and 10 Gbps
Slide 44
Slide 44 text
Reduced bandwidth costs ISP and lower Direct Connect pricing
Slide 45
Slide 45 text
Globus Online 3.8 PB moved (as of this morning!)
Slide 46
Slide 46 text
Aspera
Slide 47
Slide 47 text
Public Datasets
Slide 48
Slide 48 text
1000 Genomes Project aws.amazon.com/1000genomes
Slide 49
Slide 49 text
Collection Computation Collaboration
Slide 50
Slide 50 text
Scale
Slide 51
Slide 51 text
Scale How much can I get? What size will get me time most quickly?
Slide 52
Slide 52 text
Value How much do I need? What value does it have for me?
Slide 53
Slide 53 text
Economies of scale
Slide 54
Slide 54 text
19 price drops Committed to passing savings to customers
Slide 55
Slide 55 text
Utilisation
Slide 56
Slide 56 text
Achieving economies of scale 100% Time
Slide 57
Slide 57 text
Achieving economies of scale 100% Reserved capacity
Slide 58
Slide 58 text
Achieving economies of scale 100% Reserved capacity On-demand
Slide 59
Slide 59 text
Achieving economies of scale 100% Reserved capacity On-demand
Slide 60
Slide 60 text
Spot market Choose your own price for compute
Slide 61
Slide 61 text
Scale out
Slide 62
Slide 62 text
30k cores On the spot market. $1279 per hour.
Slide 63
Slide 63 text
50k cores Schrodinger and Cycle Computing
Slide 64
Slide 64 text
51,132 cores Schrodinger and Cycle Computing 6742 instances. $4828 per hour
Slide 65
Slide 65 text
Elastic MapReduce Myrna. Crossbow.
Slide 66
Slide 66 text
Scale up
Slide 67
Slide 67 text
CC2 Tightly coupled workflows
Slide 68
Slide 68 text
240 TFLOPS 42nd fastest supercomputer
Slide 69
Slide 69 text
Scale cores
Slide 70
Slide 70 text
GPU on demand AMBER
Slide 71
Slide 71 text
Scale?
Slide 72
Slide 72 text
Getting stuff done
Slide 73
Slide 73 text
StarCluster
Slide 74
Slide 74 text
Cloud BioLinux Ready to roll with 1000 Genomes data
Slide 75
Slide 75 text
Collection Computation Collaboration
Slide 76
Slide 76 text
Galaxy
Slide 77
Slide 77 text
No content
Slide 78
Slide 78 text
synapse.sagebase.org Collaboration platform for clinical genomic datasets
Slide 79
Slide 79 text
No content
Slide 80
Slide 80 text
AWS for Education aws.amazon.com/education
Slide 81
Slide 81 text
Storage Compute Databases Tools
Slide 82
Slide 82 text
Materials Methods Hypotheses Results
Slide 83
Slide 83 text
Data Code Pipeline Infrastructure
Slide 84
Slide 84 text
Data Code Pipeline Infrastructure
Slide 85
Slide 85 text
Fully defined Data sources. Infrastructure stack. Metadata.
Slide 86
Slide 86 text
Data Code Pipeline Infrastructure Results
Slide 87
Slide 87 text
Data Code Pipeline Infrastructure Results Data Code Pipeline Infrastructure Results
Slide 88
Slide 88 text
Data Code Pipeline Infrastructure Results Data Code Pipeline Infrastructure New results
Slide 89
Slide 89 text
Data Code Pipeline Infrastructure Results v2 Data Code Pipeline Infrastructure New results
Slide 90
Slide 90 text
Reproduce. Remix. Reuse.
Slide 91
Slide 91 text
Enabled by programmable infrastructure
Slide 92
Slide 92 text
Enabling science
Slide 93
Slide 93 text
aws.amazon.com /genomics
Slide 94
Slide 94 text
Airbnb, CapitalIQ, Marketshare, Bioproximity, Schrodinger and MIT http://aws.amazon.com/big-data-and-hpc-event/boston/
Slide 95
Slide 95 text
Thank you!
Slide 96
Slide 96 text
Q & A
[email protected]
Slide 97
Slide 97 text
Introducing the panel...
Slide 98
Slide 98 text
Angel Pizarro - U. Pennsylvania Anushka Brownley - Complete Genomics Stephen Litster - Novartis