Slide 1

Slide 1 text

Enabling Research Matt Wood D A T A I N T E N S I V E & H I G H P E R F O R M A N C E C O M P U T I N G Cloud in the

Slide 2

Slide 2 text

Hello.

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Thank you

Slide 6

Slide 6 text

Infrastructure building blocks

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Consumer business Seller business

Slide 9

Slide 9 text

Decades of experience Operations, management and scale

Slide 10

Slide 10 text

Programmatic access

Slide 11

Slide 11 text

Unexpected innovation

Slide 12

Slide 12 text

Blinding flash of the obvious

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

6 years young Amazon S3 launched on March 14th, 2006

Slide 15

Slide 15 text

0 250.000 500.000 750.000 1000.000 Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q1 2012 Billions of objects Objects in S3 906B 600k+ peak transactions per second

Slide 16

Slide 16 text

99.999999999% durability

Slide 17

Slide 17 text

Life sciences

Slide 18

Slide 18 text

A T C G

Slide 19

Slide 19 text

Storage Compute Databases Tools

Slide 20

Slide 20 text

Collection Computation Collaboration

Slide 21

Slide 21 text

Collection Computation Collaboration

Slide 22

Slide 22 text

Collection Computation Collaboration

Slide 23

Slide 23 text

Collection Computation Collaboration

Slide 24

Slide 24 text

Availability

Slide 25

Slide 25 text

Availability Programmable On-demand

Slide 26

Slide 26 text

Flexibility

Slide 27

Slide 27 text

Elasticity

Slide 28

Slide 28 text

Collection Computation Collaboration

Slide 29

Slide 29 text

Collection Computation Collaboration

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Data stays local

Slide 32

Slide 32 text

Availability zones Design for durability

Slide 33

Slide 33 text

Shared responsibility

Slide 34

Slide 34 text

Data movement

Slide 35

Slide 35 text

Data movement Upload with large object support

Slide 36

Slide 36 text

Data movement Upload with large object support Multi-part, parallel uploads

Slide 37

Slide 37 text

Data movement Upload with large object support Multi-part, parallel uploads Physical media

Slide 38

Slide 38 text

Data movement Upload with large object support Multi-part, parallel uploads Physical media Private network connection

Slide 39

Slide 39 text

AWS Direct Connect

Slide 40

Slide 40 text

Direct connection to AWS regions

Slide 41

Slide 41 text

Consistent network performance

Slide 42

Slide 42 text

Private connectivity

Slide 43

Slide 43 text

Elastic 1Gbps and 10 Gbps

Slide 44

Slide 44 text

Reduced bandwidth costs ISP and lower Direct Connect pricing

Slide 45

Slide 45 text

Globus Online 3.8 PB moved (as of this morning!)

Slide 46

Slide 46 text

Aspera

Slide 47

Slide 47 text

Public Datasets

Slide 48

Slide 48 text

1000 Genomes Project aws.amazon.com/1000genomes

Slide 49

Slide 49 text

Collection Computation Collaboration

Slide 50

Slide 50 text

Scale

Slide 51

Slide 51 text

Scale How much can I get? What size will get me time most quickly?

Slide 52

Slide 52 text

Value How much do I need? What value does it have for me?

Slide 53

Slide 53 text

Economies of scale

Slide 54

Slide 54 text

19 price drops Committed to passing savings to customers

Slide 55

Slide 55 text

Utilisation

Slide 56

Slide 56 text

Achieving economies of scale 100% Time

Slide 57

Slide 57 text

Achieving economies of scale 100% Reserved capacity

Slide 58

Slide 58 text

Achieving economies of scale 100% Reserved capacity On-demand

Slide 59

Slide 59 text

Achieving economies of scale 100% Reserved capacity On-demand

Slide 60

Slide 60 text

Spot market Choose your own price for compute

Slide 61

Slide 61 text

Scale out

Slide 62

Slide 62 text

30k cores On the spot market. $1279 per hour.

Slide 63

Slide 63 text

50k cores Schrodinger and Cycle Computing

Slide 64

Slide 64 text

51,132 cores Schrodinger and Cycle Computing 6742 instances. $4828 per hour

Slide 65

Slide 65 text

Elastic MapReduce Myrna. Crossbow.

Slide 66

Slide 66 text

Scale up

Slide 67

Slide 67 text

CC2 Tightly coupled workflows

Slide 68

Slide 68 text

240 TFLOPS 42nd fastest supercomputer

Slide 69

Slide 69 text

Scale cores

Slide 70

Slide 70 text

GPU on demand AMBER

Slide 71

Slide 71 text

Scale?

Slide 72

Slide 72 text

Getting stuff done

Slide 73

Slide 73 text

StarCluster

Slide 74

Slide 74 text

Cloud BioLinux Ready to roll with 1000 Genomes data

Slide 75

Slide 75 text

Collection Computation Collaboration

Slide 76

Slide 76 text

Galaxy

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

synapse.sagebase.org Collaboration platform for clinical genomic datasets

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

AWS for Education aws.amazon.com/education

Slide 81

Slide 81 text

Storage Compute Databases Tools

Slide 82

Slide 82 text

Materials Methods Hypotheses Results

Slide 83

Slide 83 text

Data Code Pipeline Infrastructure

Slide 84

Slide 84 text

Data Code Pipeline Infrastructure

Slide 85

Slide 85 text

Fully defined Data sources. Infrastructure stack. Metadata.

Slide 86

Slide 86 text

Data Code Pipeline Infrastructure Results

Slide 87

Slide 87 text

Data Code Pipeline Infrastructure Results Data Code Pipeline Infrastructure Results

Slide 88

Slide 88 text

Data Code Pipeline Infrastructure Results Data Code Pipeline Infrastructure New results

Slide 89

Slide 89 text

Data Code Pipeline Infrastructure Results v2 Data Code Pipeline Infrastructure New results

Slide 90

Slide 90 text

Reproduce. Remix. Reuse.

Slide 91

Slide 91 text

Enabled by programmable infrastructure

Slide 92

Slide 92 text

Enabling science

Slide 93

Slide 93 text

aws.amazon.com /genomics

Slide 94

Slide 94 text

Airbnb, CapitalIQ, Marketshare, Bioproximity, Schrodinger and MIT http://aws.amazon.com/big-data-and-hpc-event/boston/

Slide 95

Slide 95 text

Thank you!

Slide 96

Slide 96 text

Slide 97

Slide 97 text

Introducing the panel...

Slide 98

Slide 98 text

Angel Pizarro - U. Pennsylvania Anushka Brownley - Complete Genomics Stephen Litster - Novartis