Slide 1

Slide 1 text

Deepak Singh Principal Product Manager, Amazon EC2 [email protected] @mndoci Science without constraints

Slide 2

Slide 2 text

Good morning!

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Bioinformatics

Slide 8

Slide 8 text

collection

Slide 9

Slide 9 text

curation

Slide 10

Slide 10 text

analysis

Slide 11

Slide 11 text

So what?

Slide 12

Slide 12 text

Image: Yael Fitzpatrick (AAAS)

Slide 13

Slide 13 text

lots of data

Slide 14

Slide 14 text

lots of people

Slide 15

Slide 15 text

lots of places

Slide 16

Slide 16 text

constant change

Slide 17

Slide 17 text

How can we be more effective with data?

Slide 18

Slide 18 text

Versioning

Slide 19

Slide 19 text

Provenance

Slide 20

Slide 20 text

Filter

Slide 21

Slide 21 text

Aggregate Image: Chris Heiler

Slide 22

Slide 22 text

Image: Bethan Extend

Slide 23

Slide 23 text

Image: Sebastian Anthony Interfaces

Slide 24

Slide 24 text

Share

Slide 25

Slide 25 text

image: Leo Reynolds Communicate

Slide 26

Slide 26 text

Hard Problem

Slide 27

Slide 27 text

Hard Problem Really ^

Slide 28

Slide 28 text

How do we scale?

Slide 29

Slide 29 text

How can we be more effective with data?

Slide 30

Slide 30 text

Wrong question

Slide 31

Slide 31 text

How can we be more effective with our people?

Slide 32

Slide 32 text

Image: Pieter Musterd Remove constraints

Slide 33

Slide 33 text

Move data to the users?

Slide 34

Slide 34 text

Move data to the users? X

Slide 35

Slide 35 text

Move tools to the data

Slide 36

Slide 36 text

Give tools access to data

Slide 37

Slide 37 text

Give data access to tools

Slide 38

Slide 38 text

APIs

Slide 39

Slide 39 text

Small things, loosely coupled

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Amazon EC2

Slide 42

Slide 42 text

ec2-run-instances

Slide 43

Slide 43 text

starcluster start

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

Programmable

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Elastic

Slide 49

Slide 49 text

Go from one instance...

Slide 50

Slide 50 text

To Thousands

Slide 51

Slide 51 text

Building Blocks

Slide 52

Slide 52 text

Instance Types

Slide 53

Slide 53 text

General Purpose (M1, M3) Memory Optimized (M2,CR1) Compute Optimized (C1, CC2) Storage Optimized (HI1, HS1) GPU Optimized (CG1)

Slide 54

Slide 54 text

240 Teraflops

Slide 55

Slide 55 text

120,000 IOPS

Slide 56

Slide 56 text

Amazon S3

Slide 57

Slide 57 text

> 2 Trillion Objects

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

Secured Uplink Planning

Slide 60

Slide 60 text

JPL Data Center Decider File Transfer Workers Data Processing Workers Polyphony Amazon SWF Decider Data Processing Tasks File Transfer Tasks Decision Tasks Create EC2 Instances Upload and Download File Chunks Data Processing Workers EC2 EC2 EC2 EC2 S3

Slide 61

Slide 61 text

SWF EC2 S3 SimpleDB CloudWatch IAMs ELB 5 Giga-pixels in 5 minutes!

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

Zero to Internet-Scale in One Week!

Slide 64

Slide 64 text

Mars Science Laboratory - Live Video Streaming Architecture Availability Zone: us-east-1a Adobe Flash Media Server Availability Zone: us-west-1b Telestream Wirecast CloudFront streaming for museum partners Adobe Flash Media Server Elastic Load Balancer Tier 2 Nginx Cache Tier 1 Nginx Cache Cloud Formation Stack Elastic Load Balancer Tier 2 Nginx Cache Tier 1 Nginx Cache Cloud Formation Stack

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

Reproduce. Reuse. Remix

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

[plugin ipcluster] setup_class = ipcluster.IPCluster enable_notebook = true notebook_passwd = YOUR-PASS [cluster qiime] node_image_id = ami-2faa7346 keyname = YOUR-KEY cluster_size = 4 node_instance_type = m2.4xlarge plugins = ipcluster $ starcluster start -c qiime myqiime Source: Justin Riley

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

1000 Genomes

Slide 72

Slide 72 text

1000 Genomes Cloud BioLinux

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

Your HiSeq data Illumina BaseSpace

Slide 75

Slide 75 text

Your data GenomeSpace

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

No content

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

Reproduce. Reuse. Remix

Slide 81

Slide 81 text

Image: Pieter Musterd Making people more effective

Slide 82

Slide 82 text

Thank You