Slide 1

Slide 1 text

HPC with Amazon EC2 Deepak Singh @mndoci P r i n c i p a l P r o d u c t M a n a g e r

Slide 2

Slide 2 text

Amazon Web Services

Slide 3

Slide 3 text

4

Slide 4

Slide 4 text

2

Slide 5

Slide 5 text

1. Infrastructure

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ec2-run-instances

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

secure global on demand

Slide 10

Slide 10 text

programmable

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

elastic

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

instance types

Slide 17

Slide 17 text

standard (m1) high memory (m2) high CPU (c1) t1.micro

Slide 18

Slide 18 text

high performance

Slide 19

Slide 19 text

“Our 40-instance (m2.2xlarge) cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.” Mike Driscoll - Metamarkets

Slide 20

Slide 20 text

cluster computing

Slide 21

Slide 21 text

MPI

Slide 22

Slide 22 text

bandwidth intensive

Slide 23

Slide 23 text

Cluster Compute Instance

Slide 24

Slide 24 text

2*Intel Xeon 5570 8 cores w/HT 23 GB RAM 1.7 TB disk HVM cc1.4xlarge

Slide 25

Slide 25 text

10 gig E

Slide 26

Slide 26 text

Placement Group

Slide 27

Slide 27 text

Placement group full- bisection

Slide 28

Slide 28 text

linpack

Slide 29

Slide 29 text

Cores 7040 Rmax 41.82 Rpeak 82.51

Slide 30

Slide 30 text

231 November 2010

Slide 31

Slide 31 text

451 June 2011

Slide 32

Slide 32 text

WIEN2K Parallel Performance H size 56,000 (25GB) Runtime (16x8 processors) Local (Infiniband) 3h:48 Cloud (10Gbps) 1h:30 ($40) 1200 atom unit cell; SCALAPACK+MPI diagonalization, matrix size 50k-100k Credit: K. Jorissen, F. D. Villa, and J. J. Rehr (U. Washington)

Slide 33

Slide 33 text

New Cluster Compute Instance

Slide 34

Slide 34 text

2*Intel Xeon 16 cores w/HT 60.5 GB RAM 3.4 TB disk HVM cc2.8xlarge

Slide 35

Slide 35 text

linpack

Slide 36

Slide 36 text

Cores 17024 Rmax 240.09 Rpeak 354.12

Slide 37

Slide 37 text

42 November 2011

Slide 38

Slide 38 text

optimizing costs

Slide 39

Slide 39 text

on-demand

Slide 40

Slide 40 text

reserved

Slide 41

Slide 41 text

spot

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

30,472 cores

Slide 46

Slide 46 text

$1279/hr

Slide 47

Slide 47 text

2. Orchestration

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

AWS CloudFormation

Slide 50

Slide 50 text

bootstrap

Slide 51

Slide 51 text

Cloud Init

Slide 52

Slide 52 text

#cloud-config packages: ! - httpd ! runcmd: ! - /etc/init.d http start ! - echo "

hello, world"

\ ! ! > /var/www/html/ index.html

Slide 53

Slide 53 text

#!/bin/sh ec2-run-instances ami-8c1fece5 \ ! -n 1 \ ! -t m1.small \ ! -g deesinghdemo-SG \ ! -k deesinghdemo-keypair \ ! --user-data-file \ .\cloudconfig.txt

Slide 54

Slide 54 text

chef/puppet

Slide 55

Slide 55 text

familiar tools

Slide 56

Slide 56 text

LSF

Slide 57

Slide 57 text

Grid Engine

Slide 58

Slide 58 text

Bright Cluster Manager

Slide 59

Slide 59 text

combining worlds

Slide 60

Slide 60 text

MIT Starcluster

Slide 61

Slide 61 text

$ starcluster start mycluster $ starcluster listclusters

Slide 62

Slide 62 text

http://www.bioteam.net/2011/03/dude-you-got-some-chef-in-my-starcluster/

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Provisions Cluster Shared Storage Monitoring Bootstraps StarCluster Includes 200 GB Public Dataset Provisioned Stack = Submit jobs to Grid Engine

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Image: Chris Dagdigian