“AWS enables Pfizer to explore
difficult or deep scientific
questions in a timely, scalable
manner and helps us make better
decisions more quickly”
Michael Miller, Pfizer
Slide 33
Slide 33 text
THE STORY OF ANALYTICS
2
Slide 34
Slide 34 text
EC2
Utility computing.
6 years young.
Slide 35
Slide 35 text
Embarrassingly parallel problems.
Scale out systems
Queue based distribution.
Small, medium and high scale.
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
EC2
Utility computing.
6 years young.
Cost optimization.
Slide 40
Slide 40 text
Achieving economies of scale
100%
Time
Slide 41
Slide 41 text
Reserved capacity
Achieving economies of scale
100%
Time
Slide 42
Slide 42 text
Reserved capacity
Achieving economies of scale
100%
Time
On-demand
Slide 43
Slide 43 text
Reserved capacity
Achieving economies of scale
100%
Time
On-demand
UNUSED CAPACITY
Slide 44
Slide 44 text
Bid on unused EC2 capacity.
Spot Instances
Very large discount.
Perfect for batch runs.
Balance cost and scale.
Slide 45
Slide 45 text
$650 per hour
Slide 46
Slide 46 text
Pattern for distributed computing.
Map/reduce
Software frameworks such as
Hadoop.
Write two functions. Scale up.
Slide 47
Slide 47 text
Pattern for distributed computing.
Map/reduce
Software frameworks such as
Hadoop.
Write two functions. Scale up.
Complex cluster configuration
and management.
Slide 48
Slide 48 text
Managed Hadoop clusters.
Amazon Elastic MapReduce
Easy to provision and monitor.
Write two functions. Scale up.
Optimized for S3 access.
Slide 49
Slide 49 text
Input data
S3
UNDER
THE
HOOD
i
i
Slide 50
Slide 50 text
Elastic
MapReduce
Code
Input data
S3
UNDER
THE
HOOD
i
i
Slide 51
Slide 51 text
Elastic
MapReduce
Code Name
node
Input data
S3
UNDER
THE
HOOD
i
i
Slide 52
Slide 52 text
Elastic
MapReduce
Code Name
node
Input data
S3
Elastic
cluster
UNDER
THE
HOOD
i
i
Slide 53
Slide 53 text
Elastic
MapReduce
Code Name
node
Input data
S3
Elastic
cluster
HDFS
UNDER
THE
HOOD
i
i
Slide 54
Slide 54 text
Elastic
MapReduce
Code Name
node
Input data
S3
Elastic
cluster
HDFS
Queries
+ BI
Via JDBC, Pig, Hive
UNDER
THE
HOOD
i
i
Slide 55
Slide 55 text
Elastic
MapReduce
Code Name
node
Output
S3 + SimpleDB
Input data
S3
Elastic
cluster
HDFS
Queries
+ BI
Via JDBC, Pig, Hive
UNDER
THE
HOOD
i
i
Slide 56
Slide 56 text
Output
S3 + SimpleDB
Input data
S3
UNDER
THE
HOOD
i
i
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
No content
Slide 59
Slide 59 text
No content
Slide 60
Slide 60 text
No content
Slide 61
Slide 61 text
No content
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
No content
Slide 65
Slide 65 text
No content
Slide 66
Slide 66 text
No content
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
No content
Slide 69
Slide 69 text
No content
Slide 70
Slide 70 text
No content
Slide 71
Slide 71 text
Performance
Slide 72
Slide 72 text
Performance
Compute performance
Slide 73
Slide 73 text
Intel Xeon E5-2670
Cluster Compute
10 gig E non-blocking network
Placement groupings
60.5 Gb
UNDER
THE
HOOD
i
i
Slide 74
Slide 74 text
Intel Xeon E5-2670
Cluster Compute
10 gig E non-blocking network
Placement groupings
60.5 Gb
UNDER
THE
HOOD
i
i
+ GPU enabled instances
Slide 75
Slide 75 text
Performance
Compute performance
Slide 76
Slide 76 text
Performance
Compute performance
IO performance
Slide 77
Slide 77 text
NoSQL
Unstructured data storage.
Slide 78
Slide 78 text
Predictable, consistent performance
DynamoDB
Unlimited storage
No schema for unstructured data
Single digit millisecond latencies
Backed on solid state drives
Slide 79
Slide 79 text
...and SSDs for all.
New Hi1 storage instances.
Slide 80
Slide 80 text
2 x 1Tb SSDs
hi1.4xlarge
10 GigE network
HVM: 90k IOPS read, 9k to 75k write
PV: 120k IOPS read, 10k to 85k write
UNDER
THE
HOOD
i
i
Slide 81
Slide 81 text
Netflix
“The hi1.4xlarge configuration is
about half the system cost for the
same throughput.”
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html