Slide 1

Slide 1 text

Latency’s Worst Nightmare: Performance Tuning Tips and Tricks Matt Wood Principal Data Scientist @mza

Slide 2

Slide 2 text

Hello.

Slide 3

Slide 3 text

Let’s talk about performance...

Slide 4

Slide 4 text

Figure 3 Interactive user productivity versus computer response time for human-intensive interactions for system A E 600 3 - T 7 w z E 500 - U E w E - > > - - 400 - 3 n F 2 300 - 200 - 100 - 0 - 0 -" INTERACTIVE USER PRODUCTIVITY (IUP) - HUMAN-INTENSIVE COMPONENT OF IUP A MEASURED DATA (HUMAN-INTENSIVE " COMPONENT) 0 0 0 0 I 1 I I I 1 2 3 4 5 COMPUTER RESPONSE TIME (SI A. J. Thadhani, IBM Systems Journal 20 (4), 1981 Productivity and response time

Slide 5

Slide 5 text

-0.70% -0.60% -0.50% -0.40% -0.30% -0.20% -0.10% 0.00% 50ms pre- header 100ms pre- header 200ms post- header 200ms post- ads 400ms post- header Page load time and average daily searches per user http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/

Slide 6

Slide 6 text

-5.00% -4.50% -4.00% -3.50% -3.00% -2.50% -2.00% -1.50% -1.00% -0.50% 0.00% 50 200 500 1000 2000 Percent change Added delay Queries per visitor Query refinement Revenue per visitor Any clicks Satisfaction Page load delay and business metrics http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/

Slide 7

Slide 7 text

2.2s 15.4% reduction in page load time increase in conversion rate https://blog.mozilla.org/metrics/2010/04/05/firefox-page-load-speed-%E2%80%93-part-ii/

Slide 8

Slide 8 text

“Speed is more than a feature.” @fredwilson

Slide 9

Slide 9 text

Let’s talk about a web request...

Slide 10

Slide 10 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps

Slide 11

Slide 11 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps Remote

Slide 12

Slide 12 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps Remote Browser

Slide 13

Slide 13 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps Remote Browser

Slide 14

Slide 14 text

http://www.stevesouders.com/blog/2012/02/10/the-performance-golden-rule/

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Let’s talk about a web request...

Slide 18

Slide 18 text

Web server Application logic Database

Slide 19

Slide 19 text

Web server Application logic Database

Slide 20

Slide 20 text

Web server Application logic Database

Slide 21

Slide 21 text

Web server Application logic Database

Slide 22

Slide 22 text

Web server Application logic Database

Slide 23

Slide 23 text

Web server Application logic Database

Slide 24

Slide 24 text

Web server Application logic Database

Slide 25

Slide 25 text

Web server Application logic Database

Slide 26

Slide 26 text

Web server Application logic Database 3 4 6 5 7 2 1

Slide 27

Slide 27 text

Web server Application logic Database 3 4 6 5 7 2 1

Slide 28

Slide 28 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps TTFB

Slide 29

Slide 29 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps Network latency Download + negotiation time

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Variable TTFB based on geographic location. NYC London Sydney

Slide 32

Slide 32 text

Reduce internet induced latency. CloudFront content delivery network. Lower latency. Faster downloads.

Slide 33

Slide 33 text

Reduce internet induced latency. CloudFront content delivery network. Lower latency. Faster downloads.

Slide 34

Slide 34 text

Europe Amsterdam (2) Dublin Frankfurt (2) London (2) Madrid Milan Paris (2) Stockholm South America Sao Paulo North America Ashburn, VA (2) Dallas, TX (2) Hayward, CA Jacksonville, FL Los Angeles, CA (2) Miami, FL Newark, NJ New York, NY (3) Palo Alto, CA Seattle, WA San Jose, CA South Bend, IN St. Louis, MO CloudFront Edge Locations

Slide 35

Slide 35 text

Static and dynamic content Cache dynamic pages (search results). Use query strings or cookie for cache keys. Network and Path optimizations accelerate even unique content.

Slide 36

Slide 36 text

Web server Application logic Database 3 4 6 5 7 2 1 Content delivery

Slide 37

Slide 37 text

7 Web server Application logic Database 3 4 6 5 1 2 Content delivery

Slide 38

Slide 38 text

Web server Application server Database

Slide 39

Slide 39 text

Web server Application server Database

Slide 40

Slide 40 text

Application server Database

Slide 41

Slide 41 text

Application server Thread safe? Application state? Database

Slide 42

Slide 42 text

Web server Application server Database

Slide 43

Slide 43 text

Web server Application server Database Unit of scale

Slide 44

Slide 44 text

Web server Application server Database Load balancer

Slide 45

Slide 45 text

Web server Application server Database Web server Application server Web server Application server Load balancer

Slide 46

Slide 46 text

Load balancer Web server Application server Database Web server Application server Web server Application server

Slide 47

Slide 47 text

Load balancer Web server Application server Database Web server Application server Web server Application server

Slide 48

Slide 48 text

Decouple your service tiers Separation of concerns. Easier to manage. Drives higher availability.

Slide 49

Slide 49 text

Build for horizontal scale Decrease request contention. Reduce capacity planning headaches. Requires a stateless application architecture.

Slide 50

Slide 50 text

Small things, loosely coupled. Do one thing, and do it well. The Unix Way. Take a look at the Unicorn and Rainbows approach. Asynchronous be default (where possible).

Slide 51

Slide 51 text

Reduce response time. Concurrency limits can reappear quickly. Limit impact on performance through rapid scaling.

Slide 52

Slide 52 text

Load balancer

Slide 53

Slide 53 text

Load balancer

Slide 54

Slide 54 text

Load balancer

Slide 55

Slide 55 text

Fast booting with EBS-backed instances. Linux is faster to boot than Windows. EBS-backed instances are faster than S3 backed.

Slide 56

Slide 56 text

Pre-baked EBS-backed AMIs. Each code deployment creates a new AMI. AMI is the unit of deployment.

Slide 57

Slide 57 text

Automate response with Auto Scaling. Set operational thresholds. Faster than manual response (especially at 3am).

Slide 58

Slide 58 text

Time-based response with OpsWorks. Set operational times. ‘Follow the sun’ response.

Slide 59

Slide 59 text

Pre-emptive scaling. Contact your account manager, or get in touch via a Premium Support ticket.

Slide 60

Slide 60 text

Reserve capacity. Lower costs. Guaranteed availability.

Slide 61

Slide 61 text

7 Database 3 4 6 5 1 2 Content delivery Horizontal scale

Slide 62

Slide 62 text

7 Database 4 6 5 1 2 Content delivery Horizontal scale 3

Slide 63

Slide 63 text

A range of instance types Full spectrum of price/performance options.

Slide 64

Slide 64 text

Choosing the right instance type. Application specific. Benchmark. CloudWatch metrics.

Slide 65

Slide 65 text

Benchmark on business metrics. Relate application metrics to business metrics. Customers supported/instance. Photos processed/dollar.

Slide 66

Slide 66 text

The Canary in the Coal Mine Standardize on 64-bit AMIs. Deploy across instance types. Evaluate new instance types with real traffic.

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

7 Database 4 6 5 1 2 Content delivery Horizontal scale 3 Instance selection

Slide 71

Slide 71 text

7 Database 6 5 1 2 Content delivery Horizontal scale 3 Instance selection 4

Slide 72

Slide 72 text

Interface with the data store. Faster if you don’t have to go to disk. Increased concurrency.

Slide 73

Slide 73 text

Caching. Store query results in memory. Writes go to disk.

Slide 74

Slide 74 text

Amazon ElastiCache. Deploy, operate and scale in-memory caches.

Slide 75

Slide 75 text

Managing state. Transient data only. Web server state, high score tables, etc. Time consuming task results (many to many query results).

Slide 76

Slide 76 text

Best practices. Assume cold cache latency in application architecture. Set appropriate time-to-live. Batch requests rather than sequential single. Architect for cache failure.

Slide 77

Slide 77 text

7 Database 6 5 1 2 Content delivery Horizontal scale 3 Instance selection 4 Caching

Slide 78

Slide 78 text

7 Database 6 1 2 Content delivery Horizontal scale 3 Instance selection 4 Caching 5

Slide 79

Slide 79 text

Accelerating reads. Vertical scale vs horizontal scale.

Slide 80

Slide 80 text

Horizontal scale. Add additional DB resources for scale. Scale out for reads. Shard for reads and writes.

Slide 81

Slide 81 text

Vertical scale. More resources for a single DB engine. Add memory for DB caches. Add CPU for more intensive queries.

Slide 82

Slide 82 text

Scaling IO. Provision throughput for your app. Reserved. Available. Elastic. Available on EBS, Amazon RDS and DynamoDB.

Slide 83

Slide 83 text

Provisioned throughput is consistent. Consistent, predictable performance. Relational databases with RDS. NoSQL stores with DynamoDB. Relational & NoSQL with EC2 and EBS.

Slide 84

Slide 84 text

Provisioned throughput with RDS. 12.5k IOPS on MySQL. 25k IOPS on Oracle. 10k IOPS on SQL Server. Provision up to 30k for reduced latency.

Slide 85

Slide 85 text

Provisioned throughput and instance types. Optimized for provisioned IO storage: m1.large: 500 Mbps m1.xlarge, m2.xlarge, m2.4xlarge: 1000 Mbps

Slide 86

Slide 86 text

Provisioned throughput and DynamoDB. Consistent performance with unlimited throughput and storage. Single digital latency.

Slide 87

Slide 87 text

Build for a uniform workload. Evenly distribute query patterns by key. Use a wide range of key values.

Slide 88

Slide 88 text

7 6 1 2 Content delivery Horizontal scale 3 Instance selection 4 Caching 5 Read capacity

Slide 89

Slide 89 text

7 1 2 Content delivery Horizontal scale 3 Instance selection 4 Caching 5 Read capacity 6

Slide 90

Slide 90 text

Standard EBS volumes. Moderate or bursty workloads. 100 IOPS, bursting to hundreds of IOPS. Bursting is good for boot volumes.

Slide 91

Slide 91 text

Provisioned IOPS with EBS volumes. Predictable, high performance for IO intensive workloads. 2000 IOPS per volume. Stripe volumes for additional IO. Deliver within 10% of the performance, 99.9% of the time.

Slide 92

Slide 92 text

EBS-Optimized instances. m1.large, m2.2xlarge, m3.xlarge: 500 Mbps m1.xlarge, m2.4xlarge, m3.2xlarge, c1.xlarge: 1000 Mbps

Slide 93

Slide 93 text

High bandwidth networking cg1.4xlarge, cc2.8xlarge, hi1.4xlarge, hs1.8xlarge and cr1.8xlarge run on non-blocking, 10 gigabit networking. Not EBS-Optimized, but can be used with provisioned IOPS volumes.

Slide 94

Slide 94 text

High I/O instances Designed for for high throughput database workloads. 2 x 1TB SSDs 2 GB/s for reads 1.1 GB/s for writes

Slide 95

Slide 95 text

Para-virtual instances 4kb random reads: 120k IOPS 4kb random writes: 10k - 80k IOPS

Slide 96

Slide 96 text

HVM instances (including Windows) 4kb random reads: 90k IOPS 4kb random writes: 9k - 75k IOPS

Slide 97

Slide 97 text

High Storage instances High sequential IO. 24 x 2TB drives. 2.4 GB/s of 2MiB sequential reads. 2.6 GB/s for sequential writes.

Slide 98

Slide 98 text

7 1 2 Content delivery Horizontal scale 3 Instance selection 4 Caching 5 Read capacity 6 Block store

Slide 99

Slide 99 text

Initial connection SSL negotiation Time to first byte Content download %CPU kbps Back end Front end

Slide 100

Slide 100 text

http://amzn.to/170S1VV http://amzn.to/Zod4PY

Slide 101

Slide 101 text

1. Make fewer HTTP requests. 2. Use a content delivery network. 3. Add an expires header. 4. GZIP components. 5. Put style sheets at the top. 6. Put scripts at the bottom. 7. Avoid CSS expressions. 8. Make JavaScript and CSS External. 9. Reduce DNS lookups. 10. Minify Javascript. 11. Avoid redirects. 12. Remove duplicate scripts. 13. Configure ETags. 14. Make AJAX cacheable. http://stevesouders.com/hpws/rules.php 14 rules for faster loading web sites

Slide 102

Slide 102 text

7 1 3 4 6 5 Content delivery Instance selection Caching Read capacity Front end optimization 2 Horizontal scale Block store

Slide 103

Slide 103 text

One more thing...

Slide 104

Slide 104 text

US East

Slide 105

Slide 105 text

US East Sydney Dublin Route 53

Slide 106

Slide 106 text

Thank you! matthew@amazon.com @mza