Latency's Worst Nightmare: Performance Tuning Tips and Tricks

Latency’s Worst Nightmare: Performance Tuning Tips and Tricks Matt Wood
Principal Data Scientist @mza

Hello.

Let’s talk about performance...

Figure 3 Interactive user productivity versus computer response time for
human-intensive interactions for system A E 600 3 - T 7 w z E 500 - U E w E - > > - - 400 - 3 n F 2 300 - 200 - 100 - 0 - 0 -" INTERACTIVE USER PRODUCTIVITY (IUP) - HUMAN-INTENSIVE COMPONENT OF IUP A MEASURED DATA (HUMAN-INTENSIVE " COMPONENT) 0 0 0 0 I 1 I I I 1 2 3 4 5 COMPUTER RESPONSE TIME (SI A. J. Thadhani, IBM Systems Journal 20 (4), 1981 Productivity and response time

-0.70% -0.60% -0.50% -0.40% -0.30% -0.20% -0.10% 0.00% 50ms pre-
header 100ms pre- header 200ms post- header 200ms post- ads 400ms post- header Page load time and average daily searches per user http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/

-5.00% -4.50% -4.00% -3.50% -3.00% -2.50% -2.00% -1.50% -1.00% -0.50%
0.00% 50 200 500 1000 2000 Percent change Added delay Queries per visitor Query refinement Revenue per visitor Any clicks Satisfaction Page load delay and business metrics http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/

2.2s 15.4% reduction in page load time increase in conversion
rate https://blog.mozilla.org/metrics/2010/04/05/firefox-page-load-speed-%E2%80%93-part-ii/

“Speed is more than a feature.” @fredwilson

Let’s talk about a web request...

Initial connection SSL negotiation Time to first byte Content download
%CPU kbps

%CPU kbps Remote

%CPU kbps Remote Browser

http://www.stevesouders.com/blog/2012/02/10/the-performance-golden-rule/

Let’s talk about a web request...

Web server Application logic Database

Web server Application logic Database 3 4 6 5 7
2 1

%CPU kbps TTFB

%CPU kbps Network latency Download + negotiation time

Variable TTFB based on geographic location. NYC London Sydney

Reduce internet induced latency. CloudFront content delivery network. Lower latency.
Faster downloads.

Europe Amsterdam (2) Dublin Frankfurt (2) London (2) Madrid Milan
Paris (2) Stockholm South America Sao Paulo North America Ashburn, VA (2) Dallas, TX (2) Hayward, CA Jacksonville, FL Los Angeles, CA (2) Miami, FL Newark, NJ New York, NY (3) Palo Alto, CA Seattle, WA San Jose, CA South Bend, IN St. Louis, MO CloudFront Edge Locations

Static and dynamic content Cache dynamic pages (search results). Use
query strings or cookie for cache keys. Network and Path optimizations accelerate even unique content.

Web server Application logic Database 3 4 6 5 7
2 1 Content delivery

7 Web server Application logic Database 3 4 6 5
1 2 Content delivery

Web server Application server Database

Application server Database

Application server Thread safe? Application state? Database

Web server Application server Database

Web server Application server Database Unit of scale

Web server Application server Database Load balancer

Web server Application server Database Web server Application server Web
server Application server Load balancer

Load balancer Web server Application server Database Web server Application
server Web server Application server

Decouple your service tiers Separation of concerns. Easier to manage.
Drives higher availability.

Build for horizontal scale Decrease request contention. Reduce capacity planning
headaches. Requires a stateless application architecture.

Small things, loosely coupled. Do one thing, and do it
well. The Unix Way. Take a look at the Unicorn and Rainbows approach. Asynchronous be default (where possible).

Reduce response time. Concurrency limits can reappear quickly. Limit impact
on performance through rapid scaling.

Load balancer

Fast booting with EBS-backed instances. Linux is faster to boot
than Windows. EBS-backed instances are faster than S3 backed.

Pre-baked EBS-backed AMIs. Each code deployment creates a new AMI.
AMI is the unit of deployment.

Automate response with Auto Scaling. Set operational thresholds. Faster than
manual response (especially at 3am).

Time-based response with OpsWorks. Set operational times. ‘Follow the sun’
response.

Pre-emptive scaling. Contact your account manager, or get in touch
via a Premium Support ticket.

Reserve capacity. Lower costs. Guaranteed availability.

7 Database 3 4 6 5 1 2 Content delivery
Horizontal scale

7 Database 4 6 5 1 2 Content delivery Horizontal
scale 3

A range of instance types Full spectrum of price/performance options.

Choosing the right instance type. Application specific. Benchmark. CloudWatch metrics.

Benchmark on business metrics. Relate application metrics to business metrics.
Customers supported/instance. Photos processed/dollar.

The Canary in the Coal Mine Standardize on 64-bit AMIs.
Deploy across instance types. Evaluate new instance types with real traffic.

7 Database 4 6 5 1 2 Content delivery Horizontal
scale 3 Instance selection

7 Database 6 5 1 2 Content delivery Horizontal scale
3 Instance selection 4

Interface with the data store. Faster if you don’t have
to go to disk. Increased concurrency.

Caching. Store query results in memory. Writes go to disk.

Amazon ElastiCache. Deploy, operate and scale in-memory caches.

Managing state. Transient data only. Web server state, high score
tables, etc. Time consuming task results (many to many query results).

Best practices. Assume cold cache latency in application architecture. Set
appropriate time-to-live. Batch requests rather than sequential single. Architect for cache failure.

7 Database 6 5 1 2 Content delivery Horizontal scale
3 Instance selection 4 Caching

7 Database 6 1 2 Content delivery Horizontal scale 3
Instance selection 4 Caching 5

Accelerating reads. Vertical scale vs horizontal scale.

Horizontal scale. Add additional DB resources for scale. Scale out
for reads. Shard for reads and writes.

Vertical scale. More resources for a single DB engine. Add
memory for DB caches. Add CPU for more intensive queries.

Scaling IO. Provision throughput for your app. Reserved. Available. Elastic.
Available on EBS, Amazon RDS and DynamoDB.

Provisioned throughput is consistent. Consistent, predictable performance. Relational databases with
RDS. NoSQL stores with DynamoDB. Relational & NoSQL with EC2 and EBS.

Provisioned throughput with RDS. 12.5k IOPS on MySQL. 25k IOPS
on Oracle. 10k IOPS on SQL Server. Provision up to 30k for reduced latency.

Provisioned throughput and instance types. Optimized for provisioned IO storage:
m1.large: 500 Mbps m1.xlarge, m2.xlarge, m2.4xlarge: 1000 Mbps

Provisioned throughput and DynamoDB. Consistent performance with unlimited throughput and
storage. Single digital latency.

Build for a uniform workload. Evenly distribute query patterns by
key. Use a wide range of key values.

7 6 1 2 Content delivery Horizontal scale 3 Instance
selection 4 Caching 5 Read capacity

7 1 2 Content delivery Horizontal scale 3 Instance selection
4 Caching 5 Read capacity 6

Standard EBS volumes. Moderate or bursty workloads. 100 IOPS, bursting
to hundreds of IOPS. Bursting is good for boot volumes.

Provisioned IOPS with EBS volumes. Predictable, high performance for IO
intensive workloads. 2000 IOPS per volume. Stripe volumes for additional IO. Deliver within 10% of the performance, 99.9% of the time.

EBS-Optimized instances. m1.large, m2.2xlarge, m3.xlarge: 500 Mbps m1.xlarge, m2.4xlarge, m3.2xlarge,
c1.xlarge: 1000 Mbps

High bandwidth networking cg1.4xlarge, cc2.8xlarge, hi1.4xlarge, hs1.8xlarge and cr1.8xlarge run
on non-blocking, 10 gigabit networking. Not EBS-Optimized, but can be used with provisioned IOPS volumes.

High I/O instances Designed for for high throughput database workloads.
2 x 1TB SSDs 2 GB/s for reads 1.1 GB/s for writes

Para-virtual instances 4kb random reads: 120k IOPS 4kb random writes:
10k - 80k IOPS

HVM instances (including Windows) 4kb random reads: 90k IOPS 4kb
random writes: 9k - 75k IOPS

High Storage instances High sequential IO. 24 x 2TB drives.
2.4 GB/s of 2MiB sequential reads. 2.6 GB/s for sequential writes.

7 1 2 Content delivery Horizontal scale 3 Instance selection
4 Caching 5 Read capacity 6 Block store

%CPU kbps Back end Front end

http://amzn.to/170S1VV http://amzn.to/Zod4PY

1. Make fewer HTTP requests. 2. Use a content delivery
network. 3. Add an expires header. 4. GZIP components. 5. Put style sheets at the top. 6. Put scripts at the bottom. 7. Avoid CSS expressions. 8. Make JavaScript and CSS External. 9. Reduce DNS lookups. 10. Minify Javascript. 11. Avoid redirects. 12. Remove duplicate scripts. 13. Configure ETags. 14. Make AJAX cacheable. http://stevesouders.com/hpws/rules.php 14 rules for faster loading web sites

7 1 3 4 6 5 Content delivery Instance selection
Caching Read capacity Front end optimization 2 Horizontal scale Block store

One more thing...

US East

US East Sydney Dublin Route 53

Thank you! [email protected] @mza

Latency's Worst Nightmare: Performance Tuning T...

Latency's Worst Nightmare: Performance Tuning Tips and Tricks

More Decks by Matt Wood

Other Decks in Technology

Featured

Transcript