A needle in the haystack: optimizing cloud configurations for price-performance

A needle in the haystack: optimizing cloud conﬁgurations for price-performance
by Stefano Doni CTO @ akamas.io

The Problem

Cloud compute services offer overwhelming choices EC2 instances cost ranges
from $3.4 to $19482 per month (on demand) https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-ec2-instances-performance-optimization-best-practices-cmp307r1-aw s-reinvent-2018

Cloud storage services provide various price and performance points https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018
EBS cost ranges from $0.025 to $0.125 per GB-month + provisioned IOPS

Cloud compute instances and storage types are interdependent https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018 EC2
to EBS network can limit actual volume performance (e.g. IOPS) bottleneck!

The current approaches

The model-based approach, aka cloud right sizing recommendations https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-instances

The experimental approach, aka load test your app “There is
no substitute for measuring the performance of your entire application, because application performance can be impacted by the underlying infrastructure or by software and architectural limitations. We recommend application-level testing, including the use of application proﬁling and load testing tools and services” https://aws.amazon.com/ec2/instance-types/

A bigger problem: same specs, different performance across different cloud
providers “CockroachDB 2.1 achieves 40% more throughput (tpmC) on TPC-C when tested on AWS using c5d.4xlarge than on GCP via n1-standard-16. We were shocked that AWS offered such superior performance” Cockroach Labs https://www.cockroachlabs.com/blog/2018_cloud_report/

Why current approaches can not assure optimal application performance and
low costs? • May not consider end to end application performance • May not capture hidden bottlenecks • May not capture unique application / workload behaviour • May not factor in cloud-speciﬁc platforms and implementations (e.g. hypervisors, CPU architectures) • Can’t scale to the sheer complexity of cloud options

The new AI-driven approach

Key capabilities Powered by AI Automated Full-stack Goal-driven

A new vision: continuous and self-driving optimization Conﬁgure Performance Test
Measure Goal

A real example: optimizing MongoDB on AWS

The use case Goal Minimize price/performance of a MongoDB database
hosted on AWS Performance is throughput of the database (queries/sec), price is monthly AWS price for the provisioned resources Scenario Akamas driving automated optimization including application load tests Workﬂow to provision AWS EC2 and EBS resources as suggested by AI engine Optimization scope AWS EC2 instances and EBS storage volumes powering MongoDB

Modeling the cloud cost-optimization problem c5d.2xlarge Instance family Instance generation
Additional capabilities Volume type Instance size Volume size Volume IOPS io1 70 GB 1000 IOPS EC2 EBS

Results

AI-driven price-performance optimization results Baseline conﬁguration: price/performance of r4.large, gp2
70GB Best conﬁguration: -68% price/performance after 18 experiments or approx 22 hours

Best conﬁguration: for the same price, 3x throughput and -90%
latency Price: - 2.9% 65.52 (best) vs 67.48 (baseline) €/month Throughput: +205% 7605 (best) vs 2493 (baseline) query/sec Latency (avg): -90% 1330 (best) vs 14575 (baseline) milliseconds

How did AI achieve that? A look at the best
conﬁguration Instance Name Use cases vCPUs Memory (GiB) Instance Storage Block Storage (EBS) r4.large (baseline) Memory optimized 2 x Intel Xeon E5-2686 15.25 - gp2 70GB m5d.large (best) General purpose 2 x Custom Intel Xeon Platinum 8175M 8 1 x 150 GB NVMe SSD n/a The best conﬁguration for this workload is: m5d.large HW specs comparison

AI can find unusual configurations: AMD CPUs with half memory
can cut costs and still improve throughput The cheapest configuration for this workload is m5a.large -24% cost with +12% throughput Instance Name Use cases vCPUs Memory (GiB) Instance Storage Block Storage (EBS) r4.large (baseline) Memory optimized 2 x Intel Xeon E5-2686 15.25 - gp2 70 GB m5a.large (cheapest) Memory optimized 2 x AMD EPYC 8 - gp2 114 GB HW specs comparison Searching instances with EBS storage Top 5 best configurations

r4.large m5a.large Memory used r4.large m5a.large Throughput Debunking a common
myth: high resource usage != application performance bottleneck … despite m5a.large (cheapest) having half the memory of r4.large (baseline) Throughput +12% higher for the m5a.large (cheapest) vs r4.large (baseline) instance ...

Conclusions

Takeaways • Technology landscape is becoming more and more complex
• Traditional approaches are not effective and can’t scale - signiﬁcant optimization opportunities are left on the table • AI for IT optimization is required and can reach previously unthinkable beneﬁts, beyond what human experts can do • In the cloud, 70% price/performance improvements are possible by properly exploiting choices we have • Cloud rightsizing recommendations may suggest higher price options

A needle in the haystack: optimizing cloud conf...

A needle in the haystack: optimizing cloud configurations for price-performance

Stefano Doni

More Decks by Stefano Doni

Other Decks in Technology

Featured

Transcript

A needle in the haystack: optimizing cloud conﬁgurations for price-performance

The Problem

Cloud compute services offer overwhelming choices EC2 instances cost ranges

Cloud storage services provide various price and performance points https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018

Cloud compute instances and storage types are interdependent https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018 EC2

The current approaches

The model-based approach, aka cloud right sizing recommendations https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-instances

The experimental approach, aka load test your app “There is

A bigger problem: same specs, different performance across different cloud

Why current approaches can not assure optimal application performance and

The new AI-driven approach

Key capabilities Powered by AI Automated Full-stack Goal-driven

A new vision: continuous and self-driving optimization Conﬁgure Performance Test

A real example: optimizing MongoDB on AWS

The use case Goal Minimize price/performance of a MongoDB database

Modeling the cloud cost-optimization problem c5d.2xlarge Instance family Instance generation

Results

AI-driven price-performance optimization results Baseline conﬁguration: price/performance of r4.large, gp2

Best conﬁguration: for the same price, 3x throughput and -90%

How did AI achieve that? A look at the best

AI can ﬁnd unusual conﬁgurations: AMD CPUs with half memory

r4.large m5a.large Memory used r4.large m5a.large Throughput Debunking a common

Conclusions

Takeaways • Technology landscape is becoming more and more complex

Q & A