Predictability of Performance in Public Clouds - Some Empirical Data and Lessons Learned for Software Performance Testing

software evolution & architecture lab University of Zurich, Switzerland Predictability
of Performance in Public Clouds Some Empirical Data and Lessons Learned for Software Performance Testing Dr. Philipp Leitner @xLeitix

Joint Work with … Cloud Benchmarking Performance Testing

Performance Testing … • Load Tests • (Micro-)Benchmarking

A/B Testing Basic Statistical Approach V1 V2 p = 0.0123
V1 V1 V2 V2

Execution Environment Cloud Baremetal Cheap Readily available

What’s the problem? 20 30 40 0 20 40 60
Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704

Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704 Two identical instances - very different performance

Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704 Same instance over time - also very different performance

Two Kinds of Predictability • Inter-Instance Predictability “How similar is
the performance of multiple identical instances?” • Intra-Instance Predictability “How self-similar is the performance of a single instance over time?”

Let’s look at some data!

Data Collection Approach - Benchmarks

Data Collection Approach - Tooling J. Scheuner, P. Leitner, J.
Cito and H.C. Gall: Cloud Work Bench - Infrastructure-as-Code Based Cloud Benchmarking 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, Singapore, 2014, pp. 246-253. doi: 10.1109/CloudCom.2014.98 Code: https://github.com/ sealuzh/cloud- workbench Demo: https:// www.youtube.com/ watch?v=0yGFGvHvobk Collected ~ 54000 data points over 2 months

Inter-Instance Predictability

Inter-Instance Predictability Relative Standard Deviations

Hardware Heterogeneity CPU Models we observed in our tests (for
m1.small and Azure Small in North America)

Intra-Instance Predictability Relative Standard Deviations of Benchmark Runs Within the
Same Instance

0.0 2.5 5.0 7.5 10.0 Baremetal Cloud generated_serialize_1_int_field ops /
sec What does this mean for performance testing? Examples from microbenchmarking io.protostuff on baremetal versus a single cloud instance on GCE 0 10 20 30 40 50 Baremetal Cloud runtime_serialize_1_int_field ops / sec

What does this mean for performance testing? Enter AWS

Sanity Checking Approach A/A Testing V1 p = 0.5467 V1
V1 V1 V1 V1

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe
no diff. Runs 1 2 3 4 5 1 2 3 4 5 runtime_serialize_1_int_field p-values

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe
no diff. Runs 1 2 3 4 5 1 X 0,001041 0,000472 0,04298 0,2211 2 0,001041 X 0,862 0,4291 0,003211 3 0,000472 0,862 X 0,4135 0,007995 4 0,04298 0,4291 0,4135 X 0,04909 5 0,2211 0,003211 0,007995 0,04909 X runtime_serialize_1_int_field

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible
(Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up

Scaling Up Instance 1 Instance 2 … V1 V2 V1
… V1 V2 V1 …

(Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up (Level 4) Experiment interleaving

Experiment Interleaving Instance V1 V2 V1 V2 V1 … Listen
to this talk @ ICPE for more info and experiments :)

(Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up (Level 4) Experiment interleaving (Level 5) Admit defeat (ﬁne-grained diffs might just not be discoverable for you)

software evolution & architecture lab • Inter-Instance Predictability • Intra-Instance
Predictability Summary Philipp Leitner @xLeitix Main contacts: Jürgen Cito @citostyle Christoph Laaber @chrstphlbr Joel Scheuner @joe4dev 20 30 40 0 20 40 60 Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704

software evolution & architecture lab Dr. Philipp Leitner University of
Zurich, Switzerland Summary Some insights from benchmarking EC2, GCE, Azure, and Softlayer More info: Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos — A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Trans. Internet Technol. 16, 3, Article 15 (April 2016), 23 pages. DOI: http://dx.doi.org/10.1145/2885497 Philipp Leitner @xLeitix Main contacts: Jürgen Cito @citostyle Christoph Laaber @chrstphlbr Joel Scheuner @joe4dev

Predictability of Performance in Public Clouds ...

Predictability of Performance in Public Clouds - Some Empirical Data and Lessons Learned for Software Performance Testing

xLeitix

More Decks by xLeitix

Other Decks in Research

Featured

Transcript

software evolution & architecture lab University of Zurich, Switzerland Predictability

Joint Work with … Cloud Benchmarking Performance Testing

Performance Testing … • Load Tests • (Micro-)Benchmarking

A/B Testing Basic Statistical Approach V1 V2 p = 0.0123

Execution Environment Cloud Baremetal Cheap Readily available

What’s the problem? 20 30 40 0 20 40 60

What’s the problem? 20 30 40 0 20 40 60

What’s the problem? 20 30 40 0 20 40 60

Two Kinds of Predictability • Inter-Instance Predictability “How similar is

Let’s look at some data!

Data Collection Approach - Benchmarks

Data Collection Approach - Tooling J. Scheuner, P. Leitner, J.

Inter-Instance Predictability

Inter-Instance Predictability Relative Standard Deviations

Hardware Heterogeneity CPU Models we observed in our tests (for

Intra-Instance Predictability Relative Standard Deviations of Benchmark Runs Within the

0.0 2.5 5.0 7.5 10.0 Baremetal Cloud generated_serialize_1_int_field ops /

What does this mean for performance testing? Enter AWS

Sanity Checking Approach A/A Testing V1 p = 0.5467 V1

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible

Scaling Up Instance 1 Instance 2 … V1 V2 V1

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible

Experiment Interleaving Instance V1 V2 V1 V2 V1 … Listen

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible

software evolution & architecture lab • Inter-Instance Predictability • Intra-Instance

software evolution & architecture lab Dr. Philipp Leitner University of