Predictability of Performance in Public Clouds - Some Empirical Data and Lessons Learned for Software Performance Testing

Slide 1

Slide 1 text

software evolution & architecture lab University of Zurich, Switzerland Predictability of Performance in Public Clouds Some Empirical Data and Lessons Learned for Software Performance Testing Dr. Philipp Leitner @xLeitix

Slide 2

Slide 2 text

Joint Work with … Cloud Benchmarking Performance Testing

Slide 3

Slide 3 text

Performance Testing … • Load Tests • (Micro-)Benchmarking

Slide 4

Slide 4 text

A/B Testing Basic Statistical Approach V1 V2 p = 0.0123 V1 V1 V2 V2

Slide 5

Slide 5 text

Execution Environment Cloud Baremetal Cheap Readily available

Slide 6

Slide 6 text

What’s the problem? 20 30 40 0 20 40 60 Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704

Slide 7

Slide 7 text

What’s the problem? 20 30 40 0 20 40 60 Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704 Two identical instances - very different performance

Slide 8

Slide 8 text

What’s the problem? 20 30 40 0 20 40 60 Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704 Same instance over time - also very different performance

Slide 9

Slide 9 text

Two Kinds of Predictability • Inter-Instance Predictability “How similar is the performance of multiple identical instances?” • Intra-Instance Predictability “How self-similar is the performance of a single instance over time?”

Slide 10

Slide 10 text

Let’s look at some data!

Slide 11

Slide 11 text

Data Collection Approach - Benchmarks

Slide 12

Slide 12 text

Data Collection Approach - Tooling J. Scheuner, P. Leitner, J. Cito and H.C. Gall: Cloud Work Bench - Infrastructure-as-Code Based Cloud Benchmarking 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, Singapore, 2014, pp. 246-253. doi: 10.1109/CloudCom.2014.98 Code: https://github.com/ sealuzh/cloud- workbench Demo: https:// www.youtube.com/ watch?v=0yGFGvHvobk Collected ~ 54000 data points over 2 months

Slide 13

Slide 13 text

Inter-Instance Predictability

Slide 14

Slide 14 text

Inter-Instance Predictability Relative Standard Deviations

Slide 15

Slide 15 text

Hardware Heterogeneity CPU Models we observed in our tests (for m1.small and Azure Small in North America)

Slide 16

Slide 16 text

Intra-Instance Predictability Relative Standard Deviations of Benchmark Runs Within the Same Instance

Slide 17

Slide 17 text

0.0 2.5 5.0 7.5 10.0 Baremetal Cloud generated_serialize_1_int_field ops / sec What does this mean for performance testing? Examples from microbenchmarking io.protostuff on baremetal versus a single cloud instance on GCE 0 10 20 30 40 50 Baremetal Cloud runtime_serialize_1_int_field ops / sec

Slide 18

Slide 18 text

What does this mean for performance testing? Enter AWS

Slide 19

Slide 19 text

Sanity Checking Approach A/A Testing V1 p = 0.5467 V1 V1 V1 V1 V1

Slide 20

Slide 20 text

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe no diff. Runs 1 2 3 4 5 1 2 3 4 5 runtime_serialize_1_int_field p-values

Slide 21

Slide 21 text

A/A Testing Basic idea: Compare two identical configurations, (hopefully) observe no diff. Runs 1 2 3 4 5 1 X 0,001041 0,000472 0,04298 0,2211 2 0,001041 X 0,862 0,4291 0,003211 3 0,000472 0,862 X 0,4135 0,007995 4 0,04298 0,4291 0,4135 X 0,04909 5 0,2211 0,003211 0,007995 0,04909 X runtime_serialize_1_int_field

Slide 22

Slide 22 text

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible (Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up

Slide 23

Slide 23 text

Scaling Up Instance 1 Instance 2 … V1 V2 V1 … V1 V2 V1 …

Slide 24

Slide 24 text

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible (Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up (Level 4) Experiment interleaving

Slide 25

Slide 25 text

Experiment Interleaving Instance V1 V2 V1 V2 V1 … Listen to this talk @ ICPE for more info and experiments :)

Slide 26

Slide 26 text

Mitigation Strategies (Level 0) Use baremetal / dedicated if feasible (Level 1) A/A testing is key (Level 2) Test different providers (Level 3) Scale up (Level 4) Experiment interleaving (Level 5) Admit defeat (ﬁne-grained diffs might just not be discoverable for you)

Slide 27

Slide 27 text

software evolution & architecture lab • Inter-Instance Predictability • Intra-Instance Predictability Summary Philipp Leitner @xLeitix Main contacts: Jürgen Cito @citostyle Christoph Laaber @chrstphlbr Joel Scheuner @joe4dev 20 30 40 0 20 40 60 Measurement Runtime [h] IO Bandwidth [Mb/s] Instance 9097 Instance 14704

Slide 28

Slide 28 text

software evolution & architecture lab Dr. Philipp Leitner University of Zurich, Switzerland Summary Some insights from benchmarking EC2, GCE, Azure, and Softlayer More info: Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos — A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Trans. Internet Technol. 16, 3, Article 15 (April 2016), 23 pages. DOI: http://dx.doi.org/10.1145/2885497 Philipp Leitner @xLeitix Main contacts: Jürgen Cito @citostyle Christoph Laaber @chrstphlbr Joel Scheuner @joe4dev