How to Scale in the Cloud: Chargeback is Back, Baby!

How to Scale in the Cloud: Chargeback is Back, Baby!

The need for system administrators—especially Linux sys admins—to do performance management has returned with a vengeance. Why? The cloud. Resource consumption in the cloud is all about run now, pay later (AKA 'chargeback' in mainframe-ese). This talk will show you how performance models can help to find the most cost-effective deployment of your applications on Amazon Web Services (AWS). The same technique should be transferable to other cloud services. Presented at Rocky Mountain CMG, Denver, Colorado.

Ced140140e9ae226f0d9ef0fbb84a3a1?s=128

Dr. Neil Gunther

December 05, 2019
Tweet

Transcript

  1. How to Scale in the Cloud Chargeback is Back, Baby!

    Neil J. Gunther @DrQz Performance Dynamics Rocky Mountain CMG Denver, Colorado December 5, 2019 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 1 / 38
  2. Everything old is new again Abstract The need for system

    administrators—especially Linux sys admins—to do per- formance management has returned with a vengeance. Why? The cloud. Resource consumption in the cloud is all about run now, pay later1 (AKA chargeback2 in mainframe-ese). This talk will show you how performance models can help to find the most cost-effective deployment of your applications on Amazon Web Services (AWS). The same technique should be transferable to other cloud services. 1 Chargeback disappeared with the arrival of the PC revolution and the advent of distributed client-server architectures. 2 Chargeback underpins the cloud business model, especially when it comes to the development of hot applications, e.g., “Microsoft wants every developer to be an AI developer, which would help its already booming Azure Cloud business do better still: AI demands data, which requires cloud processing power and generates bills.” —The Register, May 2018 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 2 / 38
  3. Previous work 1 Joint work with Mohit Chawla Senior Systems

    Engineer, Hamburg, Germany First foray into modeling cloud applications with PDQ Period from June 2016 to April 2018 First validated cloud queueing model (AFAIK) 2 Presented jointly at CMG cloudXchange, July 2018 3 Published in Linux Magazin February 2019 (in German) English manuscript available on arXiv.org c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 3 / 38
  4. AWS cloud environment Outline 1 AWS cloud environment 2 Production

    data acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 4 / 38
  5. AWS cloud environment AWS Cloud Application Platform Entire application runs

    in the Amazon cloud Elastic load balancer (ELB) AWS Elastic Cluster (EC2) instance type m4.10xlarge: 20 cpus = 40 vpus Auto Scaling group (A/S) Mobile users make requests to Apache HTTP server3 via ELB on EC2 Tomcat thread server4 on EC2 calls external services (i.e., 3rd-party web servers) A/S controls number of active EC2 instances based on incoming traffic and configured policies ELB balances incoming traffic across all active EC2 nodes in the AWS cluster 3 Versions 2.2 and 2.4 4 Versions 7 and 8 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 5 / 38
  6. AWS cloud environment Request Processing Workflow On a single EC2

    instance: 1 Incoming HTTP Request from mobile user processed by Apache + Tomcat 2 Tomcat then sends multiple requests to External Services based on original request 3 External services respond and Tomcat computes business logic based on all those Responses 4 Tomcat sends the final Response back to originating mobile user c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 6 / 38
  7. Production data acquisition Outline 1 AWS cloud environment 2 Production

    data acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 7 / 38
  8. Production data acquisition Data Tools and Scripts JMX (Java Management

    Extensions) data from JVM jmxterm VisualVM Java Mission Control Datadog dd-agent Datadog — also integrates with AWS CloudWatch metrics Collectd — Linux performance data collection Graphite and statsd — application metrics collection & storage Grafana — time-series data plotting Custom data collection scripts by M. Chawla R statistical libs and RStudio IDE PDQ performance modeling tool by NJG c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 8 / 38
  9. Production data acquisition Distilled EC2 Instance Data These few perf

    metrics are sufficient to parameterize our PDQ model Timestamp, Xdat, Nest, Sest, Rdat, Udat 1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120 1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420 1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980 1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700 1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860 1486772700000, 528.587722, 201.187500, 0.000914, 0.366283, 0.483160 1486773000000, 533.439054, 202.600006, 0.000892, 0.378207, 0.476080 1486773300000, 531.708059, 208.187500, 0.000909, 0.392556, 0.483160 1486773600000, 532.693783, 203.266663, 0.000894, 0.379749, 0.476020 1486773900000, 519.748550, 200.937500, 0.000895, 0.381078, 0.465260 ... Interval between Unix Timestamp rows is 300 seconds Little’s law (LL) gives relationships between above metrics: 1 Nest = Xdat × Rdat : macroscopic LL =⇒ thread concurrency 2 Udat = Xdat × Sest : microscopic LL =⇒ resource service times LL provides a consistency check of the data c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 9 / 38
  10. Initial scaling model Outline 1 AWS cloud environment 2 Production

    data acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 10 / 38
  11. Initial scaling model Usual Time Series (Monitoring) View Our brains

    are not built for this Want best impedance match for cognitive processing5 5 “Seeing It All at Once with Barry,” Gunther and Jauvin, CMG 2007 (PDF) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 11 / 38
  12. Initial scaling model Time-Independent (Steady State) View N X Canonical

    closed-queue throughput N R Canonical closed-queue latency Queueing theory with finite requests tells us what to expect: Relationship between metrics, e.g., X and N Number of requests seen in our daly data N 500 Throughput X approaches a saturation ceiling as N → 500 (concave) Response time R grows linearly as “hockey stick handle” (convex) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 12 / 38
  13. Initial scaling model Production X-N Data: July 2016 0 100

    200 300 400 500 0 200 400 600 800 1000 Production Data July 2016 Concurrent users Throughput (req/s) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 13 / 38
  14. Initial scaling model PDQ Model of Throughput X(N) 0 100

    200 300 400 500 0 200 400 600 800 1000 Concurrent users Throughput (req/s) PDQ Model of Production Data July 2016 Nopt = 174.5367 thrds = 250.00 Data PDQ c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 14 / 38
  15. Initial scaling model PDQ Model of Response Time R(N) 0

    100 200 300 400 0.0 0.2 0.4 0.6 0.8 Concurrent users Response time (s) PDQ Model of Production Data July 2016 Nopt = 174.5367 thrds = 250.00 Data PDQ c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 15 / 38
  16. Initial scaling model PDQ (closed) Queueing Model N, Z S

    R(N) X(N) λ(N) Finite N mobile user-requests Think time Z = 0 for mobile Only 1 service time measured Tomcat on CPU: S = 0.8 ms Only 1 queue is definable Queue represents Tomcat server on an EC2 instance λ(N): mean request rate R(N): Tomcat response time X(N): Tomcat throughput X(N) = λ(N) in steady state c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 16 / 38
  17. Initial scaling model PDQ (closed) Queueing Model N, Z S

    R(N) X(N) λ(N) Finite N mobile user-requests Think time Z = 0 for mobile Only 1 service time measured Tomcat on CPU: S = 0.8 ms Only 1 queue is definable Queue represents Tomcat server on an EC2 instance λ(N): mean request rate R(N): Tomcat response time X(N): Tomcat throughput X(N) = λ(N) in steady state Erm ... except there’s a small problem ... c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 16 / 38
  18. Initial scaling model Dummy Queues Single N = 1 takes

    about 1 ms. But first data point in plot occurs at Nest = 133.8338 (see Slide 15) Nest Xdat Sest Rdat Udat 133.8338 416.4605 0.00088 0.32136 0.36642 for which R = 321.36 ms. Precise service time from LL: Sest = Udat /Xdat > 0.36642 / 416.4605 [1] 0.0008798433 Sest = 0.0008798433 s = 0.8798433 ms ≈ 1 ms. Using linear hockey handle characteristic (Z = 0) > 321.36 - (0.8798433 * 133.8338) [1] 203.6072 which underestimates 321.36 ms by 203.6 ms. Compensate with 200 dummy queues each with S 1 ms c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 17 / 38
  19. Initial scaling model 3-Tier Model CMG 2001 Similar model in

    Chap. 12 of my Perl::PDQ book Dws Das Ddb N clients Z = 0 ms Web Server App Server DBMS Server Requests Responses Dummy Servers Based on CMG 2001 Best Paper by Buch & Pentkovski (Intel Corp.) Tricky: dummy queues cannot have S > Sbottleneck (The meaning of these dummy latencies was never resolved) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 18 / 38
  20. Initial scaling model Outstanding Issues PDQ Tomcat model fits data

    visually but ... Need ∼ 200 dummy queues to get correct Rmin What do they represent in the actual Tomcat server? Service time Sest ≈ 0.001 s = 1 ms From table on Slide 9 CPU time derived from Udat = ρCPU in /proc Oh yeah, and what about those external service times? Hypotheses: (a) Successive polling (visits) to external services? (MC) (b) Some kind of hidden parallelism? (NJG) ... see Slide 23 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 19 / 38
  21. Initial scaling model Outstanding Issues PDQ Tomcat model fits data

    visually but ... Need ∼ 200 dummy queues to get correct Rmin What do they represent in the actual Tomcat server? Service time Sest ≈ 0.001 s = 1 ms From table on Slide 9 CPU time derived from Udat = ρCPU in /proc Oh yeah, and what about those external service times? Hypotheses: (a) Successive polling (visits) to external services? (MC) (b) Some kind of hidden parallelism? (NJG) ... see Slide 23 All this remained unresolved until ... c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 19 / 38
  22. Initial scaling model New Data Breaks PDQ Model Guerrilla mantra

    1.16 Data comes from the Devil, only models come from God. (except when they don’t) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 20 / 38
  23. Corrected scaling model Outline 1 AWS cloud environment 2 Production

    data acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 21 / 38
  24. Corrected scaling model Suprisingly ... Less (data) Is Better Too

    much initial data clouded 6 the actual scaling behavior 6See what I did there? c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 22 / 38
  25. Corrected scaling model Hypothesis (b) ... Backwards 7 October 2016

    Sched 300 300 444.41 411.62 ± 7.36 675.05 651.54 ± 3.66 Approx. 10% March 2018 Spot 254 254 203.60 199.36 ± 1.48 1247.54 1192.03 ± 8.54 Approx. 90% † Nknee is an input parameter to the PDQ model ‡ Corrected PDQ model Parallel is Just Fast Serial From the standpoint of queueing theory, parallel processing can be regarded as a form of fast serial processing. The left side of the diagram shows a pair of parallel queues, where requests arriving from outside at rate are split equally to arrive with reduced rate /2 into either one of the two queues. Assume = 0.5 requests/second and S = 1 second. When a request joins the tail of one of the parallel waiting lines, its expected time to get through that queue (waiting + service) is given by equation (1) in Berechenbare Performance [9], namely: Tpara = S 1 ( /2)S = 1.33 seconds (1) The right side of the diagram shows two queues in tandem, each twice as fast (S/2) as a parallel queue. Since the arrival flow is not split, the expected time to get through both queues is the sum of the times spent in each queue: Tserial = S/2 1 (S/2) + S/2 1 (S/2) = S 1 (S/2) = 1.33 seconds (2) Tserial in equation (2) is identical to Tpara in equation (1). Conversely, multi-stage serial processing can be trans- formed into an equivalent form of parallel processing [6, 8]. This insight helped identify the “hidden parallelism” in the July and October 2016 performance data that led to the correction of the initial PDQ Tomcat model. com/2014/07/a-little-triplet. html 2014 Systems Principles, Bolton Landing, New York, October 19–22, 2003 [13] N. Gunther, Guerrilla Capa Planning: A Tactical Approach 7 Inspired by a CMG 1993 paper, I developed an algorithm to solve parallel queues in the PDQ analyzer circa 1994, based on my observation above, and used it in my 1998 book The Practical Performance Analyst. c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 23 / 38
  26. Corrected scaling model Parallel PDQ Model of Throughput X(N) 0

    100 200 300 400 500 0 200 400 600 800 1000 Concurrent users Throughput (req/s) PDQ Model of Oct 2016 Data Corrected PDQ model (blue dots) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 24 / 38
  27. Corrected scaling model Parallel PDQ Model of Throughput R(N) 0

    100 200 300 400 0.0 0.2 0.4 0.6 0.8 Concurrent users Response time (s) Data PDQ PDQ Model of Oct 2016 Data Corrected PDQ model (blue dots) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 25 / 38
  28. Corrected scaling model Parallel PDQ Model N, Z S R(N)

    X(N) λ(N) m W S S Key differences: Rmin dominated by time inside external services True service time is Rmin : S = 444.4 ms (not CPU) Tomcat threads are now parallel service facilities Single waiting line (W) produces hockey handle Like every Fry’s customer waits for their own cashier But where is W located in the EC2 system? (still unresolved) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 26 / 38
  29. Corrected scaling model PDQ Numerical Validation 2016 0 100 200

    300 400 500 0 200 400 600 800 1000 Concurrent users Throughput (req/s) PDQ Model of Oct 2016 Data 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 Concurrent users Response time (s) Data PDQ PDQ Model of Oct 2016 Data c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 27 / 38
  30. Corrected scaling model Auto-Scaling knee and pseudo-saturation 0 100 200

    300 400 500 0 200 400 600 800 1000 Concurrent users Throughput (req/s) PDQ Model of Oct 2016 Data A/S policy triggered when instance CPU busy > 75% Induces pseudo-saturation at Nknee = 300 threads (vertical line) No additional Tomcat threads invoked above Nknee in this instance A/S spins up additional new EC2 instances (elastic capacity) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 28 / 38
  31. AWS auto-scaling costs Outline 1 AWS cloud environment 2 Production

    data acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 29 / 38
  32. AWS auto-scaling costs AWS Scheduled Scaling A/S policy threshold CPU

    > 75% Additional EC2 instances require up to 10 minutes to spin up Based on PDQ model, considered pre-emptive scheduling of EC2s (clock) Cheaper than A/S but only 10% savings Use N service threads to size the number of EC2 instances required for incoming traffic Removes expected spikes in latency and traffic (seen in time series analysis) c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 30 / 38
  33. AWS auto-scaling costs AWS Spot Pricing Spot instances for 90%

    discount on On-demand pricing Challenging to diversify instance types and sizes across the same group, e.g., Default instance type m4.10xlarge Spot market only has smaller m4.2xlarge type This forces manual reconfiguration of the application CPU (ρ), latency (R), traffic (λ) are no longer useful metrics for A/S policy Instead, use concurrency N as primary metric in A/S policy c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 31 / 38
  34. AWS auto-scaling costs PDQ Numerical Validation 2018 (cf. slide 27)

    0 100 200 300 400 500 600 0 500 1000 1500 PDQ Model of Prod Data Mar 2018 Concurrent users Throughput (req/sec) Rmin = 0.2236 Xknee = 1137.65 Nknee = 254.35 0 100 200 300 400 500 600 0.0 0.1 0.2 0.3 0.4 0.5 PDQ Model of Prod Data Mar 2018 Concurrent users Response time (s) Rmin = 0.2236 Xknee = 1137.65 Nknee = 254.35 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 32 / 38
  35. AWS auto-scaling costs Performance Improvements 2016–2018 2016 daily user profile

    20:00 01:00 06:00 11:00 16:00 150 200 250 300 350 400 450 UTC time (hours) User requests (N) 2018 daily user profile 20:00 01:00 06:00 11:00 16:00 0 100 200 300 400 500 600 UTC time (hours) User requests (N) Typical numero uno traffic profile discussed in my GCAP performance class Increasing cost-effective performance Date Rmin (ms) Xmax (RPS) NA/S Jul 2016 394.1 761.23 350 Oct 2016 444.4 675.07 300 · · · · · · · · · · · · Mar 2018 223.6 1135.96 254 Less variation in X and R due to improved traffic dsn c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 33 / 38
  36. Cloudy economics Outline 1 AWS cloud environment 2 Production data

    acquisition 3 Initial scaling model 4 Corrected scaling model 5 AWS auto-scaling costs 6 Cloudy economics c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 34 / 38
  37. Cloudy economics EC2 Instance Pricing Missed revenue? Max capacity line

    Spot instances On-demand instances Reserved instances Higher risk capex Lower risk capex Time Instances Instance capacity lines8 This is how AWS sees their own infrastructure capacity 8 J.D. Mills, “Amazon Lambda and the Transition to Cloud 2.0”, SF Bay ACM meetup, May 16, 2018 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 35 / 38
  38. Cloudy economics Name of the Game is Chargeback Google Compute

    Engine also offers reserved and spot pricing Table 1: Google VM per-hour pricing9 Machine vCPUs RAM (GB) Price ($) Preempt ($) n1-umem-40 40 938 6.3039 1.3311 n1-umem-80 80 1922 12.6078 2.6622 n1-umem-96 96 1433 10.6740 2.2600 n1-umem-160 160 3844 25.2156 5.3244 Capacity planning has not gone away because of the cloud Capacity Planning For The Cloud: A New Way Of Thinking Needed (DZone April 25, 2018) Cloud Capacity Management (DZone July 10, 2018) 9 TechCrunch, May 2018 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 36 / 38
  39. Cloudy economics Summary Cloud services are more about economic benefit

    for the hosting company than they are about technological innovation for the consumer Old-fashioned chargeback is back! Incumbent on you the customer to minimize your own cloud services costs Evolving services: containers, microservices, “serverless,” (e.g., AWS Lambda) Performance and capacity management have not gone away PDQ Tomcat model is a relatively simple yet insightful example of a cloud sizing-tool EC2 instance scalability was not a significant issue for this application You can pay LESS for MORE cloud performance! c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 37 / 38
  40. Cloudy economics Questions? www.perfdynamics.com Castro Valley, California Training —note the

    PDQ Workshop Blog Twitter Facebook info@perfdynamics.com —outstanding questions +1-510-537-5758 c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 38 / 38