Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Scale in the Cloud: Chargeback is Back, Baby!

How to Scale in the Cloud: Chargeback is Back, Baby!

The need for system administrators—especially Linux sys admins—to do performance management has returned with a vengeance. Why? The cloud. Resource consumption in the cloud is all about run now, pay later (AKA 'chargeback' in mainframe-ese). This talk will show you how performance models can help to find the most cost-effective deployment of your applications on Amazon Web Services (AWS). The same technique should be transferable to other cloud services. Presented at Rocky Mountain CMG, Denver, Colorado.

Dr. Neil Gunther

December 05, 2019
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Technology

Transcript

  1. How to Scale in the Cloud
    Chargeback is Back, Baby!
    Neil J. Gunther @DrQz
    Performance Dynamics
    Rocky Mountain CMG
    Denver, Colorado
    December 5, 2019
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 1 / 38

    View full-size slide

  2. Everything old is new again
    Abstract
    The need for system administrators—especially Linux sys admins—to do per-
    formance management has returned with a vengeance. Why? The cloud. Resource
    consumption in the cloud is all about run now, pay later1 (AKA chargeback2 in
    mainframe-ese). This talk will show you how performance models can help to find
    the most cost-effective deployment of your applications on Amazon Web Services
    (AWS). The same technique should be transferable to other cloud services.
    1
    Chargeback disappeared with the arrival of the PC revolution and the advent of distributed client-server architectures.
    2
    Chargeback underpins the cloud business model, especially when it comes to the development of hot applications, e.g.,
    “Microsoft wants every developer to be an AI developer, which would help its already booming Azure Cloud business do better
    still: AI demands data, which requires cloud processing power and generates bills.” —The Register, May 2018
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 2 / 38

    View full-size slide

  3. Previous work
    1 Joint work with Mohit Chawla
    Senior Systems Engineer, Hamburg, Germany
    First foray into modeling cloud applications with PDQ
    Period from June 2016 to April 2018
    First validated cloud queueing model (AFAIK)
    2 Presented jointly at CMG cloudXchange, July 2018
    3 Published in Linux Magazin
    February 2019 (in German)
    English manuscript available on arXiv.org
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 3 / 38

    View full-size slide

  4. AWS cloud environment
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 4 / 38

    View full-size slide

  5. AWS cloud environment
    AWS Cloud Application Platform
    Entire application runs in the Amazon cloud
    Elastic load balancer (ELB)
    AWS Elastic Cluster (EC2) instance
    type m4.10xlarge: 20 cpus = 40 vpus
    Auto Scaling group (A/S)
    Mobile users make requests to Apache HTTP
    server3 via ELB on EC2
    Tomcat thread server4 on EC2 calls external
    services (i.e., 3rd-party web servers)
    A/S controls number of active EC2 instances
    based on incoming traffic and configured policies
    ELB balances incoming traffic across all active
    EC2 nodes in the AWS cluster
    3
    Versions 2.2 and 2.4
    4
    Versions 7 and 8
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 5 / 38

    View full-size slide

  6. AWS cloud environment
    Request Processing Workflow
    On a single EC2 instance:
    1 Incoming HTTP Request from mobile user processed by Apache + Tomcat
    2 Tomcat then sends multiple requests to External Services based on original request
    3 External services respond and Tomcat computes business logic based on all those
    Responses
    4 Tomcat sends the final Response back to originating mobile user
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 6 / 38

    View full-size slide

  7. Production data acquisition
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 7 / 38

    View full-size slide

  8. Production data acquisition
    Data Tools and Scripts
    JMX (Java Management Extensions) data from JVM
    jmxterm
    VisualVM
    Java Mission Control
    Datadog dd-agent
    Datadog — also integrates with AWS CloudWatch metrics
    Collectd — Linux performance data collection
    Graphite and statsd — application metrics collection & storage
    Grafana — time-series data plotting
    Custom data collection scripts by M. Chawla
    R statistical libs and RStudio IDE
    PDQ performance modeling tool by NJG
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 8 / 38

    View full-size slide

  9. Production data acquisition
    Distilled EC2 Instance Data
    These few perf metrics are sufficient to parameterize our PDQ model
    Timestamp, Xdat, Nest, Sest, Rdat, Udat
    1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120
    1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420
    1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980
    1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700
    1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860
    1486772700000, 528.587722, 201.187500, 0.000914, 0.366283, 0.483160
    1486773000000, 533.439054, 202.600006, 0.000892, 0.378207, 0.476080
    1486773300000, 531.708059, 208.187500, 0.000909, 0.392556, 0.483160
    1486773600000, 532.693783, 203.266663, 0.000894, 0.379749, 0.476020
    1486773900000, 519.748550, 200.937500, 0.000895, 0.381078, 0.465260
    ...
    Interval between Unix Timestamp rows is 300 seconds
    Little’s law (LL) gives relationships between above metrics:
    1 Nest = Xdat × Rdat
    : macroscopic LL =⇒ thread concurrency
    2 Udat
    = Xdat × Sest
    : microscopic LL =⇒ resource service times
    LL provides a consistency check of the data
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 9 / 38

    View full-size slide

  10. Initial scaling model
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 10 / 38

    View full-size slide

  11. Initial scaling model
    Usual Time Series (Monitoring) View
    Our brains are not built for this
    Want best impedance match for
    cognitive processing5
    5
    “Seeing It All at Once with Barry,” Gunther and Jauvin, CMG 2007 (PDF)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 11 / 38

    View full-size slide

  12. Initial scaling model
    Time-Independent (Steady State) View
    N
    X
    Canonical closed-queue throughput
    N
    R
    Canonical closed-queue latency
    Queueing theory with finite requests tells us what to expect:
    Relationship between metrics, e.g., X and N
    Number of requests seen in our daly data N 500
    Throughput X approaches a saturation ceiling as N → 500 (concave)
    Response time R grows linearly as “hockey stick handle” (convex)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 12 / 38

    View full-size slide

  13. Initial scaling model
    Production X-N Data: July 2016
    0 100 200 300 400 500
    0 200 400 600 800 1000
    Production Data July 2016
    Concurrent users
    Throughput (req/s)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 13 / 38

    View full-size slide

  14. Initial scaling model
    PDQ Model of Throughput X(N)
    0 100 200 300 400 500
    0 200 400 600 800 1000
    Concurrent users
    Throughput (req/s)
    PDQ Model of Production Data July 2016
    Nopt = 174.5367
    thrds = 250.00
    Data
    PDQ
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 14 / 38

    View full-size slide

  15. Initial scaling model
    PDQ Model of Response Time R(N)
    0 100 200 300 400
    0.0 0.2 0.4 0.6 0.8
    Concurrent users
    Response time (s)
    PDQ Model of Production Data July 2016
    Nopt = 174.5367
    thrds = 250.00
    Data
    PDQ
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 15 / 38

    View full-size slide

  16. Initial scaling model
    PDQ (closed) Queueing Model
    N, Z
    S
    R(N)
    X(N)
    λ(N)
    Finite N mobile user-requests
    Think time Z = 0 for mobile
    Only 1 service time measured
    Tomcat on CPU: S = 0.8 ms
    Only 1 queue is definable
    Queue represents Tomcat
    server on an EC2 instance
    λ(N): mean request rate
    R(N): Tomcat response time
    X(N): Tomcat throughput
    X(N) = λ(N) in steady state
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 16 / 38

    View full-size slide

  17. Initial scaling model
    PDQ (closed) Queueing Model
    N, Z
    S
    R(N)
    X(N)
    λ(N)
    Finite N mobile user-requests
    Think time Z = 0 for mobile
    Only 1 service time measured
    Tomcat on CPU: S = 0.8 ms
    Only 1 queue is definable
    Queue represents Tomcat
    server on an EC2 instance
    λ(N): mean request rate
    R(N): Tomcat response time
    X(N): Tomcat throughput
    X(N) = λ(N) in steady state
    Erm ... except there’s a small problem ...
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 16 / 38

    View full-size slide

  18. Initial scaling model
    Dummy Queues
    Single N = 1 takes about 1 ms. But first data point in plot occurs at
    Nest = 133.8338 (see Slide 15)
    Nest Xdat Sest Rdat Udat
    133.8338 416.4605 0.00088 0.32136 0.36642
    for which R = 321.36 ms.
    Precise service time from LL: Sest = Udat /Xdat
    > 0.36642 / 416.4605
    [1] 0.0008798433
    Sest = 0.0008798433 s = 0.8798433 ms ≈ 1 ms.
    Using linear hockey handle characteristic (Z = 0)
    > 321.36 - (0.8798433 * 133.8338)
    [1] 203.6072
    which underestimates 321.36 ms by 203.6 ms.
    Compensate with 200 dummy queues each with S 1 ms
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 17 / 38

    View full-size slide

  19. Initial scaling model
    3-Tier Model CMG 2001
    Similar model in Chap. 12 of my Perl::PDQ book
    Dws
    Das
    Ddb
    N clients
    Z = 0 ms
    Web Server App Server DBMS Server
    Requests Responses
    Dummy Servers
    Based on CMG 2001 Best Paper by Buch & Pentkovski (Intel Corp.)
    Tricky: dummy queues cannot have S > Sbottleneck
    (The meaning of these dummy latencies was never resolved)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 18 / 38

    View full-size slide

  20. Initial scaling model
    Outstanding Issues
    PDQ Tomcat model fits data visually but ...
    Need ∼ 200 dummy queues to get correct Rmin
    What do they represent in the actual Tomcat server?
    Service time Sest ≈ 0.001 s = 1 ms
    From table on Slide 9
    CPU time derived from Udat
    = ρCPU
    in /proc
    Oh yeah, and what about those external service times?
    Hypotheses:
    (a) Successive polling (visits) to external services? (MC)
    (b) Some kind of hidden parallelism? (NJG) ... see Slide 23
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 19 / 38

    View full-size slide

  21. Initial scaling model
    Outstanding Issues
    PDQ Tomcat model fits data visually but ...
    Need ∼ 200 dummy queues to get correct Rmin
    What do they represent in the actual Tomcat server?
    Service time Sest ≈ 0.001 s = 1 ms
    From table on Slide 9
    CPU time derived from Udat
    = ρCPU
    in /proc
    Oh yeah, and what about those external service times?
    Hypotheses:
    (a) Successive polling (visits) to external services? (MC)
    (b) Some kind of hidden parallelism? (NJG) ... see Slide 23
    All this remained unresolved until ...
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 19 / 38

    View full-size slide

  22. Initial scaling model
    New Data Breaks PDQ Model
    Guerrilla mantra 1.16
    Data comes from the Devil, only models come from God.
    (except when they don’t)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 20 / 38

    View full-size slide

  23. Corrected scaling model
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 21 / 38

    View full-size slide

  24. Corrected scaling model
    Suprisingly ... Less (data) Is Better
    Too much initial data clouded 6 the actual scaling behavior
    6See what I did there?
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 22 / 38

    View full-size slide

  25. Corrected scaling model
    Hypothesis (b) ... Backwards 7
    October 2016 Sched 300 300 444.41 411.62 ± 7.36 675.05 651.54 ± 3.66 Approx. 10%
    March 2018 Spot 254 254 203.60 199.36 ± 1.48 1247.54 1192.03 ± 8.54 Approx. 90%
    † Nknee is an input parameter to the PDQ model
    ‡ Corrected PDQ model
    Parallel is Just Fast Serial
    From the standpoint of queueing theory, parallel processing can be regarded as a form of fast serial processing. The
    left side of the diagram shows a pair of parallel queues, where requests arriving from outside at rate are split equally
    to arrive with reduced rate /2 into either one of the two queues. Assume = 0.5 requests/second and S = 1 second.
    When a request joins the tail of one of the parallel waiting lines, its expected time to get through that queue (waiting
    + service) is given by equation (1) in Berechenbare Performance [9], namely:
    Tpara =
    S
    1 ( /2)S
    = 1.33 seconds (1)
    The right side of the diagram shows two queues in tandem, each twice as fast (S/2) as a parallel queue. Since the
    arrival flow is not split, the expected time to get through both queues is the sum of the times spent in each queue:
    Tserial =
    S/2
    1 (S/2)
    +
    S/2
    1 (S/2)
    =
    S
    1 (S/2)
    = 1.33 seconds (2)
    Tserial in equation (2) is identical to Tpara in equation (1). Conversely, multi-stage serial processing can be trans-
    formed into an equivalent form of parallel processing [6, 8]. This insight helped identify the “hidden parallelism” in the
    July and October 2016 performance data that led to the correction of the initial PDQ Tomcat model.
    com/2014/07/a-little-triplet.
    html 2014
    Systems Principles, Bolton Landing,
    New York, October 19–22, 2003
    [13] N. Gunther, Guerrilla Capa
    Planning: A Tactical Approach
    7
    Inspired by a CMG 1993 paper, I developed an algorithm to solve parallel queues in the PDQ analyzer circa 1994, based
    on my observation above, and used it in my 1998 book The Practical Performance Analyst.
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 23 / 38

    View full-size slide

  26. Corrected scaling model
    Parallel PDQ Model of Throughput X(N)
    0 100 200 300 400 500
    0 200 400 600 800 1000
    Concurrent users
    Throughput (req/s)
    PDQ Model of Oct 2016 Data
    Corrected PDQ model (blue dots)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 24 / 38

    View full-size slide

  27. Corrected scaling model
    Parallel PDQ Model of Throughput R(N)
    0 100 200 300 400
    0.0 0.2 0.4 0.6 0.8
    Concurrent users
    Response time (s)
    Data
    PDQ
    PDQ Model of Oct 2016 Data
    Corrected PDQ model (blue dots)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 25 / 38

    View full-size slide

  28. Corrected scaling model
    Parallel PDQ Model
    N, Z
    S
    R(N)
    X(N)
    λ(N)
    m
    W
    S
    S
    Key differences:
    Rmin
    dominated by time inside
    external services
    True service time is Rmin
    :
    S = 444.4 ms (not CPU)
    Tomcat threads are now parallel
    service facilities
    Single waiting line (W) produces
    hockey handle
    Like every Fry’s customer waits
    for their own cashier
    But where is W located in the
    EC2 system? (still unresolved)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 26 / 38

    View full-size slide

  29. Corrected scaling model
    PDQ Numerical Validation 2016
    0 100 200 300 400 500
    0 200 400 600 800 1000
    Concurrent users
    Throughput (req/s)
    PDQ Model of Oct 2016 Data
    0 100 200 300 400
    0.0 0.2 0.4 0.6 0.8
    Concurrent users
    Response time (s)
    Data
    PDQ
    PDQ Model of Oct 2016 Data
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 27 / 38

    View full-size slide

  30. Corrected scaling model
    Auto-Scaling knee and pseudo-saturation
    0 100 200 300 400 500
    0 200 400 600 800 1000
    Concurrent users
    Throughput (req/s)
    PDQ Model of Oct 2016 Data
    A/S policy triggered when instance CPU busy > 75%
    Induces pseudo-saturation at Nknee
    = 300 threads (vertical line)
    No additional Tomcat threads invoked above Nknee
    in this instance
    A/S spins up additional new EC2 instances (elastic capacity)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 28 / 38

    View full-size slide

  31. AWS auto-scaling costs
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 29 / 38

    View full-size slide

  32. AWS auto-scaling costs
    AWS Scheduled Scaling
    A/S policy threshold CPU > 75%
    Additional EC2 instances require up to
    10 minutes to spin up
    Based on PDQ model, considered
    pre-emptive scheduling of EC2s (clock)
    Cheaper than A/S but only
    10% savings
    Use N service threads to size the
    number of EC2 instances required for
    incoming traffic
    Removes expected spikes in latency
    and traffic (seen in time series analysis)
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 30 / 38

    View full-size slide

  33. AWS auto-scaling costs
    AWS Spot Pricing
    Spot instances for 90% discount on
    On-demand pricing
    Challenging to diversify instance types
    and sizes across the same group, e.g.,
    Default instance type
    m4.10xlarge
    Spot market only has smaller
    m4.2xlarge type
    This forces manual reconfiguration
    of the application
    CPU (ρ), latency (R), traffic (λ) are no
    longer useful metrics for A/S policy
    Instead, use concurrency N as primary
    metric in A/S policy
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 31 / 38

    View full-size slide

  34. AWS auto-scaling costs
    PDQ Numerical Validation 2018 (cf. slide 27)
    0 100 200 300 400 500 600
    0 500 1000 1500
    PDQ Model of Prod Data Mar 2018
    Concurrent users
    Throughput (req/sec)
    Rmin = 0.2236
    Xknee = 1137.65
    Nknee = 254.35
    0 100 200 300 400 500 600
    0.0 0.1 0.2 0.3 0.4 0.5
    PDQ Model of Prod Data Mar 2018
    Concurrent users
    Response time (s)
    Rmin = 0.2236
    Xknee = 1137.65
    Nknee = 254.35
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 32 / 38

    View full-size slide

  35. AWS auto-scaling costs
    Performance Improvements 2016–2018
    2016 daily user profile
    20:00 01:00 06:00 11:00 16:00
    150 200 250 300 350 400 450
    UTC time (hours)
    User requests (N)
    2018 daily user profile
    20:00 01:00 06:00 11:00 16:00
    0 100 200 300 400 500 600
    UTC time (hours)
    User requests (N)
    Typical numero uno traffic profile
    discussed in my GCAP performance class
    Increasing cost-effective performance
    Date Rmin (ms) Xmax (RPS) NA/S
    Jul 2016 394.1 761.23 350
    Oct 2016 444.4 675.07 300
    · · · · · · · · · · · ·
    Mar 2018 223.6 1135.96 254
    Less variation in X and R due to improved traffic dsn
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 33 / 38

    View full-size slide

  36. Cloudy economics
    Outline
    1 AWS cloud environment
    2 Production data acquisition
    3 Initial scaling model
    4 Corrected scaling model
    5 AWS auto-scaling costs
    6 Cloudy economics
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 34 / 38

    View full-size slide

  37. Cloudy economics
    EC2 Instance Pricing
    Missed revenue?
    Max capacity line
    Spot instances
    On-demand instances
    Reserved instances
    Higher
    risk
    capex
    Lower
    risk
    capex
    Time
    Instances
    Instance capacity lines8
    This is how AWS sees their own infrastructure capacity
    8
    J.D. Mills, “Amazon Lambda and the Transition to Cloud 2.0”, SF Bay ACM meetup, May 16, 2018
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 35 / 38

    View full-size slide

  38. Cloudy economics
    Name of the Game is Chargeback
    Google Compute Engine also offers reserved and spot pricing
    Table 1: Google VM per-hour pricing9
    Machine vCPUs RAM (GB) Price ($) Preempt ($)
    n1-umem-40 40 938 6.3039 1.3311
    n1-umem-80 80 1922 12.6078 2.6622
    n1-umem-96 96 1433 10.6740 2.2600
    n1-umem-160 160 3844 25.2156 5.3244
    Capacity planning has not gone away because of the cloud
    Capacity Planning For The Cloud: A New Way Of Thinking Needed
    (DZone April 25, 2018)
    Cloud Capacity Management (DZone July 10, 2018)
    9
    TechCrunch, May 2018
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 36 / 38

    View full-size slide

  39. Cloudy economics
    Summary
    Cloud services are more about economic benefit for
    the hosting company than they are about technological
    innovation for the consumer
    Old-fashioned chargeback is back!
    Incumbent on you the customer to minimize your own
    cloud services costs
    Evolving services: containers, microservices,
    “serverless,” (e.g., AWS Lambda)
    Performance and capacity management have not gone
    away
    PDQ Tomcat model is a relatively simple yet insightful
    example of a cloud sizing-tool
    EC2 instance scalability was not a significant issue for
    this application
    You can pay LESS for MORE cloud performance!
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 37 / 38

    View full-size slide

  40. Cloudy economics
    Questions?
    www.perfdynamics.com
    Castro Valley, California
    Training —note the PDQ Workshop
    Blog
    Twitter
    Facebook
    [email protected] —outstanding questions
    +1-510-537-5758
    c 2019 Performance Dynamics How to Scale in the Cloud December 6, 2019 38 / 38

    View full-size slide