Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cost versus Performance: Walking the Tightrope

Cost versus Performance: Walking the Tightrope

How do you manage cloud costs while maintaining performance and user experience? For many, it’s like walking a tight-rope; lean too far in either direction and you’ll suffer. Analytics can help you find just the right balance by revealing actionable insights about the tradeoffs between performance and cost. This presentation offers (anonymized) case studies that highlight both analytics used and realized cost savings.

Elizabeth Nichols

June 09, 2016
Tweet

More Decks by Elizabeth Nichols

Other Decks in Technology

Transcript

  1. Cost vs Performance: Walking the Tight-Rope Betsy Nichols, Ph.D. Chief

    Data Scientist June 2016 – DevOpsDays DC @eanTweet
  2. www.netuitive.com 2 Abstract •  Abstract: How do you manage cloud

    costs while maintaining performance and user experience? For many, it’s like walking a tight-rope; lean too far in either direction and you’ll suffer. Analytics can help you find just the right balance by revealing actionable insights about the tradeoffs between performance and cost. This presentation offers (anonymized) case studies that highlight both analytics used and realized cost savings. •  Description This presentation is for members of DevOps teams who have dual responsibilities for delivering industrial strength, responsive services while managing specific budgets and controlling cloud costs. Specific analytics and models will be presented that can provide valuable insights regarding interactions between such factors as capacity and cost, capacity utilization and performance, and reliability and cost. •  It’s easy to make mistakes when you aren’t looking at the complete picture. If you are under provisioned, that can result in cost savings at the expense of customer experience. If you over provisioned, that can lead to better customer experience but also waste resources and incur needless expense. This is the tight rope. If you lean too far in one direction, you’ll suffer undesirable consequences. •  Building on insights from analytics that quantify the interactions between cost and performance, concrete recommendations can be derived that help strike an appropriate balance. This presentation describes such analytics and provides (anonymized) cases from AWS environments where they have been effective.
  3. www.netuitive.com 6 The stories you are about to hear are

    true. The names have been changed to protect the innocent. @eanTweet
  4. www.netuitive.com 9 Conceptual Consumer Web Site user Web tier App

    tier DB tier Capacity Added Only When Needed Current Compute Capacity @eanTweet
  5. www.netuitive.com 11 AWS Services (more detail) • Route 53 • EC2 • ELB

    • EBS • S3 • RDS • EMR • CloudFront • CloudWatch • Elasticache • Kinesis • AWS Config • … @eanTweet
  6. www.netuitive.com 15 One Proven Path Spend Distribution λ µ Capacity

    Allocation λ µ Under- Utilization λ µ Workload Distribution λ µ Pricing Options Models λ µ @eanTweet
  7. www.netuitive.com 20 EC2 Census Tiers EC2 Count Web 10 Application

    120 Database 15 Analytics 10 Other 25 @eanTweet
  8. www.netuitive.com 21 Arrival Rate •  Queue Length •  Response Time

    Completion Rate •  # EC2’s •  Mean CPU Utilization MTTR MTBF Availability Resilience •  # EC2’s •  Mean CPU Utilization Cost @eanTweet EC2 EC2 EC2
  9. www.netuitive.com 23 EC2 Costs by Micro Service (ASG) A B

    C D E F G H I J K L W X Y Z # Nodes = 11; Utilization = 20% @eanTweet Average EC2 CPU Utilization
  10. www.netuitive.com 25 Make It Work Max Workload Time # Nodes

    = 11 Response Time < 3 sec ASG Nodes Workload Max Requests/sec Requests/sec @eanTweet
  11. www.netuitive.com 29 EC2 On-Demand SKUs1 1As of May 2016: Includes

    on-demand and reserved pricing only. Includes SKU’s for defunct but still active pricing. Spot pricing does not have SKUs. https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/index.csv" @eanTweet
  12. www.netuitive.com 32 Pricing Options Compared Feature onDemand Reserved Spot Purch

    Term 1 hour 1, 3 years 1-6 hrs Up Front $$ None 0,50%,100% None Delivery Guaranteed Guaranteed As avail Termination By user By term 1 hr: price Blk: term Risk Low Very Low 1hr: High Blk: Med. Cost High Medium Low Savings over onDemand $0.00 Up to 75%1 Market 1https://aws.amazon.com/ec2/purchasing-options/reserved-instances/ @eanTweet
  13. www.netuitive.com 35 The RI bill does not depend on usage.

    OnDemand and Spot do. Spot @eanTweet
  14. M/M/c Queuing Model •  λ = Arrival rate with Poisson

    distribution •  µ = Average service time with exponential distribution •  c = # servers •  Servers serve from the front of the queue (FCFS) •  If there are less than c jobs, some servers will be idle •  If there are greater than c jobs, some will queue in a buffer •  The buffer is of infinite size @eanTweet
  15. Equations Response time = The probability that an arriving job

    is forced to queue is given by Erlange’s C formula: intensity = Ref: https://en.wikipedia.org/wiki/M/M/c_queue @eanTweet
  16. M/M/c Model Results 2 4 6 8 10 µ =

    1.0s Response Time @eanTweet
  17. M/M/c Model Application INPUT: • Service Time (µ) • Arrival rate of

    requests (λ) • Response Time goal (3s) OUTPUT: • Response Time • Mapping: arrival rate à # nodes with constraint: response time < 3s @eanTweet
  18. www.netuitive.com 43 Save $$ with Scaling Model Reserve OnDemand Spot

    Response Time < 3 sec Arrival Rate @eanTweet
  19. www.netuitive.com 45 Save $$ with Scaling Model Reserve OnDemand Spot

    Response Time < 3 sec Arrival Rate @eanTweet
  20. www.netuitive.com 47 Risk Mitigation Factors Max scale-in rate: Max scale-out

    rate: Min instance counts: Zero thresholds: Target response time: Workload metric: Update frequency: 3 sec aws.elb.elb-c.arrivalrate hourly 1% 2 3 unlimited @eanTweet
  21. www.netuitive.com 48 Save $$ with Scaling Model Reserve OnDemand Spot

    Savings: RI + EC2 count + Spots Arrival Rate @eanTweet
  22. www.netuitive.com 50 1. AWS (and others) sell you capacity. 2. How you

    use it is up to you 3. Know yourself 4. Leverage models to map your context to cost savings strategies 5. You can walk this tightrope! @eanTweet
  23. www.netuitive.com 51 @eanTweet “Success consists of going from failure to

    failure without loss of enthusiasm.” - Winston Churchill @eanTwee
  24. www.netuitive.com 52 Contact Info Elizabeth Nichols, Ph.D. Chief Data Scientist

    [email protected] @eanTweet www.netuitive.com @netuitive (703) 464-1500 @eanTweet