Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Productivity: A New Metric for Performance Anal...

Stefano Doni
January 12, 2024
66

Productivity: A New Metric for Performance Analysis of Multi-Core and Multi-Threaded Processors

CPU Utilization is the most important metric used in computer performance analysis, capacity planning and tuning. However, due to recent advancements in CPU architecture like dynamic frequency scaling and hyper-threading, it's becoming more tricky to interpret and less useful than it used to be.

Can we design a new metric that represents CPU capacity and overcome the above issues?

In this session, I present a research project aimed at building CPU Productivity, a new metric that leverages hardware performance counters found in modern CPUs to improve the accuracy of our performance work.

Presented at CMG IMPACT 2016:
https://www.linkedin.com/pulse/cmgimpact-conference-performance-capacity-anoush-najarian/

Stefano Doni

January 12, 2024
Tweet

More Decks by Stefano Doni

Transcript

  1. Stefano Doni – [email protected] @stef3a linkedin.com/in/stefanodoni Productivity: A New Metric

    for Performance Analysis of Multi-Core and Multi-Threaded Processors
  2. 2 Agenda Motivation: why care? The Problem with Utilization on

    modern processors The Design of Productivity, a new metric for x86 CPUs Results & Benefits 1 2 3 4
  3. 4 Business is calling for IT to become a key

    partner More and more IT is rightfully asked to size capacity over business goals Sure, I’m on it! I’ll roll the dices and get back to you! Average IT Manager Hi John, our CTO needs to know if we’d be able to sustain 30% more users on our ecommerce site Random Capacity Planner
  4. 5 IT Resource Utilization % Current Working Area Maximum Business

    Capacity A Business-centric Capacity Modelling framework This system can manage up to 14k users! IT Saturation Threshold Application or Business Volume (e.g. # of Users) Residual Capacity
  5. 8 CPU Utilization Misteries on Modern Processors … 1/3 180k

    predicted 130k actual Q: What is the cause? A: Hyper-Threading!
  6. 9 CPU Utilization Misteries on Modern Processors … 2/3 Service

    Demand = 2 ms Service Demand = 1 ms Service Demand = 1.5 ms Q: What is the cause? A: Frequency Scaling! 1.2 GHZ 2.0 GHZ
  7. 10 CPU Utilization Misteries on Modern Processors … 3/3 Q:

    What is the cause? A: Turbo Boost! 150k @ 2.9 GHZ 110k @ 2 GHZ
  8. 11 What about Response Time on Modern Processors? Q: What

    is the cause? A: Frequency Scaling!
  9. 12 The Design of Productivity, a new metric for x86

    CPUs It looks like the “CPU Utilization” does not tell you the utilization of the CPU! What can we do then?
  10. 13 Productivity Design Goals Quantify actual processor capacity usage and

    headroom (0-100%) Support capacity prediction, i.e. be linearly correlated with application throughput 1 2
  11. 14 Hardware Performance Counters to the Rescue! Modern processors are

    equipped with built-in facilities that can provide low-level information about the inner workings of all of the main processor components, called Hardware Performance Counters (HPCs, or PMCs, …) • Can measure CPU cycles, instructions, cache, memory bandwidth, frequency, … • Can be accessed by using tools such as Linux perf The only problem with HPCs is how to makes sense of them! • Hundredths of HPCs available in common x86 processors • No “Utilization 2.0” counter available OOTB!
  12. 15 Explorative Benchmarking Methodology & Tools Intel Core i7 3537

    dual core, with HT and TurboBoost Ubuntu Linux 15.04, kernel 3.19 Statistical Analysis & Modeling Workload Generator ✔ Benchmark ✔ OS metrics ✔ Perf counters Data Gathering SUT SPECpower_ssj2008 benchmark sar for OS metrics, perf for perf counters In-house python tool using pandas, numpy and sklearn automate & iterate
  13. 17 The perfect performance counter for capacity planning? Retired Instructions

    is linearly correlated with application throughput!
  14. 18 Toward a New, Work-based Processor Metric Key Question: How

    to estimate the number of instructions a processor can retire at peak capacity?
  15. 19 A Basic Processor Performance Model If we could estimate

    the workload Instructions Per Cycle (IPC), then:
  16. 20 Enhanced Model for Hyper-Threaded Processors HyperThread 2 HyperThread 1

    unhalted unhalted halted halted T1: core operating with 1 active HT T2: core operating with 2 active HT T1 T1 T2 Number of Cycles The processor core capacity is:
  17. 21 Estimating the Model using Intel x86 Hardware Performance Counters

    Unfortunately, we are not aware of any performance counter that allows to count instructions retired while operating with two active hyper-threads. To derive IPC at T2 we perform a multivariate regression analysis on: Where the following metrics are measured on the live system: • Instruictions retired by the core is derived as a sum of the IR by the hyper-threads • Cycles at T1 and T2 are not measured, but can be derived from existing counters The max number of cycles at T2 is determined based on processor specs and OS power management configurations.
  18. 22 The End Result: Productivity! Productivity is linearly correlated with

    throughput from 0 to 100%! Models estimated at low load (four points)
  19. 24 Predicting System Capacity: Productivity vs Utilization 112k Prod. @

    80% 180k Util. @ 80% Productivity reduces max capacity estimation error by 7x wrt Utilization Models estimated at low load (four points)
  20. 26 Conclusions and Key Takeaways Metrics lie, be sure to

    know what they actually measure! This affects everything – on prem, virtualization, cloud, VMs, containers … «CPU Utilization» is the most important metric in performance evaluation, yet it can be broken and not reliable on modern processors We need better metrics! Productivity is our contribution and can help to understand the actual residual capacity and bottlenecks in modern systems 1 2 3
  21. Headquarters Via Schiaffino 11C 20158 Milan Italy T +39-024951-7001 USA

    East 283 Franklin Street Boston, MA 02110 T: +1-617-936-0212 USA West 425 Broadway Street Redwood City, CA 94063 T +1-650-226-4274 Contacts @moviri moviricorp moviri +moviri
  22. 29 Symmetrical Multi-Threading impact on CPU Utilization HT on, 1

    thread per core • Utilization 50% • 2 Transactions/sec • Utilization 100% • 2 Transactions/sec HT off HT on, two threads on the same core • Utilization 50% • 1,25 Transactions/sec* Hyperthreading alters the semantic of traditional utilization metrics and greatly impacts the understanding of the real CPU capacity. 50% can be the new 80%, or even 100%! * 1,25 Trans/sec assuming 25% speedup due to HT Same CPU utilization, but radically different throughput!