Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apidays Singapore 2024 - Application and Platform Optimization through Power Analysis with Kepler by Philippe Benedetti (Palo IT) and Balakrishnan B (Red Hat)

Apidays Singapore 2024 - Application and Platform Optimization through Power Analysis with Kepler by Philippe Benedetti (Palo IT) and Balakrishnan B (Red Hat)

Application and Platform Optimization through Power Analysis with Kepler
Philippe Benedetti, Head of Application Development - Palo IT
Balakrishnan B, Principal Architect - Red Hat

Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024)

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

May 04, 2024
Tweet

More Decks by apidays

Other Decks in Technology

Transcript

  1. Application and Platform optimization through power analysis with Kepler Balakrishnan

    B - Principal Architect – Red Hat Philippe Benedetti – Head of application development – Palo IT
  2. Who are we? Introduction Philippe Benedetti Head of application development

    ------------------ PALO IT is an Agile software development consultancy dedicated to helping organisations in Singapore and across the world embrace tech as a force for good. Balakrishnan Principal Architect ------------------ Red Hat is the world’s leading provider of enterprise open source software solutions around hybrid cloud infrastructure, cloud native application development, management and automation solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500.
  3. Energy consumption of digital sector is growing Introduction and Context

    According to Gartner, “ ACM technology brief estimated that the information Tech sector is between 1.8% and 3.9% of global carbon emissions” Cloud energy consumption is growing exponentially Governments are requesting companies to report their AI-Based workloads energy consumption. 4%
  4. Cloud providers energy consumption reporting not mature Introduction and Context

    Difficulty to attribute Difficulty to gather data Difficulty to measure How to attribute power on share resources to processes, containers or Pods? How do we ensure data collection is exhaustive and correct? How do we standardise the data collected from a Multi cloud infrastructure? How to measure energy consumption indirectly? How to measure energy consumption of workloads?
  5. Kubernetes-based Efficient Power Level Exporter What is Kepler? Kepler helps

    in monitoring various performance counters, kernel scheduling parameters, and system configurations to expose energy consumptions by each container and Pod, through Prometheus metrics provider API. These metrics can be used for customer sustainability reporting, or by OpenShift controllers to optimize workload scheduling and configuration to achieve energy conservation goals.
  6. Cloud native sustainability stack What is Kepler? Power Exporter (main

    repository) Model Database (saved trained models) Kepler Operator Kepler Estimator (to load trained power models) CLEVER (Container Level Energy-efficient VPA Recommended for Kubernetes) PEAKS (Power Efficiency Aware Kubernetes Scheduler) Label Exporter (to aggregate metrics) Model Server (power model creator)
  7. Kepler growing Community What is Kepler? Github repository https://github.com/sustainable-computing-io/kepler CNCF

    Sandbox And more… As part of the Linux Foundation, we provide support, oversight and direction for fast-growing, cloud native projects, including Kubernetes, Envoy, and Prometheus. Open Source project
  8. What can we measure? What is Kepler? Container Container Energy

    Consumption: - Number of Joules … total … per Core (per CPU cores) … per Uncore (level of cache, integrated GPU, memory controller) … per GPU Container Resource utilization (using hardware counters): - CPU time used by the container - CPU cycles used by the container - CPU instructions used by the container - Total cache miss that has occurred for a container Similar metrics are available at Node Level.
  9. How does it work? What is Kepler? 1 – Collect

    data from - Resource usage (using eBPF) - Node Energy consumption 2 – Leverage K8s API - Convert Container ID to Pod 3 – Create and export as Prometheus metrics 4 – Visualise in Prometheus When data points are not available, KELPER leverage ML models to estimate and attribute the consumption to relevant Pods, workflows. The Power Estimation Modeling estimates a power by using usage metrics as input features of the trained model. Measure vs Estimate
  10. Application optimization Demo 1 - Simple Spring Boot app rewritten

    in Quarkus - Run load on the app - Compare compute and power efficiency between the apps Optimise application power usage
  11. CONFIDENTIAL designator Information Classification: CONFIDENTIAL (sensitive business information, the level

    of protection is dictated by legal agreements) Platform optimization Demo 2 Auto reschedule app to a power efficient node based on power consumption - Custom scheduler monitors node power metrics - Reschedule app to a node with least power consumption - Scale up/down the app pods based on power metrics
  12. CONFIDENTIAL designator Information Classification: CONFIDENTIAL (sensitive business information, the level

    of protection is dictated by legal agreements) Demo 2: Platform optimization