Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Achieve Superhuman Performance with Machine Learning

Stefano Doni
January 12, 2024

Achieve Superhuman Performance with Machine Learning

How you can get 2x throughput and performance out of a MongoDB database by properly tuning the software stack configurations (MongoDB + Linux OS) using ML

Stefano Doni

January 12, 2024

More Decks by Stefano Doni


  1. Configuration complexity is huge and on the rise MySQL configuration

    parameters grown 5x in the last 10 years Oracle Hotspot JVM versions has >700 configuration parameters How do configurations impact application performance and infrastructure efficiency? 757
  2. Configurations significantly impact performance and costs Source: Moviri Computer Measurement

    Group Best Paper, 2015 Transactions / sec Working days CPU Util Workload CPU Utilization % Before tuning: 60% CPU Utilization After tuning: CPU cut by 5x JBOSS JVM Performance Tuning
  3. Hyper-configuration beyond Human scale Hardware (Cloud) VM Instance Operating System

    Container Java Virtual Machine Middleware & Framework Application (Cloud) Network (Cloud) Storage # of Parameters 700 100 500 200 10 10 10 10 Looking for the optimal settings? It’s easy, just try 2100 = 121,267,650,600,228,229,4 01,496,703,205,376 configurations…
  4. Optimizing a core Banking platform • Goal • Increase the

    key business service metric: payments per second • … while keeping latency under SLAs • ... without additional infrastructure and license costs • Optimization scope • Java OpenJDK 8 • JBoss • RedHat DataGrid (InfiniSpan) • Linux
  5. We outperformed experts and identified the best configuration to increase

    payments per second by 55% 1.55x performance achieved after 20h of automated tuning Manually tuned by experts (Baseline) → Score = 100%
  6. Optimization outcomes: best configuration for different goals What if you

    could run a series of automated performance tests for 24 hours where the outcome is the optimized configuration settings, across your stack, for • Throughput • Latency • Resource utilization • Cloud costs • …
  7. MongoDB Performance Optimization • Goal • Increase database throughput (query/sec)

    • Decrease query latency • Save cloud costs • Optimization scope • MongoDB • Linux
  8. Results Tuning MongoDB (~10 params.) +30% query/sec over vendor default

    Tuning MongoDB + Linux kernel (~40 params.) 2x query/sec over vendor default
  9. AI can explain where does performance really come from Q:

    Which parameters actually allowed to achieve 2x throughput? A: Out of 40 Linux kernel and MongoDB parameters, just three have a significant impact NO silver bullet! This is the result of a specific optimization. It is dependent on the application, workload, hardware, cloud options, optimization objectives, etc.
  10. AI can efficiently solve complex optimization spaces Baseline (vendor default):

    MongoDB cache=15GB Linux read ahead=0 Optimized (2x query/sec): MongoDB cache=30GB Linux read ahead=8
  11. Why full-stack optimization? The effects of Linux tuning Baseline Optimized

    Disk IOPS cut to 1/3 and disk latency doubled, apparently making things slower but… this resulted in 2x MongoDB throughput increase
  12. AI can find counter-intuitive settings experts never tried Baseline (vendor

    default): MongoDB cache=15GB MongoDB dirty target=5% Optimized (+20% query/sec): MongoDB cache=4.3GB MongoDB dirty target=58%
  13. Drivers for adopting the new AI-driven optimization approach Costs CAPEX

    reduced due to increased performance / new investments deferral OPEX reduced with full automation of testing, analysis & tuning cycle Revenue Quality of service improved of customer facing services or batch processes Agility New app releases faster shrinking the optimization cycle from months to days Strategy Innovation programs accelerated thanks to automated performance optimization Risks Service outage or slowdown risks reduced thanks to optimized configurations
  14. Conclusions • Todays’ software stack is far too complex for

    our human brains • Business impact: significant performance left on the table, lower agility • Machine learning can smartly navigate complex optimizations and find counter-intuitive, unexplored settings beating experts and yelding big gains • A new AI-driven approach to performance optimization is required to achieve the benefits in modern DevOps settings - #AIDevOps • Key capabilities include end-to-end automation of performance experiments, full-stack coverage, fast and robust AI optimization of user-driven goals