WTF is Modeling, Anyway!?

WTF is Modeling, Anyway!?

A video conversation with performance and capacity management veteran, Boris Zibitsker, about how I saved a multi-million dollar computing platform, using a 1-line performance model (at 21:50 minutes). "Best practices" caused the problem.

Ced140140e9ae226f0d9ef0fbb84a3a1?s=128

Dr. Neil Gunther

August 02, 2017
Tweet

Transcript

  1. WTF is “Modeling”, Anyway!? Video conversation with Boris Zibitsker on

    the BEZNext Channel Dr. Neil J. Gunther — @DrQz Performance Dynamics August 2, 2017 SM c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 1 / 14
  2. c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2,

    2017 2 / 14
  3. Types of Models This word “model” is overloaded in both

    english and technology: UML software modeling model train set Kim Kardashian financial/accounting models Amdahl’s law statistical regression numerical mesh simulation benchmark workload simulation support vector machines convolutional neural nets We need to specify clearly and unambiguously which model c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 3 / 14
  4. What is a Performance Model? A performance model is a

    mathematical framework used to assess the validity of performance data (an overlooked necessity) data + model = information 1 Select performance metrics as inputs: λ, R, S, Q, . . . 2 Model is a relationship between those metrics: Q = λ R 3 Model outputs are calculated metrics 4 Compare calculated metrics with (other) measured metrics 5 Repeat until satisfied Can then project metric values into circumstances that are not measured or not measureable c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 4 / 14
  5. A Real Simple Real Model c 2018 Performance Dynamics WTF

    is “Modeling”, Anyway!? August 2, 2017 5 / 14
  6. The Environment 5 Production Environment . . . S390 Robotic

    tape silos IBM AIX/SP-2 50 nodes IBM AIX/SP-2 50 nodes SP2 SP2 FDDI rings User Tek X-terminals c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 6 / 14
  7. The Data Problem: Home-grown application could take > 60 seconds

    to launch IBM cluster would cost $millions to replace c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 7 / 14
  8. Data Sample Table: Mean launch time in seconds Server Files

    Time Xfs1 8371 18.57 Xfs2 7113 16.72 NFS1 4781 17.01 NFS2 109 9.41 Observation: 109 files is nearly 128 = 27 Log2(128) = 7 is close to 9 seconds 4781 is near 4096 = 212 Log2(4096) = 12 is close to 17 seconds 8371 is near 8192 = 213 Log2(8192) = 13 is close to 18 seconds c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 8 / 14
  9. Visual Confirmation 1 10 100 1000 10000 5 10 15

    20 Log−Linear Plot Log (Number of files) Mean launch time (seconds) © 2017 Performance Dynamics Data LSQ fit c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 9 / 14
  10. Visual Confirmation 0 2000 4000 6000 8000 10000 0 5

    10 15 20 Log Model Number of files Mean launch time (seconds) © 2017 Performance Dynamics Data Model Data Model c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 10 / 14
  11. One-Line Performance Model Mean launch time R: R = k

    log10 (N) where N is the number of remote-server files and k = 4.57 is proportionality constant for base-10 logarithms Table: Log model of mean R times Remote server Measured seconds Log model %Error Xfs1 18.57 17.929948 3.446698 Xfs2 16.72 17.606686 -5.303144 NFS1 17.01 16.818079 1.128281 NFS2 9.41 9.312522 1.035894 Model is accurate to within 5% But where does logarithmic behavior come from? c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 11 / 14
  12. To get a log you need a tree 27 To

    Get a Log, You Need a Tree • • • • • • 0 1 2 1 10 100 Level Number this is of this logarithm c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 12 / 14
  13. Modeling Results 1 Saved $multi-million IBM SP2 cluster 2 SP2

    replacement would NOT have solved anything 3 Problem caused by “best practices” for system management 4 Performance management was completely overlooked 5 Font server held ∼15000 files but only ∼1000 needed 6 Simple log performance model told the whole story 7 Simple fix with no CapEx cost: prune the tree! 8 300% performance win in shortened launch times! 9 Log model more about explanation than prediction/forecasting c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 13 / 14
  14. Contact Information Performance Dynamics Company Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com

    facebook.com/PerformanceDynamics twitter.com/DrQz info@perfdynamics.com +1-510-537-5758 c 2018 Performance Dynamics WTF is “Modeling”, Anyway!? August 2, 2017 14 / 14