Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How do we do Benchmark? Impressions from Conversations with the Community

B207c84229c3cc91fa26369bc374d2eb?s=47 Stefan Marr
September 30, 2021

How do we do Benchmark? Impressions from Conversations with the Community

B207c84229c3cc91fa26369bc374d2eb?s=128

Stefan Marr

September 30, 2021
Tweet

More Decks by Stefan Marr

Other Decks in Research

Transcript

  1. How do we do Benchmark? Impressions from Conversations with the

    Community VMM’21, Virtual Stefan Marr
  2. Got a Ques*on? Feel free to interrupt me! 2

  3. There’s Work on What We Should be Doing 3

  4. There’s Work on What We Should be Doing 4

  5. There’s Work on What We Should be Doing 5

  6. There’s Work on What We Should be Doing 6

  7. What are we doing? and What are we struggling with?

    I had a different ques/on 7
  8. Outline 1. My Interview Methodology 2. Some Anecdotes 3. Our

    struggles 4. Our best prac/ces 8
  9. MY METHODOLOGY 9

  10. My Methodology • 21 interviews • .ny sample 10

  11. My Methodology • 21 interviews • With groups in the

    field of “programming languages and systems” • .ny sample • not representa.ve 11
  12. My Methodology • 21 interviews • With groups in the

    field of “programming languages and systems” • Semi-structured interviews • .ny sample • not representa.ve • not the same for all interviews 12
  13. My Methodology • 21 interviews • With groups in the

    field of “programming languages and systems” • Semi-structured interviews • Ad hoc result analysis • tiny sample • not representative • not the same for all interviews • Interpretation biases 13
  14. This isn’t Data It’s Anecdotes 14

  15. Use of Automated Test, Continuous Integration Do you use some

    form of automated tes/ng/CI? 15 Using Zoom’s Reactions likely at the bottom of the screen
  16. Use of Automated Test, Con=nuous Integra=on Do you use some

    form of automated tes/ng/CI? 16 Using Zoom’s Reactions likely at the bottom of the screen 👏 yes ❤ some=mes 😮 no
  17. Use of Automated Test, Con=nuous Integra=on Out of 21 groups

    >75% use some0mes CI 17
  18. Use of Automated Test, Continuous Integration Out of 21 groups

    >75% use some0mes CI 18 But • Can differ per student • Per project • …
  19. Use of Automated Test, Con=nuous Integra=on Out of 21 groups

    >75% use some0mes CI 19 But • Can differ per student • Per project • … 😱 In academia, tes/ng is not “standard prac/ce”
  20. Benchmarking and Frequency Do you run benchmarks for your day-to-day

    engineering? 20 👏 don’t need it ❤ yes 😮 only for a paper
  21. Benchmarking and Frequency Do you run benchmarks for your day-to-day

    engineering? Out of 21 groups, do it at least for some projects ≈30% for every pull request ≈30% at some interval ≈50% only for a paper 21
  22. Hardware Setup Out of 21 groups, for at least some

    projects: >55% dedicated, self-hosted ≈15% bare-metal cloud ≈20% mul/-tenant cloud ≈15% developer machine 22
  23. Hardware Setup Out of 21 groups, for at least some

    projects: >55% dedicated, self-hosted ≈15% bare-metal cloud ≈20% mul/-tenant cloud ≈15% developer machine 23 60% of groups: high cost/effort of maintaining machines and tools
  24. Are the Machines Prepared in Some Way? Out of 21

    groups >70% do some prepara5on <30% do no prepara5on Prepara5on may include • disabling daemons, disk usage, Address Space Layout Randomiza5on • disabling turbo boost, frequency scaling • NUMA-node pinning, thread pinning 24
  25. Are the Machines Prepared in Some Way? Out of 21

    groups >70% do some prepara5on <30% do no prepara5on Prepara5on may include • disabling daemons, disk usage, Address Space Layout Randomiza5on • disabling turbo boost, frequency scaling • NUMA-node pinning, thread pinning 25 👍 for awareness But, requires expertise and is not trivial
  26. Data Provenance Did you ever have an issue like: –

    Unsure what was measured? – Mixed up data from experiments? – Unsure which parameters were used 26 👏 yes ❤ no
  27. Data Provenance Out of 21 groups, for some projects <50%

    track it systematically >60% do not track it 27
  28. Data Provenance Out of 21 groups, for some projects <50%

    track it systema/cally >60% do not track it 28 Common issues named: • Comparing wrong data, only no5ced by inconsistencies • Losing track of what’s what • Parameters/setup details not recorded
  29. Data Processing The 21 groups named the following: >71% Python

    >40% Matplotlib ≈40% R ≈33% Spreadsheets and other things 29
  30. Data Processing The 21 groups named the following: >71% Python

    >40% Matplotlib ≈40% R ≈33% Spreadsheets and other things 30 Concerns • Too much .me spent analyzing data • OFen the same, but no reuse
  31. Data Processing Of the 21 groups, >88% do something manual

    >70% have some things scripted 2 groups automate everything including generating Latex macros 31
  32. STRUGGLES AND BEST PRACTICES 32

  33. Our Struggles • Finding good benchmarks • Setup and maintain

    machines, minimizing measurement error • Tracking data provenance • Historic data available/useful • Standard analyses, data processing, and sta/s/cs 33
  34. Best Prac0ces • Use CI/Automated Tes0ng – At the very

    least, check that benchmarks produce correct results • Use same setup for day-to-day engineering as for producing data for papers – The setup is already debugged! • Most CI systems can store ar0facts – Basic provenance tracking for results! • Automate data handling – Spreadsheets can import data from external data sources – Avoid manually copying data around • Define workflow that works for your group – And teach it! 34
  35. ?? ? Ques%ons? Our Struggles • Finding good benchmarks •

    Setup and maintain machines, minimizing measurement error • Tracking data provenance • Historic data available/useful • Standard analyses, data processing, and staBsBcs Best Prac/ces • Use CI/Automated Testing • Use same setup for day-to-day engineering as for producing data for papers • Most CI systems can store artifacts • Automate data handling • Define workflow that works for your group 35