Slide 1

Slide 1 text

How do we do Benchmark? Impressions from Conversations with the Community VMM’21, Virtual Stefan Marr

Slide 2

Slide 2 text

Got a Ques*on? Feel free to interrupt me! 2

Slide 3

Slide 3 text

There’s Work on What We Should be Doing 3

Slide 4

Slide 4 text

There’s Work on What We Should be Doing 4

Slide 5

Slide 5 text

There’s Work on What We Should be Doing 5

Slide 6

Slide 6 text

There’s Work on What We Should be Doing 6

Slide 7

Slide 7 text

What are we doing? and What are we struggling with? I had a different ques/on 7

Slide 8

Slide 8 text

Outline 1. My Interview Methodology 2. Some Anecdotes 3. Our struggles 4. Our best prac/ces 8

Slide 9

Slide 9 text

MY METHODOLOGY 9

Slide 10

Slide 10 text

My Methodology • 21 interviews • .ny sample 10

Slide 11

Slide 11 text

My Methodology • 21 interviews • With groups in the field of “programming languages and systems” • .ny sample • not representa.ve 11

Slide 12

Slide 12 text

My Methodology • 21 interviews • With groups in the field of “programming languages and systems” • Semi-structured interviews • .ny sample • not representa.ve • not the same for all interviews 12

Slide 13

Slide 13 text

My Methodology • 21 interviews • With groups in the field of “programming languages and systems” • Semi-structured interviews • Ad hoc result analysis • tiny sample • not representative • not the same for all interviews • Interpretation biases 13

Slide 14

Slide 14 text

This isn’t Data It’s Anecdotes 14

Slide 15

Slide 15 text

Use of Automated Test, Continuous Integration Do you use some form of automated tes/ng/CI? 15 Using Zoom’s Reactions likely at the bottom of the screen

Slide 16

Slide 16 text

Use of Automated Test, Con=nuous Integra=on Do you use some form of automated tes/ng/CI? 16 Using Zoom’s Reactions likely at the bottom of the screen 👏 yes ❤ some=mes 😮 no

Slide 17

Slide 17 text

Use of Automated Test, Con=nuous Integra=on Out of 21 groups >75% use some0mes CI 17

Slide 18

Slide 18 text

Use of Automated Test, Continuous Integration Out of 21 groups >75% use some0mes CI 18 But • Can differ per student • Per project • …

Slide 19

Slide 19 text

Use of Automated Test, Con=nuous Integra=on Out of 21 groups >75% use some0mes CI 19 But • Can differ per student • Per project • … 😱 In academia, tes/ng is not “standard prac/ce”

Slide 20

Slide 20 text

Benchmarking and Frequency Do you run benchmarks for your day-to-day engineering? 20 👏 don’t need it ❤ yes 😮 only for a paper

Slide 21

Slide 21 text

Benchmarking and Frequency Do you run benchmarks for your day-to-day engineering? Out of 21 groups, do it at least for some projects ≈30% for every pull request ≈30% at some interval ≈50% only for a paper 21

Slide 22

Slide 22 text

Hardware Setup Out of 21 groups, for at least some projects: >55% dedicated, self-hosted ≈15% bare-metal cloud ≈20% mul/-tenant cloud ≈15% developer machine 22

Slide 23

Slide 23 text

Hardware Setup Out of 21 groups, for at least some projects: >55% dedicated, self-hosted ≈15% bare-metal cloud ≈20% mul/-tenant cloud ≈15% developer machine 23 60% of groups: high cost/effort of maintaining machines and tools

Slide 24

Slide 24 text

Are the Machines Prepared in Some Way? Out of 21 groups >70% do some prepara5on <30% do no prepara5on Prepara5on may include • disabling daemons, disk usage, Address Space Layout Randomiza5on • disabling turbo boost, frequency scaling • NUMA-node pinning, thread pinning 24

Slide 25

Slide 25 text

Are the Machines Prepared in Some Way? Out of 21 groups >70% do some prepara5on <30% do no prepara5on Prepara5on may include • disabling daemons, disk usage, Address Space Layout Randomiza5on • disabling turbo boost, frequency scaling • NUMA-node pinning, thread pinning 25 👍 for awareness But, requires expertise and is not trivial

Slide 26

Slide 26 text

Data Provenance Did you ever have an issue like: – Unsure what was measured? – Mixed up data from experiments? – Unsure which parameters were used 26 👏 yes ❤ no

Slide 27

Slide 27 text

Data Provenance Out of 21 groups, for some projects <50% track it systematically >60% do not track it 27

Slide 28

Slide 28 text

Data Provenance Out of 21 groups, for some projects <50% track it systema/cally >60% do not track it 28 Common issues named: • Comparing wrong data, only no5ced by inconsistencies • Losing track of what’s what • Parameters/setup details not recorded

Slide 29

Slide 29 text

Data Processing The 21 groups named the following: >71% Python >40% Matplotlib ≈40% R ≈33% Spreadsheets and other things 29

Slide 30

Slide 30 text

Data Processing The 21 groups named the following: >71% Python >40% Matplotlib ≈40% R ≈33% Spreadsheets and other things 30 Concerns • Too much .me spent analyzing data • OFen the same, but no reuse

Slide 31

Slide 31 text

Data Processing Of the 21 groups, >88% do something manual >70% have some things scripted 2 groups automate everything including generating Latex macros 31

Slide 32

Slide 32 text

STRUGGLES AND BEST PRACTICES 32

Slide 33

Slide 33 text

Our Struggles • Finding good benchmarks • Setup and maintain machines, minimizing measurement error • Tracking data provenance • Historic data available/useful • Standard analyses, data processing, and sta/s/cs 33

Slide 34

Slide 34 text

Best Prac0ces • Use CI/Automated Tes0ng – At the very least, check that benchmarks produce correct results • Use same setup for day-to-day engineering as for producing data for papers – The setup is already debugged! • Most CI systems can store ar0facts – Basic provenance tracking for results! • Automate data handling – Spreadsheets can import data from external data sources – Avoid manually copying data around • Define workflow that works for your group – And teach it! 34

Slide 35

Slide 35 text

?? ? Ques%ons? Our Struggles • Finding good benchmarks • Setup and maintain machines, minimizing measurement error • Tracking data provenance • Historic data available/useful • Standard analyses, data processing, and staBsBcs Best Prac/ces • Use CI/Automated Testing • Use same setup for day-to-day engineering as for producing data for papers • Most CI systems can store artifacts • Automate data handling • Define workflow that works for your group 35