How to Conduct an Experiment

Slide 1

Slide 1 text

How to Conduct an Experiment Kotaro Hara | Assistant Professor | Singapore Management University 2023/03/21

Slide 2

Slide 2 text

Research Contributions in HCI Empirical Research Contribution “Empirical research contributions […] provide new knowledge through findings based on […] experiments, user tests, field observations, interviews, surveys, focus groups, diaries, ethnographies, sensors, log files, and many others.” Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction

Slide 3

Slide 3 text

Formative and Summative Research • Design of technology • Exploratory studies to learn about the phenomenon of interest Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction Formative Summative • Validation

Slide 4

Slide 4 text

Experiment and Observational Study Howard J. Seltman (2014) Experimental Design and Analysis Formative Summative Experimental Observational Ethnography Interview Controlled Experiment

Slide 5

Slide 5 text

Research Question, Hypothesis, Intervention, and Causality Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction Research Question An interesting question that allows you to do strong comparison H0 H1 When turned into a statement, it becomes a refutable hypothesis / An experimenter introduces intervention to separate the conditions If we observe the intervention (and only the intervention) introduces the difference in the experimental outcome, we could claim the “causal” influence of the intervention to the outcome

Slide 6

Slide 6 text

“Can we create an AI-powered tool to help people author audio descriptions?” • H0 : The tool does not support people author audio descriptions. • H1 : The tool supports people author audio descriptions. Example Intervention Comparison

Slide 7

Slide 7 text

Today’s Goals Concepts and terminologies around experiments in HCI How one could design and conduct an experiment

Slide 8

Slide 8 text

Levels of Measurement, Independent Variables, and Dependent Variables 0 1 2 Ratio Ordered Data Nominal Data A / B Interval = = It encourages one to ask a refutable question, like: “The change in design (A vs B) affects the usability (e.g., task duration).” Independent variable Dependent variable Howard J. Seltman (2014) Experimental Design and Analysis An experiment deals with objectively measured data.

Slide 9

Slide 9 text

Causation design usability causes? David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis Design A (Control Condition) Design B (Treatment Condition) One recruits participants, ask them to perform tasks, and measure some metrics to infer whether change in design causes change in usability

Slide 10

Slide 10 text

Confounding Factors and Control design usability device OS user group individual causes? Confounding factors obscure the relationship between the independent and dependent variables. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Slide 11

Slide 11 text

Confounding Factors and Control design usability device OS user group individual causes? Keep factors constant to reduce confounding effect (i.e., control) Hard-to control confounding effects should be mitigated through randomization and use of statistical tools like random effects model David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Slide 12

Slide 12 text

Between-subject and Within-subjects Designs Control Condition Treatment Condition 30 participants 30 participants 60 participants Between-subjects design Control Condition Treatment Condition 60 participants 60 participants 60 participants Within-subjects design Splitting the participants into two groups at random reduces the confound due to individual influence. This is called randomization. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Slide 13

Slide 13 text

In HCI, a population often refers to the entirety of the user group of interest. In conducting an experiment, you select a sample, a subset of the population. There are different sampling approaches: • Simple Random Sampling: Recruit participants uniformly at random from the population • Convenience Sampling: Recruit participants who are easily reachable and matches the profile of your target users Population, Sample, and Sampling Methods Target user population A sample of target users … David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Slide 14

Slide 14 text

How to Conduct an Experiment Experimental Design Data Collection Statistical Analysis

Slide 15

Slide 15 text

Describe your research question in terms of a null hypothesis (H0 ) and an alternative hypothesis (H1 ) For example: • H0 (Null Hypothesis): Difference in the interface design does not affect the usability of the system • H1 (Alternative Hypothesis): Difference in the interface design affects the usability of the system Null Hypothesis and Alternative Hypothesis H0 vs. H1 Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Slide 16

Slide 16 text

Usability Metrics and Construct Validity design usability task completion rate task duration survey responses “Usability” is a construct, a concept that cannot be observed directly design usability Instead, we measure usability metrics that are observable The degree to which these metrics accurately represent the intended construct is known as construct validity Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Slide 17

Slide 17 text

Critical Value, Power, Effect Size, and Sample Size The number of participants to recruit (sample size) is determined through power analysis Power analysis uses a significance level, power, and effect size to determine the appropriate sample size Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Slide 18

Slide 18 text

Inclusion and Exclusion Criteria, Sampling, and External Validity Decide on the inclusion and exclusion criteria and choose a sampling method The extent to which the participants represent the population of interest is an important aspect of external validity (i.e., generalizability) Y X Y’ ? Experimental Design Data Collection Statistical Analysis

Slide 19

Slide 19 text

Data Collection, Control, and Internal Validity Decide on how to measure usability metrics (e.g., surveys, interviews, observations, logs) Control for as many confounding variables as possible. The degree to which one can claim that the independent variable (e.g., UI design) causally influences the usability metrics is known as internal validity Y X Z ? Experimental Design Data Collection Statistical Analysis

Slide 20

Slide 20 text

Statistical Test Based on the design of the experiment, choose a statistical test (e.g., two sample t-tests, ANOVA) Interpret the result: • A p-value measures the strength of the evidence against H0 (i.e., 𝑃 𝑑𝑎𝑡𝑎 𝐻!)) • A confidence interval is a range within which a population parameter is estimated to fall with a specified confidence (e.g., 95%) Experimental Design Data Collection Statistical Analysis

Slide 21

Slide 21 text

Falsely rejecting H0 is called a Type 1 error. Lower the significance level, lower the chance of Type 1 error. Incorrectly retaining H0 is called Type 2 error. Type 2 error rate equals to 1 minus power. Higher the power, lower the chance of Type 2 error. Type I Error and Type II Error Howard J. Seltman (2014) Experimental Design and Analysis H1 is false (No reason to reject H0 ) H1 is true (A “correct” test result should reject H0 ) Test does not reject H0 Test rejects H0 Good! Type 1 Error Good! Type 2 Error Experimental Design Data Collection Statistical Analysis

Slide 22

Slide 22 text

Ethical Considerations • It is import that you obtain informed consent from participants. In the process, you should provide adequate information about the experiment and risks involved. If your institution has an IRB office, your consent process should be reviewed by it. • As a researcher, you have responsibility to minimize potential physical, psychological, or social harm to participants. • You also owe responsibility to protect the privacy of the participants and maintaining the confidentiality of the data that you collect. • When working with vulnerable population, such as children, people with disabilities, illiterate people, etc.

Slide 23

Slide 23 text

• An experiment is a powerhouse that drives empirical HCI research • One could argue for causal relationship between independent and dependent variables • The presentation introduced concepts and terminologies around experiments • We also saw a flow of how an experiment is designed and conducted References • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition • Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction • Howard J. Seltman (2014) Experimental Design and Analysis • Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction Summary Questions? @kotarohara_en