[…] provide new knowledge through findings based on […] experiments, user tests, field observations, interviews, surveys, focus groups, diaries, ethnographies, sensors, log files, and many others.” Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction
studies to learn about the phenomenon of interest Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction Formative Summative • Validation
Whys and Hows of Experiments in Human-Computer Interaction Research Question An interesting question that allows you to do strong comparison H0 H1 When turned into a statement, it becomes a refutable hypothesis / An experimenter introduces intervention to separate the conditions If we observe the intervention (and only the intervention) introduces the difference in the experimental outcome, we could claim the “causal” influence of the intervention to the outcome
audio descriptions?” • H0 : The tool does not support people author audio descriptions. • H1 : The tool supports people author audio descriptions. Example Intervention Comparison
2 Ratio Ordered Data Nominal Data A / B Interval = = It encourages one to ask a refutable question, like: “The change in design (A vs B) affects the usability (e.g., task duration).” Independent variable Dependent variable Howard J. Seltman (2014) Experimental Design and Analysis An experiment deals with objectively measured data.
and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis Design A (Control Condition) Design B (Treatment Condition) One recruits participants, ask them to perform tasks, and measure some metrics to infer whether change in design causes change in usability
individual causes? Confounding factors obscure the relationship between the independent and dependent variables. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
individual causes? Keep factors constant to reduce confounding effect (i.e., control) Hard-to control confounding effects should be mitigated through randomization and use of statistical tools like random effects model David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
30 participants 60 participants Between-subjects design Control Condition Treatment Condition 60 participants 60 participants 60 participants Within-subjects design Splitting the participants into two groups at random reduces the confound due to individual influence. This is called randomization. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
the user group of interest. In conducting an experiment, you select a sample, a subset of the population. There are different sampling approaches: • Simple Random Sampling: Recruit participants uniformly at random from the population • Convenience Sampling: Recruit participants who are easily reachable and matches the profile of your target users Population, Sample, and Sampling Methods Target user population A sample of target users … David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
(H0 ) and an alternative hypothesis (H1 ) For example: • H0 (Null Hypothesis): Difference in the interface design does not affect the usability of the system • H1 (Alternative Hypothesis): Difference in the interface design affects the usability of the system Null Hypothesis and Alternative Hypothesis H0 vs. H1 Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
task duration survey responses “Usability” is a construct, a concept that cannot be observed directly design usability Instead, we measure usability metrics that are observable The degree to which these metrics accurately represent the intended construct is known as construct validity Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
of participants to recruit (sample size) is determined through power analysis Power analysis uses a significance level, power, and effect size to determine the appropriate sample size Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
the inclusion and exclusion criteria and choose a sampling method The extent to which the participants represent the population of interest is an important aspect of external validity (i.e., generalizability) Y X Y’ ? Experimental Design Data Collection Statistical Analysis
measure usability metrics (e.g., surveys, interviews, observations, logs) Control for as many confounding variables as possible. The degree to which one can claim that the independent variable (e.g., UI design) causally influences the usability metrics is known as internal validity Y X Z ? Experimental Design Data Collection Statistical Analysis
a statistical test (e.g., two sample t-tests, ANOVA) Interpret the result: • A p-value measures the strength of the evidence against H0 (i.e., 𝑃 𝑑𝑎𝑡𝑎 𝐻!)) • A confidence interval is a range within which a population parameter is estimated to fall with a specified confidence (e.g., 95%) Experimental Design Data Collection Statistical Analysis
the significance level, lower the chance of Type 1 error. Incorrectly retaining H0 is called Type 2 error. Type 2 error rate equals to 1 minus power. Higher the power, lower the chance of Type 2 error. Type I Error and Type II Error Howard J. Seltman (2014) Experimental Design and Analysis H1 is false (No reason to reject H0 ) H1 is true (A “correct” test result should reject H0 ) Test does not reject H0 Test rejects H0 Good! Type 1 Error Good! Type 2 Error Experimental Design Data Collection Statistical Analysis
consent from participants. In the process, you should provide adequate information about the experiment and risks involved. If your institution has an IRB office, your consent process should be reviewed by it. • As a researcher, you have responsibility to minimize potential physical, psychological, or social harm to participants. • You also owe responsibility to protect the privacy of the participants and maintaining the confidentiality of the data that you collect. • When working with vulnerable population, such as children, people with disabilities, illiterate people, etc.
research • One could argue for causal relationship between independent and dependent variables • The presentation introduced concepts and terminologies around experiments • We also saw a flow of how an experiment is designed and conducted References • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition • Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction • Howard J. Seltman (2014) Experimental Design and Analysis • Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction Summary Questions? @kotarohara_en