How to Conduct an Experiment

How to Conduct an Experiment Kotaro Hara | Assistant Professor
| Singapore Management University 2023/03/21

Research Contributions in HCI Empirical Research Contribution “Empirical research contributions
[…] provide new knowledge through findings based on […] experiments, user tests, field observations, interviews, surveys, focus groups, diaries, ethnographies, sensors, log files, and many others.” Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction

Formative and Summative Research • Design of technology • Exploratory
studies to learn about the phenomenon of interest Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction Formative Summative • Validation

Experiment and Observational Study Howard J. Seltman (2014) Experimental Design
and Analysis Formative Summative Experimental Observational Ethnography Interview Controlled Experiment

Research Question, Hypothesis, Intervention, and Causality Kasper Hornbæk (2011) Some
Whys and Hows of Experiments in Human-Computer Interaction Research Question An interesting question that allows you to do strong comparison H0 H1 When turned into a statement, it becomes a refutable hypothesis / An experimenter introduces intervention to separate the conditions If we observe the intervention (and only the intervention) introduces the difference in the experimental outcome, we could claim the “causal” influence of the intervention to the outcome

“Can we create an AI-powered tool to help people author
audio descriptions?” • H0 : The tool does not support people author audio descriptions. • H1 : The tool supports people author audio descriptions. Example Intervention Comparison

Today’s Goals Concepts and terminologies around experiments in HCI How
one could design and conduct an experiment

Levels of Measurement, Independent Variables, and Dependent Variables 0 1
2 Ratio Ordered Data Nominal Data A / B Interval = = It encourages one to ask a refutable question, like: “The change in design (A vs B) affects the usability (e.g., task duration).” Independent variable Dependent variable Howard J. Seltman (2014) Experimental Design and Analysis An experiment deals with objectively measured data.

Causation design usability causes? David M. Diez, Christopher D. Barr,
and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis Design A (Control Condition) Design B (Treatment Condition) One recruits participants, ask them to perform tasks, and measure some metrics to infer whether change in design causes change in usability

Confounding Factors and Control design usability device OS user group
individual causes? Confounding factors obscure the relationship between the independent and dependent variables. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Confounding Factors and Control design usability device OS user group
individual causes? Keep factors constant to reduce confounding effect (i.e., control) Hard-to control confounding effects should be mitigated through randomization and use of statistical tools like random effects model David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

Between-subject and Within-subjects Designs Control Condition Treatment Condition 30 participants
30 participants 60 participants Between-subjects design Control Condition Treatment Condition 60 participants 60 participants 60 participants Within-subjects design Splitting the participants into two groups at random reduces the confound due to individual influence. This is called randomization. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

In HCI, a population often refers to the entirety of
the user group of interest. In conducting an experiment, you select a sample, a subset of the population. There are different sampling approaches: • Simple Random Sampling: Recruit participants uniformly at random from the population • Convenience Sampling: Recruit participants who are easily reachable and matches the profile of your target users Population, Sample, and Sampling Methods Target user population A sample of target users … David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis

How to Conduct an Experiment Experimental Design Data Collection Statistical
Analysis

Describe your research question in terms of a null hypothesis
(H0 ) and an alternative hypothesis (H1 ) For example: • H0 (Null Hypothesis): Difference in the interface design does not affect the usability of the system • H1 (Alternative Hypothesis): Difference in the interface design affects the usability of the system Null Hypothesis and Alternative Hypothesis H0 vs. H1 Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Usability Metrics and Construct Validity design usability task completion rate
task duration survey responses “Usability” is a construct, a concept that cannot be observed directly design usability Instead, we measure usability metrics that are observable The degree to which these metrics accurately represent the intended construct is known as construct validity Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Critical Value, Power, Effect Size, and Sample Size The number
of participants to recruit (sample size) is determined through power analysis Power analysis uses a significance level, power, and effect size to determine the appropriate sample size Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis

Inclusion and Exclusion Criteria, Sampling, and External Validity Decide on
the inclusion and exclusion criteria and choose a sampling method The extent to which the participants represent the population of interest is an important aspect of external validity (i.e., generalizability) Y X Y’ ? Experimental Design Data Collection Statistical Analysis

Data Collection, Control, and Internal Validity Decide on how to
measure usability metrics (e.g., surveys, interviews, observations, logs) Control for as many confounding variables as possible. The degree to which one can claim that the independent variable (e.g., UI design) causally influences the usability metrics is known as internal validity Y X Z ? Experimental Design Data Collection Statistical Analysis

Statistical Test Based on the design of the experiment, choose
a statistical test (e.g., two sample t-tests, ANOVA) Interpret the result: • A p-value measures the strength of the evidence against H0 (i.e., 𝑃 𝑑𝑎𝑡𝑎 𝐻!)) • A confidence interval is a range within which a population parameter is estimated to fall with a specified confidence (e.g., 95%) Experimental Design Data Collection Statistical Analysis

Falsely rejecting H0 is called a Type 1 error. Lower
the significance level, lower the chance of Type 1 error. Incorrectly retaining H0 is called Type 2 error. Type 2 error rate equals to 1 minus power. Higher the power, lower the chance of Type 2 error. Type I Error and Type II Error Howard J. Seltman (2014) Experimental Design and Analysis H1 is false (No reason to reject H0 ) H1 is true (A “correct” test result should reject H0 ) Test does not reject H0 Test rejects H0 Good! Type 1 Error Good! Type 2 Error Experimental Design Data Collection Statistical Analysis

Ethical Considerations • It is import that you obtain informed
consent from participants. In the process, you should provide adequate information about the experiment and risks involved. If your institution has an IRB office, your consent process should be reviewed by it. • As a researcher, you have responsibility to minimize potential physical, psychological, or social harm to participants. • You also owe responsibility to protect the privacy of the participants and maintaining the confidentiality of the data that you collect. • When working with vulnerable population, such as children, people with disabilities, illiterate people, etc.

• An experiment is a powerhouse that drives empirical HCI
research • One could argue for causal relationship between independent and dependent variables • The presentation introduced concepts and terminologies around experiments • We also saw a flow of how an experiment is designed and conducted References • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition • Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction • Howard J. Seltman (2014) Experimental Design and Analysis • Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction Summary Questions? @kotarohara_en

How to Conduct an Experiment

How to Conduct an Experiment

Kotaro Hara

More Decks by Kotaro Hara

Other Decks in Education

Featured

Transcript

How to Conduct an Experiment Kotaro Hara | Assistant Professor

Research Contributions in HCI Empirical Research Contribution “Empirical research contributions

Formative and Summative Research • Design of technology • Exploratory

Experiment and Observational Study Howard J. Seltman (2014) Experimental Design

Research Question, Hypothesis, Intervention, and Causality Kasper Hornbæk (2011) Some

“Can we create an AI-powered tool to help people author

Today’s Goals Concepts and terminologies around experiments in HCI How

Levels of Measurement, Independent Variables, and Dependent Variables 0 1

Causation design usability causes? David M. Diez, Christopher D. Barr,

Confounding Factors and Control design usability device OS user group

Confounding Factors and Control design usability device OS user group

Between-subject and Within-subjects Designs Control Condition Treatment Condition 30 participants

In HCI, a population often refers to the entirety of

How to Conduct an Experiment Experimental Design Data Collection Statistical

Describe your research question in terms of a null hypothesis

Usability Metrics and Construct Validity design usability task completion rate

Critical Value, Power, Effect Size, and Sample Size The number

Inclusion and Exclusion Criteria, Sampling, and External Validity Decide on

Data Collection, Control, and Internal Validity Decide on how to

Statistical Test Based on the design of the experiment, choose

Falsely rejecting H0 is called a Type 1 error. Lower

Ethical Considerations • It is import that you obtain informed

• An experiment is a powerhouse that drives empirical HCI