Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Conduct an Experiment

How to Conduct an Experiment

A talk given at the Singapore HCI 2023 Annual Gathering at SMU.

Kotaro Hara

March 21, 2023
Tweet

More Decks by Kotaro Hara

Other Decks in Education

Transcript

  1. How to Conduct an Experiment Kotaro Hara | Assistant Professor

    | Singapore Management University 2023/03/21
  2. Research Contributions in HCI Empirical Research Contribution “Empirical research contributions

    […] provide new knowledge through findings based on […] experiments, user tests, field observations, interviews, surveys, focus groups, diaries, ethnographies, sensors, log files, and many others.” Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction
  3. Formative and Summative Research • Design of technology • Exploratory

    studies to learn about the phenomenon of interest Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction Formative Summative • Validation
  4. Experiment and Observational Study Howard J. Seltman (2014) Experimental Design

    and Analysis Formative Summative Experimental Observational Ethnography Interview Controlled Experiment
  5. Research Question, Hypothesis, Intervention, and Causality Kasper Hornbæk (2011) Some

    Whys and Hows of Experiments in Human-Computer Interaction Research Question An interesting question that allows you to do strong comparison H0 H1 When turned into a statement, it becomes a refutable hypothesis / An experimenter introduces intervention to separate the conditions If we observe the intervention (and only the intervention) introduces the difference in the experimental outcome, we could claim the “causal” influence of the intervention to the outcome
  6. “Can we create an AI-powered tool to help people author

    audio descriptions?” • H0 : The tool does not support people author audio descriptions. • H1 : The tool supports people author audio descriptions. Example Intervention Comparison
  7. Levels of Measurement, Independent Variables, and Dependent Variables 0 1

    2 Ratio Ordered Data Nominal Data A / B Interval = = It encourages one to ask a refutable question, like: “The change in design (A vs B) affects the usability (e.g., task duration).” Independent variable Dependent variable Howard J. Seltman (2014) Experimental Design and Analysis An experiment deals with objectively measured data.
  8. Causation design usability causes? David M. Diez, Christopher D. Barr,

    and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis Design A (Control Condition) Design B (Treatment Condition) One recruits participants, ask them to perform tasks, and measure some metrics to infer whether change in design causes change in usability
  9. Confounding Factors and Control design usability device OS user group

    individual causes? Confounding factors obscure the relationship between the independent and dependent variables. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
  10. Confounding Factors and Control design usability device OS user group

    individual causes? Keep factors constant to reduce confounding effect (i.e., control) Hard-to control confounding effects should be mitigated through randomization and use of statistical tools like random effects model David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
  11. Between-subject and Within-subjects Designs Control Condition Treatment Condition 30 participants

    30 participants 60 participants Between-subjects design Control Condition Treatment Condition 60 participants 60 participants 60 participants Within-subjects design Splitting the participants into two groups at random reduces the confound due to individual influence. This is called randomization. David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
  12. In HCI, a population often refers to the entirety of

    the user group of interest. In conducting an experiment, you select a sample, a subset of the population. There are different sampling approaches: • Simple Random Sampling: Recruit participants uniformly at random from the population • Convenience Sampling: Recruit participants who are easily reachable and matches the profile of your target users Population, Sample, and Sampling Methods Target user population A sample of target users … David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition Howard J. Seltman (2014) Experimental Design and Analysis
  13. Describe your research question in terms of a null hypothesis

    (H0 ) and an alternative hypothesis (H1 ) For example: • H0 (Null Hypothesis): Difference in the interface design does not affect the usability of the system • H1 (Alternative Hypothesis): Difference in the interface design affects the usability of the system Null Hypothesis and Alternative Hypothesis H0 vs. H1 Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
  14. Usability Metrics and Construct Validity design usability task completion rate

    task duration survey responses “Usability” is a construct, a concept that cannot be observed directly design usability Instead, we measure usability metrics that are observable The degree to which these metrics accurately represent the intended construct is known as construct validity Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
  15. Critical Value, Power, Effect Size, and Sample Size The number

    of participants to recruit (sample size) is determined through power analysis Power analysis uses a significance level, power, and effect size to determine the appropriate sample size Howard J. Seltman (2014) Experimental Design and Analysis Experimental Design Data Collection Statistical Analysis
  16. Inclusion and Exclusion Criteria, Sampling, and External Validity Decide on

    the inclusion and exclusion criteria and choose a sampling method The extent to which the participants represent the population of interest is an important aspect of external validity (i.e., generalizability) Y X Y’ ? Experimental Design Data Collection Statistical Analysis
  17. Data Collection, Control, and Internal Validity Decide on how to

    measure usability metrics (e.g., surveys, interviews, observations, logs) Control for as many confounding variables as possible. The degree to which one can claim that the independent variable (e.g., UI design) causally influences the usability metrics is known as internal validity Y X Z ? Experimental Design Data Collection Statistical Analysis
  18. Statistical Test Based on the design of the experiment, choose

    a statistical test (e.g., two sample t-tests, ANOVA) Interpret the result: • A p-value measures the strength of the evidence against H0 (i.e., 𝑃 𝑑𝑎𝑡𝑎 𝐻!)) • A confidence interval is a range within which a population parameter is estimated to fall with a specified confidence (e.g., 95%) Experimental Design Data Collection Statistical Analysis
  19. Falsely rejecting H0 is called a Type 1 error. Lower

    the significance level, lower the chance of Type 1 error. Incorrectly retaining H0 is called Type 2 error. Type 2 error rate equals to 1 minus power. Higher the power, lower the chance of Type 2 error. Type I Error and Type II Error Howard J. Seltman (2014) Experimental Design and Analysis H1 is false (No reason to reject H0 ) H1 is true (A “correct” test result should reject H0 ) Test does not reject H0 Test rejects H0 Good! Type 1 Error Good! Type 2 Error Experimental Design Data Collection Statistical Analysis
  20. Ethical Considerations • It is import that you obtain informed

    consent from participants. In the process, you should provide adequate information about the experiment and risks involved. If your institution has an IRB office, your consent process should be reviewed by it. • As a researcher, you have responsibility to minimize potential physical, psychological, or social harm to participants. • You also owe responsibility to protect the privacy of the participants and maintaining the confidentiality of the data that you collect. • When working with vulnerable population, such as children, people with disabilities, illiterate people, etc.
  21. • An experiment is a powerhouse that drives empirical HCI

    research • One could argue for causal relationship between independent and dependent variables • The presentation introduced concepts and terminologies around experiments • We also saw a flow of how an experiment is designed and conducted References • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition • Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction • Howard J. Seltman (2014) Experimental Design and Analysis • Jacob O. Wobbrock and Julie A. Kientz (2016) Research Contributions in Human-Computer Interaction Summary Questions? @kotarohara_en