Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Conduct an Experiment

How to Conduct an Experiment

A talk given at the Singapore HCI 2023 Annual Gathering at SMU.

Kotaro Hara

March 21, 2023
Tweet

More Decks by Kotaro Hara

Other Decks in Education

Transcript

  1. How to Conduct an Experiment
    Kotaro Hara | Assistant Professor | Singapore Management University
    2023/03/21

    View Slide

  2. Research Contributions in HCI
    Empirical Research Contribution
    “Empirical research contributions […]
    provide new knowledge through findings
    based on […] experiments, user tests,
    field observations, interviews, surveys,
    focus groups, diaries, ethnographies,
    sensors, log files, and many others.”
    Jacob O. Wobbrock and Julie A. Kientz (2016)
    Research Contributions in Human-Computer Interaction

    View Slide

  3. Formative and Summative Research
    • Design of technology
    • Exploratory studies to learn
    about the phenomenon of
    interest
    Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction
    Formative Summative
    • Validation

    View Slide

  4. Experiment and Observational Study
    Howard J. Seltman (2014) Experimental Design and Analysis
    Formative Summative
    Experimental
    Observational
    Ethnography
    Interview
    Controlled Experiment

    View Slide

  5. Research Question, Hypothesis, Intervention, and Causality
    Kasper Hornbæk (2011) Some Whys and Hows of Experiments in Human-Computer Interaction
    Research Question
    An interesting question that
    allows you to do strong
    comparison
    H0
    H1
    When turned into a statement,
    it becomes a refutable
    hypothesis
    /
    An experimenter introduces
    intervention to separate the
    conditions
    If we observe the intervention (and only the intervention) introduces the
    difference in the experimental outcome, we could claim the “causal”
    influence of the intervention to the outcome

    View Slide

  6. “Can we create an AI-powered
    tool to help people author
    audio descriptions?”
    • H0
    : The tool does not
    support people author audio
    descriptions.
    • H1
    : The tool supports people
    author audio descriptions.
    Example
    Intervention
    Comparison

    View Slide

  7. Today’s Goals
    Concepts and terminologies
    around experiments in HCI
    How one could design and
    conduct an experiment

    View Slide

  8. Levels of Measurement, Independent Variables, and Dependent Variables
    0 1 2
    Ratio
    Ordered Data
    Nominal Data
    A / B
    Interval
    =
    =
    It encourages one to ask a refutable question, like:
    “The change in design (A vs B) affects the usability (e.g., task duration).”
    Independent variable Dependent variable
    Howard J. Seltman (2014) Experimental Design and Analysis
    An experiment deals with objectively measured data.

    View Slide

  9. Causation
    design usability
    causes?
    David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition
    Howard J. Seltman (2014) Experimental Design and Analysis
    Design A
    (Control Condition)
    Design B
    (Treatment Condition)
    One recruits participants, ask
    them to perform tasks, and
    measure some metrics to infer
    whether change in design causes
    change in usability

    View Slide

  10. Confounding Factors and Control
    design usability
    device OS user group individual
    causes?
    Confounding factors obscure
    the relationship between the
    independent and dependent
    variables.
    David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition
    Howard J. Seltman (2014) Experimental Design and Analysis

    View Slide

  11. Confounding Factors and Control
    design usability
    device OS user group individual
    causes?
    Keep factors constant
    to reduce confounding
    effect (i.e., control)
    Hard-to control confounding effects
    should be mitigated through
    randomization and use of statistical
    tools like random effects model
    David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition
    Howard J. Seltman (2014) Experimental Design and Analysis

    View Slide

  12. Between-subject and Within-subjects Designs
    Control Condition Treatment Condition
    30 participants
    30 participants
    60 participants
    Between-subjects design
    Control Condition Treatment Condition
    60 participants
    60 participants
    60 participants
    Within-subjects design
    Splitting the participants
    into two groups at random
    reduces the confound due
    to individual influence. This
    is called randomization.
    David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition
    Howard J. Seltman (2014) Experimental Design and Analysis

    View Slide

  13. In HCI, a population often refers to the entirety of the user group
    of interest. In conducting an experiment, you select a sample, a
    subset of the population.
    There are different sampling approaches:
    • Simple Random Sampling: Recruit participants uniformly at
    random from the population
    • Convenience Sampling: Recruit participants who are easily
    reachable and matches the profile of your target users
    Population, Sample, and Sampling Methods
    Target user population
    A sample of target users

    David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel (2015) OpenIntro Statistics, Third Edition
    Howard J. Seltman (2014) Experimental Design and Analysis

    View Slide

  14. How to Conduct an Experiment
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  15. Describe your research question in terms of a null hypothesis (H0
    )
    and an alternative hypothesis (H1
    )
    For example:
    • H0
    (Null Hypothesis): Difference in the interface design does
    not affect the usability of the system
    • H1
    (Alternative Hypothesis): Difference in the interface design
    affects the usability of the system
    Null Hypothesis and Alternative Hypothesis
    H0
    vs. H1
    Howard J. Seltman (2014) Experimental Design and Analysis
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  16. Usability Metrics and Construct Validity
    design usability
    task completion rate
    task duration
    survey responses
    “Usability” is a construct, a
    concept that cannot be
    observed directly
    design usability Instead, we measure usability
    metrics that are observable
    The degree to which these
    metrics accurately represent
    the intended construct is
    known as construct validity
    Howard J. Seltman (2014) Experimental Design and Analysis
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  17. Critical Value, Power, Effect Size, and Sample Size
    The number of participants to recruit (sample
    size) is determined through power analysis
    Power analysis uses a significance level, power,
    and effect size to determine the appropriate
    sample size
    Howard J. Seltman (2014) Experimental Design and Analysis
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  18. Inclusion and Exclusion Criteria, Sampling, and External Validity
    Decide on the inclusion and exclusion criteria
    and choose a sampling method
    The extent to which the participants represent
    the population of interest is an important aspect
    of external validity (i.e., generalizability)
    Y
    X
    Y’
    ?
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  19. Data Collection, Control, and Internal Validity
    Decide on how to measure usability metrics
    (e.g., surveys, interviews, observations, logs)
    Control for as many confounding variables as
    possible. The degree to which one can claim that
    the independent variable (e.g., UI design) causally
    influences the usability metrics is known as
    internal validity
    Y
    X
    Z
    ?
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  20. Statistical Test
    Based on the design of the experiment, choose a
    statistical test (e.g., two sample t-tests, ANOVA)
    Interpret the result:
    • A p-value measures the strength of the
    evidence against H0
    (i.e., 𝑃 𝑑𝑎𝑡𝑎 𝐻!))
    • A confidence interval is a range within which a
    population parameter is estimated to fall with
    a specified confidence (e.g., 95%)
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  21. Falsely rejecting H0
    is called a
    Type 1 error. Lower the
    significance level, lower the
    chance of Type 1 error.
    Incorrectly retaining H0 is
    called Type 2 error. Type 2 error
    rate equals to 1 minus power.
    Higher the power, lower the
    chance of Type 2 error.
    Type I Error and Type II Error
    Howard J. Seltman (2014) Experimental Design and Analysis
    H1
    is false
    (No reason to
    reject H0
    )
    H1
    is true
    (A “correct” test
    result should
    reject H0
    )
    Test does not
    reject H0
    Test rejects H0
    Good! Type 1 Error
    Good!
    Type 2 Error
    Experimental
    Design
    Data
    Collection
    Statistical
    Analysis

    View Slide

  22. Ethical Considerations
    • It is import that you obtain informed consent from participants. In the process,
    you should provide adequate information about the experiment and risks involved.
    If your institution has an IRB office, your consent process should be reviewed by it.
    • As a researcher, you have responsibility to minimize potential physical,
    psychological, or social harm to participants.
    • You also owe responsibility to protect the privacy of the participants and
    maintaining the confidentiality of the data that you collect.
    • When working with vulnerable population, such as children, people with
    disabilities, illiterate people, etc.

    View Slide

  23. • An experiment is a powerhouse that drives empirical HCI
    research
    • One could argue for causal relationship between independent
    and dependent variables
    • The presentation introduced concepts and terminologies
    around experiments
    • We also saw a flow of how an experiment is designed and
    conducted
    References
    • David M. Diez, Christopher D. Barr, and Mine Çetinkaya-Rundel
    (2015) OpenIntro Statistics, Third Edition
    • Kasper Hornbæk (2011) Some Whys and Hows of Experiments in
    Human-Computer Interaction
    • Howard J. Seltman (2014) Experimental Design and Analysis
    • Jacob O. Wobbrock and Julie A. Kientz (2016) Research
    Contributions in Human-Computer Interaction
    Summary
    Questions?
    @kotarohara_en

    View Slide