$30 off During Our Annual Pro Sale. View Details »

What does Fairness in Information Access Mean and Can We Achieve It?

wing.nus
March 13, 2023
340

What does Fairness in Information Access Mean and Can We Achieve It?

Abstract: Bias in data as well as lack of transparency and fairness in algorithms are not new problems, but with the increasing scale, complexity, and adoption, most AI systems are suffering from these issues at a level unprecedented. Information access systems are not spared since these days, almost all large-scale systems of information access are mediated by algorithms. These algorithms are optimized not only for relevance, which is subjective to begin with, but also for measures of engagement and impressions. They are picking up signals of what may be 'good' from individuals and perpetuating that through learning methods that are opaque and hard to debug. Considering 'fairness' and introducing more transparency can help, but it can also backfire or create other issues. We also need to understand how and why users of these systems engage with content. In this talk, I will share some of our attempts for bringing fairness in ranking systems and then talk about how the solutions are not that simple.

Speaker Bio: Dr. Chirag Shah is a Professor in Information School, an Adjunct Professor in Paul G. Allen School of Computer Science & Engineering, and an Adjunct Professor in Human Centered Design & Engineering (HCDE) at University of Washington (UW). He is the Founding Director of InfoSeeking Lab and the Founding Co-Director of RAISE, a Center for Responsible AI. His research revolves around intelligent systems. He received his PhD in Information Science from University of North Carolina (UNC) at Chapel Hill.

wing.nus

March 13, 2023
Tweet

More Decks by wing.nus

Transcript

  1. What does Fairness in Information Access Mean
    and Can We Achieve It?
    Chirag Shah
    @chirag_shah

    View Slide

  2. We live in a biased world
    • We are biased.
    • Any dataset can be biased.
    • Any model can be biased.
    • No data is a perfect representation of the
    world; tradeoffs are made during data
    collection, storage, and analysis.

    View Slide

  3. Fairness = Lack of Bias?
    • Bias is not always bad
    • Three definitions of fairness:
    • Statistical parity
    • Disparate impact
    • Disparate treatment

    View Slide

  4. Addressing fairness through diversity
    ● Took a sliver of search data (queries, top results).
    ● Clustered the results and quantified the amount of topical bias.
    ● Designed new algorithms to re-rank those results to have a fairer
    ranking.
    ● Two forms of fairness:
    ○ Statistical parity
    ○ Disparate impact
    Ruoyuan Gao
    Amazon
    Gao, R. & Shah, C. (2020). Toward Creating a Fairer Ranking in Search Engine Results. Journal of Information Processing and
    Management (IP&M), 57(1).
    Gao, R. & Shah, C. (2019). How Fair Can We Go: Detecting the Boundaries of Fairness Optimization in Information Retrieval. In
    Proceedings of ACM International Conference on Theory of Information Retrieval (ICTIR). pp. 229-236. October 2-5, 2019. Santa
    Clara, CA, USA.
    Gao, R., Ge, Y., & Shah, C. (2022). FAIR: Fairness-Aware Information Retrieval Evaluation. Journal of the Association for Information
    Science and Technology (JASIST).

    View Slide

  5. Datasets
    ● Google
    ○ From Google Trends (June 23-June 29, 2019)
    ○ 100 queries
    ○ Top 100 results per query
    ● New York Times
    ○ 1.8M articles published by NYT
    ○ 50 queries
    ○ Top 100 results per query
    ● Clustering with two subtopics
    madden shooting
    hurricane lane update
    jacksonville shooting
    video
    shanann watts
    holy fire update
    fortnite galaxy skin
    new deadly spider
    stolen plane

    View Slide

  6. Creating a Fair Top-10 List from Top-100

    View Slide

  7. Creating a Fair Top-10 List from Top-100

    View Slide

  8. =
    Statistical parity

    View Slide

  9. 70% 30%
    Problem: we are disregarding relevance.
    Disparate impact

    View Slide

  10. Top-Top with Statistical Parity =

    View Slide

  11. Top-Top with Disparate Impact
    Problem: subsequent documents may not have as much novelty.
    70% 30%

    View Slide

  12. Page-wise with Statistical Parity =

    View Slide

  13. Page-wise with Disparate Impact
    Problem: we are not getting enough diversity by sampling from the tops.
    70% 30%

    View Slide

  14. ε-greedy
    ● Explore the results with ε probability, exploit with 1-ε
    ● ε=0.0 → No exploration
    ● ε=1.0 → Full exploration (randomness)
    ● Non-fair (naïve) ε-greedy:
    with probability ε, randomly select from entire rank-list (100)
    with probability 1-ε, pick from the top
    ● Fair ε-greedy
    with probability ε, randomly select a cluster, then pick top from the cluster
    with probability 1-ε, pick the “fair” cluster, then pick top from the cluster
    Statistical parity | Disparate impact

    View Slide

  15. Implications: text search

    View Slide

  16. Measuring impacts on users
    http://fate.infoseeking.org/googleornot.php

    View Slide

  17. View Slide

  18. Translating
    systems to
    experiences
    • Most people can’t tell the difference
    between original Google results and those
    with ε=0.3.
    • But they can if ε>0.5.
    • Lesson: We can provide diversity in search
    results in a careful way that helps reduce
    bias while keeping user satisfaction.

    View Slide

  19. Implications: image search

    View Slide

  20. Query “CEO United States”

    View Slide

  21. Query “CEO UK”

    View Slide

  22. Feng, Y. & Shah, C. (2022). Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image
    Search. AAAI Conference on Artificial Intelligence. February 22-March 1, 2022. Vancouver, Canada.

    View Slide

  23. Reducing bias using
    “fair-greedy” approach
    Feng, Y. & Shah, C. (2022). Has CEO Gender Bias Really Been Fixed?
    Adversarial Attacking and Improving Gender Fairness in Image
    Search. AAAI Conference on Artificial Intelligence. February 22-
    March 1, 2022. Vancouver, Canada.

    View Slide

  24. But…
    • [Technical] Multi-objective optimization
    (fairness in marketplace) is hard and not
    always well-defined.
    • [Business] Re-ranking brings additional
    costs.
    • [Social] Our notions of what’s biased,
    what’s fair, and what’s good keep
    changing.

    View Slide

  25. Summary
    • Large scale information access systems
    suffer from problems of bias, unfairness,
    and opaqueness – some due to technical
    issues, some due to business objectives,
    and some are social issues.
    • We could audit these systems and create
    education, awareness, and advocacy
    around them.
    • Ideally, we need a multifaceted approach
    similar to curbing smoking.

    View Slide

  26. Thank you!

    View Slide