Slide 1

Slide 1 text

What does Fairness in Information Access Mean and Can We Achieve It? Chirag Shah @chirag_shah

Slide 2

Slide 2 text

We live in a biased world • We are biased. • Any dataset can be biased. • Any model can be biased. • No data is a perfect representation of the world; tradeoffs are made during data collection, storage, and analysis.

Slide 3

Slide 3 text

Fairness = Lack of Bias? • Bias is not always bad • Three definitions of fairness: • Statistical parity • Disparate impact • Disparate treatment

Slide 4

Slide 4 text

Addressing fairness through diversity ● Took a sliver of search data (queries, top results). ● Clustered the results and quantified the amount of topical bias. ● Designed new algorithms to re-rank those results to have a fairer ranking. ● Two forms of fairness: ○ Statistical parity ○ Disparate impact Ruoyuan Gao Amazon Gao, R. & Shah, C. (2020). Toward Creating a Fairer Ranking in Search Engine Results. Journal of Information Processing and Management (IP&M), 57(1). Gao, R. & Shah, C. (2019). How Fair Can We Go: Detecting the Boundaries of Fairness Optimization in Information Retrieval. In Proceedings of ACM International Conference on Theory of Information Retrieval (ICTIR). pp. 229-236. October 2-5, 2019. Santa Clara, CA, USA. Gao, R., Ge, Y., & Shah, C. (2022). FAIR: Fairness-Aware Information Retrieval Evaluation. Journal of the Association for Information Science and Technology (JASIST).

Slide 5

Slide 5 text

Datasets ● Google ○ From Google Trends (June 23-June 29, 2019) ○ 100 queries ○ Top 100 results per query ● New York Times ○ 1.8M articles published by NYT ○ 50 queries ○ Top 100 results per query ● Clustering with two subtopics madden shooting hurricane lane update jacksonville shooting video shanann watts holy fire update fortnite galaxy skin new deadly spider stolen plane …

Slide 6

Slide 6 text

Creating a Fair Top-10 List from Top-100

Slide 7

Slide 7 text

Creating a Fair Top-10 List from Top-100

Slide 8

Slide 8 text

= Statistical parity

Slide 9

Slide 9 text

70% 30% Problem: we are disregarding relevance. Disparate impact

Slide 10

Slide 10 text

Top-Top with Statistical Parity =

Slide 11

Slide 11 text

Top-Top with Disparate Impact Problem: subsequent documents may not have as much novelty. 70% 30%

Slide 12

Slide 12 text

Page-wise with Statistical Parity =

Slide 13

Slide 13 text

Page-wise with Disparate Impact Problem: we are not getting enough diversity by sampling from the tops. 70% 30%

Slide 14

Slide 14 text

ε-greedy ● Explore the results with ε probability, exploit with 1-ε ● ε=0.0 → No exploration ● ε=1.0 → Full exploration (randomness) ● Non-fair (naïve) ε-greedy: with probability ε, randomly select from entire rank-list (100) with probability 1-ε, pick from the top ● Fair ε-greedy with probability ε, randomly select a cluster, then pick top from the cluster with probability 1-ε, pick the “fair” cluster, then pick top from the cluster Statistical parity | Disparate impact

Slide 15

Slide 15 text

Implications: text search

Slide 16

Slide 16 text

Measuring impacts on users http://fate.infoseeking.org/googleornot.php

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Translating systems to experiences • Most people can’t tell the difference between original Google results and those with ε=0.3. • But they can if ε>0.5. • Lesson: We can provide diversity in search results in a careful way that helps reduce bias while keeping user satisfaction.

Slide 19

Slide 19 text

Implications: image search

Slide 20

Slide 20 text

Query “CEO United States”

Slide 21

Slide 21 text

Query “CEO UK”

Slide 22

Slide 22 text

Feng, Y. & Shah, C. (2022). Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search. AAAI Conference on Artificial Intelligence. February 22-March 1, 2022. Vancouver, Canada.

Slide 23

Slide 23 text

Reducing bias using “fair-greedy” approach Feng, Y. & Shah, C. (2022). Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search. AAAI Conference on Artificial Intelligence. February 22- March 1, 2022. Vancouver, Canada.

Slide 24

Slide 24 text

But… • [Technical] Multi-objective optimization (fairness in marketplace) is hard and not always well-defined. • [Business] Re-ranking brings additional costs. • [Social] Our notions of what’s biased, what’s fair, and what’s good keep changing.

Slide 25

Slide 25 text

Summary • Large scale information access systems suffer from problems of bias, unfairness, and opaqueness – some due to technical issues, some due to business objectives, and some are social issues. • We could audit these systems and create education, awareness, and advocacy around them. • Ideally, we need a multifaceted approach similar to curbing smoking.

Slide 26

Slide 26 text

Thank you!