When Data Bias Gets Real

When Data Bias Gets Real Detecting & Preventing Most Common
Data Biases with: Shiran Krasnov Data Analyst & a blogger: Shiran.Tips 20:00 העשב 24.2.2021 יעיבר םוי event-ב האצרהל רושיק

When Data Bias Gets Real Detecting & Preventing Most Common
Data Biases

Ain’t I a woman? 3 https://youtu.be/UG_X_7g63rY?t=48 Joy Buolamwini

5 The Metropolitan Police in London used facial-recognition cameras to
scan for wanted people. Credit: Kelvin Chan/AP/Shutterstock

Hello! I am Shiran Krasnov I am a Data Analyst
| FP&A at Ex- Libris And a Blogger: https://shiran.tips/ You can ﬁnd me at [email protected] 6

What we are going to talk about? 7 Simple and
real life examples Most common types of data biases Why it’s only getting worse? How to prevent this phenomenon

Survivorship Bias When speciﬁc parts of the data have ‘survived’
some selection criteria, the data set does not represent the whole population => may cause false conclusions 1 8

Abraham Wald

Danger of Summary Metrics When looking only the statistical measurements
such as: mean, variance and correlation => may cause you to miss big differences in raw data 2 1

Correlation vs Causality When two events follow each other =>
but their relations have no reason or a hidden one 3 1

https://www.tylervigen.com/spurious-correlations 16

Simpson’s Paradox When a trend appears in different subsets of
data => reverses when combining the groups 4 1

19 Status Female Male Applicants 2000 2000 Admitted 372 568
Admitted (%) 19% 28%

20 Subject Female Male Math 15% (270 out of 1800)
14% (168 out of 1200) Physics 51% (102 out of 200) 50% (400 out of 800)

21 Subject Female Male Math 15% (270 out of 1800)
14% (168 out of 1200) Physics 51% (102 out of 200) 50% (400 out of 800) Status Female Male Applicants 2000 2000 Admitted 372 568 Admitted (%) 19% 28%

Gambler’s Fallacy When an individual erroneously believes that a certain
random event is less likely or more likely, => given a previous series of events. 5 2

Survivorship Bias When speciﬁc parts of the data have ‘survived’
some selection criteria, the data set does not represent the whole population => may cause false conclusions. Danger of Summary Metrics When looking only the statistic measurements such as: mean, variance and correlation => may cause missing big differences in raw data. Simpson’s Paradox When a trend appears in different subsets of data => reverses when combining the groups. Correlation vs Causality When two events follow each other => but their relations have no reason or an hidden one Gambler’s Fallacy When an individual erroneously believes that a certain random event is less likely or more likely, => given a previous series of events. 26 Recap

AI Algorithms Machine Learning

“ Algorithms like viruses can spread bias on massive scale
and at rapid pace

https://youtu.be/heQzqX35c9A?t=7

31 Remember the facial (un)recognition?

32 Biased AI Recruiting Tool

350M Every day 4K Every second 243K Every minute 33

34 “I had pasta tonight”

Who codes Matters An inclusive team can check each other
blind spots. Yes! How we code matters Factoring fairness and think about ethics Why we code matters Social change should be a priority and not after thought 35 Can we change it?

Thanks! Any questions? You can ﬁnd me at https://shiran.tips/ [email protected]
Shiran Krasnov Shiran Zakaim Krasnov 36

When Data Bias Gets Real

When Data Bias Gets Real

shirankrasnov

More Decks by shirankrasnov

Featured

Transcript