Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Data Bias Gets Real

shirankrasnov
February 24, 2021
1.7k

When Data Bias Gets Real

shirankrasnov

February 24, 2021
Tweet

Transcript

  1. When Data Bias Gets Real Detecting & Preventing Most Common

    Data Biases with: Shiran Krasnov Data Analyst & a blogger: Shiran.Tips 20:00 העשב 24.2.2021 יעיבר םוי event-ב האצרהל רושיק
  2. 4

  3. 5 The Metropolitan Police in London used facial-recognition cameras to

    scan for wanted people. Credit: Kelvin Chan/AP/Shutterstock
  4. Hello! I am Shiran Krasnov I am a Data Analyst

    | FP&A at Ex- Libris And a Blogger: https://shiran.tips/ You can find me at [email protected] 6
  5. What we are going to talk about? 7 Simple and

    real life examples Most common types of data biases Why it’s only getting worse? How to prevent this phenomenon
  6. Survivorship Bias When specific parts of the data have ‘survived’

    some selection criteria, the data set does not represent the whole population => may cause false conclusions 1 8
  7. Danger of Summary Metrics When looking only the statistical measurements

    such as: mean, variance and correlation => may cause you to miss big differences in raw data 2 1
  8. Correlation vs Causality When two events follow each other =>

    but their relations have no reason or a hidden one 3 1
  9. 15

  10. Simpson’s Paradox When a trend appears in different subsets of

    data => reverses when combining the groups 4 1
  11. 20 Subject Female Male Math 15% (270 out of 1800)

    14% (168 out of 1200) Physics 51% (102 out of 200) 50% (400 out of 800)
  12. 21 Subject Female Male Math 15% (270 out of 1800)

    14% (168 out of 1200) Physics 51% (102 out of 200) 50% (400 out of 800) Status Female Male Applicants 2000 2000 Admitted 372 568 Admitted (%) 19% 28%
  13. Gambler’s Fallacy When an individual erroneously believes that a certain

    random event is less likely or more likely, => given a previous series of events. 5 2
  14. 23

  15. 24

  16. Survivorship Bias When specific parts of the data have ‘survived’

    some selection criteria, the data set does not represent the whole population => may cause false conclusions. Danger of Summary Metrics When looking only the statistic measurements such as: mean, variance and correlation => may cause missing big differences in raw data. Simpson’s Paradox When a trend appears in different subsets of data => reverses when combining the groups. Correlation vs Causality When two events follow each other => but their relations have no reason or an hidden one Gambler’s Fallacy When an individual erroneously believes that a certain random event is less likely or more likely, => given a previous series of events. 26 Recap
  17. 27

  18. Who codes Matters An inclusive team can check each other

    blind spots. Yes! How we code matters Factoring fairness and think about ethics Why we code matters Social change should be a priority and not after thought 35 Can we change it?