Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Data Lorax: Planting The Seeds of Fairness in Data Products

62321e5935c9c0731462b8178a7423f8?s=47 OmaymaS
November 07, 2018

The Data Lorax: Planting The Seeds of Fairness in Data Products

Invited talk at #DatafestTbilisi2018.
https://datafest.ge/

62321e5935c9c0731462b8178a7423f8?s=128

OmaymaS

November 07, 2018
Tweet

Transcript

  1. THE DATA LORAX PLANTING THE SEEDS OF FAIRNESS IN DATA

    PRODUCTS OMAYMA SAID DATA SCIENTIST
  2. None
  3. The Once-ler

  4. LET’S UNLOCK THE VALUE OF THE THNEED!

  5. The Lorax I SPEAK FOR THE TREES!

  6. EVERYBODY NEEDS A THNEED!

  7. None
  8. None
  9. THNEED AT SCALE!

  10. None
  11. None
  12. None
  13. None
  14. UNLESS WHAT?

  15. DATA IS THE NEW OIL “ ”

  16. Data is the new oil, in the way that oil

    is a ubiquitous commodity that requires incredible resource allocation to extract value from, deep expertise to manage – and even when all that goes well – can have universally consequential negative externalities.* “ Drew Conway Founder & CEO
  17. AI-POWERED [----] ML-ENABLED [----]

  18. MACHINES LEARN WHO IS THE TEACHER ?

  19. KODAK SHIRLEY CARDS (1960s & 1970s)

  20. SHIRLEY CARDS Several “Shirley cards” from the 1960s and 1970s.

    1960s & 1970s
  21. SHIRLEY CARDS A mixed-color photos by Walt Jabsco, 1960s &

    1970s
  22. SHIRLEY CARDS A mixed-color photos by Walt Jabsco, 1960s &

    1970s PRODUCT FAILED DUE TO SOMETHING INDIVIDUALS CAN’T CHANGE ABOUT THEMSELVES!
  23. IMAGES DATASETS (NOW)

  24. NOW IN A DIFFERENT CONTEXT

  25. NOW IN A DIFFERENT CONTEXT OPEN IMAGES

  26. NOW MORE DIVERSITY? OPEN IMAGES IN A DIFFERENT CONTEXT

  27. No Classification without Representation Assessing Geodiversity Issues in Open Data

    Sets for the Developing World* Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley Google Brain Team Open Images ImageNet US US
  28. WEDDING PHOTOS Photos of bridegrooms from different countries aligned by

    the log-likelihood that the classifier trained on Open Images assigns to the bridegroom class (Source) BETTER AND MORE CONSISTENT CLASSIFICATION
  29. The WEIRDest people in the world? Joseph Henrich, Steven J.

    Heine, Ara Norenzayan University of British Columbia*
  30. The WEIRDest people in the world? Western Educated Industrialized Rich

    Democratic
  31. INCLUSIVE IMAGE COMPETITION Wedding photographs (donated by Googlers), labeled by

    a classifier trained on the Open Images dataset. Source: Introducing The Inclusive Images Competition
  32. None
  33. Amazon’s system TAUGHT ITSELF that male candidates were preferable. It

    penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter. They did not specify the names of the schools. “
  34. Amazon’s system TAUGHT ITSELF that male candidates were preferable. It

    penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter. They did not specify the names of the schools. “ LEARNED FROM HUMANS
  35. HUMAN BIAS AMPLIFICATION

  36. UNFAIRNESS @ SCALE

  37. I worry all the time about building things and not

    having the foresight coz I'm just flawed and imperfect as everybody else, to know the consequences of what i am doing, and hurting ppl who can't bear the cost nearly as well as I can do. “ Josh Wills Software Engineer (Former Director of Data Engineering) I Build The Black Box: Grappling with Product and Policy
  38. REMEMBER THAT IT IS PEOPLE WHO COLLECT/LABEL DATA BUILD DATA

    PRODUCTS DEFINE METRICS
  39. COLLECT/LABEL DATA BUILD DATA PRODUCTS DEFINE METRICS REMEMBER THAT IT

    IS PEOPLE WHO BIAS IN: - REPRESENTATION - DISTRIBUTION - LABELS AND MORE…..
  40. COLLECT/LABEL DATA BUILD DATA PRODUCTS DEFINE METRICS REMEMBER THAT IT

    IS PEOPLE WHO - TRAIN/TEST SPLIT - FEATURES/PROXIES - COMPLEX MODELS INTERPRETABILITY AND MORE…..
  41. - WHAT IS THE IMPACT OF DIFFERENT ERROR TYPES ON

    DIFFERENT GROUPS? BUILD DATA PRODUCTS DEFINE METRICS REMEMBER THAT IT IS PEOPLE WHO - WHAT DO YOU OPTIMIZE FOR? COLLECT/LABEL DATA
  42. What we're still missing is an understanding for how to

    put ethics into practice in data as well as the overall product development process. “ ” DJ Patil Hilary Mason GM of Machine Learning Mike Loukides Vice President, Content Strategy
  43. THINGS WON’T GET BETTER

  44. UNLESS SOMEONE LIKE YOU CARES A WHOLE AWFUL LOT, NOTHING

    IS GOING TO GET BETTER, IT’S NOT! “ ”
  45. THE DATA LORAX PLANTING THE SEEDS OF FAIRNESS IN DATA

    PRODUCTS OMAYMA SAID DATA SCIENTIST