Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Bias - Pycon India 2019

Machine Learning Bias - Pycon India 2019

Bias Bias Bias

AbdulMajedRaja RS

October 12, 2019
Tweet

More Decks by AbdulMajedRaja RS

Other Decks in Technology

Transcript

  1. Machine Learning Bias
    AbdulMajedRaja RS

    View full-size slide

  2. Outline
    ● Recognizing the Problem
    ● What’s Machine Learning Bias?
    ● Definition of “Fairness”
    ● Interpretable Machine Learning
    ● Python Tools
    ● Case Study

    View full-size slide

  3. Thoughts?
    What if I told you Computers can lie?
    Would you believe me?

    View full-size slide

  4. ImageNet Roulette
    https://nerdcore.de/2019/03/12/imagenet-roulette/

    View full-size slide

  5. The Problem - Samples

    View full-size slide

  6. (Biased?) Google Translation at Work

    View full-size slide

  7. But Wait, Why is this concerning?
    After all, This is just Google Translate

    View full-size slide

  8. Biased-Google Photos App at Work

    View full-size slide

  9. Perhaps, That’s just Google?
    Nope, Here’s Microsoft!

    View full-size slide

  10. Microsoft’s super-cool Teen Tweeting Bot Tay

    View full-size slide

  11. Oops, Got it!
    There, definitely, is Bias!
    What’s next?

    View full-size slide

  12. ML Bias - What

    View full-size slide

  13. What’s Machine Learning Bias?
    A Machine Learning Algorithm being
    “unfair” with its Predictions
    A Machine Learning Algorithm missing
    “Fairness”

    View full-size slide

  14. ML Bias - (un)Fairness

    View full-size slide

  15. Disclaimer
    No Common Consensus / Standard
    definition of Fairness

    View full-size slide

  16. ML Bias - un(Fairness)
    ● Group Fairness
    Partitions a population into groups defined by protected attributes(such as gender,
    caste, or religion) and seeks for some statistical measure to be equal across
    groups.
    ● Individual Fairness - similar individuals should be treated similarly.

    View full-size slide

  17. ML Bias - Causes (Data)
    ● Skewed sample
    ● Tainted examples
    ● Limited features
    ● Sample size disparity
    ● Proxies

    View full-size slide

  18. ML Bias - Mitigate

    View full-size slide

  19. Mitigation
    Also means, Improving Fairness

    View full-size slide

  20. ML Bias - Improving Fairness
    Pre-Processing Training (Optimization) Post-Processing
    Learn a New
    Representation - Free
    from Sensitive Variable
    Yet, preserving the
    Information
    Add a constraint or a
    regularization term
    Find a proper threshold
    using the original score
    function

    View full-size slide

  21. ML Bias - Happening

    View full-size slide

  22. Mention of ML Fairness in Research Papers

    View full-size slide

  23. Difficulties in ensuring ML Algorithm is unbiased

    View full-size slide

  24. Interpretable
    Machine Learning

    View full-size slide

  25. Today - Modelling Architecture

    View full-size slide

  26. IML - Definition
    Interpretable Machine Learning refers to methods and
    models that make the behavior and predictions of
    machine learning systems understandable
    to humans.

    View full-size slide

  27. IML - Benefits
    ● Fairness: Ensuring that predictions are unbiased and do not implicitly or explicitly
    discriminate against protected groups. An interpretable model can tell you why it
    has decided that a certain person should not get a loan, and it becomes easier for a
    human to judge whether the decision is based on a learned demographic (e.g.
    racial) bias.
    ● Privacy: Ensuring that sensitive information in the data is protected.
    ● Reliability or Robustness: Ensuring that small changes in the input do not lead to
    large changes in the prediction.
    ● Causality: Check that only causal relationships are picked up.
    ● Trust: It is easier for humans to trust a system that explains its decisions
    compared to a black box.

    View full-size slide

  28. Modelling Architecture - with IML

    View full-size slide

  29. Preferred Explaining - Model Interpretation

    View full-size slide

  30. Python Tools

    View full-size slide

  31. Explainability and Fairness - Just one `pip` away
    ● lime - https://github.com/marcotcr/lime
    ● shap - https://github.com/slundberg/shap
    explainer = shap.TreeExplainer(model)
    shap_values = explainer.shap_values(X)
    ● eli5 - https://github.com/TeamHG-Memex/eli5
    ● scikit-lego - https://github.com/koaning/scikit-lego
    from sklego.preprocessing import InformationFilter
    from sklego.linear_model import FairClassifier
    ● What-if Tool - https://pair-code.github.io/what-if-tool/
    ● Captum - https://github.com/pytorch/captum

    View full-size slide

  32. Attrition Prediction - People Analytics
    Objective:
    ● Building a Machine Learning Solution to Predict Employee Attrition in the
    Organization for the next Quarter
    Data used:
    ● Demographic data
    ● Compensation data
    ● Promotion data
    ● Reward & recognition Data

    View full-size slide

  33. Attrition Prediction - People Analytics
    Final Model:
    ● Ensemble of Bagging (RandomForest) and Boosting (xgboost) with weighted
    Average
    Accuracy: Acceptable

    View full-size slide

  34. All Good?
    Can we go ahead and Productionize?
    But, Wait!

    View full-size slide

  35. Attrition Prediction - People Analytics
    Interpreting the Model:
    ● Variable Importance Plot - for unboxing the Blackbox methods
    Result:
    ● Maternity Leave (x) is one of the most important Variable to Attrition (y)

    View full-size slide

  36. Machine Learning Ethics?

    View full-size slide

  37. Attrition Prediction - People Analytics
    Maternity Leave
    Married
    Female
    Gender
    Implications with Proceeding with this Model
    ● Female Employees taking Maternity Leave would be suspected of Leaving the Job
    soon
    ● Future Hiring of Married Female Employees would be scrutinized

    View full-size slide

  38. Attrition Prediction - People Analytics
    Result
    ● Retrained the Model with `Maternity Leave` made a `Protected Attributed` and
    made `unaware` to the Model during the Training
    ● Thus, Newly built model excludes the Sensitive Variable (`Maternity Leave`) that
    lead to Bias against a particular segment (`Female & Married`)
    Impact
    ● Reduction in Model Accuracy Score
    ● But, Job Delivered to the HR Department with a Model of No Obvious Bias in it

    View full-size slide

  39. Attrition Prediction - People Analytics
    Lessons Learnt
    ● Unlike the obvious presence of Bias from Data being transferred to the Model, In
    this case, there’s no Bias (as such) in the Data
    ● But the Model during the Training (Feature Engineering) learnt which leads to
    Bias
    ● Mostly, It comes down to Trade-off between Accuracy and Responsible Data
    Science
    ● Better techniques, just other than `unaware` could have been used to minimize the
    accuracy loss
    ● Machine Learning Ethics Matter to be built something that’s fair to everyone

    View full-size slide

  40. Inspirations / References
    ● Artificial Intelligence needs all of us | Rachel Thomas P.h.D. | TEDxSanFrancisco
    ● Machine Learning Fairness - Google
    ● A Tutorial on Fairness in Machine Learning - Ziyuan Zhong
    ● Reducing bias and ensuring fairness in data science by Henry Hinnefeld
    ● The Trouble with Bias - NIPS 2017 Keynote - Kate Crawford
    ● Interpretable Machine Learning - Christoph Molnar
    ● What’s in a Name?Reducing Bias in Bios without Access to Protected Attributes (Arxiv)
    ● Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019 (YouTube)

    View full-size slide

  41. It’s easier to be just another cool Data Scientist
    What’s tougher is, to be a
    Responsible Ethics-driven Data Scientist
    And, This is a choice only you can make!
    Thank you!

    View full-size slide