Law Review, Vol 104, 2016 “Data mining can go wrong in any number of ways: by choosing a target variable that correlates to protected class more than others would, by injecting current or past prejudice into the decision about what makes a good training example, by choosing too small a feature set, or by not diving deep enough into each feature. ” http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899 Worse, many datasets encode actual discrimination even if they were collected fairly. Outright racial barriers to housing purchases were common 50 years ago and still exist today. If ZIP code is a feature in your model, it may reflect this discrimination. You may have to actively guard against it. Of course, these are not new problems in statistics, but sometimes people presume that since we’re using an algorithm to do the analysis we are somehow freed from human bias and demographic differences. That’s just not true!