,
Kevin Stangl
Creative Commons Attribution 3.0 Unported license
Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint [Hardt et al., 2016] combined with ERM will provably recover the Bayes optimal classifier under a range of bias models. We also consider other recovery methods including re-weighting the training data, Equalized Odds, and Demographic Parity, and Calibration. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.
@InProceedings{blum_et_al:LIPIcs.FORC.2020.3,
author = {Blum, Avrim and Stangl, Kevin},
title = {{Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?}},
booktitle = {1st Symposium on Foundations of Responsible Computing (FORC 2020)},
pages = {3:1--3:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-142-9},
ISSN = {1868-8969},
year = {2020},
volume = {156},
editor = {Roth, Aaron},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2020.3},
URN = {urn:nbn:de:0030-drops-120192},
doi = {10.4230/LIPIcs.FORC.2020.3},
annote = {Keywords: fairness in machine learning, equal opportunity, bias, machine learning}
}