Abstracting Fairness: Oracles, Metrics, and Interpretability

Dwork, Cynthia; Ilvento, Christina; Rothblum, Guy N.; Sur, Pragya

doi:10.4230/LIPIcs.FORC.2020.8

Abstract

It is well understood that classification algorithms, for example, for deciding on loan applications, cannot be evaluated for fairness without taking context into account. We examine what can be learned from a fairness oracle equipped with an underlying understanding of "true" fairness. The oracle takes as input a (context, classifier) pair satisfying an arbitrary fairness definition, and accepts or rejects the pair according to whether the classifier satisfies the underlying fairness truth. Our principal conceptual result is an extraction procedure that learns the underlying truth; moreover, the procedure can learn an approximation to this truth given access to a weak form of the oracle. Since every "truly fair" classifier induces a coarse metric, in which those receiving the same decision are at distance zero from one another and those receiving different decisions are at distance one, this extraction process provides the basis for ensuring a rough form of metric fairness, also known as individual fairness.
Our principal technical result is a higher fidelity extractor under a mild technical constraint on the weak oracle’s conception of fairness. Our framework permits the scenario in which many classifiers, with differing outcomes, may all be considered fair.
Our results have implications for interpretablity - a highly desired but poorly defined property of classification systems that endeavors to permit a human arbiter to reject classifiers deemed to be "unfair" or illegitimately derived.

Benjamin R Baer, Daniel E Gilbert, and Martin T Wells. Fairness criteria through the lens of directed acyclic graphical models. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.11333.
Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2):277-292, 2010.
Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153-163, 2017.
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214-226. ACM, 2012.
Cynthia Dwork and Christina Ilvento. Fairness under composition. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, pages 33:1-33:20, 2019.
Cynthia Dwork, Michael P Kim, Omer Reingold, Guy N Rothblum, and Gal Yona. Learning from outcomes: Evidence-based rankings. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 106-125. IEEE, 2019.
Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint, 2015. URL: http://arxiv.org/abs/1511.05897.
Stephen Gillen, Christopher Jung, Michael Kearns, and Aaron Roth. Online learning with an unknown fairness metric. In Advances in Neural Information Processing Systems, pages 2600-2609, 2018.
Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315-3323, 2016.
Úrsula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, pages 1944-1953, 2018.
Christina Ilvento. Metric learning for individual fairness. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.00250.
Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pages 325-333, 2016.
Christopher Jung, Sampath Kannan, Changwa Lee, Mallesh M. Pai, Aaron Roth, and Rakesh Vohra. Fair prediction with endogenous behavior. Manuscript shared with authors.
Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. Eliciting and enforcing subjective individual fairness. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.10660.
Faisal Kamiran and Toon Calders. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication, pages 1-6. IEEE, 2009.
Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643-650. IEEE, 2011.
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2569-2577, 2018.
Niki Kilbertus, Philip J Ball, Matt J Kusner, Adrian Weller, and Ricardo Silva. The sensitivity of counterfactual fairness to unmeasured confounding. arXiv preprint, 2019. URL: http://arxiv.org/abs/1907.01040.
Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656-666, 2017.
Michael Kim, Omer Reingold, and Guy Rothblum. Fairness through computationally-bounded awareness. In Advances in Neural Information Processing Systems, pages 4842-4852, 2018.
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066-4076, 2017.
Himabindu Lakkaraju. Course notes for compsci 282br, harvard university: Interpretability and explainability in machine learning, 2019.
Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):31-57, 2018.
David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. arXiv preprint, 2018. URL: http://arxiv.org/abs/1802.06309.
Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
Roland Neil and Christopher Winship. Methodological challenges and opportunities in testing for racial discrimination in policing. Annual Review of Criminology, 2:73-98, 2019.
Judea Pearl. Causality. Cambridge university press, 2009.
Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 560-568, 2008.
Gal Yona and Guy N. Rothblum. Probably approximately metric-fair learning. In ICML, 2018.
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 325-333, 2013.

Abstracting Fairness: Oracles, Metrics, and Interpretability

Authors Cynthia Dwork, Christina Ilvento, Guy N. Rothblum, Pragya Sur

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Abstracting Fairness: Oracles, Metrics, and Interpretability

Authors Cynthia Dwork, Christina Ilvento, Guy N. Rothblum, Pragya Sur

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message