Abstracting Fairness: Oracles, Metrics, and Interpretability

Authors Cynthia Dwork, Christina Ilvento, Guy N. Rothblum, Pragya Sur

Thumbnail PDF


  • Filesize: 0.52 MB
  • 16 pages

Document Identifiers

Author Details

Cynthia Dwork
  • Harvard John A Paulson School of Engineering and Applied Sciences, Cambridge, MA, USA
  • Radcliffe Institute for Advanced Study, Cambridge, MA, USA
  • Microsoft Research, Mountain View, CA, USA
Christina Ilvento
  • Harvard John A Paulson School of Engineering and Applied Sciences, Cambridge, MA, USA
Guy N. Rothblum
  • Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
Pragya Sur
  • Harvard University, Center for Research on Computation and Society, Cambridge, MA, USA


This research was conducted, in part, while the authors were at Microsoft Research, Silicon Valley.

Cite AsGet BibTex

Cynthia Dwork, Christina Ilvento, Guy N. Rothblum, and Pragya Sur. Abstracting Fairness: Oracles, Metrics, and Interpretability. In 1st Symposium on Foundations of Responsible Computing (FORC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 156, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


It is well understood that classification algorithms, for example, for deciding on loan applications, cannot be evaluated for fairness without taking context into account. We examine what can be learned from a fairness oracle equipped with an underlying understanding of "true" fairness. The oracle takes as input a (context, classifier) pair satisfying an arbitrary fairness definition, and accepts or rejects the pair according to whether the classifier satisfies the underlying fairness truth. Our principal conceptual result is an extraction procedure that learns the underlying truth; moreover, the procedure can learn an approximation to this truth given access to a weak form of the oracle. Since every "truly fair" classifier induces a coarse metric, in which those receiving the same decision are at distance zero from one another and those receiving different decisions are at distance one, this extraction process provides the basis for ensuring a rough form of metric fairness, also known as individual fairness. Our principal technical result is a higher fidelity extractor under a mild technical constraint on the weak oracle’s conception of fairness. Our framework permits the scenario in which many classifiers, with differing outcomes, may all be considered fair. Our results have implications for interpretablity - a highly desired but poorly defined property of classification systems that endeavors to permit a human arbiter to reject classifiers deemed to be "unfair" or illegitimately derived.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
  • Algorithmic fairness
  • fairness definitions
  • causality-based fairness
  • interpretability
  • individual fairness
  • metric fairness


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Benjamin R Baer, Daniel E Gilbert, and Martin T Wells. Fairness criteria through the lens of directed acyclic graphical models. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.11333.
  2. Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2):277-292, 2010. Google Scholar
  3. Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153-163, 2017. Google Scholar
  4. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214-226. ACM, 2012. Google Scholar
  5. Cynthia Dwork and Christina Ilvento. Fairness under composition. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, pages 33:1-33:20, 2019. Google Scholar
  6. Cynthia Dwork, Michael P Kim, Omer Reingold, Guy N Rothblum, and Gal Yona. Learning from outcomes: Evidence-based rankings. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 106-125. IEEE, 2019. Google Scholar
  7. Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint, 2015. URL: http://arxiv.org/abs/1511.05897.
  8. Stephen Gillen, Christopher Jung, Michael Kearns, and Aaron Roth. Online learning with an unknown fairness metric. In Advances in Neural Information Processing Systems, pages 2600-2609, 2018. Google Scholar
  9. Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315-3323, 2016. Google Scholar
  10. Úrsula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, pages 1944-1953, 2018. Google Scholar
  11. Christina Ilvento. Metric learning for individual fairness. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.00250.
  12. Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pages 325-333, 2016. Google Scholar
  13. Christopher Jung, Sampath Kannan, Changwa Lee, Mallesh M. Pai, Aaron Roth, and Rakesh Vohra. Fair prediction with endogenous behavior. Manuscript shared with authors. Google Scholar
  14. Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. Eliciting and enforcing subjective individual fairness. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.10660.
  15. Faisal Kamiran and Toon Calders. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication, pages 1-6. IEEE, 2009. Google Scholar
  16. Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643-650. IEEE, 2011. Google Scholar
  17. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2569-2577, 2018. Google Scholar
  18. Niki Kilbertus, Philip J Ball, Matt J Kusner, Adrian Weller, and Ricardo Silva. The sensitivity of counterfactual fairness to unmeasured confounding. arXiv preprint, 2019. URL: http://arxiv.org/abs/1907.01040.
  19. Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656-666, 2017. Google Scholar
  20. Michael Kim, Omer Reingold, and Guy Rothblum. Fairness through computationally-bounded awareness. In Advances in Neural Information Processing Systems, pages 4842-4852, 2018. Google Scholar
  21. Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  22. Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066-4076, 2017. Google Scholar
  23. Himabindu Lakkaraju. Course notes for compsci 282br, harvard university: Interpretability and explainability in machine learning, 2019. Google Scholar
  24. Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):31-57, 2018. Google Scholar
  25. David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. arXiv preprint, 2018. URL: http://arxiv.org/abs/1802.06309.
  26. Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. Google Scholar
  27. Roland Neil and Christopher Winship. Methodological challenges and opportunities in testing for racial discrimination in policing. Annual Review of Criminology, 2:73-98, 2019. Google Scholar
  28. Judea Pearl. Causality. Cambridge university press, 2009. Google Scholar
  29. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 560-568, 2008. Google Scholar
  30. Gal Yona and Guy N. Rothblum. Probably approximately metric-fair learning. In ICML, 2018. Google Scholar
  31. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 325-333, 2013. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail