Loss Minimization Through the Lens Of Outcome Indistinguishability

Authors Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, Udi Wieder

Thumbnail PDF


  • Filesize: 0.7 MB
  • 20 pages

Document Identifiers

Author Details

Parikshit Gopalan
  • Apple, Cupertino, CA, USA
Lunjia Hu
  • Stanford University, CA, USA
Michael P. Kim
  • Miller Institute, UC Berkeley, CA, USA
Omer Reingold
  • Stanford University, CA, USA
Udi Wieder
  • VMware Research, Palo Alto, CA, USA


We would like to thank Konstantinos Stavropoulos for finding a bug in an earlier version of this paper and suggesting a fix. PG and MPK would like to thank Mihir Singhal and Shengjia Zhao for several discussions while working on [Parikshit Gopalan et al., 2022] which inspired some of this work. PG would like to thank Adam Klivans, Aravind Gollakota, and Konstantinos Stavropoulos for helpful discussions and comments on earlier versions of this paper and Raghu Meka and Varun Kanade for pointers to the literature.

Cite AsGet BibTex

Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. Loss Minimization Through the Lens Of Outcome Indistinguishability. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 60:1-60:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests - based on a collection of losses and hypothesis class - a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature’s true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses. This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape.

Subject Classification

ACM Subject Classification
  • Theory of computation → Theory and algorithms for application domains
  • Loss Minimization
  • Indistinguishability


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alan Agresti. Foundations of Linear and Generalized Linear Models. Wiley, 2015. Google Scholar
  2. Peter Auer, Mark Herbster, and Manfred K. Warmuth. Exponentially many local minima for single neurons. In Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, USA, November 27-30, 1995, pages 316-322. MIT Press, 1995. Google Scholar
  3. Shai Ben-David, Philip M. Long, and Yishay Mansour. Agnostic boosting. In 14th Annual Conference on Computational Learning Theory, COLT, 2001. Google Scholar
  4. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
  5. Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.). Wiley, 2006. URL: http://www.elementsofinformationtheory.com/.
  6. Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Outcome indistinguishability. In ACM Symposium on Theory of Computing (STOC'21), 2021. URL: http://arxiv.org/abs/2011.13426.
  7. Dean P. Foster and Rakesh V. Vohra. Asymptotic calibration. Biometrika, 85(2):379-390, 1998. Google Scholar
  8. Surbhi Goel, Varun Kanade, Adam R. Klivans, and Justin Thaler. Reliably learning the relu in polynomial time. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, volume 65 of Proceedings of Machine Learning Research, pages 1004-1042. PMLR, 2017. URL: http://proceedings.mlr.press/v65/goel17a.html.
  9. Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. Loss minimization through the lens of outcome indistinguishability. arXiv preprint arXiv:2210.08649, 2022. Google Scholar
  10. Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Innovations in Theoretical Computer Science (ITCS'2022), 2022. URL: http://arxiv.org/abs/2109.05389.
  11. Parikshit Gopalan, Michael P. Kim, Mihir Singhal, and Shengjia Zhao. Low-degree multicalibration. In Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 3193-3234. PMLR, 2022. Google Scholar
  12. Parikshit Gopalan, Omer Reingold, Vatsal Sharan, and Udi Wieder. Multicalibrated partitions for importance weights. In International Conference on Algorithmic Learning Theory, 29-1 April 2022, Paris, France, volume 167 of Proceedings of Machine Learning Research, pages 408-435. PMLR, 2022. Google Scholar
  13. Moritz Hardt and Benjamin Recht. Patterns, predictions, and actions: A story about machine learning. arXiv preprint, 2021. URL: http://arxiv.org/abs/2102.05242.
  14. Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018. Google Scholar
  15. Sham M. Kakade, Adam Kalai, Varun Kanade, and Ohad Shamir. Efficient learning of generalized linear and single index models with isotonic regression. In 25th Annual Conference on Neural Information Processing Systems 2011., pages 927-935, 2011. Google Scholar
  16. Adam Kalai. Learning monotonic linear functions. In Learning Theory, 17th Annual Conference on Learning Theory, COLT 2004, volume 3120 of Lecture Notes in Computer Science, pages 487-501. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27819-1_34.
  17. Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pages 629-638. ACM, 2008. URL: https://doi.org/10.1145/1374376.1374466.
  18. Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: High-dimensional isotonic regression. In COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009. Google Scholar
  19. Adam Tauman Kalai and Rocco A Servedio. Boosting in the presence of noise. Journal of Computer and System Sciences, 71(3):266-290, 2005. Google Scholar
  20. Varun Kanade. Computational learning theory. learning real-valued functions, michaelmas term 2018. URL: https://www.cs.ox.ac.uk/people/varun.kanade/teaching/CLT-MT2018/.
  21. Michael J Kearns and Umesh Vazirani. An introduction to computational learning theory. MIT press, 1994. Google Scholar
  22. Michael P. Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247-254, 2019. Google Scholar
  23. Michael P. Kim and Juan C. Perdomo. Making decisions under outcome performativity. arXiv preprint arXiv:2210.01745, 2022. Google Scholar
  24. Yishay Mansour and David McAllester. Boosting using branching programs. Journal of Computer and System Sciences, 64(1):103-112, 2002. Google Scholar
  25. Hamed Masnadi-Shirazi and Nuno Vasconcelos. On the design of loss functions for classification: theory, robustness to outliers, and savageboost. Advances in neural information processing systems, 21, 2008. Google Scholar
  26. P. McCullagh and J. A. Nelder. Generalized Linear Models (2nd ed.). Chapman and Hall, 1989. Google Scholar
  27. Frank Nielsen. An elementary introduction to information geometry. CoRR, abs/1808.08271, 2018. URL: http://arxiv.org/abs/1808.08271.
  28. Philippe Rigollet. Statistics for applications, lecture notes. lecture 10: Generelized linear models, fall 2016. Google Scholar
  29. Guy N Rothblum and Gal Yona. Multi-group agnostic pac learnability. In International Conference on Machine Learning, pages 9107-9115. PMLR, 2021. Google Scholar
  30. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. Google Scholar
  31. Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trendsregistered in Machine Learning, 4(2):107-194, 2012. Google Scholar
  32. Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Learning kernel-based halfspaces with the 0-1 loss. SIAM J. Comput., 40(6):1623-1646, 2011. URL: https://doi.org/10.1137/100806126.
  33. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267-288, 1996. Google Scholar
  34. Bianca Zadrozny and Charles Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 609-616. Morgan Kaufmann, 2001. Google Scholar