Authors Parikshit Gopalan , Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

Thumbnail PDF


  • Filesize: 0.9 MB
  • 21 pages

Document Identifiers

Author Details

Parikshit Gopalan
  • VMware Research, Palo Alto, California, USA
Adam Tauman Kalai
  • Microsoft Research, Boston, MA, USA
Omer Reingold
  • Stanford University, CA, USA
Vatsal Sharan
  • University of Southern California, Los Angeles, CA, USA
Udi Wieder
  • VMware Research, Palo Alto, California, USA

Cite AsGet BibTex

Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 79:1-79:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?"). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be known at the time of learning, requiring the algorithm to use a best-guess loss function. Alternatively, the same classifier may be used to inform multiple decisions, which correspond to multiple loss functions, requiring multiple learning algorithms to be run on the same data. We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action. We introduce the notion of an (L,𝒞)-omnipredictor, which could be used to optimize any loss in a family L. Once the loss function is set, the outputs of the predictor can be post-processed (a simple univariate data-independent transformation of individual predictions) to do well compared with any hypothesis from the class C. The post processing is essentially what one would perform if the outputs of the predictor were true probabilities of the uncertain events. In a sense, omnipredictors extract all the predictive power from the class 𝒞, irrespective of the loss function in L. We show that such "loss-oblivious" learning is feasible through a connection to multicalibration, a notion introduced in the context of algorithmic fairness. A multicalibrated predictor doesn’t aim to minimize some loss function, but rather to make calibrated predictions, even when conditioned on inputs lying in certain sets c belonging to a family 𝒞 which is weakly learnable. We show that a 𝒞-multicalibrated predictor is also an (L,𝒞)-omnipredictor, where L contains all convex loss functions with some mild Lipschitz conditions. The predictors are even omnipredictors with respect to sparse linear combinations of functions in 𝒞. As a corollary, we deduce that distribution-specific weak agnostic learning is complete for a large class of loss minimization tasks. In addition, we show how multicalibration can be viewed as a solution concept for agnostic boosting, shedding new light on past results. Finally, we transfer our insights back to the context of algorithmic fairness by providing omnipredictors for multi-group loss minimization.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Machine learning approaches
  • Loss-minimzation
  • multi-group fairness
  • agnostic learning
  • boosting


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. N Barda, G Yona, GN Rothblum, P Greenland, M Leibowitz, R Balicer, E Bachmat, and N. Dagan. Addressing bias in prediction models by improving subpopulation calibration. J Am Med Inform Assoc., 28(3):549-558, 2021. Google Scholar
  2. Noam Barda, Dan Riesel, Amichay Akriv, Joseph Levy, Uriah Finkel, Gal Yona, Daniel Greenfeld, Shimon Sheiba, Jonathan Somer, Eitan Bachmat, Guy N. Rothblum, Uri Shalit, Doron Netzer, Ran Balicer, and Noa Dagan. Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nat Commun, 11, 2020. Google Scholar
  3. Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006. Google Scholar
  4. Shai Ben-David, Philip M. Long, and Yishay Mansour. Agnostic boosting. In 14th Annual Conference on Computational Learning Theory, COLT, 2001. Google Scholar
  5. Avrim Blum and Thodoris Lykouris. Advancing Subgroup Fairness via Sleeping Experts. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151, pages 55:1-55:24, 2020. Google Scholar
  6. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
  7. A. P. Dawid. Objective probability forecasts. University College London, Dept. of Statistical Science. Research Report 14, 1982. Google Scholar
  8. Krzysztof Dembczyński, Wojciech Kotłowski, Oluwasanmi Koyejo, and Nagarajan Natarajan. Consistency analysis for binary classification revisited. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 961-969. PMLR, 06-11 Aug 2017. URL:
  9. Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Outcome indistinguishability. In ACM Symposium on Theory of Computing (STOC'21), 2021. URL:
  10. Vitaly Feldman. Distribution-specific agnostic boosting. arXiv preprint arXiv:0909.2927, 2009. Google Scholar
  11. Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119-139, 1997. Google Scholar
  12. Sumegha Garg, Michael P. Kim, and Omer Reingold. Tracking and improving information in the service of fairness. In Proceedings of the 2019 ACM Conference on Economics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019, pages 809-824, 2019. Google Scholar
  13. Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors, 2021. URL:
  14. Parikshit Gopalan, Omer Reingold, Vatsal Sharan, and Udi Wieder. Multicalibrated partitions for importance weights. arXiv preprint arXiv:2103.05853, 2021. Google Scholar
  15. Bruce E Hansen. Autoregressive conditional density estimation. International Economic Review, pages 705-730, 1994. Google Scholar
  16. Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018. Google Scholar
  17. Christopher Jung, Changhwa Lee, Mallesh M. Pai, Aaron Roth, and Rakesh Vohra. Moment multicalibration for uncertainty estimation. CoRR, abs/2008.08037, 2020. URL:
  18. Adam Kalai. Learning monotonic linear functions. In John Shawe-Taylor and Yoram Singer, editors, Learning Theory, 17th Annual Conference on Learning Theory, COLT 2004, volume 3120 of Lecture Notes in Computer Science, pages 487-501. Springer, 2004. URL:
  19. Adam Kalai and Varun Kanade. Potential-based agnostic boosting. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL:
  20. Adam Tauman Kalai. Learning nested halfspaces and uphill decision trees. In Learning Theory, 20th Annual Conference on Learning Theory, COLT 2007, volume 4539 of Lecture Notes in Computer Science, pages 378-392. Springer, 2007. URL:
  21. Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777-1805, 2008. URL:
  22. Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pages 629-638. ACM, 2008. URL:
  23. Adam Tauman Kalai and Rocco A Servedio. Boosting in the presence of noise. Journal of Computer and System Sciences, 71(3):266-290, 2005. Google Scholar
  24. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564-2572, 2018. Google Scholar
  25. Michael J. Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. J. Comput. Syst. Sci., 58(1):109-128, 1999. URL:
  26. Michael P. Kim. A complexity-theoretic perspective on fairness. PhD thesis, Stanford University, 2020. Google Scholar
  27. Michael P. Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247-254, 2019. Google Scholar
  28. Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS, 2017. Google Scholar
  29. Gábor Lugosi, Nicolas Vayatis, et al. On the bayes-risk consistency of regularized boosting methods. The Annals of statistics, 32(1):30-55, 2004. Google Scholar
  30. Yishay Mansour and David McAllester. Boosting using branching programs. Journal of Computer and System Sciences, 64(1):103-112, 2002. Google Scholar
  31. Nagarajan Natarajan, Oluwasanmi Koyejo, Pradeep Ravikumar, and Inderjit S. Dhillon. Optimal decision-theoretic classification using non-decomposable performance metrics. CoRR, abs/1505.01802, 2015. URL:
  32. John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In ADVANCES IN LARGE MARGIN CLASSIFIERS, pages 61-74. MIT Press, 1999. Google Scholar
  33. Guy N. Rothblum and Gal Yona. Multi-group agnostic PAC learnability. CoRR, abs/2105.09989, 2021. URL:
  34. Robert E. Schapire and Yoav Freund. Boosting: Foundations and Algorithms. MIT Press, 2012. Google Scholar
  35. Mark J. Schervish. A general method for comparing probability assessors. The Annals of Statistics, 17(4):1856-1879, 1989. Google Scholar
  36. Clayton Scott. Calibrated asymmetric surrogate losses. Electronic Journal of Statistics, 6:958-992, 2012. Google Scholar
  37. Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, 2014. Google Scholar
  38. Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE transactions on information theory, 51(1):128-142, 2005. Google Scholar
  39. Ambuj Tewari and Peter L Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8(5), 2007. Google Scholar
  40. Bianca Zadrozny and Charles Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 609-616. Morgan Kaufmann, 2001. Google Scholar