Omnipredictors

Gopalan, Parikshit; Kalai, Adam Tauman; Reingold, Omer; Sharan, Vatsal; Wieder, Udi

doi:10.4230/LIPIcs.ITCS.2022.79

Abstract

Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?"). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be known at the time of learning, requiring the algorithm to use a best-guess loss function. Alternatively, the same classifier may be used to inform multiple decisions, which correspond to multiple loss functions, requiring multiple learning algorithms to be run on the same data. We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action. 
We introduce the notion of an (L,𝒞)-omnipredictor, which could be used to optimize any loss in a family L. Once the loss function is set, the outputs of the predictor can be post-processed (a simple univariate data-independent transformation of individual predictions) to do well compared with any hypothesis from the class C. The post processing is essentially what one would perform if the outputs of the predictor were true probabilities of the uncertain events. In a sense, omnipredictors extract all the predictive power from the class 𝒞, irrespective of the loss function in L. 
We show that such "loss-oblivious" learning is feasible through a connection to multicalibration, a notion introduced in the context of algorithmic fairness. A multicalibrated predictor doesn’t aim to minimize some loss function, but rather to make calibrated predictions, even when conditioned on inputs lying in certain sets c belonging to a family 𝒞 which is weakly learnable. We show that a 𝒞-multicalibrated predictor is also an (L,𝒞)-omnipredictor, where L contains all convex loss functions with some mild Lipschitz conditions. The predictors are even omnipredictors with respect to sparse linear combinations of functions in 𝒞. As a corollary, we deduce that distribution-specific weak agnostic learning is complete for a large class of loss minimization tasks.
In addition, we show how multicalibration can be viewed as a solution concept for agnostic boosting, shedding new light on past results. Finally, we transfer our insights back to the context of algorithmic fairness by providing omnipredictors for multi-group loss minimization.

N Barda, G Yona, GN Rothblum, P Greenland, M Leibowitz, R Balicer, E Bachmat, and N. Dagan. Addressing bias in prediction models by improving subpopulation calibration. J Am Med Inform Assoc., 28(3):549-558, 2021.
Noam Barda, Dan Riesel, Amichay Akriv, Joseph Levy, Uriah Finkel, Gal Yona, Daniel Greenfeld, Shimon Sheiba, Jonathan Somer, Eitan Bachmat, Guy N. Rothblum, Uri Shalit, Doron Netzer, Ran Balicer, and Noa Dagan. Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nat Commun, 11, 2020.
Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006.
Shai Ben-David, Philip M. Long, and Yishay Mansour. Agnostic boosting. In 14th Annual Conference on Computational Learning Theory, COLT, 2001.
Avrim Blum and Thodoris Lykouris. Advancing Subgroup Fairness via Sleeping Experts. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151, pages 55:1-55:24, 2020.
Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
A. P. Dawid. Objective probability forecasts. University College London, Dept. of Statistical Science. Research Report 14, 1982.
Krzysztof Dembczyński, Wojciech Kotłowski, Oluwasanmi Koyejo, and Nagarajan Natarajan. Consistency analysis for binary classification revisited. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 961-969. PMLR, 06-11 Aug 2017. URL: https://proceedings.mlr.press/v70/dembczynski17a.html.
Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Outcome indistinguishability. In ACM Symposium on Theory of Computing (STOC'21), 2021. URL: https://arxiv.org/abs/2011.13426.
Vitaly Feldman. Distribution-specific agnostic boosting. arXiv preprint arXiv:0909.2927, 2009.
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119-139, 1997.
Sumegha Garg, Michael P. Kim, and Omer Reingold. Tracking and improving information in the service of fairness. In Proceedings of the 2019 ACM Conference on Economics and Computation, EC 2019, Phoenix, AZ, USA, June 24-28, 2019, pages 809-824, 2019.
Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors, 2021. URL: http://arxiv.org/abs/2109.05389.
Parikshit Gopalan, Omer Reingold, Vatsal Sharan, and Udi Wieder. Multicalibrated partitions for importance weights. arXiv preprint arXiv:2103.05853, 2021.
Bruce E Hansen. Autoregressive conditional density estimation. International Economic Review, pages 705-730, 1994.
Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018.
Christopher Jung, Changhwa Lee, Mallesh M. Pai, Aaron Roth, and Rakesh Vohra. Moment multicalibration for uncertainty estimation. CoRR, abs/2008.08037, 2020. URL: http://arxiv.org/abs/2008.08037.
Adam Kalai. Learning monotonic linear functions. In John Shawe-Taylor and Yoram Singer, editors, Learning Theory, 17th Annual Conference on Learning Theory, COLT 2004, volume 3120 of Lecture Notes in Computer Science, pages 487-501. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27819-1_34.
Adam Kalai and Varun Kanade. Potential-based agnostic boosting. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL: https://proceedings.neurips.cc/paper/2009/file/13f9896df61279c928f19721878fac41-Paper.pdf.
Adam Tauman Kalai. Learning nested halfspaces and uphill decision trees. In Learning Theory, 20th Annual Conference on Learning Theory, COLT 2007, volume 4539 of Lecture Notes in Computer Science, pages 378-392. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-72927-3_28.
Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777-1805, 2008. URL: https://doi.org/10.1137/060649057.
Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pages 629-638. ACM, 2008. URL: https://doi.org/10.1145/1374376.1374466.
Adam Tauman Kalai and Rocco A Servedio. Boosting in the presence of noise. Journal of Computer and System Sciences, 71(3):266-290, 2005.
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564-2572, 2018.
Michael J. Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. J. Comput. Syst. Sci., 58(1):109-128, 1999. URL: https://doi.org/10.1006/jcss.1997.1543.
Michael P. Kim. A complexity-theoretic perspective on fairness. PhD thesis, Stanford University, 2020.
Michael P. Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247-254, 2019.
Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS, 2017.
Gábor Lugosi, Nicolas Vayatis, et al. On the bayes-risk consistency of regularized boosting methods. The Annals of statistics, 32(1):30-55, 2004.
Yishay Mansour and David McAllester. Boosting using branching programs. Journal of Computer and System Sciences, 64(1):103-112, 2002.
Nagarajan Natarajan, Oluwasanmi Koyejo, Pradeep Ravikumar, and Inderjit S. Dhillon. Optimal decision-theoretic classification using non-decomposable performance metrics. CoRR, abs/1505.01802, 2015. URL: http://arxiv.org/abs/1505.01802.
John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In ADVANCES IN LARGE MARGIN CLASSIFIERS, pages 61-74. MIT Press, 1999.
Guy N. Rothblum and Gal Yona. Multi-group agnostic PAC learnability. CoRR, abs/2105.09989, 2021. URL: http://arxiv.org/abs/2105.09989.
Robert E. Schapire and Yoav Freund. Boosting: Foundations and Algorithms. MIT Press, 2012.
Mark J. Schervish. A general method for comparing probability assessors. The Annals of Statistics, 17(4):1856-1879, 1989.
Clayton Scott. Calibrated asymmetric surrogate losses. Electronic Journal of Statistics, 6:958-992, 2012.
Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, 2014.
Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE transactions on information theory, 51(1):128-142, 2005.
Ambuj Tewari and Peter L Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8(5), 2007.
Bianca Zadrozny and Charles Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 609-616. Morgan Kaufmann, 2001.

Omnipredictors

Authors Parikshit Gopalan , Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Omnipredictors

Authors Parikshit Gopalan , Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message