Loss Minimization Yields Multicalibration for Large Neural Networks

Authors Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2024.17.pdf
  • Filesize: 0.78 MB
  • 21 pages

Document Identifiers

Author Details

Jarosław Błasiok
  • ETH Zürich, Switzerland
Parikshit Gopalan
  • Apple, Palo Alto, CA, USA
Lunjia Hu
  • Stanford University, CA, USA
Adam Tauman Kalai
  • Microsoft Research, Cambridge, MA, USA
Preetum Nakkiran
  • Apple, Palo Alto, CA, USA

Acknowledgements

We thank Ryan O'Donnell and Rocco Servedio for suggesting the current proof of Lemma 14. We also thank Barry-John Theobald for comments on an early draft.

Cite AsGet BibTex

Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, and Preetum Nakkiran. Loss Minimization Yields Multicalibration for Large Neural Networks. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 17:1-17:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITCS.2024.17

Abstract

Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions. In this work, we consider the setting where the protected groups can be represented by neural networks of size k, and the predictors are neural networks of size n > k. We show that minimizing the squared loss over all neural nets of size n implies multicalibration for all but a bounded number of unlucky values of n. We also give evidence that our bound on the number of unlucky values is tight, given our proof technique. Previously, results of the flavor that loss minimization yields multicalibration were known only for predictors that were near the ground truth, hence were rather limited in applicability. Unlike these, our results rely on the expressivity of neural nets and utilize the representation of the predictor.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
  • Computing methodologies → Neural networks
Keywords
  • Multi-group fairness
  • loss minimization
  • neural networks

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org, 2019. URL: http://www.fairmlbook.org.
  2. Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint, 2023. URL: https://arxiv.org/abs/2304.01373.
  3. Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran. A unifying theory of distance from calibration. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, pages 1727-1740, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3564246.3585182.
  4. Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, and Preetum Nakkiran. When does optimizing a proper loss yield calibration? arXiv preprint, 2023. URL: https://arxiv.org/abs/2305.18764.
  5. Mathieu Blondel, André F. T. Martins, and Vlad Niculae. Learning with Fenchel-Young losses. Journal of Machine Learning Research, 21(35):1-69, 2020. URL: http://jmlr.org/papers/v21/19-021.html.
  6. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, ..., and Dario Amodei. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877-1901. Curran Associates, Inc., 2020. URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  7. Andreas Buja, Werner Stuetzle, and Yi Shen. Loss functions for binary class probability estimation and classification: Structure and applications. Manuscript, 2005. URL: https://sites.stat.washington.edu/wxs/Learning-papers/paper-proper-scoring.pdf.
  8. Annabelle Carrell, Neil Mallinar, James Lucas, and Preetum Nakkiran. The calibration generalization gap. arXiv preprint, 2022. URL: https://arxiv.org/abs/2210.01964.
  9. Zhun Deng, Cynthia Dwork, and Linjun Zhang. Happymap : A generalized multicalibration method. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 41:1-41:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.41.
  10. Shrey Desai and Greg Durrett. Calibration of pre-trained transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 295-302, 2020. Google Scholar
  11. John Duchi, Khashayar Khosravi, and Feng Ruan. Multiclass classification, information, divergence and surrogate risk. The Annals of Statistics, 46(6B):3246-3275, 2018. URL: https://doi.org/10.1214/17-AOS1657.
  12. Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Outcome indistinguishability. In ACM Symposium on Theory of Computing (STOC'21), 2021. URL: https://arxiv.org/abs/2011.13426.
  13. Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Beyond bernoulli: Generating random outcomes that cannot be distinguished from nature. In The 33rd International Conference on Algorithmic Learning Theory, 2022. Google Scholar
  14. Eldar Fischer, Guy Kindler, Dana Ron, Shmuel Safra, and Alex Samorodnitsky. Testing juntas. Journal of Computer and System Sciences, 68(4):753-787, 2004. Special Issue on FOCS 2002. URL: https://www.sciencedirect.com/science/article/pii/S0022000003001831.
  15. Sumegha Garg, Christopher Jung, Omer Reingold, and Aaron Roth. Oracle efficient online multicalibration and omniprediction. CoRR, abs/2307.08999, 2023. URL: https://doi.org/10.48550/arXiv.2307.08999.
  16. Ira Globus-Harris, Declan Harrison, Michael Kearns, Aaron Roth, and Jessica Sorrell. Multicalibration as boosting for regression. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 11459-11492. PMLR, 23-29 July 2023. URL: https://proceedings.mlr.press/v202/globus-harris23a.html.
  17. Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, and Udi Wieder. Loss Minimization Through the Lens Of Outcome Indistinguishability. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 60:1-60:20, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.60.
  18. Parikshit Gopalan, Adam Tauman Kalai, and Adam R. Klivans. Agnostically learning decision trees. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC '08, pages 527-536, 2008. Google Scholar
  19. Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Innovations in Theoretical Computer Science (ITCS'2022), 2022. URL: https://arxiv.org/abs/2109.05389.
  20. Parikshit Gopalan, Michael Kim, and Omer Reingold. Characterizing notions of omniprediction via multicalibration. In under submission, 2023. URL: https://arxiv.org/abs/2302.06726.
  21. Parikshit Gopalan, Michael P. Kim, Mihir Singhal, and Shengjia Zhao. Low-degree multicalibration. In Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 3193-3234. PMLR, 2022. Google Scholar
  22. Parikshit Gopalan, Omer Reingold, Vatsal Sharan, and Udi Wieder. Multicalibrated partitions for importance weights. In International Conference on Algorithmic Learning Theory, 29-1 April 2022, Paris, France, volume 167 of Proceedings of Machine Learning Research, pages 408-435. PMLR, 2022. Google Scholar
  23. Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018. Google Scholar
  24. Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. Augmix: A simple method to improve robustness and uncertainty under data shift. In International Conference on Learning Representations, 2020. URL: https://openreview.net/forum?id=S1gmrxHFvB.
  25. Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359-366, 1989. URL: https://doi.org/10.1016/0893-6080(89)90020-8.
  26. Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359-366, 1989. Google Scholar
  27. Lunjia Hu, Inbal Rachel Livni Navon, Omer Reingold, and Chutong Yang. Omnipredictors for constrained optimization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 13497-13527. PMLR, 23-29 July 2023. URL: https://proceedings.mlr.press/v202/hu23b.html.
  28. Christopher Jung, Changhwa Lee, Mallesh M Pai, Aaron Roth, and Rakesh Vohra. Moment multicalibration for uncertainty estimation. arXiv preprint, 2020. URL: https://arxiv.org/abs/2008.08037.
  29. Sham M. Kakade and Dean P. Foster. Deterministic calibration and Nash equilibrium. In John Shawe-Taylor and Yoram Singer, editors, Learning Theory, pages 33-48, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. Google Scholar
  30. Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C Mozer, and Becca Roelofs. Soft calibration objectives for neural networks. Advances in Neural Information Processing Systems, 34:29768-29779, 2021. Google Scholar
  31. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint, 2017. URL: https://arxiv.org/abs/1711.05144.
  32. Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247-254, 2019. Google Scholar
  33. Michael P Kim, Christoph Kern, Shafi Goldwasser, Frauke Kreuter, and Omer Reingold. Universal adaptability: Target-independent inference that competes with propensity scoring. Proceedings of the National Academy of Sciences, 119(4), 2022. Google Scholar
  34. Michael P. Kim and Juan C. Perdomo. Making Decisions Under Outcome Performativity. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1-79:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.79.
  35. Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS, 2017. Google Scholar
  36. Lydia T Liu, Max Simchowitz, and Moritz Hardt. The implicit fairness criterion of unconstrained learning. In International Conference on Machine Learning, pages 4051-4060. PMLR, 2019. Google Scholar
  37. Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, and Mario Lucic. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34:15682-15694, 2021. Google Scholar
  38. Elchanan Mossel, Ryan O'Donnell, and Rocco A. Servedio. Learning functions of k relevant variables. Journal of Computer and System Sciences, 69(3):421-434, 2004. Special Issue on STOC 2003. URL: https://doi.org/10.1016/j.jcss.2004.04.002.
  39. Vaishnavh Nagarajan and J Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning. Advances in Neural Information Processing Systems, 32, 2019. Google Scholar
  40. Preetum Nakkiran, Behnam Neyshabur, and Hanie Sedghi. The deep bootstrap framework: Good online learners are good offline generalizers. In International Conference on Learning Representations, 2021. URL: https://openreview.net/forum?id=guetrIHLFGI.
  41. Ryan O'Donnell. Analysis of boolean functions. Cambridge University Press, 2014. Google Scholar
  42. Mark D. Reid and Robert C. Williamson. Composite binary losses. Journal of Machine Learning Research, 11(83):2387-2422, 2010. URL: http://jmlr.org/papers/v11/reid10a.html.
  43. Leonard J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66(336):783-801, 1971. URL: https://doi.org/10.1080/01621459.1971.10482346.
  44. Mark J. Schervish. A general method for comparing probability assessors. The Annals of Statistics, 17(4):1856-1879, 1989. URL: https://doi.org/10.1214/aos/1176347398.
  45. Emir H. Shuford, Arthur Albert, and H. Edward Massengill. Admissible probability measurement procedures. Psychometrika, 31(2):125-145, 1966. URL: https://doi.org/10.1007/BF02289503.
  46. Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385-463, May 2004. URL: https://doi.org/10.1145/990308.990310.
  47. Robert C Titsworth. Correlation properties of cyclic sequences. PhD thesis, California Institute of Technology, 1962. Google Scholar
  48. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107-115, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail