Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes

Authors Lunjia Hu, Charlotte Peale



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2023.72.pdf
  • Filesize: 1.13 MB
  • 30 pages

Document Identifiers

Author Details

Lunjia Hu
  • Computer Science Department, Stanford University, CA, USA
Charlotte Peale
  • Computer Science Department, Stanford University, CA, USA

Cite AsGet BibTex

Lunjia Hu and Charlotte Peale. Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 72:1-72:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ITCS.2023.72

Abstract

In many learning theory problems, a central role is played by a hypothesis class: we might assume that the data is labeled according to a hypothesis in the class (usually referred to as the realizable setting), or we might evaluate the learned model by comparing it with the best hypothesis in the class (the agnostic setting). Taking a step beyond these classic setups that involve only a single hypothesis class, we study a variety of problems that involve two hypothesis classes simultaneously. We introduce comparative learning as a combination of the realizable and agnostic settings in PAC learning: given two binary hypothesis classes S and B, we assume that the data is labeled according to a hypothesis in the source class S and require the learned model to achieve an accuracy comparable to the best hypothesis in the benchmark class B. Even when both S and B have infinite VC dimensions, comparative learning can still have a small sample complexity. We show that the sample complexity of comparative learning is characterized by the mutual VC dimension VC(S,B) which we define to be the maximum size of a subset shattered by both S and B. We also show a similar result in the online setting, where we give a regret characterization in terms of the analogous mutual Littlestone dimension Ldim(S,B). These results also hold for partial hypotheses. We additionally show that the insights necessary to characterize the sample complexity of comparative learning can be applied to other tasks involving two hypothesis classes. In particular, we characterize the sample complexity of realizable multiaccuracy and multicalibration using the mutual fat-shattering dimension, an analogue of the mutual VC dimension for real-valued hypotheses. This not only solves an open problem proposed by Hu, Peale, Reingold (2022), but also leads to independently interesting results extending classic ones about regression, boosting, and covering number to our two-hypothesis-class setting.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
  • Theory of computation → Sample complexity and generalization bounds
  • Computing methodologies → Learning settings
Keywords
  • Comparative learning
  • mutual VC dimension
  • realizable multiaccuracy and multicalibration
  • sample complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Omri Ben-Eliezer, Yuval Dagan, Shay Moran, Moni Naor, and Eylon Yogev. Adversarial laws of large numbers and optimal regret in online classification. In STOC '21 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 447-455. ACM, New York, 2021. URL: https://doi.org/10.1145/3406325.3451041.
  2. Noga Alon, Nicolò Cesa-Bianchi, Shai Ben-David, and David Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. In 34th Annual Symposium on Foundations of Computer Science (Palo Alto, CA, 1993), pages 292-301. IEEE Comput. Soc. Press, Los Alamitos, CA, 1993. URL: https://doi.org/10.1109/SFCS.1993.366858.
  3. Noga Alon, Steve Hanneke, Ron Holzman, and Shay Moran. A theory of PAC learnability of partial concept classes. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science - FOCS 2021, pages 658-671. IEEE Computer Soc., Los Alamitos, CA, 2022. URL: https://doi.org/10.1109/FOCS52979.2021.00070.
  4. Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. Private PAC learning implies finite Littlestone dimension. In STOC'19 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 852-860. ACM, New York, 2019. URL: https://doi.org/10.1145/3313276.3316312.
  5. S. Artstein, V. Milman, S. Szarek, and N. Tomczak-Jaegermann. On convexified packing and entropy duality. Geom. Funct. Anal., 14(5):1134-1141, 2004. URL: https://doi.org/10.1007/s00039-004-0486-3.
  6. S. Artstein, V. Milman, and S. J. Szarek. Duality of metric entropy. Ann. of Math. (2), 159(3):1313-1328, 2004. URL: https://doi.org/10.4007/annals.2004.159.1313.
  7. Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning. J. Comput. System Sci., 75(1):78-89, 2009. URL: https://doi.org/10.1016/j.jcss.2008.07.003.
  8. Maria-Florina Balcan, Steve Hanneke, and Jennifer Wortman Vaughan. The true sample complexity of active learning. Mach. Learn., 80(2-3):111-139, 2010. URL: https://doi.org/10.1007/s10994-010-5174-y.
  9. Peter L Bartlett and Philip M Long. More theorems about scale-sensitive dimensions and learning. In Proceedings of the eighth annual conference on Computational learning theory, pages 392-401, 1995. Google Scholar
  10. Peter L. Bartlett and Philip M. Long. Prediction, learning, uniform convergence, and scale-sensitive dimensions. In Eighth Annual Workshop on Computational Learning Theory (COLT) (Santa Cruz, CA, 1995), volume 56, pages 174-190. Journal of Computer and System Sciences, 1998. URL: https://doi.org/10.1006/jcss.1997.1557.
  11. Peter L. Bartlett, Philip M. Long, and Robert C. Williamson. Fat-shattering and the learnability of real-valued functions. In Seventh Annual Workshop on Computational Learning Theory (COLT) (New Brunswick, NJ, 1994), volume 52, pages 434-452. Journal of Computer and System Sciences, 1996. URL: https://doi.org/10.1006/jcss.1996.0033.
  12. Shai Ben-David, Nicolò Cesa-Bianchi, David Haussler, and Philip M. Long. Characterizations of learnability for classes of 0,⋯,n-valued functions. J. Comput. System Sci., 50(1):74-86, 1995. URL: https://doi.org/10.1006/jcss.1995.1008.
  13. Shai Ben-David, Dávid Pál, and Shai Shalev-Shwartz. Agnostic online learning. In COLT, volume 3, page 1, 2009. Google Scholar
  14. Eric Blais, Renato Ferreira Pinto, Jr., and Nathaniel Harms. VC dimension and distribution-free sample-based testing. In STOC '21 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 504-517. ACM, New York, 2021. URL: https://doi.org/10.1145/3406325.3451104.
  15. Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 253-262, 1994. Google Scholar
  16. Avrim Blum and Thodoris Lykouris. Advancing subgroup fairness via sleeping experts. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12-14, 2020, Seattle, Washington, USA, volume 151 of LIPIcs, pages 55:1-55:24. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ITCS.2020.55.
  17. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. J. Assoc. Comput. Mach., 36(4):929-965, 1989. URL: https://doi.org/10.1145/76359.76371.
  18. J. Bourgain, A. Pajor, S. J. Szarek, and N. Tomczak-Jaegermann. On the duality problem for entropy numbers of operators. In Geometric aspects of functional analysis (1987-88), volume 1376 of Lecture Notes in Math., pages 50-63. Springer, Berlin, 1989. URL: https://doi.org/10.1007/BFb0090048.
  19. Nataly Brukhim, Daniel Carmon, Irit Dinur, Shay Moran, and Amir Yehudayoff. A characterization of multiclass learnability. arXiv preprint, 2022. URL: http://arxiv.org/abs/2203.01550.
  20. Mark Bun, Roi Livni, and Shay Moran. An equivalence between private classification and online prediction. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, pages 389-402. IEEE Computer Soc., Los Alamitos, CA, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00044.
  21. Ofir David, Shay Moran, and Amir Yehudayoff. Supervised learning through the lens of compression. Advances in Neural Information Processing Systems, 29, 2016. Google Scholar
  22. Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, Aaron Roth, and Saeed Sharifi-Malvajerdi. Multiaccurate proxies for downstream fairness. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, pages 1207-1239, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3531146.3533180.
  23. Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. Outcome indistinguishability. In STOC '21 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1095-1108. ACM, New York, 2021. URL: https://doi.org/10.1145/3406325.3451064.
  24. Cynthia Dwork, Michael P Kim, Omer Reingold, Guy N Rothblum, and Gal Yona. Beyond bernoulli: Generating random outcomes that cannot be distinguished from nature. In International Conference on Algorithmic Learning Theory, pages 342-380. PMLR, 2022. Google Scholar
  25. Vitaly Feldman. Distribution-specific agnostic boosting. In Andrew Chi-Chih Yao, editor, Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pages 241-250. Tsinghua University Press, 2010. URL: http://conference.iiis.tsinghua.edu.cn/ICS2010/content/papers/20.html.
  26. Yuval Filmus, Steve Hanneke, Idan Mehalel, and Shay Moran. Optimal prediction using expert advice and randomized Littlestone dimension, 2022. Google Scholar
  27. Badih Ghazi, Noah Golowich, Ravi Kumar, and Pasin Manurangsi. Sample-efficient proper PAC learning with approximate differential privacy. In STOC '21 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 183-196. ACM, New York, 2021. URL: https://doi.org/10.1145/3406325.3451028.
  28. Ira Globus-Harris, Varun Gupta, Christopher Jung, Michael Kearns, Jamie Morgenstern, and Aaron Roth. Multicalibrated regression for downstream fairness. arXiv preprint, 2022. URL: http://arxiv.org/abs/2209.07312.
  29. Ira Globus-Harris, Michael Kearns, and Aaron Roth. An algorithmic framework for bias bounties. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, pages 1106-1124, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3531146.3533172.
  30. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996), pages 339-348. IEEE Comput. Soc. Press, Los Alamitos, CA, 1996. URL: https://doi.org/10.1109/SFCS.1996.548493.
  31. Noah Golowich. Differentially private nonparametric regression under a growth condition. In Conference on Learning Theory, pages 2149-2192. PMLR, 2021. Google Scholar
  32. Alon Gonen, Shachar Lovett, and Michal Moshkovitz. Towards a combinatorial characterization of bounded-memory learning. Advances in Neural Information Processing Systems, 33:9028-9038, 2020. Google Scholar
  33. Parikshit Gopalan, Lunjia Hu, Michael P Kim, Omer Reingold, and Udi Wieder. Loss minimization through the lens of outcome indistinguishability. arXiv preprint, 2022. URL: http://arxiv.org/abs/2210.08649.
  34. Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, and Udi Wieder. Omnipredictors. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference, ITCS 2022, January 31 - February 3, 2022, Berkeley, CA, USA, volume 215 of LIPIcs, pages 79:1-79:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ITCS.2022.79.
  35. Parikshit Gopalan, Michael P Kim, Mihir A Singhal, and Shengjia Zhao. Low-degree multicalibration. In Conference on Learning Theory, pages 3193-3234. PMLR, 2022. Google Scholar
  36. Parikshit Gopalan, Omer Reingold, Vatsal Sharan, and Udi Wieder. Multicalibrated partitions for importance weights. In International Conference on Algorithmic Learning Theory, pages 408-435. PMLR, 2022. Google Scholar
  37. Thore Graepel, Ralf Herbrich, and John Shawe-Taylor. Pac-bayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59(1):55-76, 2005. Google Scholar
  38. Nika Haghtalab, Tim Roughgarden, and Abhishek Shetty. Smoothed analysis of online and differentially private learning. Advances in Neural Information Processing Systems, 33:9203-9215, 2020. Google Scholar
  39. Nika Haghtalab, Tim Roughgarden, and Abhishek Shetty. Smoothed analysis with adaptive adversaries. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science - FOCS 2021, pages 942-953. IEEE Computer Soc., Los Alamitos, CA, 2022. URL: https://doi.org/10.1109/FOCS52979.2021.00095.
  40. D. Haussler, N. Littlestone, and M. K. Warmuth. Predicting 0,1-functions on randomly drawn points (extended abstracts). In Proceedings of the 1988 Workshop on Computational Learning Theory (Cambridge, MA, 1988), pages 280-296. Morgan Kaufmann, San Mateo, CA, 1989. URL: https://doi.org/10.1016/0315-0860(89)90031-1.
  41. Ursula Hébert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, pages 1939-1948. PMLR, 2018. Google Scholar
  42. Max Hopkins, Daniel Kane, and Shachar Lovett. The power of comparisons for actively learning linear classifiers. Advances in Neural Information Processing Systems, 33:6342-6353, 2020. Google Scholar
  43. Max Hopkins, Daniel Kane, Shachar Lovett, and Gaurav Mahajan. Noise-tolerant, reliable active classification with comparison queries. In Conference on Learning Theory, pages 1957-2006. PMLR, 2020. Google Scholar
  44. Max Hopkins, Daniel Kane, Shachar Lovett, and Gaurav Mahajan. Point location and active learning: learning halfspaces almost optimally. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, pages 1034-1044. IEEE Computer Soc., Los Alamitos, CA, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00100.
  45. Max Hopkins, Daniel M Kane, Shachar Lovett, and Gaurav Mahajan. Realizable learning is all you need. In Conference on Learning Theory, pages 3015-3069. PMLR, 2022. Google Scholar
  46. Lunjia Hu, Inbal Livni-Navon, Omer Reingold, and Chutong Yang. Omnipredictors for constrained optimization. arXiv preprint, 2022. URL: http://arxiv.org/abs/2209.07463.
  47. Lunjia Hu and Charlotte Peale. Comparative learning: A sample complexity theory for two hypothesis classes. arXiv preprint, 2022. URL: http://arxiv.org/abs/2211.09101.
  48. Lunjia Hu, Charlotte Peale, and Omer Reingold. Metric entropy duality and the sample complexity of outcome indistinguishability. In International Conference on Algorithmic Learning Theory, pages 515-552. PMLR, 2022. Google Scholar
  49. Christopher Jung, Changhwa Lee, Mallesh Pai, Aaron Roth, and Rakesh Vohra. Moment multicalibration for uncertainty estimation. In Conference on Learning Theory, pages 2634-2678. PMLR, 2021. Google Scholar
  50. Young Jung, Baekjin Kim, and Ambuj Tewari. On the equivalence between online and private learnability beyond binary classification. Advances in Neural Information Processing Systems, 33:16701-16710, 2020. Google Scholar
  51. Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In STOC'08, pages 629-638. ACM, New York, 2008. URL: https://doi.org/10.1145/1374376.1374466.
  52. Daniel M. Kane, Shachar Lovett, Shay Moran, and Jiapeng Zhang. Active classification with comparison queries. In 58th Annual IEEE Symposium on Foundations of Computer Science - FOCS 2017, pages 355-366. IEEE Computer Soc., Los Alamitos, CA, 2017. URL: https://doi.org/10.1109/FOCS.2017.40.
  53. Michael Kearns. Efficient noise-tolerant learning from statistical queries. In Proceedings of the twenty-fifth annual ACM symposium on Theory of Computing, pages 392-401, 1993. Google Scholar
  54. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564-2572. PMLR, 2018. Google Scholar
  55. Michael Kearns and Dana Ron. Testing problems with sublearning sample complexity. J. Comput. System Sci., 61(3):428-456, 2000. URL: https://doi.org/10.1006/jcss.1999.1656.
  56. Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of probabilistic concepts. In 31st Annual Symposium on Foundations of Computer Science (FOCS) (St. Louis, MO, 1990), volume 48, pages 464-497. Journal of Computer and System Sciences, 1994. URL: https://doi.org/10.1016/S0022-0000(05)80062-5.
  57. Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247-254, 2019. Google Scholar
  58. Michael P Kim, Christoph Kern, Shafi Goldwasser, Frauke Kreuter, and Omer Reingold. Universal adaptability: Target-independent inference that competes with propensity scoring. Proceedings of the National Academy of Sciences, 119(4):e2108097119, 2022. Google Scholar
  59. J. Kivinen. Learning reliably and with one-sided error. Math. Systems Theory, 28(2):141-172, 1995. URL: https://doi.org/10.1007/BF01191474.
  60. Jyrki Kivinen. Reliable and useful learning. In Proceedings of the Second Annual Workshop on Computational Learning Theory (Santa Cruz, CA, 1989), pages 365-380. Morgan Kaufmann, San Mateo, CA, 1989. Google Scholar
  61. Jyrki Kivinen. Reliable and useful learning with uniform probability distributions. In ALT, pages 209-222, 1990. Google Scholar
  62. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, CA, 2000), pages 309-318. ACM, New York, 2000. Google Scholar
  63. Nathan Linial, Yishay Mansour, and Ronald L. Rivest. Results on learnability and the Vapnik-Chervonenkis dimension. Inform. and Comput., 90(1):33-49, 1991. URL: https://doi.org/10.1016/0890-5401(91)90058-A.
  64. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2(4):285-318, 1988. Google Scholar
  65. Nick Littlestone and Manfred Warmuth. Relating data compression and learnability, 1986. Google Scholar
  66. Philip M. Long. On agnostic learning with 0,*,1-valued and real-valued hypotheses. In Computational learning theory (Amsterdam, 2001), volume 2111 of Lecture Notes in Comput. Sci., pages 289-302. Springer, Berlin, 2001. URL: https://doi.org/10.1007/3-540-44581-1_19.
  67. Emanuel Milman. A remark on two duality relations. Integral Equations Operator Theory, 57(2):217-228, 2007. URL: https://doi.org/10.1007/s00020-006-1479-4.
  68. Albrecht Pietsch. Theorie der Operatorenideale (Zusammenfassung). Wissenschaftliche Beiträge der Friedrich-Schiller-Universität Jena. Friedrich-Schiller-Universität, Jena, 1972. Google Scholar
  69. Ronald L. Rivest and Robert Sloan. Learning complicated concepts reliably and usefully (extended abstract). In Proceedings of the 1988 Workshop on Computational Learning Theory (Cambridge, MA, 1988), pages 69-79. Morgan Kaufmann, San Mateo, CA, 1989. Google Scholar
  70. Harrison Rosenberg, Robi Bhattacharjee, Kassem Fawaz, and Somesh Jha. An exploration of multicalibration uniform convergence bounds. arXiv preprint, 2022. URL: http://arxiv.org/abs/2202.04530.
  71. Guy N Rothblum and Gal Yona. Multi-group agnostic PAC learnability. In International Conference on Machine Learning, pages 9107-9115. PMLR, 2021. Google Scholar
  72. Eliran Shabat, Lee Cohen, and Yishay Mansour. Sample complexity of uniform convergence for multicalibration. Advances in Neural Information Processing Systems, 33:13331-13340, 2020. Google Scholar
  73. Satchit Sivakumar, Mark Bun, and Marco Gaboardi. Multiclass versus binary differentially private pac learning. Advances in Neural Information Processing Systems, 34:22943-22954, 2021. Google Scholar
  74. Christopher J Tosh and Daniel Hsu. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In International Conference on Machine Learning, pages 21633-21657. PMLR, 2022. Google Scholar
  75. Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984. Google Scholar
  76. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16:264-280, 1971. Google Scholar
  77. Shengjia Zhao, Michael Kim, Roshni Sahoo, Tengyu Ma, and Stefano Ermon. Calibrating predictions to decisions: A novel approach to multi-class calibration. Advances in Neural Information Processing Systems, 34:22313-22324, 2021. Google Scholar