Hardness of Learning Boolean Functions from Label Proportions

Guruswami, Venkatesan; Saket, Rishi

doi:10.4230/LIPIcs.FSTTCS.2023.37

File

LIPIcs.FSTTCS.2023.37.pdf

Filesize: 0.74 MB
15 pages

Document Identifiers

DOI: 10.4230/LIPIcs.FSTTCS.2023.37
URN: urn:nbn:de:0030-drops-194106

Author Details

Venkatesan Guruswami

Department of EECS and Simons Institute for the Theory of Computing, University of California, Berkeley, CA, USA

Rishi Saket

Google Research India, Banglaore, India

Cite AsGet BibTex

Venkatesan Guruswami and Rishi Saket. Hardness of Learning Boolean Functions from Label Proportions. In 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 284, pp. 37:1-37:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.FSTTCS.2023.37

Abstract

In recent years the framework of learning from label proportions (LLP) has been gaining importance in machine learning. In this setting, the training examples are aggregated into subsets or bags and only the average label per bag is available for learning an example-level predictor. This generalizes traditional PAC learning which is the special case of unit-sized bags. The computational learning aspects of LLP were studied in recent works [R. Saket, 2021; R. Saket, 2022] which showed algorithms and hardness for learning halfspaces in the LLP setting. In this work we focus on the intractability of LLP learning Boolean functions. Our first result shows that given a collection of bags of size at most 2 which are consistent with an OR function, it is NP-hard to find a CNF of constantly many clauses which satisfies any constant-fraction of the bags. This is in contrast with the work of [R. Saket, 2021] which gave a (2/5)-approximation for learning ORs using a halfspace. Thus, our result provides a separation between constant clause CNFs and halfspaces as hypotheses for LLP learning ORs. Next, we prove the hardness of satisfying more than 1/2 + o(1) fraction of such bags using a t-DNF (i.e. DNF where each term has ≤ t literals) for any constant t. In usual PAC learning such a hardness was known [S. Khot and R. Saket, 2008] only for learning noisy ORs. We also study the learnability of parities and show that it is NP-hard to satisfy more than (q/2^{q-1} + o(1))-fraction of q-sized bags which are consistent with a parity using a parity, while a random parity based algorithm achieves a (1/2^{q-2})-approximation.

Subject Classification

ACM Subject Classification

Theory of computation → Problems, reductions and completeness

Keywords

Learning from label proportions
Computational learning
Hardness
Boolean functions

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci., 54(2):317-331, 1997.
S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501-555, 1998.
S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70-122, 1998.
D. Barucic and J. Kybic. Fast learning from label proportions with small bags. CoRR, abs/2110.03426, 2021. URL: https://arxiv.org/abs/2110.03426.
G. Bortsova, F. Dubost, S. N. Ørting, I. Katramados, L. Hogeweg, L. H. Thomsen, M. M. W. Wille, and M. de Bruijne. Deep learning from label proportions for emphysema quantification. In MICCAI, volume 11071 of Lecture Notes in Computer Science, pages 768-776. Springer, 2018. URL: https://arxiv.org/abs/1807.08601.
R. I. Busa-Fekete, H. Choi, T. Dick, C. Gentile, and A. M. Medina. Easy learning from label proportions. arXiv, 2023. URL: https://arxiv.org/abs/2302.03115.
L. Chen, T. Fu, A. Karbasi, and V. Mirrokni. Learning from aggregated data: Curated bags versus random bags. arXiv, 2023. URL: https://arxiv.org/abs/2305.09557.
L. Chen, Z. Huang, and R. Ramakrishnan. Cost-based labeling of groups of mass spectra. In Proc. ACM SIGMOD International Conference on Management of Data, pages 167-178, 2004.
L. M. Dery, B. Nachman, F. Rubbo, and A. Schwartzman. Weakly supervised classification in high energy physics. Journal of High Energy Physics, 2017(5):1-11, 2017.
V. Feldman, V. Guruswami, P. Raghavendra, and Y. Wu. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558-1590, 2012.
S. Ghoshal and R. Saket. Hardness of learning DNFs using halfspaces. In Proc. STOC, pages 467-480, 2021.
V. Guruswami, P. Raghavendra, R. Saket, and Y. Wu. Bypassing UGC from some optimal geometric inapproximability results. ACM Trans. Algorithms, 12(1):6:1-6:25, 2016. URL: http://eccc.hpi-web.de/report/2010/177.
J. Håstad. Some optimal inapproximability results. J. ACM, 48(4):798-859, 2001.
J. Hernández-González, I. Inza, L. Crisol-Ortíz, M. A. Guembe, M. J. Iñarra, and J. A. Lozano. Fitting the data from embryo implantation prediction: Learning from label proportions. Statistical methods in medical research, 27(4):1056-1066, 2018.
S. Khot and R. Saket. Hardness of minimizing and learning DNF expressions. In Proc. FOCS, pages 231-240, 2008.
C. O'Brien, A. Thiagarajan, S. Das, R. Barreto, C. Verma, T. Hsu, J. Neufeld, and J. J. Hunt. Challenges and approaches to privacy preserving post-click conversion prediction. CoRR, abs/2201.12666, 2022. URL: https://arxiv.org/abs/2201.12666.
S. N. Ørting, J. Petersen, M. Wille, L. Thomsen, and M. de Bruijne. Quantifying emphysema extent from weakly labeled ct scans of the lungs using label proportions learning. In The Sixth International Workshop on Pulmonary Image Analysis, pages 31-42, 2016.
R. O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
R. Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763-803, 1998.
S. Rueping. SVM classifier estimation from group probabilities. In Proc. ICML, pages 911-918, 2010.
R. Saket. Learnability of linear thresholds from label proportions. In Proc. NeurIPS, 2021. URL: https://openreview.net/forum?id=5BnaKeEwuYk.
R. Saket. Algorithms and hardness for learning linear thresholds from label proportions. In Proc. NeurIPS, 2022. URL: https://openreview.net/forum?id=4LZo68TuF-4.
L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134-1142, 1984.
J. Wojtusiak, K. Irvin, A. Birerdinc, and A. V. Baranova. Using published medical results and non-homogenous data in rule learning. In Proc. International Conference on Machine Learning and Applications and Workshops, volume 2, pages 84-89. IEEE, 2011.
F. X. Yu, K. Choromanski, S. Kumar, T. Jebara, and S. F. Chang. On learning from label proportions. CoRR, abs/1402.5902, 2014. URL: https://arxiv.org/abs/1402.5902.

Hardness of Learning Boolean Functions from Label Proportions

Authors Venkatesan Guruswami, Rishi Saket

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Hardness of Learning Boolean Functions from Label Proportions

Authors Venkatesan Guruswami, Rishi Saket

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message