Symmetric Sparse Boolean Matrix Factorization and Applications

Chen, Sitan; Song, Zhao; Tao, Runzhou; Zhang, Ruizhe

doi:10.4230/LIPIcs.ITCS.2022.46

Abstract

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given {𝐌} ∈ {ℤ}^{m× m}, we want to find {𝐖} ∈ {0,1}^{m× r} such that ‖ {𝐌} - {𝐖} {𝐖}^⊤ ‖₀ is minimized among all {𝐖} for which each row is k-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: {𝐌} = {𝐖} {𝐖}^{⊤} for {𝐖} a random Boolean matrix with k-sparse rows, and the goal is to recover {𝐖} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random k-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about {𝐖} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix {𝐖} has full column rank with high probability as soon as m = Ω̃(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Amir Abboud. Fine-grained reductions and quantum speedups for dynamic programming. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
Amir Abboud and Kevin Lewi. Exact weight subgraphs and the k-sum conjecture. In International Colloquium on Automata, Languages, and Programming (ICALP), 2013.
Amir Abboud, Kevin Lewi, and Ryan Williams. Losing weight by gaining edges. In European Symposium on Algorithms (ESA), 2014.
Alfred V Aho, Jeffrey D Ullman, and Mihalis Yannakakis. On notions of information transfer in vlsi circuits. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 133-139, 1983.
Elad Aigner-Horev and Yury Person. On sparse random combinatorial matrices. arXiv preprint, 2020. URL: http://arxiv.org/abs/2010.07648.
Nima Anari, Constantinos Daskalakis, Wolfgang Maass, Christos Papadimitriou, Amin Saberi, and Santosh Vempala. Smoothed analysis of discrete tensor decomposition and assemblies of neurons. In Advances in Neural Information Processing Systems, volume 31, 2018.
Sanjeev Arora, Rong Ge, Ravindran Kannan, and Ankur Moitra. Computing a nonnegative matrix factorization-provably. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing (STOC), pages 145-162. https://arxiv.org/pdf/1111.0952.pdf, 2012.
Sanjeev Arora, Rong Ge, Sushant Sachdeva, and Grant Schoenebeck. Finding overlapping communities in social networks: toward a rigorous approach. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 37-54, 2012.
Megasthenis Asteris and Alexandros G Dimakis. Repairable fountain codes. IEEE Journal on Selected Areas in Communications, 32(5):1037-1047, 2014.
Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, and David P Woodruff. A ptas for lp-low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 747-766. SIAM, 2019.
Radim Belohlavek and Vilem Vychodil. Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences, 76(1):3-20, 2010.
Arnab Bhattacharyya, Piotr Indyk, David P Woodruff, and Ning Xie. The complexity of linear dependence problems in vector spaces. In Innovations in Computer Science (ICS), pages 496-508, 2011.
Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, and Florian Tramer. An attack on instahide: Is private learning possible with instance encoding? arXiv preprint, 2020. URL: http://arxiv.org/abs/2011.05315.
David Cattanéo and Simon Perdrix. The parameterized complexity of domination-type problems and application to linear codes. In International Conference on Theory and Applications of Models of Computation, pages 86-103. Springer, 2014.
Parinya Chalermsook, Sandy Heydrich, Eugenia Holm, and Andreas Karrenbauer. Nearly tight approximability results for minimum biclique cover and partition. In European Symposium on Algorithms (ESA), pages 235-246. Springer, 2014.
Sunil Chandran, Davis Issac, and Andreas Karrenbauer. On the parameterized complexity of biclique cover and partition. In 11th International Symposium on Parameterized and Exact Computation (IPEC). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Sitan Chen, Xiaoxiao Li, Zhao Song, and Danyang Zhuo. On instahide, phase retrieval, and sparse matrix factorization. In ICLR. arXiv preprint, 2021. URL: http://arxiv.org/abs/2011.11181.
Yanhua Chen, Manjeet Rege, Ming Dong, and Jing Hua. Non-negative matrix factorization for semi-supervised data clustering. Knowledge and Information Systems, 17(3):355-379, 2008.
Jeremy E Cohen and Nicolas Gillis. Nonnegative low-rank sparse component analysis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8226-8230. IEEE, 2019.
Kevin p Costello and Van Vu. On the rank of random sparse matrices. Combinatorics, Probability and Computing, 19(3):321-342, 2010.
Kevin P Costello and Van H Vu. The rank of random graphs. Random Structures & Algorithms, 33(3):269-285, 2008.
Ruairí de Fréin, Konstantinos Drakakis, Scott Rickard, and Andrzej Cichocki. Analysis of financial data using non-negative matrix factorization. international mathematical forum, 3(38):1853-1870, 2008.
Daniele Giorgio Degiorgi and Klaus Simon. A dynamic algorithm for line graph recognition. In International Workshop on Graph-Theoretic Concepts in Computer Science, pages 37-48. Springer, 1995.
Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM international conference on data mining (ICDM), pages 606-610. SIAM, 2005.
Irit Dinur and Pasin Manurangsi. Eth-hardness of approximating 2-csps and directed steiner network. arXiv preprint, 2018. URL: http://arxiv.org/abs/1805.03867.
Radu-Alexandru Dragomir, Jérôme Bolte, and Alexandre d’Aspremont. Fast gradient methods for symmetric nonnegative matrix factorization. arXiv preprint, 2019. URL: http://arxiv.org/abs/1901.10791.
Alessandro Epasto and Eli Upfal. Efficient approximation for restricted biclique cover problems. Algorithms, 11(6):84, 2018.
Paul Erdös. On a lemma of littlewood and offord. Bulletin of the American Mathematical Society, 51(12):898-902, 1945.
Paul Erdős and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17-60, 1960.
Jeff Erickson. Lower bounds for linear satisfiability problems. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 388-395, 1995.
Asaf Ferber, Vishesh Jain, Kyle Luh, and Wojciech Samotij. On the counting problem in inverse littlewood-offord theory. Journal of the London Mathematical Society, 2020.
Asaf Ferber, Matthew Kwan, and Lisa Sauermann. Singularity of sparse random matrices: simple proofs. arXiv preprint, 2020. URL: http://arxiv.org/abs/2011.01291.
Fedor V Fomin, Petr A Golovach, Daniel Lokshtanov, Fahad Panolan, and Saket Saurabh. Approximation schemes for low-rank binary matrix approximation problems. ACM Transactions on Algorithms (TALG), 16(1):1-39, 2019.
Yuan Gao and George Church. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 21(21):3970-3975, 2005.
Nicolas Gillis. Sparse and unique nonnegative matrix factorization through data preprocessing. The Journal of Machine Learning Research, 13(1):3349-3386, 2012.
RA HARSHMAN. Foundations of the parafac procedure: Models and conditions for an" explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16:1-84, 1970.
Zhaoshui He, Shengli Xie, Rafal Zdunek, Guoxu Zhou, and Andrzej Cichocki. Symmetric nonnegative matrix factorization: Algorithms and applications to probabilistic clustering. IEEE Transactions on Neural Networks, 22(12):2117-2131, 2011.
Patrik O Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(Nov):1457-1469, 2004.
Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, and Danyang Zhuo. Instahide’s sample complexity when mixing two private images. arXiv preprint, 2020. URL: http://arxiv.org/abs/2011.11877.
Jiaoyang Huang. Invertibility of adjacency matrices for random d-regular graphs. arXiv preprint, 2018. URL: http://arxiv.org/abs/1807.06465.
Yangsibo Huang, Zhao Song, Kai Li, and Sanjeev Arora. Instahide: Instance-hiding schemes for private distributed learning. In International Conference on Machine Learning (ICML), pages 4507-4518, 2020.
Michael S Jacobson, André E Kézdy, and Jenő Lehel. Recognizing intersection graphs of linear uniform hypergraphs. Graphs and Combinatorics, 13(4):359-367, 1997.
Vishesh Jain. Approximate spielman-teng theorems for the least singular value of random combinatorial matrices. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.10592.
Tomas Juškevičius and Grazvydas Šemetulskis. Optimal littlewood-offord inequalities in groups. arXiv preprint, 2017. URL: http://arxiv.org/abs/1707.01085.
Vassilis Kalofolias and Efstratios Gallopoulos. Computing symmetric nonnegative rank factorizations. Linear algebra and its applications, 436(2):421-435, 2012.
Richard M Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85-103. Springer, 1972.
Jingu Kim and Haesun Park. Sparse nonnegative matrix factorization for clustering. Technical report, Georgia Institute of Technology, 2008.
Da Kuang, Chris Ding, and Haesun Park. Symmetric nonnegative matrix factorization for graph clustering. In Proceedings of the SIAM International Conference on Data Mining (ICDM), pages 106-117. SIAM, 2012.
Richard Kueng and Joel A Tropp. Binary component decomposition part ii: The asymmetric case. arXiv preprint, 2019. URL: http://arxiv.org/abs/1907.13602.
Richard Kueng and Joel A Tropp. Binary component decomposition part i: the positive-semidefinite case. SIAM Journal on Mathematics of Data Science, 3(2):544-572, 2021.
Ravi Kumar, Rina Panigrahy, Ali Rahimi, and David Woodruff. Faster algorithms for binary matrix factorization. In International Conference on Machine Learning (ICML), pages 3551-3559, 2019.
Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791, 1999.
Philippe GH Lehot. An optimal algorithm to detect a line graph and output its root graph. Journal of the ACM (JACM), 21(4):569-575, 1974.
Sue E Leurgans, Robert T Ross, and Rebecca B Abel. A decomposition for three-way arrays. SIAM Journal on Matrix Analysis and Applications, 14(4):1064-1083, 1993.
Dajie Liu, Stojan Trajanovski, and Piet Van Mieghem. Iligra: an efficient inverse line graph algorithm. Journal of Mathematical Modelling and Algorithms in Operations Research, 14(1):13-33, 2015.
L Lovász. Problem, beitrag zur graphentheorie und deren auwendungen, vorgstragen auf dem intern. koll, 1977.
Haibing Lu, Jaideep Vaidya, Vijayalakshmi Atluri, and Yuan Hong. Constraint-aware role mining via extended boolean matrix decomposition. IEEE Transactions on Dependable and Secure Computing, 9(5):655-669, 2012.
Songtao Lu, Mingyi Hong, and Zhengdao Wang. A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality. IEEE Transactions on Signal Processing, 65(12):3120-3135, 2017.
Michael Luby. Lt codes. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 271-271. IEEE Computer Society, 2002.
David JC MacKay. Fountain codes. IEE Proceedings-Communications, 152(6):1062-1068, 2005.
Yury Metelsky and Regina Tyshkevich. On line graphs of linear 3-uniform hypergraphs. Journal of Graph Theory, 25(4):243-251, 1997.
Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, and Heikki Mannila. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering (TKDE), 20(10):1348-1362, 2008.
Pauli Miettinen and Stefan Neumann. Recent developments in boolean matrix factorization. arXiv preprint, 2020. URL: http://arxiv.org/abs/2012.03127.
Barsha Mitra, Shamik Sural, Jaideep Vaidya, and Vijayalakshmi Atluri. A survey of role mining. ACM Computing Surveys (CSUR), 48(4):1-37, 2016.
Ankur Moitra. An almost optimal algorithm for computing nonnegative rank. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1454-1464. SIAM, 2013.
Hoi H Nguyen. On the singularity of random combinatorial matrices. SIAM Journal on Discrete Mathematics, 27(1):447-458, 2013.
Noam Nisan. Lower bounds for non-commutative computation. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 410-418, 1991.
James Orlin. Contentment in graph theory: Covering graphs with cliques. Indagationes Mathematicae (Proceedings), 80(5):406-424, 1977.
Mihai Patrascu. Towards polynomial lower bounds for dynamic problems. In Proceedings of the forty-second ACM symposium on Theory of computing (STOC), pages 603-610, 2010.
Robert Peharz and Franz Pernkopf. Sparse nonnegative matrix factorization with 𝓁₀-constraints. Neurocomputing, 80:38-46, 2012.
Svatopluk Poljak, Vojtěch Rödl, and Daniel Turzik. Complexity of representation of graphs by set systems. Discrete Applied Mathematics, 3(4):301-312, 1981.
Yury Polyanskiy. Hypercontractivity of spherical averages in hamming space. SIAM Journal on Discrete Mathematics, 33(2):731-754, 2019.
Siamak Ravanbakhsh, Barnabás Póczos, and Russell Greiner. Boolean matrix factorization and noisy completion via message passing. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML), pages 945-954, 2016.
Ilya Razenshteyn, Zhao Song, and David P Woodruff. Weighted low rank approximations with provable guarantees. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (STOC), pages 250-263, 2016.
Nicholas D Roussopoulos. A max m , n algorithm for determining the graph h from its line graph g. Information Processing Letters, 2(4):108-112, 1973.
Mark Rudelson and Roman Vershynin. The littlewood-offord problem and invertibility of random matrices. Advances in Mathematics, 218(2):600-633, 2008.
Roman Sandler and Michael Lindenbaum. Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590-1602, 2011.
Mikkel N Schmidt and Rasmus K Olsson. Single-channel speech separation using sparse non-negative matrix factorization. In Ninth International Conference on Spoken Language Processing, 2006.
Jouni K Seppänen, Ella Bingham, and Heikki Mannila. A simple algorithm for topic identification in 0-1 data. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 423-434. Springer, 2003.
Tomáš Šingliar and Miloš Hauskrecht. Noisy-or component analysis and its application to link analysis. Journal of Machine Learning Research (JMLR), 7(Oct):2189-2213, 2006.
PV Skums, SV Suzdal, and RI Tyshkevich. Edge intersection graphs of linear 3-uniform hypergraphs. Electronic Notes in Discrete Mathematics, 22:33-40, 2005.
Paris Smaragdis and Judith C Brown. Non-negative matrix factorization for polyphonic music transcription. In 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684), pages 177-180. IEEE, 2003.
Zhao Song, David P Woodruff, and Peilin Zhong. Low rank approximation with entrywise l1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 688-701, 2017.
Zhao Song, David P Woodruff, and Peilin Zhong. Relative error tensor low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2772-2789. SIAM, 2019.
Maciej M Syslo. A labeling algorithm to recognize a line digraph and output its root graph. Information Processing Letters, 15(1):28-30, 1982.
Jaideep Vaidya, Vijayalakshmi Atluri, and Qi Guo. The role mining problem: finding a minimal descriptive set of roles. In Proceedings of the 12th ACM symposium on Access control models and technologies, pages 175-184, 2007.
Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, and Inderjit Dhillon. Efficient and non-convex coordinate descent for symmetric nonnegative matrix factorization. IEEE Transactions on Signal Processing, 64(21):5571-5584, 2016.
Van Vu. Random discrete matrices. In Horizons of combinatorics, pages 257-280. Springer, 2008.
Fei Wang, Tao Li, Xin Wang, Shenghuo Zhu, and Chris Ding. Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3):493-521, 2011.
Hassler Whitney. Congruent graphs and the connectivity of graphs. In Hassler Whitney Collected Papers, pages 61-79. Springer, 1992.
Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267-273, 2003.
Xiaohui Yan, Jiafeng Guo, Shenghua Liu, Xueqi Cheng, and Yanfeng Wang. Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In proceedings of the SIAM International Conference on Data Mining (ICDM), pages 749-757. SIAM, 2013.
Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, and Erkki Oja. Clustering by nonnegative matrix factorization using graph random walk. Advances in Neural Information Processing Systems (NeurIPS), 25:1079-1087, 2012.
Ron Zass and Amnon Shashua. A unifying approach to hard and probabilistic clustering. In Tenth IEEE International Conference on Computer Vision (ICCV), pages 294-301. IEEE, 2005.
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint, 2017. URL: http://arxiv.org/abs/1710.09412.
Zhong-Yuan Zhang, Yong Wang, and Yong-Yeol Ahn. Overlapping community detection in complex networks using symmetric binary matrix factorization. Physical Review E, 87(6):062803, 2013.
Zhongyuan Zhang, Tao Li, Chris Ding, and Xiangsun Zhang. Binary matrix factorization with applications. In Seventh IEEE international conference on data mining (ICDM), pages 391-400. IEEE, 2007.
Ruicong Zhi, Markus Flierl, Qiuqi Ruan, and W Bastiaan Kleijn. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1):38-52, 2010.
Zhihui Zhu, Xiao Li, Kai Liu, and Qiuwei Li. Dropping symmetry for fast symmetric nonnegative matrix factorization. Advances in Neural Information Processing Systems (NeurIPS), 31:5154-5164, 2018.

Symmetric Sparse Boolean Matrix Factorization and Applications

Authors Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Symmetric Sparse Boolean Matrix Factorization and Applications

Authors Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message