Symmetric Sparse Boolean Matrix Factorization and Applications

Authors Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang

Thumbnail PDF


  • Filesize: 0.89 MB
  • 25 pages

Document Identifiers

Author Details

Sitan Chen
  • University of California, Berkeley, CA, USA
Zhao Song
  • Adobe Research, Seattle, WA, USA
Runzhou Tao
  • Columbia University, New York, NY, USA
Ruizhe Zhang
  • University of Texas at Austin, TX, USA

Cite AsGet BibTex

Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang. Symmetric Sparse Boolean Matrix Factorization and Applications. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 46:1-46:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given {𝐌} ∈ {ℤ}^{m× m}, we want to find {𝐖} ∈ {0,1}^{m× r} such that ‖ {𝐌} - {𝐖} {𝐖}^⊤ ‖₀ is minimized among all {𝐖} for which each row is k-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: {𝐌} = {𝐖} {𝐖}^{⊤} for {𝐖} a random Boolean matrix with k-sparse rows, and the goal is to recover {𝐖} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random k-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about {𝐖} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix {𝐖} has full column rank with high probability as soon as m = Ω̃(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Matrix factorization
  • tensors
  • random matrices
  • average-case complexity


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Amir Abboud. Fine-grained reductions and quantum speedups for dynamic programming. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  2. Amir Abboud and Kevin Lewi. Exact weight subgraphs and the k-sum conjecture. In International Colloquium on Automata, Languages, and Programming (ICALP), 2013. Google Scholar
  3. Amir Abboud, Kevin Lewi, and Ryan Williams. Losing weight by gaining edges. In European Symposium on Algorithms (ESA), 2014. Google Scholar
  4. Alfred V Aho, Jeffrey D Ullman, and Mihalis Yannakakis. On notions of information transfer in vlsi circuits. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 133-139, 1983. Google Scholar
  5. Elad Aigner-Horev and Yury Person. On sparse random combinatorial matrices. arXiv preprint, 2020. URL:
  6. Nima Anari, Constantinos Daskalakis, Wolfgang Maass, Christos Papadimitriou, Amin Saberi, and Santosh Vempala. Smoothed analysis of discrete tensor decomposition and assemblies of neurons. In Advances in Neural Information Processing Systems, volume 31, 2018. Google Scholar
  7. Sanjeev Arora, Rong Ge, Ravindran Kannan, and Ankur Moitra. Computing a nonnegative matrix factorization-provably. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing (STOC), pages 145-162., 2012.
  8. Sanjeev Arora, Rong Ge, Sushant Sachdeva, and Grant Schoenebeck. Finding overlapping communities in social networks: toward a rigorous approach. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 37-54, 2012. Google Scholar
  9. Megasthenis Asteris and Alexandros G Dimakis. Repairable fountain codes. IEEE Journal on Selected Areas in Communications, 32(5):1037-1047, 2014. Google Scholar
  10. Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, and David P Woodruff. A ptas for lp-low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 747-766. SIAM, 2019. Google Scholar
  11. Radim Belohlavek and Vilem Vychodil. Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences, 76(1):3-20, 2010. Google Scholar
  12. Arnab Bhattacharyya, Piotr Indyk, David P Woodruff, and Ning Xie. The complexity of linear dependence problems in vector spaces. In Innovations in Computer Science (ICS), pages 496-508, 2011. Google Scholar
  13. Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, and Florian Tramer. An attack on instahide: Is private learning possible with instance encoding? arXiv preprint, 2020. URL:
  14. David Cattanéo and Simon Perdrix. The parameterized complexity of domination-type problems and application to linear codes. In International Conference on Theory and Applications of Models of Computation, pages 86-103. Springer, 2014. Google Scholar
  15. Parinya Chalermsook, Sandy Heydrich, Eugenia Holm, and Andreas Karrenbauer. Nearly tight approximability results for minimum biclique cover and partition. In European Symposium on Algorithms (ESA), pages 235-246. Springer, 2014. Google Scholar
  16. Sunil Chandran, Davis Issac, and Andreas Karrenbauer. On the parameterized complexity of biclique cover and partition. In 11th International Symposium on Parameterized and Exact Computation (IPEC). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  17. Sitan Chen, Xiaoxiao Li, Zhao Song, and Danyang Zhuo. On instahide, phase retrieval, and sparse matrix factorization. In ICLR. arXiv preprint, 2021. URL:
  18. Yanhua Chen, Manjeet Rege, Ming Dong, and Jing Hua. Non-negative matrix factorization for semi-supervised data clustering. Knowledge and Information Systems, 17(3):355-379, 2008. Google Scholar
  19. Jeremy E Cohen and Nicolas Gillis. Nonnegative low-rank sparse component analysis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8226-8230. IEEE, 2019. Google Scholar
  20. Kevin p Costello and Van Vu. On the rank of random sparse matrices. Combinatorics, Probability and Computing, 19(3):321-342, 2010. Google Scholar
  21. Kevin P Costello and Van H Vu. The rank of random graphs. Random Structures & Algorithms, 33(3):269-285, 2008. Google Scholar
  22. Ruairí de Fréin, Konstantinos Drakakis, Scott Rickard, and Andrzej Cichocki. Analysis of financial data using non-negative matrix factorization. international mathematical forum, 3(38):1853-1870, 2008. Google Scholar
  23. Daniele Giorgio Degiorgi and Klaus Simon. A dynamic algorithm for line graph recognition. In International Workshop on Graph-Theoretic Concepts in Computer Science, pages 37-48. Springer, 1995. Google Scholar
  24. Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM international conference on data mining (ICDM), pages 606-610. SIAM, 2005. Google Scholar
  25. Irit Dinur and Pasin Manurangsi. Eth-hardness of approximating 2-csps and directed steiner network. arXiv preprint, 2018. URL:
  26. Radu-Alexandru Dragomir, Jérôme Bolte, and Alexandre d’Aspremont. Fast gradient methods for symmetric nonnegative matrix factorization. arXiv preprint, 2019. URL:
  27. Alessandro Epasto and Eli Upfal. Efficient approximation for restricted biclique cover problems. Algorithms, 11(6):84, 2018. Google Scholar
  28. Paul Erdös. On a lemma of littlewood and offord. Bulletin of the American Mathematical Society, 51(12):898-902, 1945. Google Scholar
  29. Paul Erdős and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17-60, 1960. Google Scholar
  30. Jeff Erickson. Lower bounds for linear satisfiability problems. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 388-395, 1995. Google Scholar
  31. Asaf Ferber, Vishesh Jain, Kyle Luh, and Wojciech Samotij. On the counting problem in inverse littlewood-offord theory. Journal of the London Mathematical Society, 2020. Google Scholar
  32. Asaf Ferber, Matthew Kwan, and Lisa Sauermann. Singularity of sparse random matrices: simple proofs. arXiv preprint, 2020. URL:
  33. Fedor V Fomin, Petr A Golovach, Daniel Lokshtanov, Fahad Panolan, and Saket Saurabh. Approximation schemes for low-rank binary matrix approximation problems. ACM Transactions on Algorithms (TALG), 16(1):1-39, 2019. Google Scholar
  34. Yuan Gao and George Church. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 21(21):3970-3975, 2005. Google Scholar
  35. Nicolas Gillis. Sparse and unique nonnegative matrix factorization through data preprocessing. The Journal of Machine Learning Research, 13(1):3349-3386, 2012. Google Scholar
  36. RA HARSHMAN. Foundations of the parafac procedure: Models and conditions for an" explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16:1-84, 1970. Google Scholar
  37. Zhaoshui He, Shengli Xie, Rafal Zdunek, Guoxu Zhou, and Andrzej Cichocki. Symmetric nonnegative matrix factorization: Algorithms and applications to probabilistic clustering. IEEE Transactions on Neural Networks, 22(12):2117-2131, 2011. Google Scholar
  38. Patrik O Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(Nov):1457-1469, 2004. Google Scholar
  39. Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, and Danyang Zhuo. Instahide’s sample complexity when mixing two private images. arXiv preprint, 2020. URL:
  40. Jiaoyang Huang. Invertibility of adjacency matrices for random d-regular graphs. arXiv preprint, 2018. URL:
  41. Yangsibo Huang, Zhao Song, Kai Li, and Sanjeev Arora. Instahide: Instance-hiding schemes for private distributed learning. In International Conference on Machine Learning (ICML), pages 4507-4518, 2020. Google Scholar
  42. Michael S Jacobson, André E Kézdy, and Jenő Lehel. Recognizing intersection graphs of linear uniform hypergraphs. Graphs and Combinatorics, 13(4):359-367, 1997. Google Scholar
  43. Vishesh Jain. Approximate spielman-teng theorems for the least singular value of random combinatorial matrices. arXiv preprint, 2019. URL:
  44. Tomas Juškevičius and Grazvydas Šemetulskis. Optimal littlewood-offord inequalities in groups. arXiv preprint, 2017. URL:
  45. Vassilis Kalofolias and Efstratios Gallopoulos. Computing symmetric nonnegative rank factorizations. Linear algebra and its applications, 436(2):421-435, 2012. Google Scholar
  46. Richard M Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85-103. Springer, 1972. Google Scholar
  47. Jingu Kim and Haesun Park. Sparse nonnegative matrix factorization for clustering. Technical report, Georgia Institute of Technology, 2008. Google Scholar
  48. Da Kuang, Chris Ding, and Haesun Park. Symmetric nonnegative matrix factorization for graph clustering. In Proceedings of the SIAM International Conference on Data Mining (ICDM), pages 106-117. SIAM, 2012. Google Scholar
  49. Richard Kueng and Joel A Tropp. Binary component decomposition part ii: The asymmetric case. arXiv preprint, 2019. URL:
  50. Richard Kueng and Joel A Tropp. Binary component decomposition part i: the positive-semidefinite case. SIAM Journal on Mathematics of Data Science, 3(2):544-572, 2021. Google Scholar
  51. Ravi Kumar, Rina Panigrahy, Ali Rahimi, and David Woodruff. Faster algorithms for binary matrix factorization. In International Conference on Machine Learning (ICML), pages 3551-3559, 2019. Google Scholar
  52. Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791, 1999. Google Scholar
  53. Philippe GH Lehot. An optimal algorithm to detect a line graph and output its root graph. Journal of the ACM (JACM), 21(4):569-575, 1974. Google Scholar
  54. Sue E Leurgans, Robert T Ross, and Rebecca B Abel. A decomposition for three-way arrays. SIAM Journal on Matrix Analysis and Applications, 14(4):1064-1083, 1993. Google Scholar
  55. Dajie Liu, Stojan Trajanovski, and Piet Van Mieghem. Iligra: an efficient inverse line graph algorithm. Journal of Mathematical Modelling and Algorithms in Operations Research, 14(1):13-33, 2015. Google Scholar
  56. L Lovász. Problem, beitrag zur graphentheorie und deren auwendungen, vorgstragen auf dem intern. koll, 1977. Google Scholar
  57. Haibing Lu, Jaideep Vaidya, Vijayalakshmi Atluri, and Yuan Hong. Constraint-aware role mining via extended boolean matrix decomposition. IEEE Transactions on Dependable and Secure Computing, 9(5):655-669, 2012. Google Scholar
  58. Songtao Lu, Mingyi Hong, and Zhengdao Wang. A nonconvex splitting method for symmetric nonnegative matrix factorization: Convergence analysis and optimality. IEEE Transactions on Signal Processing, 65(12):3120-3135, 2017. Google Scholar
  59. Michael Luby. Lt codes. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 271-271. IEEE Computer Society, 2002. Google Scholar
  60. David JC MacKay. Fountain codes. IEE Proceedings-Communications, 152(6):1062-1068, 2005. Google Scholar
  61. Yury Metelsky and Regina Tyshkevich. On line graphs of linear 3-uniform hypergraphs. Journal of Graph Theory, 25(4):243-251, 1997. Google Scholar
  62. Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das, and Heikki Mannila. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering (TKDE), 20(10):1348-1362, 2008. Google Scholar
  63. Pauli Miettinen and Stefan Neumann. Recent developments in boolean matrix factorization. arXiv preprint, 2020. URL:
  64. Barsha Mitra, Shamik Sural, Jaideep Vaidya, and Vijayalakshmi Atluri. A survey of role mining. ACM Computing Surveys (CSUR), 48(4):1-37, 2016. Google Scholar
  65. Ankur Moitra. An almost optimal algorithm for computing nonnegative rank. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1454-1464. SIAM, 2013. Google Scholar
  66. Hoi H Nguyen. On the singularity of random combinatorial matrices. SIAM Journal on Discrete Mathematics, 27(1):447-458, 2013. Google Scholar
  67. Noam Nisan. Lower bounds for non-commutative computation. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 410-418, 1991. Google Scholar
  68. James Orlin. Contentment in graph theory: Covering graphs with cliques. Indagationes Mathematicae (Proceedings), 80(5):406-424, 1977. Google Scholar
  69. Mihai Patrascu. Towards polynomial lower bounds for dynamic problems. In Proceedings of the forty-second ACM symposium on Theory of computing (STOC), pages 603-610, 2010. Google Scholar
  70. Robert Peharz and Franz Pernkopf. Sparse nonnegative matrix factorization with 𝓁₀-constraints. Neurocomputing, 80:38-46, 2012. Google Scholar
  71. Svatopluk Poljak, Vojtěch Rödl, and Daniel Turzik. Complexity of representation of graphs by set systems. Discrete Applied Mathematics, 3(4):301-312, 1981. Google Scholar
  72. Yury Polyanskiy. Hypercontractivity of spherical averages in hamming space. SIAM Journal on Discrete Mathematics, 33(2):731-754, 2019. Google Scholar
  73. Siamak Ravanbakhsh, Barnabás Póczos, and Russell Greiner. Boolean matrix factorization and noisy completion via message passing. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML), pages 945-954, 2016. Google Scholar
  74. Ilya Razenshteyn, Zhao Song, and David P Woodruff. Weighted low rank approximations with provable guarantees. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (STOC), pages 250-263, 2016. Google Scholar
  75. Nicholas D Roussopoulos. A max m , n algorithm for determining the graph h from its line graph g. Information Processing Letters, 2(4):108-112, 1973. Google Scholar
  76. Mark Rudelson and Roman Vershynin. The littlewood-offord problem and invertibility of random matrices. Advances in Mathematics, 218(2):600-633, 2008. Google Scholar
  77. Roman Sandler and Michael Lindenbaum. Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1590-1602, 2011. Google Scholar
  78. Mikkel N Schmidt and Rasmus K Olsson. Single-channel speech separation using sparse non-negative matrix factorization. In Ninth International Conference on Spoken Language Processing, 2006. Google Scholar
  79. Jouni K Seppänen, Ella Bingham, and Heikki Mannila. A simple algorithm for topic identification in 0-1 data. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 423-434. Springer, 2003. Google Scholar
  80. Tomáš Šingliar and Miloš Hauskrecht. Noisy-or component analysis and its application to link analysis. Journal of Machine Learning Research (JMLR), 7(Oct):2189-2213, 2006. Google Scholar
  81. PV Skums, SV Suzdal, and RI Tyshkevich. Edge intersection graphs of linear 3-uniform hypergraphs. Electronic Notes in Discrete Mathematics, 22:33-40, 2005. Google Scholar
  82. Paris Smaragdis and Judith C Brown. Non-negative matrix factorization for polyphonic music transcription. In 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684), pages 177-180. IEEE, 2003. Google Scholar
  83. Zhao Song, David P Woodruff, and Peilin Zhong. Low rank approximation with entrywise l1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 688-701, 2017. Google Scholar
  84. Zhao Song, David P Woodruff, and Peilin Zhong. Relative error tensor low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2772-2789. SIAM, 2019. Google Scholar
  85. Maciej M Syslo. A labeling algorithm to recognize a line digraph and output its root graph. Information Processing Letters, 15(1):28-30, 1982. Google Scholar
  86. Jaideep Vaidya, Vijayalakshmi Atluri, and Qi Guo. The role mining problem: finding a minimal descriptive set of roles. In Proceedings of the 12th ACM symposium on Access control models and technologies, pages 175-184, 2007. Google Scholar
  87. Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, and Inderjit Dhillon. Efficient and non-convex coordinate descent for symmetric nonnegative matrix factorization. IEEE Transactions on Signal Processing, 64(21):5571-5584, 2016. Google Scholar
  88. Van Vu. Random discrete matrices. In Horizons of combinatorics, pages 257-280. Springer, 2008. Google Scholar
  89. Fei Wang, Tao Li, Xin Wang, Shenghuo Zhu, and Chris Ding. Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3):493-521, 2011. Google Scholar
  90. Hassler Whitney. Congruent graphs and the connectivity of graphs. In Hassler Whitney Collected Papers, pages 61-79. Springer, 1992. Google Scholar
  91. Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267-273, 2003. Google Scholar
  92. Xiaohui Yan, Jiafeng Guo, Shenghua Liu, Xueqi Cheng, and Yanfeng Wang. Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In proceedings of the SIAM International Conference on Data Mining (ICDM), pages 749-757. SIAM, 2013. Google Scholar
  93. Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, and Erkki Oja. Clustering by nonnegative matrix factorization using graph random walk. Advances in Neural Information Processing Systems (NeurIPS), 25:1079-1087, 2012. Google Scholar
  94. Ron Zass and Amnon Shashua. A unifying approach to hard and probabilistic clustering. In Tenth IEEE International Conference on Computer Vision (ICCV), pages 294-301. IEEE, 2005. Google Scholar
  95. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint, 2017. URL:
  96. Zhong-Yuan Zhang, Yong Wang, and Yong-Yeol Ahn. Overlapping community detection in complex networks using symmetric binary matrix factorization. Physical Review E, 87(6):062803, 2013. Google Scholar
  97. Zhongyuan Zhang, Tao Li, Chris Ding, and Xiangsun Zhang. Binary matrix factorization with applications. In Seventh IEEE international conference on data mining (ICDM), pages 391-400. IEEE, 2007. Google Scholar
  98. Ruicong Zhi, Markus Flierl, Qiuqi Ruan, and W Bastiaan Kleijn. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1):38-52, 2010. Google Scholar
  99. Zhihui Zhu, Xiao Li, Kai Liu, and Qiuwei Li. Dropping symmetry for fast symmetric nonnegative matrix factorization. Advances in Neural Information Processing Systems (NeurIPS), 31:5154-5164, 2018. Google Scholar