DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2024.93

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Authors: Zhao Song, Lichen Zhang, and Ruizhe Zhang

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).

Cite as

Zhao Song, Lichen Zhang, and Ruizhe Zhang. Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.ITCS.2024.93,
  author =	{Song, Zhao and Zhang, Lichen and Zhang, Ruizhe},
  title =	{{Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{93:1--93:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.93},
  URN =		{urn:nbn:de:0030-drops-196212},
  doi =		{10.4230/LIPIcs.ITCS.2024.93},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2023.108

A Hyperbolic Extension of Kadison-Singer Type Results

Authors: Ruizhe Zhang and Xinzhi Zhang

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

Abstract

In 2013, Marcus, Spielman, and Srivastava resolved the famous Kadison-Singer conjecture. It states that for n independent random vectors v_1,⋯, v_n that have expected squared norm bounded by ε and are in the isotropic position in expectation, there is a positive probability that the determinant polynomial det(xI - ∑_{i=1}^n v_i v_i^⊤) has roots bounded by (1 + √ε)². An interpretation of the Kadison-Singer theorem is that we can always find a partition of the vectors v_1,⋯,v_n into two sets with a low discrepancy in terms of the spectral norm (in other words, rely on the determinant polynomial). In this paper, we provide two results for a broader class of polynomials, the hyperbolic polynomials. Furthermore, our results are in two generalized settings: - The first one shows that the Kadison-Singer result requires a weaker assumption that the vectors have a bounded sum of hyperbolic norms. - The second one relaxes the Kadison-Singer result’s distribution assumption to the Strongly Rayleigh distribution. To the best of our knowledge, the previous results only support determinant polynomials [Anari and Oveis Gharan'14, Kyng, Luh and Song'20]. It is unclear whether they can be generalized to a broader class of polynomials. In addition, we also provide a sub-exponential time algorithm for constructing our results.

Cite as

Ruizhe Zhang and Xinzhi Zhang. A Hyperbolic Extension of Kadison-Singer Type Results. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 108:1-108:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ICALP.2023.108,
  author =	{Zhang, Ruizhe and Zhang, Xinzhi},
  title =	{{A Hyperbolic Extension of Kadison-Singer Type Results}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{108:1--108:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.108},
  URN =		{urn:nbn:de:0030-drops-181606},
  doi =		{10.4230/LIPIcs.ICALP.2023.108},
  annote =	{Keywords: Kadison-Singer conjecture, Hyperbolic polynomials, Strongly-Rayleigh distributions, Interlacing polynomials}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.10

Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Authors: Zhao Song and Ruizhe Zhang

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Abstract

Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with n sets and n elements will have discrepancy O(√{n log n}) with high probability, while the famous result by Spencer [Spe85] shows that there exists an O(√n) discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by Gårding [Går59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: - We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. - We show a hyperbolic anti-concentration bound. - We generalize the hyperbolic Kadison-Singer theorem [Brä18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial which is a special case of hyperbolic polynomials. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories.

Cite as

Zhao Song and Ruizhe Zhang. Hyperbolic Concentration, Anti-Concentration, and Discrepancy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.APPROX/RANDOM.2022.10,
  author =	{Song, Zhao and Zhang, Ruizhe},
  title =	{{Hyperbolic Concentration, Anti-Concentration, and Discrepancy}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  URN =		{urn:nbn:de:0030-drops-171324},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  annote =	{Keywords: Hyperbolic polynomial, Chernoff bound, Concentration, Discrepancy theory, Anti-concentration}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.16

Eigenstripping, Spectral Decay, and Edge-Expansion on Posets

Authors: Jason Gaitonde, Max Hopkins, Tali Kaufman, Shachar Lovett, and Ruizhe Zhang

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Abstract

Fast mixing of random walks on hypergraphs (simplicial complexes) has recently led to myriad breakthroughs throughout theoretical computer science. Many important applications, however, (e.g. to LTCs, 2-2 games) rely on a more general class of underlying structures called posets, and crucially take advantage of non-simplicial structure. These works make it clear that the global expansion properties of posets depend strongly on their underlying architecture (e.g. simplicial, cubical, linear algebraic), but the overall phenomenon remains poorly understood. In this work, we quantify the advantage of different poset architectures in both a spectral and combinatorial sense, highlighting how regularity controls the spectral decay and edge-expansion of corresponding random walks. We show that the spectra of walks on expanding posets (Dikstein, Dinur, Filmus, Harsha APPROX-RANDOM 2018) concentrate in strips around a small number of approximate eigenvalues controlled by the regularity of the underlying poset. This gives a simple condition to identify poset architectures (e.g. the Grassmann) that exhibit strong (even exponential) decay of eigenvalues, versus architectures like hypergraphs whose eigenvalues decay linearly - a crucial distinction in applications to hardness of approximation and agreement testing such as the recent proof of the 2-2 Games Conjecture (Khot, Minzer, Safra FOCS 2018). We show these results lead to a tight characterization of edge-expansion on expanding posets in the 𝓁₂-regime (generalizing recent work of Bafna, Hopkins, Kaufman, and Lovett (SODA 2022)), and pay special attention to the case of the Grassmann where we show our results are tight for a natural set of sparsifications of the Grassmann graphs. We note for clarity that our results do not recover the characterization of expansion used in the proof of the 2-2 Games Conjecture which relies on 𝓁_∞ rather than 𝓁₂-structure.

Cite as

Jason Gaitonde, Max Hopkins, Tali Kaufman, Shachar Lovett, and Ruizhe Zhang. Eigenstripping, Spectral Decay, and Edge-Expansion on Posets. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 16:1-16:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gaitonde_et_al:LIPIcs.APPROX/RANDOM.2022.16,
  author =	{Gaitonde, Jason and Hopkins, Max and Kaufman, Tali and Lovett, Shachar and Zhang, Ruizhe},
  title =	{{Eigenstripping, Spectral Decay, and Edge-Expansion on Posets}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{16:1--16:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.16},
  URN =		{urn:nbn:de:0030-drops-171381},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.16},
  annote =	{Keywords: High-dimensional expanders, posets, eposets}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2022.46

Symmetric Sparse Boolean Matrix Factorization and Applications

Authors: Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

Abstract

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given {𝐌} ∈ {ℤ}^{m× m}, we want to find {𝐖} ∈ {0,1}^{m× r} such that ‖ {𝐌} - {𝐖} {𝐖}^⊤ ‖₀ is minimized among all {𝐖} for which each row is k-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: {𝐌} = {𝐖} {𝐖}^{⊤} for {𝐖} a random Boolean matrix with k-sparse rows, and the goal is to recover {𝐖} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random k-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about {𝐖} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix {𝐖} has full column rank with high probability as soon as m = Ω̃(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Cite as

Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang. Symmetric Sparse Boolean Matrix Factorization and Applications. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 46:1-46:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITCS.2022.46,
  author =	{Chen, Sitan and Song, Zhao and Tao, Runzhou and Zhang, Ruizhe},
  title =	{{Symmetric Sparse Boolean Matrix Factorization and Applications}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{46:1--46:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.46},
  URN =		{urn:nbn:de:0030-drops-156422},
  doi =		{10.4230/LIPIcs.ITCS.2022.46},
  annote =	{Keywords: Matrix factorization, tensors, random matrices, average-case complexity}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2022.47

Quantum Meets the Minimum Circuit Size Problem

Authors: Nai-Hui Chia, Chi-Ning Chou, Jiayu Zhang, and Ruizhe Zhang

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

Abstract

In this work, we initiate the study of the Minimum Circuit Size Problem (MCSP) in the quantum setting. MCSP is a problem to compute the circuit complexity of Boolean functions. It is a fascinating problem in complexity theory - its hardness is mysterious, and a better understanding of its hardness can have surprising implications to many fields in computer science. We first define and investigate the basic complexity-theoretic properties of minimum quantum circuit size problems for three natural objects: Boolean functions, unitaries, and quantum states. We show that these problems are not trivially in NP but in QCMA (or have QCMA protocols). Next, we explore the relations between the three quantum MCSPs and their variants. We discover that some reductions that are not known for classical MCSP exist for quantum MCSPs for unitaries and states, e.g., search-to-decision reductions and self-reductions. Finally, we systematically generalize results known for classical MCSP to the quantum setting (including quantum cryptography, quantum learning theory, quantum circuit lower bounds, and quantum fine-grained complexity) and also find new connections to tomography and quantum gravity. Due to the fundamental differences between classical and quantum circuits, most of our results require extra care and reveal properties and phenomena unique to the quantum setting. Our findings could be of interest for future studies, and we post several open problems for further exploration along this direction.

Cite as

Nai-Hui Chia, Chi-Ning Chou, Jiayu Zhang, and Ruizhe Zhang. Quantum Meets the Minimum Circuit Size Problem. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 47:1-47:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{chia_et_al:LIPIcs.ITCS.2022.47,
  author =	{Chia, Nai-Hui and Chou, Chi-Ning and Zhang, Jiayu and Zhang, Ruizhe},
  title =	{{Quantum Meets the Minimum Circuit Size Problem}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{47:1--47:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.47},
  URN =		{urn:nbn:de:0030-drops-156433},
  doi =		{10.4230/LIPIcs.ITCS.2022.47},
  annote =	{Keywords: Quantum Computation, Quantum Complexity, Minimum Circuit Size Problem}
}

Document

DOI: 10.4230/LIPIcs.CCC.2020.16

On the Quantum Complexity of Closest Pair and Related Problems

Authors: Scott Aaronson, Nai-Hui Chia, Han-Hsuan Lin, Chunhao Wang, and Ruizhe Zhang

Published in: LIPIcs, Volume 169, 35th Computational Complexity Conference (CCC 2020)

Abstract

The closest pair problem is a fundamental problem of computational geometry: given a set of n points in a d-dimensional space, find a pair with the smallest distance. A classical algorithm taught in introductory courses solves this problem in O(n log n) time in constant dimensions (i.e., when d = O(1)). This paper asks and answers the question of the problem’s quantum time complexity. Specifically, we give an Õ(n^(2/3)) algorithm in constant dimensions, which is optimal up to a polylogarithmic factor by the lower bound on the quantum query complexity of element distinctness. The key to our algorithm is an efficient history-independent data structure that supports quantum interference. In polylog(n) dimensions, no known quantum algorithms perform better than brute force search, with a quadratic speedup provided by Grover’s algorithm. To give evidence that the quadratic speedup is nearly optimal, we initiate the study of quantum fine-grained complexity and introduce the Quantum Strong Exponential Time Hypothesis (QSETH), which is based on the assumption that Grover’s algorithm is optimal for CNF-SAT when the clause width is large. We show that the naïve Grover approach to closest pair in higher dimensions is optimal up to an n^o(1) factor unless QSETH is false. We also study the bichromatic closest pair problem and the orthogonal vectors problem, with broadly similar results.

Cite as

Scott Aaronson, Nai-Hui Chia, Han-Hsuan Lin, Chunhao Wang, and Ruizhe Zhang. On the Quantum Complexity of Closest Pair and Related Problems. In 35th Computational Complexity Conference (CCC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 169, pp. 16:1-16:43, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{aaronson_et_al:LIPIcs.CCC.2020.16,
  author =	{Aaronson, Scott and Chia, Nai-Hui and Lin, Han-Hsuan and Wang, Chunhao and Zhang, Ruizhe},
  title =	{{On the Quantum Complexity of Closest Pair and Related Problems}},
  booktitle =	{35th Computational Complexity Conference (CCC 2020)},
  pages =	{16:1--16:43},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-156-6},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{169},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2020.16},
  URN =		{urn:nbn:de:0030-drops-125681},
  doi =		{10.4230/LIPIcs.CCC.2020.16},
  annote =	{Keywords: Closest pair, Quantum computing, Quantum fine grained reduction, Quantum strong exponential time hypothesis, Fine grained complexity}
}

Search Results

Documents authored by Zhang, Ruizhe

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Abstract

Cite as

A Hyperbolic Extension of Kadison-Singer Type Results

Abstract

Cite as

Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Abstract

Cite as

Eigenstripping, Spectral Decay, and Edge-Expansion on Posets

Abstract

Cite as

Symmetric Sparse Boolean Matrix Factorization and Applications

Abstract

Cite as

Quantum Meets the Minimum Circuit Size Problem

Abstract

Cite as

On the Quantum Complexity of Closest Pair and Related Problems

Abstract

Cite as

Thanks for your feedback!

Could not send message