DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2024.93

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Authors: Zhao Song, Lichen Zhang, and Ruizhe Zhang

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).

Cite as

Zhao Song, Lichen Zhang, and Ruizhe Zhang. Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.ITCS.2024.93,
  author =	{Song, Zhao and Zhang, Lichen and Zhang, Ruizhe},
  title =	{{Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{93:1--93:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.93},
  URN =		{urn:nbn:de:0030-drops-196212},
  doi =		{10.4230/LIPIcs.ITCS.2024.93},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.79

A Faster Interior-Point Method for Sum-Of-Squares Optimization

Authors: Shunhua Jiang, Bento Natura, and Omri Weinstein

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let p = ∑_i q²_i be an n-variate SOS polynomial of degree 2d. Denoting by L : = binom(n+d,d) and U : = binom(n+2d,2d) the dimensions of the vector spaces in which q_i’s and p live respectively, our algorithm runs in time Õ(LU^{1.87}). This is polynomially faster than state-of-art SOS and semidefinite programming solvers [Jiang et al., 2020; Huang et al., 2021; Papp and Yildiz, 2019], which achieve runtime Õ(L^{0.5} min{U^{2.37}, L^{4.24}}). The centerpiece of our algorithm is a dynamic data structure for maintaining the inverse of the Hessian of the SOS barrier function under the polynomial interpolant basis [Papp and Yildiz, 2019], which efficiently extends to multivariate SOS optimization, and requires maintaining spectral approximations to low-rank perturbations of elementwise (Hadamard) products. This is the main challenge and departure from recent IPM breakthroughs using inverse-maintenance, where low-rank updates to the slack matrix readily imply the same for the Hessian matrix.

Cite as

Shunhua Jiang, Bento Natura, and Omri Weinstein. A Faster Interior-Point Method for Sum-Of-Squares Optimization. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 79:1-79:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2022.79,
  author =	{Jiang, Shunhua and Natura, Bento and Weinstein, Omri},
  title =	{{A Faster Interior-Point Method for Sum-Of-Squares Optimization}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{79:1--79:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.79},
  URN =		{urn:nbn:de:0030-drops-164205},
  doi =		{10.4230/LIPIcs.ICALP.2022.79},
  annote =	{Keywords: Interior Point Methods, Sum-of-squares Optimization, Dynamic Matrix Inverse}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2021.44

Total Functions in the Polynomial Hierarchy

Authors: Robert Kleinberg, Oliver Korten, Daniel Mitropolsky, and Christos Papadimitriou

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)

Abstract

We identify several genres of search problems beyond NP for which existence of solutions is guaranteed. One class that seems especially rich in such problems is PEPP (for "polynomial empty pigeonhole principle"), which includes problems related to existence theorems proved through the union bound, such as finding a bit string that is far from all codewords, finding an explicit rigid matrix, as well as a problem we call Complexity, capturing Complexity Theory’s quest. When the union bound is generous, in that solutions constitute at least a polynomial fraction of the domain, we have a family of seemingly weaker classes α-PEPP, which are inside FP^NP|poly. Higher in the hierarchy, we identify the constructive version of the Sauer-Shelah lemma and the appropriate generalization of PPP that contains it, as well as the problem of finding a king in a tournament (a vertex k such that all other vertices are defeated by k, or by somebody k defeated).

Cite as

Robert Kleinberg, Oliver Korten, Daniel Mitropolsky, and Christos Papadimitriou. Total Functions in the Polynomial Hierarchy. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 44:1-44:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{kleinberg_et_al:LIPIcs.ITCS.2021.44,
  author =	{Kleinberg, Robert and Korten, Oliver and Mitropolsky, Daniel and Papadimitriou, Christos},
  title =	{{Total Functions in the Polynomial Hierarchy}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{44:1--44:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.44},
  URN =		{urn:nbn:de:0030-drops-135835},
  doi =		{10.4230/LIPIcs.ITCS.2021.44},
  annote =	{Keywords: total complexity, polynomial hierarchy, pigeonhole principle}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2021.63

Training (Overparametrized) Neural Networks in Near-Linear Time

Authors: Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)

Abstract

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster second-order optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (independent of the training batch size n), second-order algorithms incur a daunting slowdown in the cost per iteration (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [Zhang et al., 2019; Cai et al., 2019], yielding an O(mn²)-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width m. We show how to speed up the algorithm of [Cai et al., 2019], achieving an Õ(mn)-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (mn) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an 𝓁₂-regression problem, and then use a Fast-JL type dimension reduction to precondition the underlying Gram matrix in time independent of M, allowing to find a sufficiently good approximate solution via first-order conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra - which led to recent breakthroughs in convex optimization (ERM, LPs, Regression) - can be carried over to the realm of deep learning as well.

Cite as

Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein. Training (Overparametrized) Neural Networks in Near-Linear Time. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 63:1-63:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{vandenbrand_et_al:LIPIcs.ITCS.2021.63,
  author =	{van den Brand, Jan and Peng, Binghui and Song, Zhao and Weinstein, Omri},
  title =	{{Training (Overparametrized) Neural Networks in Near-Linear Time}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{63:1--63:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.63},
  URN =		{urn:nbn:de:0030-drops-136025},
  doi =		{10.4230/LIPIcs.ITCS.2021.63},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}

Document

DOI: 10.4230/LIPIcs.ESA.2020.68

Settling the Relationship Between Wilber’s Bounds for Dynamic Optimality

Authors: Victor Lecomte and Omri Weinstein

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Abstract

In FOCS 1986, Wilber proposed two combinatorial lower bounds on the operational cost of any binary search tree (BST) for a given access sequence X ∈ [n]^m. Both bounds play a central role in the ongoing pursuit of the dynamic optimality conjecture (Sleator and Tarjan, 1985), but their relationship remained unknown for more than three decades. We show that Wilber’s Funnel bound dominates his Alternation bound for all X, and give a tight Θ(lg lg n) separation for some X, answering Wilber’s conjecture and an open problem of Iacono, Demaine et. al. The main ingredient of the proof is a new symmetric characterization of Wilber’s Funnel bound, which proves that it is invariant under rotations of X. We use this characterization to provide initial indication that the Funnel bound matches the Independent Rectangle bound (Demaine et al., 2009), by proving that when the Funnel bound is constant, IRB_upRect is linear. To the best of our knowledge, our results provide the first progress on Wilber’s conjecture that the Funnel bound is dynamically optimal (1986).

Cite as

Victor Lecomte and Omri Weinstein. Settling the Relationship Between Wilber’s Bounds for Dynamic Optimality. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 68:1-68:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{lecomte_et_al:LIPIcs.ESA.2020.68,
  author =	{Lecomte, Victor and Weinstein, Omri},
  title =	{{Settling the Relationship Between Wilber’s Bounds for Dynamic Optimality}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{68:1--68:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.68},
  URN =		{urn:nbn:de:0030-drops-129342},
  doi =		{10.4230/LIPIcs.ESA.2020.68},
  annote =	{Keywords: data structures, binary search trees, dynamic optimality, lower bounds}
}

Document

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2017.46

The Minrank of Random Graphs

Authors: Alexander Golovnev, Oded Regev, and Omri Weinstein

Published in: LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

Abstract

The minrank of a directed graph G is the minimum rank of a matrix M that can be obtained from the adjacency matrix of G by switching some ones to zeros (i.e., deleting edges) and then setting all diagonal entries to one. This quantity is closely related to the fundamental information-theoretic problems of (linear) index coding (Bar-Yossef et al., FOCS'06), network coding and distributed storage, and to Valiant's approach for proving superlinear circuit lower bounds (Valiant, Boolean Function Complexity '92). We prove tight bounds on the minrank of directed Erdos-Renyi random graphs G(n,p) for all regimes of 0<p<1. In particular, for any constant p, we show that minrk(G) = Theta(n/log n) with high probability, where G is chosen from G(n,p). This bound gives a near quadratic improvement over the previous best lower bound of Omega(sqrt{n}) (Haviv and Langberg, ISIT'12), and partially settles an open problem raised by Lubetzky and Stav (FOCS '07). Our lower bound matches the well-known upper bound obtained by the "clique covering" solution, and settles the linear index coding problem for random graphs. Finally, our result suggests a new avenue of attack, via derandomization, on Valiant's approach for proving superlinear lower bounds for logarithmic-depth semilinear circuits.

Cite as

Alexander Golovnev, Oded Regev, and Omri Weinstein. The Minrank of Random Graphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 46:1-46:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{golovnev_et_al:LIPIcs.APPROX-RANDOM.2017.46,
  author =	{Golovnev, Alexander and Regev, Oded and Weinstein, Omri},
  title =	{{The Minrank of Random Graphs}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)},
  pages =	{46:1--46:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-044-6},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{81},
  editor =	{Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.46},
  URN =		{urn:nbn:de:0030-drops-75953},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2017.46},
  annote =	{Keywords: circuit complexity, index coding, information theory}
}

Document

DOI: 10.4230/LIPIcs.ESA.2016.41

Distributed Signaling Games

Authors: Moran Feldman, Moshe Tennenholtz, and Omri Weinstein

Published in: LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

Abstract

The study of the algorithmic and computational complexity of designing efficient signaling schemes for mechanisms aiming to optimize social welfare or revenue is a recurring theme in recent computer science literature. In reality, however, information is typically not held by a central authority, but is distributed among multiple sources (third-party "mediators"), a fact that dramatically changes the strategic and combinatorial nature of the signaling problem. In this paper we introduce distributed signaling games, while using display advertising as a canonical example for introducing this foundational framework. A distributed signaling game may be a pure coordination game (i.e., a distributed optimization task), or a non-cooperative game. In the context of pure coordination games, we show a wide gap between the computational complexity of the centralized and distributed signaling problems, proving that distributed coordination on revenue-optimal signaling is a much harder problem than its "centralized" counterpart. In the context of non-cooperative games, the outcome generated by the mediators' signals may have different value to each. The reason for that is typically the desire of the auctioneer to align the incentives of the mediators with his own by a compensation relative to the marginal benefit from their signals. We design a mechanism for this problem via a novel application of Shapley's value, and show that it possesses a few interesting economical properties.

Cite as

Moran Feldman, Moshe Tennenholtz, and Omri Weinstein. Distributed Signaling Games. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{feldman_et_al:LIPIcs.ESA.2016.41,
  author =	{Feldman, Moran and Tennenholtz, Moshe and Weinstein, Omri},
  title =	{{Distributed Signaling Games}},
  booktitle =	{24th Annual European Symposium on Algorithms (ESA 2016)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-015-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{57},
  editor =	{Sankowski, Piotr and Zaroliagis, Christos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.41},
  URN =		{urn:nbn:de:0030-drops-63536},
  doi =		{10.4230/LIPIcs.ESA.2016.41},
  annote =	{Keywords: Signaling, display advertising, mechanism design, shapley value}
}

7 Search Results for "Weinstein, Omri"

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Abstract

Cite as

A Faster Interior-Point Method for Sum-Of-Squares Optimization

Abstract

Cite as

Total Functions in the Polynomial Hierarchy

Abstract

Cite as

Training (Overparametrized) Neural Networks in Near-Linear Time

Abstract

Cite as

Settling the Relationship Between Wilber’s Bounds for Dynamic Optimality

Abstract

Cite as

The Minrank of Random Graphs

Abstract

Cite as

Distributed Signaling Games

Abstract

Cite as

Thanks for your feedback!

Could not send message