DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2024.93

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Authors: Zhao Song, Lichen Zhang, and Ruizhe Zhang

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).

Cite as

Zhao Song, Lichen Zhang, and Ruizhe Zhang. Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.ITCS.2024.93,
  author =	{Song, Zhao and Zhang, Lichen and Zhang, Ruizhe},
  title =	{{Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{93:1--93:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.93},
  URN =		{urn:nbn:de:0030-drops-196212},
  doi =		{10.4230/LIPIcs.ITCS.2024.93},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.1

A Unified Approach to Discrepancy Minimization

Authors: Nikhil Bansal, Aditi Laddha, and Santosh Vempala

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Abstract

We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances.

Cite as

Nikhil Bansal, Aditi Laddha, and Santosh Vempala. A Unified Approach to Discrepancy Minimization. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bansal_et_al:LIPIcs.APPROX/RANDOM.2022.1,
  author =	{Bansal, Nikhil and Laddha, Aditi and Vempala, Santosh},
  title =	{{A Unified Approach to Discrepancy Minimization}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.1},
  URN =		{urn:nbn:de:0030-drops-171238},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.1},
  annote =	{Keywords: Discrepancy theory, smoothed analysis}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.10

Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Authors: Zhao Song and Ruizhe Zhang

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Abstract

Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with n sets and n elements will have discrepancy O(√{n log n}) with high probability, while the famous result by Spencer [Spe85] shows that there exists an O(√n) discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by Gårding [Går59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: - We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. - We show a hyperbolic anti-concentration bound. - We generalize the hyperbolic Kadison-Singer theorem [Brä18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial which is a special case of hyperbolic polynomials. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories.

Cite as

Zhao Song and Ruizhe Zhang. Hyperbolic Concentration, Anti-Concentration, and Discrepancy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.APPROX/RANDOM.2022.10,
  author =	{Song, Zhao and Zhang, Ruizhe},
  title =	{{Hyperbolic Concentration, Anti-Concentration, and Discrepancy}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  URN =		{urn:nbn:de:0030-drops-171324},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  annote =	{Keywords: Hyperbolic polynomial, Chernoff bound, Concentration, Discrepancy theory, Anti-concentration}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICALP.2022.4

The Manifold Joys of Sampling (Invited Talk)

Authors: Yin Tat Lee and Santosh S. Vempala

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

We survey recent progress and many open questions in the field of sampling high-dimensional distributions, with specific focus on sampling with non-Euclidean metrics.

Cite as

Yin Tat Lee and Santosh S. Vempala. The Manifold Joys of Sampling (Invited Talk). In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ICALP.2022.4,
  author =	{Lee, Yin Tat and Vempala, Santosh S.},
  title =	{{The Manifold Joys of Sampling}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.4},
  URN =		{urn:nbn:de:0030-drops-163459},
  doi =		{10.4230/LIPIcs.ICALP.2022.4},
  annote =	{Keywords: Sampling, Diffusion, Optimization, High Dimension}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.FSTTCS.2020.3

Convex Optimization and Dynamic Data Structure (Invited Talk)

Authors: Yin Tat Lee

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)

Abstract

In the last three years, there are many breakthroughs in optimization such as nearly quadratic time algorithms for bipartite matching, linear programming algorithms that are as fast as Ax = b. All of these algorithms are based on a careful combination of optimization techniques and dynamic data structures. In this talk, we will explain the framework underlying all the recent breakthroughs. Joint work with Jan van den Brand, Michael B. Cohen, Sally Dong, Haotian Jiang, Tarun Kathuria, Danupon Nanongkai, Swati Padmanabhan, Richard Peng, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang, Sam Chiu-wai Wong, Guanghao Ye, Qiuyi Zhang.

Cite as

Yin Tat Lee. Convex Optimization and Dynamic Data Structure (Invited Talk). In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{lee:LIPIcs.FSTTCS.2020.3,
  author =	{Lee, Yin Tat},
  title =	{{Convex Optimization and Dynamic Data Structure}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.3},
  URN =		{urn:nbn:de:0030-drops-132440},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.3},
  annote =	{Keywords: Convex Optimization, Dynamic Data Structure}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2020.64

New Query Lower Bounds for Submodular Function Minimization

Authors: Andrei Graur, Tristan Pollner, Vidhya Ramaswamy, and S. Matthew Weinberg

Published in: LIPIcs, Volume 151, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)

Abstract

We consider submodular function minimization in the oracle model: given black-box access to a submodular set function f:2^[n] → ℝ, find an element of arg min_S {f(S)} using as few queries to f(⋅) as possible. State-of-the-art algorithms succeed with Õ(n²) queries [Yin Tat Lee et al., 2015], yet the best-known lower bound has never been improved beyond n [Nicholas J. A. Harvey, 2008]. We provide a query lower bound of 2n for submodular function minimization, a 3n/2-2 query lower bound for the non-trivial minimizer of a symmetric submodular function, and a binom{n}{2} query lower bound for the non-trivial minimizer of an asymmetric submodular function. Our 3n/2-2 lower bound results from a connection between SFM lower bounds and a novel concept we term the cut dimension of a graph. Interestingly, this yields a 3n/2-2 cut-query lower bound for finding the global mincut in an undirected, weighted graph, but we also prove it cannot yield a lower bound better than n+1 for s-t mincut, even in a directed, weighted graph.

Cite as

Andrei Graur, Tristan Pollner, Vidhya Ramaswamy, and S. Matthew Weinberg. New Query Lower Bounds for Submodular Function Minimization. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 64:1-64:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{graur_et_al:LIPIcs.ITCS.2020.64,
  author =	{Graur, Andrei and Pollner, Tristan and Ramaswamy, Vidhya and Weinberg, S. Matthew},
  title =	{{New Query Lower Bounds for Submodular Function Minimization}},
  booktitle =	{11th Innovations in Theoretical Computer Science Conference (ITCS 2020)},
  pages =	{64:1--64:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-134-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{151},
  editor =	{Vidick, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2020.64},
  URN =		{urn:nbn:de:0030-drops-117493},
  doi =		{10.4230/LIPIcs.ITCS.2020.64},
  annote =	{Keywords: submodular functions, query lower bounds, min cut}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.64

Optimal Convergence Rate of Hamiltonian Monte Carlo for Strongly Logconcave Distributions

Authors: Zongchen Chen and Santosh S. Vempala

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Abstract

We study Hamiltonian Monte Carlo (HMC) for sampling from a strongly logconcave density proportional to e^{-f} where f:R^d -> R is mu-strongly convex and L-smooth (the condition number is kappa = L/mu). We show that the relaxation time (inverse of the spectral gap) of ideal HMC is O(kappa), improving on the previous best bound of O(kappa^{1.5}); we complement this with an example where the relaxation time is Omega(kappa). When implemented using a nearly optimal ODE solver, HMC returns an epsilon-approximate point in 2-Wasserstein distance using O~((kappa d)^{0.5} epsilon^{-1}) gradient evaluations per step and O~((kappa d)^{1.5}epsilon^{-1}) total time.

Cite as

Zongchen Chen and Santosh S. Vempala. Optimal Convergence Rate of Hamiltonian Monte Carlo for Strongly Logconcave Distributions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 64:1-64:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.APPROX-RANDOM.2019.64,
  author =	{Chen, Zongchen and Vempala, Santosh S.},
  title =	{{Optimal Convergence Rate of Hamiltonian Monte Carlo for Strongly Logconcave Distributions}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{64:1--64:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.64},
  URN =		{urn:nbn:de:0030-drops-112790},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.64},
  annote =	{Keywords: logconcave distribution, sampling, Hamiltonian Monte Carlo, spectral gap, strong convexity}
}

7 Search Results for "Lee, Yin Tat"

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Abstract

Cite as

A Unified Approach to Discrepancy Minimization

Abstract

Cite as

Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Abstract

Cite as

The Manifold Joys of Sampling (Invited Talk)

Abstract

Cite as

Convex Optimization and Dynamic Data Structure (Invited Talk)

Abstract

Cite as

New Query Lower Bounds for Submodular Function Minimization

Abstract

Cite as

Optimal Convergence Rate of Hamiltonian Monte Carlo for Strongly Logconcave Distributions

Abstract

Cite as

Thanks for your feedback!

Could not send message