Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).

Zhao Song, Lichen Zhang, and Ruizhe Zhang. Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.ITCS.2024.93, author = {Song, Zhao and Zhang, Lichen and Zhang, Ruizhe}, title = {{Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {93:1--93:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.93}, URN = {urn:nbn:de:0030-drops-196212}, doi = {10.4230/LIPIcs.ITCS.2024.93}, annote = {Keywords: Deep learning theory, Nonconvex optimization} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

We study the problem of solving linear program in the streaming model. Given a constraint matrix A ∈ ℝ^{m×n} and vectors b ∈ ℝ^m, c ∈ ℝ^n, we develop a space-efficient interior point method that optimizes solely on the dual program. To this end, we obtain efficient algorithms for various different problems:
- For general linear programs, we can solve them in Õ(√n log(1/ε)) passes and Õ(n²) space for an ε-approximate solution. To the best of our knowledge, this is the most efficient LP solver in streaming with no polynomial dependence on m for both space and passes.
- For bipartite graphs, we can solve the minimum vertex cover and maximum weight matching problem in Õ(√m) passes and Õ(n) space.
In addition to our space-efficient IPM, we also give algorithms for solving SDD systems and isolation lemma in Õ(n) spaces, which are the cornerstones for our graph results.

S. Cliff Liu, Zhao Song, Hengjie Zhang, Lichen Zhang, and Tianyi Zhou. Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 88:1-88:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ICALP.2023.88, author = {Liu, S. Cliff and Song, Zhao and Zhang, Hengjie and Zhang, Lichen and Zhou, Tianyi}, title = {{Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching}}, booktitle = {50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)}, pages = {88:1--88:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-278-5}, ISSN = {1868-8969}, year = {2023}, volume = {261}, editor = {Etessami, Kousha and Feige, Uriel and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.88}, URN = {urn:nbn:de:0030-drops-181408}, doi = {10.4230/LIPIcs.ICALP.2023.88}, annote = {Keywords: Convex optimization, interior point method, streaming algorithm} }

Document

RANDOM

**Published in:** LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with n sets and n elements will have discrepancy O(√{n log n}) with high probability, while the famous result by Spencer [Spe85] shows that there exists an O(√n) discrepancy solution.
The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by Gårding [Går59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory.
In this paper, we present a list of new results for hyperbolic polynomials:
- We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone.
- We show a hyperbolic anti-concentration bound.
- We generalize the hyperbolic Kadison-Singer theorem [Brä18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors.
The classical matrix Chernoff and discrepancy results are based on determinant polynomial which is a special case of hyperbolic polynomials. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories.

Zhao Song and Ruizhe Zhang. Hyperbolic Concentration, Anti-Concentration, and Discrepancy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.APPROX/RANDOM.2022.10, author = {Song, Zhao and Zhang, Ruizhe}, title = {{Hyperbolic Concentration, Anti-Concentration, and Discrepancy}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)}, pages = {10:1--10:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-249-5}, ISSN = {1868-8969}, year = {2022}, volume = {245}, editor = {Chakrabarti, Amit and Swamy, Chaitanya}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.10}, URN = {urn:nbn:de:0030-drops-171324}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.10}, annote = {Keywords: Hyperbolic polynomial, Chernoff bound, Concentration, Discrepancy theory, Anti-concentration} }

Document

**Published in:** LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given {𝐌} ∈ {ℤ}^{m× m}, we want to find {𝐖} ∈ {0,1}^{m× r} such that ‖ {𝐌} - {𝐖} {𝐖}^⊤ ‖₀ is minimized among all {𝐖} for which each row is k-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training.
As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: {𝐌} = {𝐖} {𝐖}^{⊤} for {𝐖} a random Boolean matrix with k-sparse rows, and the goal is to recover {𝐖} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random k-uniform hypergraph from its line graph.
Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about {𝐖} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix {𝐖} has full column rank with high probability as soon as m = Ω̃(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang. Symmetric Sparse Boolean Matrix Factorization and Applications. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 46:1-46:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITCS.2022.46, author = {Chen, Sitan and Song, Zhao and Tao, Runzhou and Zhang, Ruizhe}, title = {{Symmetric Sparse Boolean Matrix Factorization and Applications}}, booktitle = {13th Innovations in Theoretical Computer Science Conference (ITCS 2022)}, pages = {46:1--46:25}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-217-4}, ISSN = {1868-8969}, year = {2022}, volume = {215}, editor = {Braverman, Mark}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.46}, URN = {urn:nbn:de:0030-drops-156422}, doi = {10.4230/LIPIcs.ITCS.2022.46}, annote = {Keywords: Matrix factorization, tensors, random matrices, average-case complexity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

For a directed graph G with n vertices and a start vertex u_start, we wish to (approximately) sample an L-step random walk over G starting from u_start with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be Θ̃(n ⋅ L). Prior to our work, a better space complexity was only known with Õ(√L) passes.
We essentially settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is Θ̃(n ⋅ √L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that any p-pass algorithm for this problem uses Ω̃(n ⋅ L^{1/p}) space. In addition, we show a similar Θ̃(n ⋅ √L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu. Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 52:1-52:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2021.52, author = {Chen, Lijie and Kol, Gillat and Paramonov, Dmitry and Saxena, Raghuvansh R. and Song, Zhao and Yu, Huacheng}, title = {{Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs}}, booktitle = {48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)}, pages = {52:1--52:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-195-5}, ISSN = {1868-8969}, year = {2021}, volume = {198}, editor = {Bansal, Nikhil and Merelli, Emanuela and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.52}, URN = {urn:nbn:de:0030-drops-141218}, doi = {10.4230/LIPIcs.ICALP.2021.52}, annote = {Keywords: streaming algorithms, random walk sampling} }

Document

**Published in:** LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster second-order optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (independent of the training batch size n), second-order algorithms incur a daunting slowdown in the cost per iteration (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [Zhang et al., 2019; Cai et al., 2019], yielding an O(mn²)-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width m.
We show how to speed up the algorithm of [Cai et al., 2019], achieving an Õ(mn)-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (mn) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an 𝓁₂-regression problem, and then use a Fast-JL type dimension reduction to precondition the underlying Gram matrix in time independent of M, allowing to find a sufficiently good approximate solution via first-order conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra - which led to recent breakthroughs in convex optimization (ERM, LPs, Regression) - can be carried over to the realm of deep learning as well.

Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein. Training (Overparametrized) Neural Networks in Near-Linear Time. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 63:1-63:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{vandenbrand_et_al:LIPIcs.ITCS.2021.63, author = {van den Brand, Jan and Peng, Binghui and Song, Zhao and Weinstein, Omri}, title = {{Training (Overparametrized) Neural Networks in Near-Linear Time}}, booktitle = {12th Innovations in Theoretical Computer Science Conference (ITCS 2021)}, pages = {63:1--63:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-177-1}, ISSN = {1868-8969}, year = {2021}, volume = {185}, editor = {Lee, James R.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.63}, URN = {urn:nbn:de:0030-drops-136025}, doi = {10.4230/LIPIcs.ITCS.2021.63}, annote = {Keywords: Deep learning theory, Nonconvex optimization} }

Document

**Published in:** LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2.
The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that
(1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}},
where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties.
We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.

Eric Price, Zhao Song, and David P. Woodruff. Fast Regression with an $ell_infty$ Guarantee. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 59:1-59:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.ICALP.2017.59, author = {Price, Eric and Song, Zhao and Woodruff, David P.}, title = {{Fast Regression with an \$ell\underlineinfty\$ Guarantee}}, booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)}, pages = {59:1--59:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-041-5}, ISSN = {1868-8969}, year = {2017}, volume = {80}, editor = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.59}, URN = {urn:nbn:de:0030-drops-74488}, doi = {10.4230/LIPIcs.ICALP.2017.59}, annote = {Keywords: Linear regression, Count-Sketch, Gaussians, Leverage scores, ell\underlineinfty-guarantee} }

Document

**Published in:** LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)

We present two improved algorithms for weighted discrete p-center problem for tree networks with n vertices. One of our proposed algorithms runs in O(n*log(n) + p*log^2(n) * log(n/p)) time. For all values of p, our algorithm thus runs as fast as or faster than the most efficient O(n*log^2(n)) time algorithm obtained by applying Cole's [1987] speed-up technique to the algorithm due to Megiddo and Tamir [1983], which has remained unchallenged for nearly 30 years.
Our other algorithm, which is more practical, runs in O(n*log(n) + p^2*log^2(n/p)) time, and when p=O(sqrt(n)) it is faster than Megiddo and Tamir's O(n*log^2(n) * log(log(n))) time algorithm [1983].

Aritra Banik, Binay Bhattacharya, Sandip Das, Tsunehiko Kameda, and Zhao Song. The p-Center Problem in Tree Networks Revisited. In 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 6:1-6:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{banik_et_al:LIPIcs.SWAT.2016.6, author = {Banik, Aritra and Bhattacharya, Binay and Das, Sandip and Kameda, Tsunehiko and Song, Zhao}, title = {{The p-Center Problem in Tree Networks Revisited}}, booktitle = {15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)}, pages = {6:1--6:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-011-8}, ISSN = {1868-8969}, year = {2016}, volume = {53}, editor = {Pagh, Rasmus}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016.6}, URN = {urn:nbn:de:0030-drops-60296}, doi = {10.4230/LIPIcs.SWAT.2016.6}, annote = {Keywords: Facility location, p-center, parametric search, tree network, sorting network} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail