14 Search Results for "Song, Zhao"


Document
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Authors: Zhao Song, Lichen Zhang, and Ruizhe Zhang

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)


Abstract
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).

Cite as

Zhao Song, Lichen Zhang, and Ruizhe Zhang. Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 93:1-93:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.ITCS.2024.93,
  author =	{Song, Zhao and Zhang, Lichen and Zhang, Ruizhe},
  title =	{{Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{93:1--93:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.93},
  URN =		{urn:nbn:de:0030-drops-196212},
  doi =		{10.4230/LIPIcs.ITCS.2024.93},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}
Document
Short Paper
Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper)

Authors: Yuhan Ji and Song Gao

Published in: LIPIcs, Volume 277, 12th International Conference on Geographic Information Science (GIScience 2023)


Abstract
This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.

Cite as

Yuhan Ji and Song Gao. Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper). In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 43:1-43:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ji_et_al:LIPIcs.GIScience.2023.43,
  author =	{Ji, Yuhan and Gao, Song},
  title =	{{Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations}},
  booktitle =	{12th International Conference on Geographic Information Science (GIScience 2023)},
  pages =	{43:1--43:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-288-4},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{277},
  editor =	{Beecham, Roger and Long, Jed A. and Smith, Dianna and Zhao, Qunshan and Wise, Sarah},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2023.43},
  URN =		{urn:nbn:de:0030-drops-189381},
  doi =		{10.4230/LIPIcs.GIScience.2023.43},
  annote =	{Keywords: LLMs, foundation models, GeoAI}
}
Document
Short Paper
The Ethics of AI-Generated Maps: DALL·E 2 and AI’s Implications for Cartography (Short Paper)

Authors: Qianheng Zhang, Yuhao Kang, and Robert Roth

Published in: LIPIcs, Volume 277, 12th International Conference on Geographic Information Science (GIScience 2023)


Abstract
The rapid advancement of artificial intelligence (AI) such as the emergence of large language models ChatGPT and DALL·E 2 has brought both opportunities for improving productivity and raised ethical concerns. This paper investigates the ethics of using artificial intelligence (AI) in cartography, with a particular focus on the generation of maps using DALL·E 2. To accomplish this, we first created an open-sourced dataset that includes synthetic (AI-generated) and real-world (human-designed) maps at multiple scales with a variety of settings. We subsequently examined four potential ethical concerns that may arise from the characteristics of DALL·E 2 generated maps, namely inaccuracies, misleading information, unanticipated features, and irreproducibility. We then developed a deep learning-based model to identify those AI-generated maps. Our research emphasizes the importance of ethical considerations in the development and use of AI techniques in cartography, contributing to the growing body of work on trustworthy maps. We aim to raise public awareness of the potential risks associated with AI-generated maps and support the development of ethical guidelines for their future use.

Cite as

Qianheng Zhang, Yuhao Kang, and Robert Roth. The Ethics of AI-Generated Maps: DALL·E 2 and AI’s Implications for Cartography (Short Paper). In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 93:1-93:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.GIScience.2023.93,
  author =	{Zhang, Qianheng and Kang, Yuhao and Roth, Robert},
  title =	{{The Ethics of AI-Generated Maps: DALL·E 2 and AI’s Implications for Cartography}},
  booktitle =	{12th International Conference on Geographic Information Science (GIScience 2023)},
  pages =	{93:1--93:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-288-4},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{277},
  editor =	{Beecham, Roger and Long, Jed A. and Smith, Dianna and Zhao, Qunshan and Wise, Sarah},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2023.93},
  URN =		{urn:nbn:de:0030-drops-189886},
  doi =		{10.4230/LIPIcs.GIScience.2023.93},
  annote =	{Keywords: Ethics, GeoAI, DALL-E, Cartography}
}
Document
Track A: Algorithms, Complexity and Games
Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching

Authors: S. Cliff Liu, Zhao Song, Hengjie Zhang, Lichen Zhang, and Tianyi Zhou

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)


Abstract
We study the problem of solving linear program in the streaming model. Given a constraint matrix A ∈ ℝ^{m×n} and vectors b ∈ ℝ^m, c ∈ ℝ^n, we develop a space-efficient interior point method that optimizes solely on the dual program. To this end, we obtain efficient algorithms for various different problems: - For general linear programs, we can solve them in Õ(√n log(1/ε)) passes and Õ(n²) space for an ε-approximate solution. To the best of our knowledge, this is the most efficient LP solver in streaming with no polynomial dependence on m for both space and passes. - For bipartite graphs, we can solve the minimum vertex cover and maximum weight matching problem in Õ(√m) passes and Õ(n) space. In addition to our space-efficient IPM, we also give algorithms for solving SDD systems and isolation lemma in Õ(n) spaces, which are the cornerstones for our graph results.

Cite as

S. Cliff Liu, Zhao Song, Hengjie Zhang, Lichen Zhang, and Tianyi Zhou. Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 88:1-88:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ICALP.2023.88,
  author =	{Liu, S. Cliff and Song, Zhao and Zhang, Hengjie and Zhang, Lichen and Zhou, Tianyi},
  title =	{{Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{88:1--88:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.88},
  URN =		{urn:nbn:de:0030-drops-181408},
  doi =		{10.4230/LIPIcs.ICALP.2023.88},
  annote =	{Keywords: Convex optimization, interior point method, streaming algorithm}
}
Document
RANDOM
Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Authors: Zhao Song and Ruizhe Zhang

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)


Abstract
Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with n sets and n elements will have discrepancy O(√{n log n}) with high probability, while the famous result by Spencer [Spe85] shows that there exists an O(√n) discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by Gårding [Går59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: - We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. - We show a hyperbolic anti-concentration bound. - We generalize the hyperbolic Kadison-Singer theorem [Brä18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial which is a special case of hyperbolic polynomials. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories.

Cite as

Zhao Song and Ruizhe Zhang. Hyperbolic Concentration, Anti-Concentration, and Discrepancy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.APPROX/RANDOM.2022.10,
  author =	{Song, Zhao and Zhang, Ruizhe},
  title =	{{Hyperbolic Concentration, Anti-Concentration, and Discrepancy}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  URN =		{urn:nbn:de:0030-drops-171324},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.10},
  annote =	{Keywords: Hyperbolic polynomial, Chernoff bound, Concentration, Discrepancy theory, Anti-concentration}
}
Document
Symmetric Sparse Boolean Matrix Factorization and Applications

Authors: Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)


Abstract
In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given {𝐌} ∈ {ℤ}^{m× m}, we want to find {𝐖} ∈ {0,1}^{m× r} such that ‖ {𝐌} - {𝐖} {𝐖}^⊤ ‖₀ is minimized among all {𝐖} for which each row is k-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: {𝐌} = {𝐖} {𝐖}^{⊤} for {𝐖} a random Boolean matrix with k-sparse rows, and the goal is to recover {𝐖} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random k-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about {𝐖} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix {𝐖} has full column rank with high probability as soon as m = Ω̃(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Cite as

Sitan Chen, Zhao Song, Runzhou Tao, and Ruizhe Zhang. Symmetric Sparse Boolean Matrix Factorization and Applications. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 46:1-46:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITCS.2022.46,
  author =	{Chen, Sitan and Song, Zhao and Tao, Runzhou and Zhang, Ruizhe},
  title =	{{Symmetric Sparse Boolean Matrix Factorization and Applications}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{46:1--46:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.46},
  URN =		{urn:nbn:de:0030-drops-156422},
  doi =		{10.4230/LIPIcs.ITCS.2022.46},
  annote =	{Keywords: Matrix factorization, tensors, random matrices, average-case complexity}
}
Document
Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms

Authors: Shumo Chu, Danyang Zhuo, Elaine Shi, and T-H. Hubert Chan

Published in: LIPIcs, Volume 199, 2nd Conference on Information-Theoretic Cryptography (ITC 2021)


Abstract
Numerous high-profile works have shown that access patterns to even encrypted databases can leak secret information and sometimes even lead to reconstruction of the entire database. To thwart access pattern leakage, the literature has focused on oblivious algorithms, where obliviousness requires that the access patterns leak nothing about the input data. In this paper, we consider the Join operator, an important database primitive that has been extensively studied and optimized. Unfortunately, any fully oblivious Join algorithm would require always padding the result to the worst-case length which is quadratic in the data size N. In comparison, an insecure baseline incurs only O(R + N) cost where R is the true result length, and in the common case in practice, R is relatively short. As a typical example, when R = O(N), any fully oblivious algorithm must inherently incur a prohibitive, N-fold slowdown relative to the insecure baseline. Indeed, the (non-private) database and algorithms literature invariably focuses on studying the instance-specific rather than worst-case performance of database algorithms. Unfortunately, the stringent notion of full obliviousness precludes the design of efficient algorithms with non-trivial instance-specific performance. To overcome this worst-case performance barrier of full obliviousness and enable algorithms with good instance-specific performance, we consider a relaxed notion of access pattern privacy called (ε, δ)-differential obliviousness (DO), originally proposed in the seminal work of Chan et al. (SODA'19). Rather than insisting that the access patterns leak no information whatsoever, the relaxed DO notion requires that the access patterns satisfy (ε, δ)-differential privacy. We show that by adopting the relaxed DO notion, we can obtain efficient database Join mechanisms whose instance-specific performance approximately matches the insecure baseline, while still offering a meaningful notion of privacy to individual users. Complementing our upper bound results, we also prove new lower bounds regarding the performance of any DO Join algorithm. Differential obliviousness (DO) is a new notion and is a relatively unexplored territory. Following the pioneering investigations by Chan et al. and others, our work is among the very first to formally explore how DO can help overcome the worst-case performance curse of full obliviousness; moreover, we motivate our work with database applications. Our work shows new evidence why DO might be a promising notion, and opens up several exciting future directions.

Cite as

Shumo Chu, Danyang Zhuo, Elaine Shi, and T-H. Hubert Chan. Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms. In 2nd Conference on Information-Theoretic Cryptography (ITC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 199, pp. 19:1-19:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chu_et_al:LIPIcs.ITC.2021.19,
  author =	{Chu, Shumo and Zhuo, Danyang and Shi, Elaine and Chan, T-H. Hubert},
  title =	{{Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms}},
  booktitle =	{2nd Conference on Information-Theoretic Cryptography (ITC 2021)},
  pages =	{19:1--19:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-197-9},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{199},
  editor =	{Tessaro, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITC.2021.19},
  URN =		{urn:nbn:de:0030-drops-143386},
  doi =		{10.4230/LIPIcs.ITC.2021.19},
  annote =	{Keywords: differentially oblivious, database join, instance-specific performance}
}
Document
Track A: Algorithms, Complexity and Games
Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs

Authors: Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)


Abstract
For a directed graph G with n vertices and a start vertex u_start, we wish to (approximately) sample an L-step random walk over G starting from u_start with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be Θ̃(n ⋅ L). Prior to our work, a better space complexity was only known with Õ(√L) passes. We essentially settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is Θ̃(n ⋅ √L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that any p-pass algorithm for this problem uses Ω̃(n ⋅ L^{1/p}) space. In addition, we show a similar Θ̃(n ⋅ √L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

Cite as

Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu. Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 52:1-52:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2021.52,
  author =	{Chen, Lijie and Kol, Gillat and Paramonov, Dmitry and Saxena, Raghuvansh R. and Song, Zhao and Yu, Huacheng},
  title =	{{Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{52:1--52:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.52},
  URN =		{urn:nbn:de:0030-drops-141218},
  doi =		{10.4230/LIPIcs.ICALP.2021.52},
  annote =	{Keywords: streaming algorithms, random walk sampling}
}
Document
Training (Overparametrized) Neural Networks in Near-Linear Time

Authors: Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)


Abstract
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster second-order optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (independent of the training batch size n), second-order algorithms incur a daunting slowdown in the cost per iteration (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [Zhang et al., 2019; Cai et al., 2019], yielding an O(mn²)-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width m. We show how to speed up the algorithm of [Cai et al., 2019], achieving an Õ(mn)-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (mn) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an 𝓁₂-regression problem, and then use a Fast-JL type dimension reduction to precondition the underlying Gram matrix in time independent of M, allowing to find a sufficiently good approximate solution via first-order conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra - which led to recent breakthroughs in convex optimization (ERM, LPs, Regression) - can be carried over to the realm of deep learning as well.

Cite as

Jan van den Brand, Binghui Peng, Zhao Song, and Omri Weinstein. Training (Overparametrized) Neural Networks in Near-Linear Time. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 63:1-63:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{vandenbrand_et_al:LIPIcs.ITCS.2021.63,
  author =	{van den Brand, Jan and Peng, Binghui and Song, Zhao and Weinstein, Omri},
  title =	{{Training (Overparametrized) Neural Networks in Near-Linear Time}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{63:1--63:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.63},
  URN =		{urn:nbn:de:0030-drops-136025},
  doi =		{10.4230/LIPIcs.ITCS.2021.63},
  annote =	{Keywords: Deep learning theory, Nonconvex optimization}
}
Document
Invited Talk
Convex Optimization and Dynamic Data Structure (Invited Talk)

Authors: Yin Tat Lee

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)


Abstract
In the last three years, there are many breakthroughs in optimization such as nearly quadratic time algorithms for bipartite matching, linear programming algorithms that are as fast as Ax = b. All of these algorithms are based on a careful combination of optimization techniques and dynamic data structures. In this talk, we will explain the framework underlying all the recent breakthroughs. Joint work with Jan van den Brand, Michael B. Cohen, Sally Dong, Haotian Jiang, Tarun Kathuria, Danupon Nanongkai, Swati Padmanabhan, Richard Peng, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang, Sam Chiu-wai Wong, Guanghao Ye, Qiuyi Zhang.

Cite as

Yin Tat Lee. Convex Optimization and Dynamic Data Structure (Invited Talk). In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lee:LIPIcs.FSTTCS.2020.3,
  author =	{Lee, Yin Tat},
  title =	{{Convex Optimization and Dynamic Data Structure}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.3},
  URN =		{urn:nbn:de:0030-drops-132440},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.3},
  annote =	{Keywords: Convex Optimization, Dynamic Data Structure}
}
Document
Track A: Algorithms, Complexity and Games
Estimating the Frequency of a Clustered Signal

Authors: Xue Chen and Eric Price

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
We consider the problem of locating a signal whose frequencies are "off grid" and clustered in a narrow band. Given noisy sample access to a function g(t) with Fourier spectrum in a narrow range [f_0 - Delta, f_0 + Delta], how accurately is it possible to identify f_0? We present generic conditions on g that allow for efficient, accurate estimates of the frequency. We then show bounds on these conditions for k-Fourier-sparse signals that imply recovery of f_0 to within Delta + O~(k^3) from samples on [-1, 1]. This improves upon the best previous bound of O(Delta + O~(k^5))^{1.5}. We also show that no algorithm can do better than Delta + O~(k^2). In the process we provide a new O~(k^3) bound on the ratio between the maximum and average value of continuous k-Fourier-sparse signals, which has independent application.

Cite as

Xue Chen and Eric Price. Estimating the Frequency of a Clustered Signal. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 36:1-36:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2019.36,
  author =	{Chen, Xue and Price, Eric},
  title =	{{Estimating the Frequency of a Clustered Signal}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{36:1--36:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.36},
  URN =		{urn:nbn:de:0030-drops-106128},
  doi =		{10.4230/LIPIcs.ICALP.2019.36},
  annote =	{Keywords: sublinear algorithms, Fourier transform}
}
Document
Fast Regression with an $ell_infty$ Guarantee

Authors: Eric Price, Zhao Song, and David P. Woodruff

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)


Abstract
Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector \hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S \cdot A and S \cdot b for a randomly chosen matrix S, then outputs x' = (SA)^{\dagger} Sb so as to minimize || SAx' - Sb||_2. The sketch-and-solve paradigm gives a bound on ||x'-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x' - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that (1) \langle a, x'-x^* \rangle \lesssim \frac{ \|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, where c, \gamma > 0 are arbitrary constants. This implies ||x'-x^*||_{\infty} is a factor d^{\frac{1}{2}-\gamma} smaller than ||x'-x^*||_2. It also gives a better bound on the generalization of x' to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties. We also provide lower bounds, both on how small ||x'-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x'-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||x'-x^*||_2 <= epsilon ||Ax^*-b||_2 \cdot ||A^{\dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x' for which ||Ax'-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work.

Cite as

Eric Price, Zhao Song, and David P. Woodruff. Fast Regression with an $ell_infty$ Guarantee. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 59:1-59:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{price_et_al:LIPIcs.ICALP.2017.59,
  author =	{Price, Eric and Song, Zhao and Woodruff, David P.},
  title =	{{Fast Regression with an \$ell\underlineinfty\$ Guarantee}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{59:1--59:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.59},
  URN =		{urn:nbn:de:0030-drops-74488},
  doi =		{10.4230/LIPIcs.ICALP.2017.59},
  annote =	{Keywords: Linear regression, Count-Sketch, Gaussians, Leverage scores, ell\underlineinfty-guarantee}
}
Document
Counting Hypergraph Matchings up to Uniqueness Threshold

Authors: Renjie Song, Yitong Yin, and Jinman Zhao

Published in: LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)


Abstract
We study the problem of approximately counting matchings in hypergraphs of bounded maximum degree and maximum size of hyperedges. With an activity parameter lambda, each matching M is assigned a weight lambda^{|M|}. The counting problem is formulated as computing a partition function that gives the sum of the weights of all matchings in a hypergraph. This problem unifies two extensively studied statistical physics models in approximate counting: the hardcore model (graph independent sets) and the monomer-dimer model (graph matchings). For this model, the critical activity lambda_c= (d^d)/(k (d-1)^{d+1}) is the threshold for the uniqueness of Gibbs measures on the infinite (d+1)-uniform (k+1)-regular hypertree. Consider hypergraphs of maximum degree at most k+1 and maximum size of hyperedges at most d+1. We show that when lambda < lambda_c, there is an FPTAS for computing the partition function; and when lambda = lambda_c, there is a PTAS for computing the log-partition function. These algorithms are based on the decay of correlation (strong spatial mixing) property of Gibbs distributions. When lambda > 2lambda_c, there is no PRAS for the partition function or the log-partition function unless NP=RP. Towards obtaining a sharp transition of computational complexity of approximate counting, we study the local convergence from a sequence of finite hypergraphs to the infinite lattice with specified symmetry. We show a surprising connection between the local convergence and the reversibility of a natural random walk. This leads us to a barrier for the hardness result: The non-uniqueness of infinite Gibbs measure is not realizable by any finite gadgets.

Cite as

Renjie Song, Yitong Yin, and Jinman Zhao. Counting Hypergraph Matchings up to Uniqueness Threshold. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 46:1-46:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{song_et_al:LIPIcs.APPROX-RANDOM.2016.46,
  author =	{Song, Renjie and Yin, Yitong and Zhao, Jinman},
  title =	{{Counting Hypergraph Matchings up to Uniqueness Threshold}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)},
  pages =	{46:1--46:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-018-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{60},
  editor =	{Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.46},
  URN =		{urn:nbn:de:0030-drops-66690},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2016.46},
  annote =	{Keywords: approximate counting; phase transition; spatial mixing}
}
Document
The p-Center Problem in Tree Networks Revisited

Authors: Aritra Banik, Binay Bhattacharya, Sandip Das, Tsunehiko Kameda, and Zhao Song

Published in: LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)


Abstract
We present two improved algorithms for weighted discrete p-center problem for tree networks with n vertices. One of our proposed algorithms runs in O(n*log(n) + p*log^2(n) * log(n/p)) time. For all values of p, our algorithm thus runs as fast as or faster than the most efficient O(n*log^2(n)) time algorithm obtained by applying Cole's [1987] speed-up technique to the algorithm due to Megiddo and Tamir [1983], which has remained unchallenged for nearly 30 years. Our other algorithm, which is more practical, runs in O(n*log(n) + p^2*log^2(n/p)) time, and when p=O(sqrt(n)) it is faster than Megiddo and Tamir's O(n*log^2(n) * log(log(n))) time algorithm [1983].

Cite as

Aritra Banik, Binay Bhattacharya, Sandip Das, Tsunehiko Kameda, and Zhao Song. The p-Center Problem in Tree Networks Revisited. In 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 6:1-6:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{banik_et_al:LIPIcs.SWAT.2016.6,
  author =	{Banik, Aritra and Bhattacharya, Binay and Das, Sandip and Kameda, Tsunehiko and Song, Zhao},
  title =	{{The p-Center Problem in Tree Networks Revisited}},
  booktitle =	{15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)},
  pages =	{6:1--6:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-011-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{53},
  editor =	{Pagh, Rasmus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016.6},
  URN =		{urn:nbn:de:0030-drops-60296},
  doi =		{10.4230/LIPIcs.SWAT.2016.6},
  annote =	{Keywords: Facility location, p-center, parametric search, tree network, sorting network}
}
  • Refine by Author
  • 8 Song, Zhao
  • 3 Zhang, Ruizhe
  • 2 Price, Eric
  • 2 Zhang, Lichen
  • 1 Banik, Aritra
  • Show More...

  • Refine by Classification
  • 4 Theory of computation → Streaming, sublinear and near linear time algorithms
  • 3 Theory of computation → Design and analysis of algorithms
  • 2 Theory of computation → Nonconvex optimization
  • 1 Computing methodologies → Artificial intelligence
  • 1 Human-centered computing → Human computer interaction (HCI)
  • Show More...

  • Refine by Keyword
  • 2 Deep learning theory
  • 2 GeoAI
  • 2 Nonconvex optimization
  • 1 Anti-concentration
  • 1 Cartography
  • Show More...

  • Refine by Type
  • 14 document

  • Refine by Publication Year
  • 3 2021
  • 3 2023
  • 2 2016
  • 2 2022
  • 1 2017
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail