Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

We consider the design of sublinear space and query complexity algorithms for estimating the cost of a minimum spanning tree (MST) and the cost of a minimum traveling salesman (TSP) tour in a metric on n points. We start by exploring this estimation task in the regime of o(n) space, when the input is presented as a stream of all binom(n,2) entries of the metric in an arbitrary order (a metric stream). For any α ≥ 2, we show that both MST and TSP cost can be α-approximated using Õ(n/α) space, and moreover, Ω(n/α²) space is necessary for this task. We further show that even if the streaming algorithm is allowed p passes over a metric stream, it still requires Ω̃(√{n/α p²}) space.
We next consider the well-studied semi-streaming regime. In this regime, it is straightforward to compute MST cost exactly even in the case where the input stream only contains the edges of a weighted graph that induce the underlying metric (a graph stream), and the main challenging problem is to estimate TSP cost to within a factor that is strictly better than 2. We show that in graph streams, for any ε > 0, any one-pass (2-ε)-approximation of TSP cost requires Ω(ε² n²) space. On the other hand, we show that there is an Õ(n) space two-pass algorithm that approximates the TSP cost to within a factor of 1.96.
Finally, we consider the query complexity of estimating metric TSP cost to within a factor that is strictly better than 2 when the algorithm is given access to an n × n matrix that specifies pairwise distances between n points. The problem of MST cost estimation in this model is well-understood and a (1+ε)-approximation is achievable by Õ(n/ε^{O(1)}) queries. However, for estimating TSP cost, it is known that an analogous result requires Ω(n²) queries even for (1,2)-TSP, and for general metrics, no algorithm that achieves a better than 2-approximation with o(n²) queries is known. We make progress on this task by designing an algorithm that performs Õ(n^{1.5}) distance queries and achieves a strictly better than 2-approximation when either the metric is known to contain a spanning tree supported on weight-1 edges or the algorithm is given access to a minimum spanning tree of the graph. Prior to our work, such results were only known for the special cases of graphic TSP and (1,2)-TSP.
In terms of techniques, our algorithms for metric TSP cost estimation in both streaming and query settings rely on estimating the cover advantage which intuitively measures the cost needed to turn an MST into an Eulerian graph. One of our main algorithmic contributions is to show that this quantity can be meaningfully estimated by a sublinear number of queries in the query model. On one hand, the fact that a metric stream reveals pairwise distances for all pairs of vertices provably helps algorithmically. On the other hand, it also seems to render useless techniques for proving space lower bounds via reductions from well-known hard communication problems. Our main technical contribution in lower bounds is to identify and characterize the communication complexity of new problems that can serve as canonical starting point for proving metric stream lower bounds.

Yu Chen, Sanjeev Khanna, and Zihan Tan. Sublinear Algorithms and Lower Bounds for Estimating MST and TSP Cost in General Metrics. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 37:1-37:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2023.37, author = {Chen, Yu and Khanna, Sanjeev and Tan, Zihan}, title = {{Sublinear Algorithms and Lower Bounds for Estimating MST and TSP Cost in General Metrics}}, booktitle = {50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)}, pages = {37:1--37:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-278-5}, ISSN = {1868-8969}, year = {2023}, volume = {261}, editor = {Etessami, Kousha and Feige, Uriel and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.37}, URN = {urn:nbn:de:0030-drops-180892}, doi = {10.4230/LIPIcs.ICALP.2023.37}, annote = {Keywords: Minimum spanning tree, travelling salesman problem, streaming algorithms} }

Document

**Published in:** LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

We resolve the space complexity of one-pass streaming algorithms for Minimum Dominating Set (MDS) in both insertion-only and insertion-deletion streams (up to poly-logarithmic factors) where an input graph is revealed by a sequence of edge updates. Recently, streaming algorithms for the related Set Cover problem have received significant attention. Even though MDS can be viewed as a special case of Set Cover, it is however harder to solve in the streaming setting since the input stream consists of individual edges rather than entire vertex-neighborhoods, as is the case in Set Cover.
We prove the following results (n is the number of vertices of the input graph):
1) In insertion-only streams, we give a one-pass semi-streaming algorithm (meaning Õ(n) space) with approximation factor Õ(√n). We also prove that every one-pass streaming algorithm with space o(n) has an approximation factor of Ω(n/log n).
Combined with a result by [Assadi et al., STOC'16] for Set Cover which, translated to MDS, shows that space Θ̃(n² / α) is necessary and sufficient for computing an α-approximation for every α = o(√n), this completely settles the space requirements for MDS in the insertion-only setting.
2) In insertion-deletion streams, we prove that space Ω(n² / (α log n)) is necessary for every approximation factor α ≤ Θ(n / log³ n). Combined with the Set Cover algorithm of [Assadi et al., STOC'16], which can be adapted to MDS even in the insertion-deletion setting to give an α-approximation in Õ(n² / α) space, this completely settles the space requirements for MDS in the insertion-deletion setting.

Sanjeev Khanna and Christian Konrad. Optimal Bounds for Dominating Set in Graph Streams. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 93:1-93:23, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{khanna_et_al:LIPIcs.ITCS.2022.93, author = {Khanna, Sanjeev and Konrad, Christian}, title = {{Optimal Bounds for Dominating Set in Graph Streams}}, booktitle = {13th Innovations in Theoretical Computer Science Conference (ITCS 2022)}, pages = {93:1--93:23}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-217-4}, ISSN = {1868-8969}, year = {2022}, volume = {215}, editor = {Braverman, Mark}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.93}, URN = {urn:nbn:de:0030-drops-156894}, doi = {10.4230/LIPIcs.ITCS.2022.93}, annote = {Keywords: Streaming algorithms, communication complexity, information complexity, dominating set} }

Document

**Published in:** LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

We study the problem of finding a spanning forest in an undirected, n-vertex multi-graph under two basic query models. One are Linear queries which are linear measurements on the incidence vector induced by the edges; the other are the weaker OR queries which only reveal whether a given subset of plausible edges is empty or not. At the heart of our study lies a fundamental problem which we call the single element recovery problem: given a non-negative vector x ∈ ℝ^{N}_{≥ 0}, the objective is to return a single element x_j > 0 from the support. Queries can be made in rounds, and our goals is to understand the trade-offs between the query complexity and the rounds of adaptivity needed to solve these problems, for both deterministic and randomized algorithms. These questions have connections and ramifications to multiple areas such as sketching, streaming, graph reconstruction, and compressed sensing. Our main results are as follows:
- For the single element recovery problem, it is easy to obtain a deterministic, r-round algorithm which makes (N^{1/r}-1)-queries per-round. We prove that this is tight: any r-round deterministic algorithm must make ≥ (N^{1/r} - 1) Linear queries in some round. In contrast, a 1-round O(polylog)-query randomized algorithm is known to exist.
- We design a deterministic O(r)-round, Õ(n^{1+1/r})-OR query algorithm for graph connectivity. We complement this with an Ω̃(n^{1 + 1/r})-lower bound for any r-round deterministic algorithm in the OR-model.
- We design a randomized, 2-round algorithm for the graph connectivity problem which makes Õ(n)-OR queries. In contrast, we prove that any 1-round algorithm (possibly randomized) requires Ω̃(n²)-OR queries. A randomized, 1-round algorithm making Õ(n)-Linear queries is already known. All our algorithms, in fact, work with more natural graph query models which are special cases of the above, and have been extensively studied in the literature. These are Cross queries (cut-queries) and BIS (bipartite independent set) queries.

Sepehr Assadi, Deeparnab Chakrabarty, and Sanjeev Khanna. Graph Connectivity and Single Element Recovery via Linear and OR Queries. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 7:1-7:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ESA.2021.7, author = {Assadi, Sepehr and Chakrabarty, Deeparnab and Khanna, Sanjeev}, title = {{Graph Connectivity and Single Element Recovery via Linear and OR Queries}}, booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)}, pages = {7:1--7:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-204-4}, ISSN = {1868-8969}, year = {2021}, volume = {204}, editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.7}, URN = {urn:nbn:de:0030-drops-145880}, doi = {10.4230/LIPIcs.ESA.2021.7}, annote = {Keywords: Query Models, Graph Connectivity, Group Testing, Duality} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczúr and Karger (1996) showed that given any n-vertex undirected weighted graph G and a parameter ε ∈ (0,1), there is a near-linear time algorithm that outputs a weighted subgraph G' of G of size Õ(n/ε²) such that the weight of every cut in G is preserved to within a (1 ± ε)-factor in G'. The graph G' is referred to as a (1 ± ε)-approximate cut sparsifier of G. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require Ω(n + m) time where n denotes the number of vertices and m denotes the number of hyperedges in the hypergraph. Since m can be exponentially large in n, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in n, independent of the number of edges. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph.
Specifically, we design an algorithm that constructs a (1 ± ε)-approximate cut sparsifier of a hypergraph H(V,E) in polynomial time in n, independent of the number of hyperedges, when given access to the hypergraph using the following two queries:
1) given any cut (S, ̄S), return the size |δ_E(S)| (cut value queries); and
2) given any cut (S, ̄S), return a uniformly at random edge crossing the cut (cut edge sample queries). Our algorithm outputs a sparsifier with Õ(n/ε²) edges, which is essentially optimal. We then extend our results to show that cut value and cut edge sample queries can also be used to construct hypergraph spectral sparsifiers in poly(n) time, independent of the number of hyperedges.
We complement the algorithmic results above by showing that any algorithm that has access to only one of the above two types of queries can not give a hypergraph cut sparsifier in time that is polynomial in n. Finally, we show that our algorithmic results also hold if we replace the cut edge sample queries with a pair neighbor sample query that for any pair of vertices, returns a random edge incident on them. In contrast, we show that having access only to cut value queries and queries that return a random edge incident on a given single vertex, is not sufficient.

Yu Chen, Sanjeev Khanna, and Ansh Nagda. Sublinear Time Hypergraph Sparsification via Cut and Edge Sampling Queries. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 53:1-53:21, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2021.53, author = {Chen, Yu and Khanna, Sanjeev and Nagda, Ansh}, title = {{Sublinear Time Hypergraph Sparsification via Cut and Edge Sampling Queries}}, booktitle = {48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)}, pages = {53:1--53:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-195-5}, ISSN = {1868-8969}, year = {2021}, volume = {198}, editor = {Bansal, Nikhil and Merelli, Emanuela and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.53}, URN = {urn:nbn:de:0030-drops-141227}, doi = {10.4230/LIPIcs.ICALP.2021.53}, annote = {Keywords: hypergraphs, graph sparsification, cut queries} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

We consider the problem of designing sublinear time algorithms for estimating the cost of minimum metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any ε > 0, there exists an Õ(n/ε^O(1)) time algorithm that returns a (1 + ε)-approximate estimate of the MST cost. This result immediately implies an Õ(n/ε^O(1)) time algorithm to estimate the TSP cost to within a (2 + ε) factor for any ε > 0. However, no o(n²) time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + ε)-approximate estimation algorithms for metric TSP with Õ(n) time for any fixed ε > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost.
On the algorithmic side, we first consider the graphic TSP problem where the metric D corresponds to shortest path distances in a connected unweighted undirected graph. We show that there exists an Õ(n) time algorithm that estimates the cost of graphic TSP to within a factor of (2-ε₀) for some ε₀ > 0. This is the first sublinear cost estimation algorithm for graphic TSP that achieves an approximation factor less than 2. We also consider another well-studied special case of metric TSP, namely, (1,2)-TSP where all distances are either 1 or 2, and give an Õ(n^1.5) time algorithm to estimate optimal cost to within a factor of 1.625. Our estimation algorithms for graphic TSP as well as for (1,2)-TSP naturally lend themselves to Õ(n) space streaming algorithms that give an 11/6-approximation for graphic TSP and a 1.625-approximation for (1,2)-TSP. These results motivate the natural question if analogously to metric MST, for any ε > 0, (1 + ε)-approximate estimates can be obtained for graphic TSP and (1,2)-TSP using Õ(n) queries. We answer this question in the negative - there exists an ε₀ > 0, such that any algorithm that estimates the cost of graphic TSP ((1,2)-TSP) to within a (1 + ε₀)-factor, necessarily requires Ω(n²) queries. This lower bound result highlights a sharp separation between the metric MST and metric TSP problems.
Similarly to many classical approximation algorithms for TSP, our sublinear time estimation algorithms utilize subroutines for estimating the size of a maximum matching in the underlying graph. We show that this is not merely an artifact of our approach, and that for any ε > 0, any algorithm that estimates the cost of graphic TSP or (1,2)-TSP to within a (1 + ε)-factor, can also be used to estimate the size of a maximum matching in a bipartite graph to within an ε n additive error. This connection allows us to translate known lower bounds for matching size estimation in various models to similar lower bounds for metric TSP cost estimation.

Yu Chen, Sampath Kannan, and Sanjeev Khanna. Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 30:1-30:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2020.30, author = {Chen, Yu and Kannan, Sampath and Khanna, Sanjeev}, title = {{Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation}}, booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)}, pages = {30:1--30:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-138-2}, ISSN = {1868-8969}, year = {2020}, volume = {168}, editor = {Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.30}, URN = {urn:nbn:de:0030-drops-124372}, doi = {10.4230/LIPIcs.ICALP.2020.30}, annote = {Keywords: sublinear algorithms, TSP, streaming algorithms, query complexity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

We give the first polynomial-time approximation scheme (PTAS) for the stochastic load balancing problem when the job sizes follow Poisson distributions. This improves upon the 2-approximation algorithm due to Goel and Indyk (FOCS'99). Moreover, our approximation scheme is an efficient PTAS that has a running time double exponential in 1/ε but nearly-linear in n, where n is the number of jobs and ε is the target error. Previously, a PTAS (not efficient) was only known for jobs that obey exponential distributions (Goel and Indyk, FOCS'99).
Our algorithm relies on several probabilistic ingredients including some (seemingly) new results on scaling and the so-called "focusing effect" of maximum of Poisson random variables which might be of independent interest.

Anindya De, Sanjeev Khanna, Huan Li, and Hesam Nikpey. An Efficient PTAS for Stochastic Load Balancing with Poisson Jobs. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 37:1-37:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{de_et_al:LIPIcs.ICALP.2020.37, author = {De, Anindya and Khanna, Sanjeev and Li, Huan and Nikpey, Hesam}, title = {{An Efficient PTAS for Stochastic Load Balancing with Poisson Jobs}}, booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)}, pages = {37:1--37:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-138-2}, ISSN = {1868-8969}, year = {2020}, volume = {168}, editor = {Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.37}, URN = {urn:nbn:de:0030-drops-124449}, doi = {10.4230/LIPIcs.ICALP.2020.37}, annote = {Keywords: Efficient PTAS, Makespan Minimization, Scheduling, Stochastic Load Balancing, Poisson Distribution} }

Document

**Published in:** LIPIcs, Volume 124, 10th Innovations in Theoretical Computer Science Conference (ITCS 2019)

In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(m^{rho(H)}), where rho(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H.
We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 +/- epsilon)-approximation with high probability in O(m^{rho(H)}/#H) * poly(log(n),1/epsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries.

Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 124, pp. 6:1-6:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ITCS.2019.6, author = {Assadi, Sepehr and Kapralov, Michael and Khanna, Sanjeev}, title = {{A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling}}, booktitle = {10th Innovations in Theoretical Computer Science Conference (ITCS 2019)}, pages = {6:1--6:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-095-8}, ISSN = {1868-8969}, year = {2019}, volume = {124}, editor = {Blum, Avrim}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2019.6}, URN = {urn:nbn:de:0030-drops-100996}, doi = {10.4230/LIPIcs.ITCS.2019.6}, annote = {Keywords: Sublinear-time algorithms, Subgraph counting, AGM bound} }

Document

**Published in:** OASIcs, Volume 61, 1st Symposium on Simplicity in Algorithms (SOSA 2018)

Given a non-negative real matrix A, the matrix scaling problem is to determine if it is possible to scale the rows and columns so that each row and each column sums to a specified target value for it.
The matrix scaling problem arises in many algorithmic applications, perhaps most notably as a preconditioning step in solving linear system of equations. One of the most natural and by now classical approach to matrix scaling is the Sinkhorn-Knopp algorithm (also known as the RAS method) where one alternately scales either all rows or all columns to meet the target values. In addition to being extremely simple and natural, another appeal of this procedure is that it easily lends itself to parallelization. A central question is to understand the rate of convergence of the Sinkhorn-Knopp algorithm.
Specifically, given a suitable error metric to measure deviations from target values, and an error bound epsilon, how quickly does the Sinkhorn-Knopp algorithm converge to an error below epsilon? While there are several non-trivial convergence results known about the Sinkhorn-Knopp algorithm, perhaps somewhat surprisingly, even for natural error metrics such as ell_1-error or ell_2-error, this is not entirely understood.
In this paper, we present an elementary convergence analysis for the Sinkhorn-Knopp algorithm that improves upon the previous best bound. In a nutshell, our approach is to show (i) a simple bound on the number of iterations needed so that the KL-divergence between the current row-sums and the target row-sums drops below a specified threshold delta, and (ii) then show that for a suitable choice of delta, whenever KL-divergence is below delta, then the ell_1-error or the ell_2-error is below epsilon. The well-known Pinsker's inequality immediately allows us to translate a bound on the KL divergence to a bound on ell_1-error. To bound the ell_2-error in terms of the KL-divergence, we establish a new inequality, referred to as (KL vs ell_1/ell_2) inequality in the paper. This new inequality is a strengthening of the Pinsker's inequality that we believe is of independent interest. Our analysis of ell_2-error significantly improves upon the best previous convergence bound for ell_2-error.
The idea of studying Sinkhorn-Knopp convergence via KL-divergence is not new and has indeed been previously explored. Our contribution is an elementary, self-contained presentation of this approach and an interesting new inequality that yields a significantly stronger convergence guarantee for the extensively studied ell_2-error.

Deeparnab Chakrabarty and Sanjeev Khanna. Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 4:1-4:11, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chakrabarty_et_al:OASIcs.SOSA.2018.4, author = {Chakrabarty, Deeparnab and Khanna, Sanjeev}, title = {{Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling}}, booktitle = {1st Symposium on Simplicity in Algorithms (SOSA 2018)}, pages = {4:1--4:11}, series = {Open Access Series in Informatics (OASIcs)}, ISBN = {978-3-95977-064-4}, ISSN = {2190-6807}, year = {2018}, volume = {61}, editor = {Seidel, Raimund}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2018.4}, URN = {urn:nbn:de:0030-drops-83045}, doi = {10.4230/OASIcs.SOSA.2018.4}, annote = {Keywords: Matrix Scaling, Entropy Minimization, KL Divergence Inequalities} }

Document

**Published in:** LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)

Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates k hypotheticals, each retaining some of the tuples of a database instance, possibly overlapping, and she wishes to answer the query under scenarios, where a scenario is defined by a subset of the hypotheticals that are "turned on". We say that a query admits compact provisioning if given any database instance and any k hypotheticals, one can create a poly-size (in k) sketch that can then be used to answer the query under any of the 2^k possible scenarios without accessing the original instance.
In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and statistics/analytics (the numerical component). We first show that queries that compute quantiles or linear regression (as well as simpler queries that compute count and sum/average of positive values) can be compactly provisioned to provide (multiplicative) approximate answers to an arbitrary precision. In contrast, exact provisioning for each of these statistics requires the sketch size to be exponential in k. We then establish that for any complex query whose logical component is a positive relational algebra query, as long as the numerical component can be compactly provisioned, the complex query itself can be compactly provisioned. On the other hand, introducing negation or recursion in the logical component again requires the sketch size to be exponential in k. While our positive results use algorithms that do not access the original instance after a scenario is known, we prove our lower bounds even for the case when, knowing the scenario, limited access to the instance is allowed.

Sepehr Assadi, Sanjeev Khanna, Yang Li, and Val Tannen. Algorithms for Provisioning Queries and Analytics. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 18:1-18:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ICDT.2016.18, author = {Assadi, Sepehr and Khanna, Sanjeev and Li, Yang and Tannen, Val}, title = {{Algorithms for Provisioning Queries and Analytics}}, booktitle = {19th International Conference on Database Theory (ICDT 2016)}, pages = {18:1--18:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-002-6}, ISSN = {1868-8969}, year = {2016}, volume = {48}, editor = {Martens, Wim and Zeume, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.18}, URN = {urn:nbn:de:0030-drops-57877}, doi = {10.4230/LIPIcs.ICDT.2016.18}, annote = {Keywords: What-if Analysis, Provisioning, Data Compression, Approximate Query Answering} }

Document

**Published in:** LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

In this paper, we introduce a new model for sublinear algorithms called dynamic sketching. In this model, the underlying data is partitioned into a large static part and a small dynamic part and the goal is to compute a summary of the static part (i.e, a sketch) such that given any update for the dynamic part, one can combine it with the sketch to compute a given function. We say that a sketch is compact if its size is bounded by a polynomial function of the length of the dynamic data, (essentially) independent of the size of the static part.
A graph optimization problem P in this model is defined as follows. The input is a graph G(V,E) and a set T \subseteq V of k terminals; the edges between the terminals are the dynamic part and the other edges in G are the static part. The goal is to summarize the graph G into a compact sketch (of size poly(k)) such that given any set Q of edges between the terminals, one can answer the problem P for the graph obtained by inserting all edges in Q to G, using only the sketch.
We study the fundamental problem of computing a maximum matching and prove tight bounds on the sketch size. In particular, we show that there exists a (compact) dynamic sketch of size O(k^2) for the matching problem and any such sketch has to be of size \Omega(k^2). Our sketch for matchings can be further used to derive compact dynamic sketches for other fundamental graph problems involving cuts and connectivities. Interestingly, our sketch for matchings can also be used to give an elementary construction of a cut-preserving vertex sparsifier with space O(kC^2) for k-terminal graphs, which matches the best known upper bound; here C is the total capacity of the edges incident on the terminals. Additionally, we give an improved lower bound (in terms of C) of Omega(C/log{C}) on size of cut-preserving vertex sparsifiers, and establish that progress on dynamic sketching of the s-t max-flow problem (either upper bound or lower bound) immediately leads to better bounds for size of cut-preserving vertex sparsifiers.

Sepehr Assadi, Sanjeev Khanna, Yang Li, and Val Tannen. Dynamic Sketching for Graph Optimization Problems with Applications to Cut-Preserving Sketches. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, pp. 52-68, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.FSTTCS.2015.52, author = {Assadi, Sepehr and Khanna, Sanjeev and Li, Yang and Tannen, Val}, title = {{Dynamic Sketching for Graph Optimization Problems with Applications to Cut-Preserving Sketches}}, booktitle = {35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)}, pages = {52--68}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-97-2}, ISSN = {1868-8969}, year = {2015}, volume = {45}, editor = {Harsha, Prahladh and Ramalingam, G.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.52}, URN = {urn:nbn:de:0030-drops-56361}, doi = {10.4230/LIPIcs.FSTTCS.2015.52}, annote = {Keywords: Small-space Algorithms, Maximum Matchings, Vertex Sparsifiers} }

Document

**Published in:** LIPIcs, Volume 2, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (2008)

We study the problem of space-efficient
polynomial-time algorithms for {\em directed
st-connectivity} (STCON).
Given a directed graph $G$, and a pair of vertices $s, t$, the STCON problem
is to decide if there exists a path from $s$ to $t$ in $G$.
For general graphs, the best polynomial-time algorithm for STCON
uses space that is only slightly sublinear.
However, for special classes of directed graphs, polynomial-time poly-logarithmic-space
algorithms are known for STCON. In this paper, we continue this thread of research
and study a class of graphs called
\emph{unique-path graphs with respect to source $s$},
where there is at most one simple path from $s$ to any vertex in the graph.
For these graphs, we give
a polynomial-time algorithm that uses
$\tilde O(n^{\varepsilon})$ space for any constant $\varepsilon \in (0,1]$.
We also give a polynomial-time, $\tilde O(n^\varepsilon)$-space
algorithm to \emph{recognize} unique-path graphs.
Unique-path graphs are related to configuration graphs of unambiguous
log-space computations, but they can have some directed cycles. Our results
may be viewed along the continuum of sublinear-space polynomial-time
algorithms for STCON in different classes of directed graphs - from
slightly sublinear-space algorithms for general graphs to $O(\log n)$ space algorithms for trees.

Sampath Kannan, Sanjeev Khanna, and Sudeepa Roy. STCON in Directed Unique-Path Graphs. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 2, pp. 256-267, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{kannan_et_al:LIPIcs.FSTTCS.2008.1758, author = {Kannan, Sampath and Khanna, Sanjeev and Roy, Sudeepa}, title = {{STCON in Directed Unique-Path Graphs}}, booktitle = {IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science}, pages = {256--267}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-08-8}, ISSN = {1868-8969}, year = {2008}, volume = {2}, editor = {Hariharan, Ramesh and Mukund, Madhavan and Vinay, V}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2008.1758}, URN = {urn:nbn:de:0030-drops-17589}, doi = {10.4230/LIPIcs.FSTTCS.2008.1758}, annote = {Keywords: Algorithm, complexity, st-connectivity} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail