11 Search Results for "Saha, Barna"


Document
Invited Talk
Role of Structured Matrices in Fine-Grained Algorithm Design (Invited Talk)

Authors: Barna Saha

Published in: LIPIcs, Volume 322, 35th International Symposium on Algorithms and Computation (ISAAC 2024)


Abstract
Fine-grained complexity attempts to precisely determine the time complexity of a problem and has emerged as a guide for algorithm design in recent times. Some of the central problems in fine-grain complexity deals with computation of distances. For example, computing all pairs shortest paths in a weighted graph, computing edit distance between two sequences or two trees, and computing distance of a sequence from a context free language. Many of these problems reduce to computation of matrix products over various algebraic structures, predominantly over the (min,+) semiring. Obtaining a truly subcubic algorithm for (min,+) product is one of the outstanding open questions in computer science. Interestingly many of the aforementioned distance computation problems have some additional structural properties. Specifically, when we perturb the inputs slightly, we do not expect a huge change in the output. This simple yet powerful observation has led to better algorithms for many problems for which we were able to improve the running time after several decades. This includes problems such as the Language Edit Distance, RNA folding, and Dyck Edit Distance. Indeed, this structure in the problem leads to matrices that have the Lipschitz property, and we gave the first truly subcubic time algorithm for computing (min,+) product over such Lipschitz matrices. Follow-up work by several researchers obtained improved bounds for monotone matrices, and for (min,+) convolution under similar structures leading to improved bounds for a series of optimization problems. These result in not just faster algorithms for exact computation but also for approximation algorithms. In particular, we show how fast (min,+) product computation over monotone matrices can lead to better additive approximation algorithms for computing all pairs shortest paths on unweighted undirected graphs, leading to improvements after twenty four years.

Cite as

Barna Saha. Role of Structured Matrices in Fine-Grained Algorithm Design (Invited Talk). In 35th International Symposium on Algorithms and Computation (ISAAC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 322, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{saha:LIPIcs.ISAAC.2024.3,
  author =	{Saha, Barna},
  title =	{{Role of Structured Matrices in Fine-Grained Algorithm Design}},
  booktitle =	{35th International Symposium on Algorithms and Computation (ISAAC 2024)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-354-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{322},
  editor =	{Mestre, Juli\'{a}n and Wirth, Anthony},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2024.3},
  URN =		{urn:nbn:de:0030-drops-221303},
  doi =		{10.4230/LIPIcs.ISAAC.2024.3},
  annote =	{Keywords: Fine-Grained Complexity, Fast Algorithms}
}
Document
On the Complexity of Algorithms with Predictions for Dynamic Graph Problems

Authors: Monika Henzinger, Barna Saha, Martin P. Seybold, and Christopher Ye

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)


Abstract
Algorithms with predictions is a new research direction that leverages machine learned predictions for algorithm design. So far a plethora of recent works have incorporated predictions to improve on worst-case bounds for online problems. In this paper, we initiate the study of complexity of dynamic data structures with predictions, including dynamic graph algorithms. Unlike online algorithms, the goal in dynamic data structures is to maintain the solution efficiently with every update. We investigate three natural models of prediction: (1) δ-accurate predictions where each predicted request matches the true request with probability δ, (2) list-accurate predictions where a true request comes from a list of possible requests, and (3) bounded delay predictions where the true requests are a permutation of the predicted requests. We give general reductions among the prediction models, showing that bounded delay is the strongest prediction model, followed by list-accurate, and δ-accurate. Further, we identify two broad problem classes based on lower bounds due to the Online Matrix Vector (OMv) conjecture. Specifically, we show that locally correctable dynamic problems have strong conditional lower bounds for list-accurate predictions that are equivalent to the non-prediction setting, unless list-accurate predictions are perfect. Moreover, we show that locally reducible dynamic problems have time complexity that degrades gracefully with the quality of bounded delay predictions. We categorize problems with known OMv lower bounds accordingly and give several upper bounds in the delay model that show that our lower bounds are almost tight. We note that concurrent work by v.d.Brand et al. [SODA '24] and Liu and Srinivas [arXiv:2307.08890] independently study dynamic graph algorithms with predictions, but their work is mostly focused on showing upper bounds.

Cite as

Monika Henzinger, Barna Saha, Martin P. Seybold, and Christopher Ye. On the Complexity of Algorithms with Predictions for Dynamic Graph Problems. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 62:1-62:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{henzinger_et_al:LIPIcs.ITCS.2024.62,
  author =	{Henzinger, Monika and Saha, Barna and Seybold, Martin P. and Ye, Christopher},
  title =	{{On the Complexity of Algorithms with Predictions for Dynamic Graph Problems}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{62:1--62:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.62},
  URN =		{urn:nbn:de:0030-drops-195907},
  doi =		{10.4230/LIPIcs.ITCS.2024.62},
  annote =	{Keywords: Dynamic Graph Algorithms, Algorithms with Predictions}
}
Document
An Algorithmic Bridge Between Hamming and Levenshtein Distances

Authors: Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Barna Saha

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)


Abstract
The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a for a parameter a ≥ 1. This basic variant, denoted ED_a, bridges classical edit distance (a = 1) with Hamming distance (a → ∞), leading to interesting algorithmic challenges: Does the time complexity of computing ED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating ED_a? We first present a simple deterministic exact algorithm for ED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ε)-approximation of ED_a(X,Y), given strings X,Y of total length n and a bound k ≥ ED_a(X,Y). For simplicity, let us focus on k ≥ 1 and a constant ε > 0; then, our algorithm takes Õ(n/a + ak³) time. Unless a = Õ(1), in which case ED_a resembles the standard edit distance, and for the most interesting regime of small enough k, this running time is sublinear in n. We also consider a very natural version that asks to find a (k_I, k_S)-alignment, i.e., an alignment with at most k_I indels and k_S substitutions. In this setting, we give an exact algorithm and, more importantly, an Õ((nk_I)/k_S + k_S k_I³)-time (1,1+ε)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for ED_a for a = Θ(k_S/k_I), and its running time is again sublinear in n whenever k_I ≪ k_S and the overall distance is small enough. These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ε)-approximation in sublinear time, even for a favorable choice of k.

Cite as

Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Barna Saha. An Algorithmic Bridge Between Hamming and Levenshtein Distances. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 58:1-58:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{goldenberg_et_al:LIPIcs.ITCS.2023.58,
  author =	{Goldenberg, Elazar and Kociumaka, Tomasz and Krauthgamer, Robert and Saha, Barna},
  title =	{{An Algorithmic Bridge Between Hamming and Levenshtein Distances}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{58:1--58:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.58},
  URN =		{urn:nbn:de:0030-drops-175615},
  doi =		{10.4230/LIPIcs.ITCS.2023.58},
  annote =	{Keywords: edit distance, Hamming distance, Longest Common Extension queries}
}
Document
APPROX
Approximating LCS and Alignment Distance over Multiple Sequences

Authors: Debarati Das and Barna Saha

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)


Abstract
We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS) problem), or minimizes the number of unaligned symbols (the alignment distance aka the complement of LCS). Multiple sequence alignment is a well-studied problem in bioinformatics and is used routinely to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or alignment distance of m sequences each of length n requires Θ(n^m) time unless the Strong Exponential Time Hypothesis is false. However, unlike the case of two strings, fast algorithms to approximate LCS and alignment distance of multiple sequences are lacking in the literature. A major challenge in this area is to break the triangle inequality. Specifically, by splitting m sequences into two (roughly) equal sized groups, then computing the alignment distance in each group and finally combining them by using triangle inequality, it is possible to achieve a 2-approximation in Õ_m(n^⌈m/2⌉) time. But, an approximation factor below 2 which would need breaking the triangle inequality barrier is not known in O(n^{α m}) time for any α < 1. We make significant progress in this direction. First, we consider a semi-random model where, we show if just one out of m sequences is (p,B)-pseudorandom then, we can get a below-two approximation in Õ_m(nB^{m-1}+n^{⌊m/2⌋+3}) time. Such semi-random models are very well-studied for two strings scenario, however directly extending those works require one but all sequences to be pseudorandom, and would only give an O(1/p) approximation. We overcome these with significant new ideas. Specifically an ingredient to this proof is a new algorithm that achives below 2 approximations when alignment distance is large in Õ_m(n^{⌊m/2⌋+2}) time. This could be of independent interest. Next, for LCS of m sequences each of length n, we show if the optimum LCS is λ n for some λ ∈ [0,1], then in Õ_m(n^{⌊m/2⌋+1}) time, we can return a common subsequence of length at least λ²n/(2+ε) for any arbitrary constant ε > 0. In contrast, for two strings, the best known subquadratic algorithm may return a common subsequence of length Θ(λ⁴ n).

Cite as

Debarati Das and Barna Saha. Approximating LCS and Alignment Distance over Multiple Sequences. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 54:1-54:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.APPROX/RANDOM.2022.54,
  author =	{Das, Debarati and Saha, Barna},
  title =	{{Approximating LCS and Alignment Distance over Multiple Sequences}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{54:1--54:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.54},
  URN =		{urn:nbn:de:0030-drops-171762},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.54},
  annote =	{Keywords: String Algorithms, Approximation Algorithms}
}
Document
Track A: Algorithms, Complexity and Games
Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding

Authors: Debarati Das, Tomasz Kociumaka, and Barna Saha

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)


Abstract
The Dyck language, which consists of well-balanced sequences of parentheses, is one of the most fundamental context-free languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given length-n parenthesis sequence well-balanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and )( are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n^2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1+ε)-multiplicative approximation is known with a running time of Ω(n^2.372). The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubic-time combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)-factor approximation algorithm that runs in near-linear time (Saha, FOCS'14), whereas for RNA Folding only an ε n-additive approximation in Õ(n²/ε) time (Saha, FOCS'17) is known. In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constant-factor approximation algorithm that runs in Õ(n^1.971) time (the first constant-factor approximation in subquadratic time). Moreover, we develop a (1+ε)-factor approximation algorithm running in Õ(n²/ε) time, which improves upon the earlier additive approximation. Finally, we design a (3+ε)-approximation that takes Õ(nd/ε) time, where d ≥ 1 is an upper bound on the sought distance. As for RNA folding, for any s ≥ 1, we design a factor-s approximation algorithm that runs in O(n+(n/s)³) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n² barrier. All our algorithms are combinatorial in nature.

Cite as

Debarati Das, Tomasz Kociumaka, and Barna Saha. Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 49:1-49:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.ICALP.2022.49,
  author =	{Das, Debarati and Kociumaka, Tomasz and Saha, Barna},
  title =	{{Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{49:1--49:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.49},
  URN =		{urn:nbn:de:0030-drops-163902},
  doi =		{10.4230/LIPIcs.ICALP.2022.49},
  annote =	{Keywords: Dyck Edit Distance, RNA Folding, String Algorithms}
}
Document
Invited Talk
Sublinear Algorithms for Edit Distance (Invited Talk)

Authors: Barna Saha

Published in: LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)


Abstract
The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic programming computes the edit distance between two strings of length n in O(n²) time, and a more sophisticated algorithm runs in time O(n+t²) where t is the distance (Landau, Myers and Schmidt, SICOMP 1998). In pursuit of obtaining faster running time, the last couple of decades have seen a flurry of research on approximating edit distance, including polylogarithmic approximation in near-linear time (Andoni, Krauthgamer and Onak, FOCS 2010), and a constant-factor approximation in subquadratic time (Chakrabarty, Das, Goldenberg, Koucký and Saks, FOCS 2018). In this talk, we will discuss recent progress that goes beyond linear time, and studies sublinear time algorithms for edit distance. We will also discuss the role preprocessing might play in designing fast algorithms. This is a joint work with Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Aviad Rubinstein.

Cite as

Barna Saha. Sublinear Algorithms for Edit Distance (Invited Talk). In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, p. 5:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{saha:LIPIcs.MFCS.2021.5,
  author =	{Saha, Barna},
  title =	{{Sublinear Algorithms for Edit Distance}},
  booktitle =	{46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)},
  pages =	{5:1--5:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-201-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{202},
  editor =	{Bonchi, Filippo and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.5},
  URN =		{urn:nbn:de:0030-drops-144452},
  doi =		{10.4230/LIPIcs.MFCS.2021.5},
  annote =	{Keywords: Edit distance, sublinear algorithms, string processing}
}
Document
RANDOM
Connectivity of Random Annulus Graphs and the Geometric Block Model

Authors: Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, and Barna Saha

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)


Abstract
Random geometric graph (Gilbert, 1961) is a basic model of random graphs for spatial networks proposed shortly after the introduction of the Erdős-Rényi random graphs. The geometric block model (GBM) is a probabilistic model for community detection defined over random geometric graphs (RGG) similar in spirit to the popular stochastic block model which is defined over Erdős-Rényi random graphs. The GBM naturally inherits many desirable properties of RGGs such as transitivity ("friends having common friends') and has been shown to model many real-world networks better than the stochastic block model. Analyzing the properties of a GBM requires new tools and perspectives to handle correlation in edge formation. In this paper, we study the necessary and sufficient conditions for community recovery over GBM in the connectivity regime. We provide efficient algorithms that recover the communities exactly with high probability and match the lower bound within a small constant factor. This requires us to prove new connectivity results for vertex-random graphs or random annulus graphs which are natural generalizations of random geometric graphs. A vertex-random graph is a model of random graphs where the randomness lies in the vertices as opposed to an Erdős-Rényi random graph where the randomness lies in the edges. A vertex-random graph G(n, [r_1, r_2]), 0 <=r_1 <r_2 <=1 with n vertices is defined by assigning a real number in [0,1] randomly and uniformly to each vertices and adding an edge between two vertices if the "distance" between the corresponding two random numbers is between r_1 and r_2. For the special case of r_1=0, this corresponds to random geometric graph in one dimension. We can extend this model naturally to higher dimensions; these higher dimensional counterparts are referred to as random annulus graphs. Random annulus graphs appear naturally whenever the well-known Goldilocks principle ("not too close, not too far') holds in a network. In this paper, we study the connectivity properties of such graphs, providing both necessary and sufficient conditions. We show a surprising long edge phenomena for vertex-random graphs: the minimum gap for connectivity between r_1 and r_2 is significantly less when r_1 >0 vs when r_1=0 (RGG). We then extend the connectivity results to high dimensions. These results play a crucial role in analyzing the GBM.

Cite as

Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, and Barna Saha. Connectivity of Random Annulus Graphs and the Geometric Block Model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 53:1-53:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{galhotra_et_al:LIPIcs.APPROX-RANDOM.2019.53,
  author =	{Galhotra, Sainyam and Mazumdar, Arya and Pal, Soumyabrata and Saha, Barna},
  title =	{{Connectivity of Random Annulus Graphs and the Geometric Block Model}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{53:1--53:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.53},
  URN =		{urn:nbn:de:0030-drops-112682},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.53},
  annote =	{Keywords: random graphs, geometric graphs, community detection, block model}
}
Document
Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Authors: Barna Saha and Sanjay Subramanian

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices u and v in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under the same-cluster querying framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. In a recent work, Ailon et al. (2018b) provided an approximation algorithm for correlation clustering that approximates the correlation clustering objective within (1+epsilon) with O((k^{14} log{n} log{k})/epsilon^6) queries when the number of clusters, k, is fixed. For many applications, k is not fixed and can grow with |V|. Moreover, the dependency of k^14 on query complexity renders the algorithm impractical even for datasets with small values of k. In this paper, we take a different approach. Let C_{OPT} be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_{OPT} rather than by the number of clusters. Indeed, a good clustering must have small C_{OPT}. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_{OPT} queries and an efficient algorithm that outputs a 2-approximation using at most C_{OPT} queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+alpha for an absolute constant alpha > 0 with o(C_{OPT}) queries. Therefore, our first algorithm achieves the optimal query bound within a factor of 2. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against known correlation clustering algorithms that do not perform querying. In all cases, our algorithms exhibit superior performance.

Cite as

Barna Saha and Sanjay Subramanian. Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 81:1-81:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{saha_et_al:LIPIcs.ESA.2019.81,
  author =	{Saha, Barna and Subramanian, Sanjay},
  title =	{{Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{81:1--81:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.81},
  URN =		{urn:nbn:de:0030-drops-112020},
  doi =		{10.4230/LIPIcs.ESA.2019.81},
  annote =	{Keywords: Clustering, Approximation Algorithm, Crowdsourcing, Randomized Algorithm}
}
Document
Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Authors: Rajesh Jayaram and Barna Saha

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)


Abstract
In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!

Cite as

Rajesh Jayaram and Barna Saha. Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 19:1-19:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{jayaram_et_al:LIPIcs.ICALP.2017.19,
  author =	{Jayaram, Rajesh and Saha, Barna},
  title =	{{Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{19:1--19:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.19},
  URN =		{urn:nbn:de:0030-drops-74548},
  doi =		{10.4230/LIPIcs.ICALP.2017.19},
  annote =	{Keywords: Approximation, Edit Distance, Dynamic Programming, Context Free Grammar, Hardness}
}
Document
Renting a Cloud

Authors: Barna Saha

Published in: LIPIcs, Volume 24, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2013)


Abstract
We consider the problem of efficiently scheduling jobs on data centers to minimize the cost of renting machines from "the cloud." In the most basic cloud service model, cloud providers offer computers on demand from large pools installed in data centers. Clients pay for use at an hourly rate. In order to minimize cost, each client needs to decide on the number of machines to be rented and the duration of renting each machine. This suggests the following optimization problem, which we call Rent Minimization. There is a set J={j_1,j_2,...,j_n} of n jobs. Job j_i is released at time r_i >= 0, has a deadline of d_i, and requires p_i>0 contiguous processing time, r_i,d_i,p_i in R. The jobs need to be scheduled on identical parallel machines. Machines may be rented for any length of time; however, the cost of renting a machine for l>=0 time units is [l/D] (the smallest integer >= l/D) dollars, for some given large real D; in particular, one pays dollar 2 whether the machine is rented for D+1 or 2D time units. The goal is to schedule all the jobs in a way that minimizes the incurred rental cost. In this paper, we develop offline and online algorithms for Rent Minimization problem. The algorithms achieve a constant factor approximation for the offline version and O(log(p_max/p_min)) for the online version, where p_max and p_min are the maximum and minimum processing time of the jobs respectively. We also show that no deterministic online algorithm can achieve an approximation factor better than log_{3}(p_max/p_min) within a constant factor. Both of these algorithms use the well-studied problem of Machine Minimization as a subroutine. Machine Minimization is a special case of Rent Minimization where D = max_{i}{d_i}. In the process of solving the Rent Minimization problem, in this paper, we also develop the first online algorithm for Machine Minimization.

Cite as

Barna Saha. Renting a Cloud. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2013). Leibniz International Proceedings in Informatics (LIPIcs), Volume 24, pp. 437-448, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{saha:LIPIcs.FSTTCS.2013.437,
  author =	{Saha, Barna},
  title =	{{Renting  a Cloud}},
  booktitle =	{IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2013)},
  pages =	{437--448},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-64-4},
  ISSN =	{1868-8969},
  year =	{2013},
  volume =	{24},
  editor =	{Seth, Anil and Vishnoi, Nisheeth K.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2013.437},
  URN =		{urn:nbn:de:0030-drops-43918},
  doi =		{10.4230/LIPIcs.FSTTCS.2013.437},
  annote =	{Keywords: Scheduling Algorithm, Online Algorithm, Approximation Algorithm}
}
Document
Energy Efficient Scheduling via Partial Shutdown

Authors: Samir Khuller, Jian Li, and Barna Saha

Published in: Dagstuhl Seminar Proceedings, Volume 10071, Scheduling (2010)


Abstract
We define a collection of new problems referred to as ``machine activation'' problems. The central framework we introduce considers a collection of M machines (unrelated or related), with machine $i$ having an activation cost of $a_i$. There is also a collection of N jobs that need to be performed, and $p_{ij}$ is the processing time of job $j$ on machine $i$. Standard scheduling models assume that the set of machines is fixed and all machines are available. We assume that there is an activation cost budget of $A$ -- we would like to select a subset S of the machines to activate with total cost $a(S)le A$ and find a schedule for the jobs on the machines in $S$ minimizing the makespan. In this work we develop bi-criteria approximation algorithms for this problem based on both LP rounding and a greedy approach.

Cite as

Samir Khuller, Jian Li, and Barna Saha. Energy Efficient Scheduling via Partial Shutdown. In Scheduling. Dagstuhl Seminar Proceedings, Volume 10071, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)


Copy BibTex To Clipboard

@InProceedings{khuller_et_al:DagSemProc.10071.5,
  author =	{Khuller, Samir and Li, Jian and Saha, Barna},
  title =	{{Energy Efficient Scheduling via Partial Shutdown}},
  booktitle =	{Scheduling},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10071},
  editor =	{Susanne Albers and Sanjoy K. Baruah and Rolf H. M\"{o}hring and Kirk Pruhs},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.10071.5},
  URN =		{urn:nbn:de:0030-drops-25435},
  doi =		{10.4230/DagSemProc.10071.5},
  annote =	{Keywords: Unrelated parallel machine scheduling, approximation algorithms}
}
  • Refine by Author
  • 11 Saha, Barna
  • 2 Das, Debarati
  • 2 Kociumaka, Tomasz
  • 1 Galhotra, Sainyam
  • 1 Goldenberg, Elazar
  • Show More...

  • Refine by Classification

  • Refine by Keyword
  • 2 Approximation Algorithm
  • 2 String Algorithms
  • 1 Algorithms with Predictions
  • 1 Approximation
  • 1 Approximation Algorithms
  • Show More...

  • Refine by Type
  • 11 document

  • Refine by Publication Year
  • 2 2019
  • 2 2022
  • 2 2024
  • 1 2010
  • 1 2013
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail