DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2023.31

Clustering Permutations: New Techniques with Streaming Applications

Authors: Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)

Abstract

We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2-δ)-approximation algorithm for k = 1, where δ≈ 2^{-40}. Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log (nd))^{O(k)} nd³ for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

Cite as

Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Clustering Permutations: New Techniques with Streaming Applications. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 31:1-31:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.ITCS.2023.31,
  author =	{Chakraborty, Diptarka and Das, Debarati and Krauthgamer, Robert},
  title =	{{Clustering Permutations: New Techniques with Streaming Applications}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{31:1--31:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.31},
  URN =		{urn:nbn:de:0030-drops-175340},
  doi =		{10.4230/LIPIcs.ITCS.2023.31},
  annote =	{Keywords: Clustering, Approximation Algorithms, Ulam Distance, Rank Aggregation, Streaming}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.54

Approximating LCS and Alignment Distance over Multiple Sequences

Authors: Debarati Das and Barna Saha

Published in: LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

Abstract

We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS) problem), or minimizes the number of unaligned symbols (the alignment distance aka the complement of LCS). Multiple sequence alignment is a well-studied problem in bioinformatics and is used routinely to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or alignment distance of m sequences each of length n requires Θ(n^m) time unless the Strong Exponential Time Hypothesis is false. However, unlike the case of two strings, fast algorithms to approximate LCS and alignment distance of multiple sequences are lacking in the literature. A major challenge in this area is to break the triangle inequality. Specifically, by splitting m sequences into two (roughly) equal sized groups, then computing the alignment distance in each group and finally combining them by using triangle inequality, it is possible to achieve a 2-approximation in Õ_m(n^⌈m/2⌉) time. But, an approximation factor below 2 which would need breaking the triangle inequality barrier is not known in O(n^{α m}) time for any α < 1. We make significant progress in this direction. First, we consider a semi-random model where, we show if just one out of m sequences is (p,B)-pseudorandom then, we can get a below-two approximation in Õ_m(nB^{m-1}+n^{⌊m/2⌋+3}) time. Such semi-random models are very well-studied for two strings scenario, however directly extending those works require one but all sequences to be pseudorandom, and would only give an O(1/p) approximation. We overcome these with significant new ideas. Specifically an ingredient to this proof is a new algorithm that achives below 2 approximations when alignment distance is large in Õ_m(n^{⌊m/2⌋+2}) time. This could be of independent interest. Next, for LCS of m sequences each of length n, we show if the optimum LCS is λ n for some λ ∈ [0,1], then in Õ_m(n^{⌊m/2⌋+1}) time, we can return a common subsequence of length at least λ²n/(2+ε) for any arbitrary constant ε > 0. In contrast, for two strings, the best known subquadratic algorithm may return a common subsequence of length Θ(λ⁴ n).

Cite as

Debarati Das and Barna Saha. Approximating LCS and Alignment Distance over Multiple Sequences. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 54:1-54:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.APPROX/RANDOM.2022.54,
  author =	{Das, Debarati and Saha, Barna},
  title =	{{Approximating LCS and Alignment Distance over Multiple Sequences}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{54:1--54:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.54},
  URN =		{urn:nbn:de:0030-drops-171762},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.54},
  annote =	{Keywords: String Algorithms, Approximation Algorithms}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.49

Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding

Authors: Debarati Das, Tomasz Kociumaka, and Barna Saha

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

The Dyck language, which consists of well-balanced sequences of parentheses, is one of the most fundamental context-free languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given length-n parenthesis sequence well-balanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and )( are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n^2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1+ε)-multiplicative approximation is known with a running time of Ω(n^2.372). The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubic-time combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)-factor approximation algorithm that runs in near-linear time (Saha, FOCS'14), whereas for RNA Folding only an ε n-additive approximation in Õ(n²/ε) time (Saha, FOCS'17) is known. In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constant-factor approximation algorithm that runs in Õ(n^1.971) time (the first constant-factor approximation in subquadratic time). Moreover, we develop a (1+ε)-factor approximation algorithm running in Õ(n²/ε) time, which improves upon the earlier additive approximation. Finally, we design a (3+ε)-approximation that takes Õ(nd/ε) time, where d ≥ 1 is an upper bound on the sought distance. As for RNA folding, for any s ≥ 1, we design a factor-s approximation algorithm that runs in O(n+(n/s)³) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n² barrier. All our algorithms are combinatorial in nature.

Cite as

Debarati Das, Tomasz Kociumaka, and Barna Saha. Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 49:1-49:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.ICALP.2022.49,
  author =	{Das, Debarati and Kociumaka, Tomasz and Saha, Barna},
  title =	{{Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{49:1--49:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.49},
  URN =		{urn:nbn:de:0030-drops-163902},
  doi =		{10.4230/LIPIcs.ICALP.2022.49},
  annote =	{Keywords: Dyck Edit Distance, RNA Folding, String Algorithms}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2021.11

Approximate Trace Reconstruction via Median String (In Average-Case)

Authors: Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer

Published in: LIPIcs, Volume 213, 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021)

Abstract

We consider an approximate version of the trace reconstruction problem, where the goal is to recover an unknown string s ∈ {0,1}ⁿ from m traces (each trace is generated independently by passing s through a probabilistic insertion-deletion channel with rate p). We present a deterministic near-linear time algorithm for the average-case model, where s is random, that uses only three traces. It runs in near-linear time Õ(n) and with high probability reports a string within edit distance Õ(p² n) from s, which significantly improves over the straightforward bound of O(pn). Technically, our algorithm computes a (1+ε)-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown s. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time O(n³).

Cite as

Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Approximate Trace Reconstruction via Median String (In Average-Case). In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 213, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.FSTTCS.2021.11,
  author =	{Chakraborty, Diptarka and Das, Debarati and Krauthgamer, Robert},
  title =	{{Approximate Trace Reconstruction via Median String (In Average-Case)}},
  booktitle =	{41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021)},
  pages =	{11:1--11:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-215-0},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{213},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Chekuri, Chandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2021.11},
  URN =		{urn:nbn:de:0030-drops-155228},
  doi =		{10.4230/LIPIcs.FSTTCS.2021.11},
  annote =	{Keywords: Trace Reconstruction, Approximation Algorithms, Edit Distance, String Median}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.39

A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

Authors: Karl Bringmann and Debarati Das

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an extensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Künnemann [FOCS'15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This major roadblock for getting faster exact algorithms has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting (where we make no assumption on the length of the optimum solution or the alphabet size), only a naive O(n^{ε/2})-approximation algorithm with running time Õ(n^{2-ε}) has been known, for any constant 0 < ε ≤ 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n^{0.497956})-approximation in expectation; improving upon the naive O(√n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n^{2-ε}) computes an Õ(n^{2ε/5})-approximation with high probability, for any 0 < ε ≤ 1. Our result (1) gives an Õ(n^{0.4})-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n^{2-ε}), improving upon the naive bound of O(n^{ε/2}) for any ε, and (3) instead of only in expectation, succeeds with high probability.

Cite as

Karl Bringmann and Debarati Das. A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 39:1-39:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bringmann_et_al:LIPIcs.ICALP.2021.39,
  author =	{Bringmann, Karl and Das, Debarati},
  title =	{{A Linear-Time n^\{0.4\}-Approximation for Longest Common Subsequence}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{39:1--39:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.39},
  URN =		{urn:nbn:de:0030-drops-141082},
  doi =		{10.4230/LIPIcs.ICALP.2021.39},
  annote =	{Keywords: approximation algorithm, longest common subsequence, string algorithm}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2019.10

Approximate Online Pattern Matching in Sublinear Time

Authors: Diptarka Chakraborty, Debarati Das, and Michal Koucký

Published in: LIPIcs, Volume 150, 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)

Abstract

We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Sigma, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let k_t be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k_1,k_2,...,k_n. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of k_t approximately within a constant factor. The worst case running time of our algorithm is O~(n m^(3/4)). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes k_t approximately within O(1)-multiplicative factor and m^(8/9)-additive error. Our algorithm takes O~(m^(1-(7/54))) amortized time per symbol arrival and takes O~(m^(1-(1/54))) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Koucký and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.

Cite as

Diptarka Chakraborty, Debarati Das, and Michal Koucký. Approximate Online Pattern Matching in Sublinear Time. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.FSTTCS.2019.10,
  author =	{Chakraborty, Diptarka and Das, Debarati and Kouck\'{y}, Michal},
  title =	{{Approximate Online Pattern Matching in Sublinear Time}},
  booktitle =	{39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-131-3},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{150},
  editor =	{Chattopadhyay, Arkadev and Gastin, Paul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2019.10},
  URN =		{urn:nbn:de:0030-drops-115726},
  doi =		{10.4230/LIPIcs.FSTTCS.2019.10},
  annote =	{Keywords: Approximate Pattern Matching, Online Pattern Matching, Edit Distance, Sublinear Algorithm, Streaming Algorithm}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.12

Space-Optimal Quasi-Gray Codes with Logarithmic Read Complexity

Authors: Diptarka Chakraborty, Debarati Das, Michal Koucký, and Nitin Saurabh

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

A quasi-Gray code of dimension n and length l over an alphabet Sigma is a sequence of distinct words w_1,w_2,...,w_l from Sigma^n such that any two consecutive words differ in at most c coordinates, for some fixed constant c>0. In this paper we are interested in the read and write complexity of quasi-Gray codes in the bit-probe model, where we measure the number of symbols read and written in order to transform any word w_i into its successor w_{i+1}. We present construction of quasi-Gray codes of dimension n and length 3^n over the ternary alphabet {0,1,2} with worst-case read complexity O(log n) and write complexity 2. This generalizes to arbitrary odd-size alphabets. For the binary alphabet, we present quasi-Gray codes of dimension n and length at least 2^n - 20n with worst-case read complexity 6+log n and write complexity 2. This complements a recent result by Raskin [Raskin '17] who shows that any quasi-Gray code over binary alphabet of length 2^n has read complexity Omega(n). Our results significantly improve on previously known constructions and for the odd-size alphabets we break the Omega(n) worst-case barrier for space-optimal (non-redundant) quasi-Gray codes with constant number of writes. We obtain our results via a novel application of algebraic tools together with the principles of catalytic computation [Buhrman et al. '14, Ben-Or and Cleve '92, Barrington '89, Coppersmith and Grossman '75].

Cite as

Diptarka Chakraborty, Debarati Das, Michal Koucký, and Nitin Saurabh. Space-Optimal Quasi-Gray Codes with Logarithmic Read Complexity. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.ESA.2018.12,
  author =	{Chakraborty, Diptarka and Das, Debarati and Kouck\'{y}, Michal and Saurabh, Nitin},
  title =	{{Space-Optimal Quasi-Gray Codes with Logarithmic Read Complexity}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{12:1--12:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.12},
  URN =		{urn:nbn:de:0030-drops-94750},
  doi =		{10.4230/LIPIcs.ESA.2018.12},
  annote =	{Keywords: Gray code, Space-optimal counter, Decision assignment tree, Cell probe model}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2018.15

Sparse Weight Tolerant Subgraph for Single Source Shortest Path

Authors: Diptarka Chakraborty and Debarati Das

Published in: LIPIcs, Volume 101, 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)

Abstract

In this paper we address the problem of computing a sparse subgraph of any weighted directed graph such that the exact distances from a designated source vertex to all other vertices are preserved under bounded weight increment. Finding a small sized subgraph that preserves distances between any pair of vertices is a well studied problem. Since in the real world any network is prone to failures, it is natural to study the fault tolerant version of the above problem. Unfortunately, it turns out that there may not always exist such a sparse subgraph even under single edge failure [Demetrescu et al. '08]. However in real applications it is not always the case that a link (edge) in a network becomes completely faulty. Instead, it can happen that some links become more congested which can be captured by increasing weight on the corresponding edges. Thus it makes sense to try to construct a sparse distance preserving subgraph under the above weight increment model where total increase in weight in the whole network (graph) is bounded by some parameter k. To the best of our knowledge this problem has not been studied so far. In this paper we show that given any weighted directed graph with n vertices and a source vertex, one can construct a subgraph of size at most e * (k-1)!2^kn such that it preserves distances between the source and all other vertices as long as the total weight increment is bounded by k and we are allowed to only have integer valued (can be negative) weight on edges and also weight of an edge can only be increased by some positive integer. Next we show a lower bound of c * 2^kn, for some constant c >= 5/4, on the size of such a subgraph. We further argue that the restrictions of integral weight and integral weight increment are actually essential by showing that if we remove any one of these two we may need to store Omega(n^2) edges to preserve the distances.

Cite as

Diptarka Chakraborty and Debarati Das. Sparse Weight Tolerant Subgraph for Single Source Shortest Path. In 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 101, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.SWAT.2018.15,
  author =	{Chakraborty, Diptarka and Das, Debarati},
  title =	{{Sparse Weight Tolerant Subgraph for Single Source Shortest Path}},
  booktitle =	{16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-068-2},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{101},
  editor =	{Eppstein, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2018.15},
  URN =		{urn:nbn:de:0030-drops-88413},
  doi =		{10.4230/LIPIcs.SWAT.2018.15},
  annote =	{Keywords: Shortest path, fault tolerant, distance preserver, graph algorithm}
}

Document

DOI: 10.4230/LIPIcs.STACS.2018.23

Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication

Authors: Debarati Das, Michal Koucký, and Michael Saks

Published in: LIPIcs, Volume 96, 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)

Abstract

In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least Omega(n^3 / 2^{O( sqrt{ log n })}). Subsequently, we propose a more general model capable of simulating the "Four Russian Algorithm". We prove a lower bound of Omega(n^{7/3} / 2^{O(sqrt{ log n })}) for the BMM under this model. We use a special class of graphs, called (r,t)-graphs, originally discovered by Rusza and Szemeredi (1978), along with randomization, to construct matrices that are hard instances for our combinatorial models.

Cite as

Debarati Das, Michal Koucký, and Michael Saks. Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication. In 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 96, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.STACS.2018.23,
  author =	{Das, Debarati and Kouck\'{y}, Michal and Saks, Michael},
  title =	{{Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication}},
  booktitle =	{35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)},
  pages =	{23:1--23:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-062-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{96},
  editor =	{Niedermeier, Rolf and Vall\'{e}e, Brigitte},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2018.23},
  URN =		{urn:nbn:de:0030-drops-85050},
  doi =		{10.4230/LIPIcs.STACS.2018.23},
  annote =	{Keywords: Lower bounds, Combinatorial algorithm, Boolean matrix multiplication}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2015.221

Dimension, Pseudorandomness and Extraction of Pseudorandomness

Authors: Manindra Agrawal, Diptarka Chakraborty, Debarati Das, and Satyadev Nandakumar

Published in: LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

Abstract

In this paper we propose a quantification of distributions on a set of strings, in terms of how close to pseudorandom a distribution is. The quantification is an adaptation of the theory of dimension of sets of infinite sequences introduced by Lutz. Adapting Hitchcock's work, we also show that the logarithmic loss incurred by a predictor on a distribution is quantitatively equivalent to the notion of dimension we define. Roughly, this captures the equivalence between pseudorandomness defined via indistinguishability and via unpredictability. Later we show some natural properties of our notion of dimension. We also do a comparative study among our proposed notion of dimension and two well known notions of computational analogue of entropy, namely HILL-type pseudo min-entropy and next-bit pseudo Shannon entropy. Further, we apply our quantification to the following problem. If we know that the dimension of a distribution on the set of n-length strings is s in (0,1], can we extract out O(sn) pseudorandom bits out of the distribution? We show that to construct such extractor, one need at least Omega(log n) bits of pure randomness. However, it is still open to do the same using O(log n) random bits. We show that deterministic extraction is possible in a special case - analogous to the bit-fixing sources introduced by Chor et al., which we term nonpseudorandom bit-fixing source. We adapt the techniques of Gabizon, Raz and Shaltiel to construct a deterministic pseudorandom extractor for this source. By the end, we make a little progress towards P vs. BPP problem by showing that existence of optimal stretching function that stretches O(log n) input bits to produce n output bits such that output distribution has dimension s in (0,1], implies P=BPP.

Cite as

Manindra Agrawal, Diptarka Chakraborty, Debarati Das, and Satyadev Nandakumar. Dimension, Pseudorandomness and Extraction of Pseudorandomness. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, pp. 221-235, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{agrawal_et_al:LIPIcs.FSTTCS.2015.221,
  author =	{Agrawal, Manindra and Chakraborty, Diptarka and Das, Debarati and Nandakumar, Satyadev},
  title =	{{Dimension, Pseudorandomness and Extraction of Pseudorandomness}},
  booktitle =	{35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)},
  pages =	{221--235},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-97-2},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{45},
  editor =	{Harsha, Prahladh and Ramalingam, G.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.221},
  URN =		{urn:nbn:de:0030-drops-56184},
  doi =		{10.4230/LIPIcs.FSTTCS.2015.221},
  annote =	{Keywords: Pseudorandomness, Dimension, Martingale, Unpredictability, Pseudoentropy, Pseudorandom Extractor, Hard Function}
}

@InProceedings{agrawal_et_al:LIPIcs.FSTTCS.2015.221,
  author =	{Agrawal, Manindra and Chakraborty, Diptarka and Das, Debarati and Nandakumar, Satyadev},
  title =	{{Dimension, Pseudorandomness and Extraction of Pseudorandomness}},
  booktitle =	{35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)},
  pages =	{221--235},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-97-2},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{45},
  editor =	{Harsha, Prahladh and Ramalingam, G.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.221},
  URN =		{urn:nbn:de:0030-drops-56184},
  doi =		{10.4230/LIPIcs.FSTTCS.2015.221},
  annote =	{Keywords: Pseudorandomness, Dimension, Martingale, Unpredictability, Pseudoentropy, Pseudorandom Extractor, Hard Function}
}

Search Results

Documents authored by Das, Debarati

Clustering Permutations: New Techniques with Streaming Applications

Abstract

Cite as

Approximating LCS and Alignment Distance over Multiple Sequences

Abstract

Cite as

Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding

Abstract

Cite as

Approximate Trace Reconstruction via Median String (In Average-Case)

Abstract

Cite as

A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

Abstract

Cite as

Approximate Online Pattern Matching in Sublinear Time

Abstract

Cite as

Space-Optimal Quasi-Gray Codes with Logarithmic Read Complexity

Abstract

Cite as

Sparse Weight Tolerant Subgraph for Single Source Shortest Path

Abstract

Cite as

Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication

Abstract

Cite as

Dimension, Pseudorandomness and Extraction of Pseudorandomness

Abstract

Cite as