DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2026.62

Relative Compressed Reverse Suffix Array

Authors: Muhammed Oguzhan Kulekci, Mano Prakash Parthasarathi, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

Suffix trees and suffix arrays are two fundamental data structures in the field of string algorithms. For a string (a.k.a. text or sequence) of length n over an alphabet of size σ, these structures typically require O(nlog n) bits of space. The FM-index provides a compressed representation of the suffix array in ≈ nlog σ bits, allowing for efficient queries on both the suffix array and its inverse array in near logarithmic time. In certain applications, such as approximate pattern matching (i.e., with wildcards, mismatches, edits), there is a need to access the suffix array of a text, as well as the suffix array of text’s reverse. Motivated by this, we explore the possibility of encoding the suffix array of the reversed text in a compact form, assuming the availability of the FM-index for the original text. Our first solution is an O(n)-bit (relative) encoding of the suffix array of the reversed text, with the time for decoding an entry being only O(log^*n) times that of decoding an entry in the text’s suffix array using FM-index. We then demonstrate how to reduce the space to O(n/κ) bits for a parameter κ, while multiplicative factor in time becomes approximately O(κlog^*n+κ³). We can also support inverse suffix array and longest common extension queries on the reversed text. These results are achieved through some careful and non-trivial application of various succinct data structure techniques.

Cite as

Muhammed Oguzhan Kulekci, Mano Prakash Parthasarathi, Rahul Shah, and Sharma V. Thankachan. Relative Compressed Reverse Suffix Array. In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 62:1-62:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kulekci_et_al:LIPIcs.STACS.2026.62,
  author =	{Kulekci, Muhammed Oguzhan and Parthasarathi, Mano Prakash and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Relative Compressed Reverse Suffix Array}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{62:1--62:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.62},
  URN =		{urn:nbn:de:0030-drops-255512},
  doi =		{10.4230/LIPIcs.STACS.2026.62},
  annote =	{Keywords: String Matching, Text Indexing, Data Structures, Suffix Trees}
}

Document

DOI: 10.4230/OASIcs.Manzini.11

Circular Dictionary Matching Using Extended BWT

Authors: Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The dictionary matching problem involves preprocessing a set of strings (patterns) into a data structure that efficiently identifies all occurrences of these patterns within a query string (text). In this work, we investigate a variation of this problem, termed circular dictionary matching, where the patterns are circular, meaning their cyclic shifts are also considered valid patterns. Such patterns naturally occur in areas such as bioinformatics and computational geometry. Based on the extended Burrows-Wheeler Transformation (eBWT), we design a space-efficient solution for this problem. Specifically, we show that a dictionary of d circular patterns of total length n can be indexed in nlog σ + O(n+dlog n+σ log n) bits of space and support circular dictionary matching on a query text T in O((|T|+occ)log n) time, where σ represents the size of the underlying alphabet and occ represents the output size.

Cite as

Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan. Circular Dictionary Matching Using Extended BWT. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hon_et_al:OASIcs.Manzini.11,
  author =	{Hon, Wing-Kai and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Circular Dictionary Matching Using Extended BWT}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.11},
  URN =		{urn:nbn:de:0030-drops-239195},
  doi =		{10.4230/OASIcs.Manzini.11},
  annote =	{Keywords: String algorithms, Burrows-Wheeler transformation, suffix trees, succinct data structures}
}

Document

DOI: 10.4230/LIPIcs.STACS.2025.38

Two-Dimensional Longest Common Extension Queries in Compact Space

Authors: Arnab Ganguly, Daniel Gibney, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 327, 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)

Abstract

For a length n text over an alphabet of size σ, we can encode the suffix tree data structure in 𝒪(nlog σ) bits of space. It supports suffix array (SA), inverse suffix array (ISA), and longest common extension (LCE) queries in 𝒪(log^ε_σ n) time, which enables efficient pattern matching; here ε > 0 is an arbitrarily small constant. Further improvements are possible for LCE queries, where 𝒪(1) time queries can be achieved using an index of space 𝒪(nlog σ) bits. However, compactly indexing a two-dimensional text (i.e., an n× n matrix) has been a major open problem. We show progress in this direction by first presenting an 𝒪(n²log σ)-bit structure supporting LCE queries in near 𝒪((log_σ n)^{2/3}) time. We then present an 𝒪(n²log σ + n²log log n)-bit structure supporting ISA queries in near 𝒪(log n ⋅ (log_σ n)^{2/3}) time. Within a similar space, achieving SA queries in poly-logarithmic (even strongly sub-linear) time is a significant challenge. However, our 𝒪(n²log σ + n²log log n)-bit structure can support SA queries in 𝒪(n²/(σ log n)^c) time, where c is an arbitrarily large constant, which enables pattern matching in time faster than what is possible without preprocessing. We then design a repetition-aware data structure. The δ_2D compressibility measure for two-dimensional texts was recently introduced by Carfagna and Manzini [SPIRE 2023]. The measure ranges from 1 to n², with smaller δ_2D indicating a highly compressible two-dimensional text. The current data structure utilizing δ_2D allows only element access. We obtain the first structure based on δ_2D for LCE queries. It takes 𝒪^{~}(n^{5/3} + n^{8/5}δ_2D^{1/5}) space and answers queries in 𝒪(log n) time.

Cite as

Arnab Ganguly, Daniel Gibney, Rahul Shah, and Sharma V. Thankachan. Two-Dimensional Longest Common Extension Queries in Compact Space. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 38:1-38:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.STACS.2025.38,
  author =	{Ganguly, Arnab and Gibney, Daniel and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Two-Dimensional Longest Common Extension Queries in Compact Space}},
  booktitle =	{42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)},
  pages =	{38:1--38:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-365-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{327},
  editor =	{Beyersdorff, Olaf and Pilipczuk, Micha{\l} and Pimentel, Elaine and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2025.38},
  URN =		{urn:nbn:de:0030-drops-228649},
  doi =		{10.4230/LIPIcs.STACS.2025.38},
  annote =	{Keywords: String matching, text indexing, two-dimensional text}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.65

Fully Functional Parameterized Suffix Trees in Compact Space

Authors: Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

Two equal length strings are a parameterized match (p-match) iff there exists a one-to-one function that renames the symbols in one string to those in the other. The Parameterized Suffix Tree (PST) [Baker, STOC' 93] is a fundamental data structure that handles various string matching problems under this setting. The PST of a text T[1,n] over an alphabet Σ of size σ takes O(nlog n) bits of space. It can report any entry in (parameterized) (i) suffix array, (ii) inverse suffix array, and (iii) longest common prefix (LCP) array in O(1) time. Given any pattern P as a query, a position i in T is an occurrence iff T[i,i+|P|-1] and P are a p-match. The PST can count the number of occurrences of P in T in time O(|P|log σ) and then report each occurrence in time proportional to that of accessing a suffix array entry. An important question is, can we obtain a compressed version of PST that takes space close to the text’s size of nlogσ bits and still support all three functionalities mentioned earlier? In SODA' 17, Ganguly et al. answered this question partially by presenting an O(nlogσ) bit index that can support (parameterized) suffix array and inverse suffix array operations in O(log n) time. However, the compression of the (parameterized) LCP array and the possibility of faster suffix array and inverse suffix array queries in compact space were left open. In this work, we obtain a compact representation of the (parameterized) LCP array. With this result, in conjunction with three new (parameterized) suffix array representations, we obtain the first set of PST representations in o(nlog n) bits (when logσ = o(log n)) as follows. Here ε > 0 is an arbitrarily small constant. - Space O(n logσ) bits and query time O(log_σ^ε n); - Space O(n logσlog log_σ n) bits and query time O(log log_σ n); and - Space O(n logσ log^ε_σ n) bits and query time O(1). The first trade-off is an improvement over Ganguly et al.’s result, whereas our third trade-off matches the optimal time performance of Baker’s PST while squeezing the space by a factor roughly log_σ n. We highlight that our trade-offs match the space-and-time bounds of the best-known compressed text indexes for exact pattern matching and further improvement is highly unlikely.

Cite as

Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Fully Functional Parameterized Suffix Trees in Compact Space. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 65:1-65:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ICALP.2022.65,
  author =	{Ganguly, Arnab and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Fully Functional Parameterized Suffix Trees in Compact Space}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{65:1--65:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.65},
  URN =		{urn:nbn:de:0030-drops-164061},
  doi =		{10.4230/LIPIcs.ICALP.2022.65},
  annote =	{Keywords: Data Structures, Suffix Trees, String Algorithms, Compression}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2021.60

Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space

Authors: Dhrumil Patel and Rahul Shah

Published in: LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)

Abstract

In a 2-dimensional (2D) pattern matching problem, the text is arranged as a matrix 𝖬[1..n, 1..n] and consists of N = n × n symbols drawn from alphabet set Σ of size σ. The query consists of a m × m square matrix 𝖯[1..m, 1..m] drawn from the same alphabet set Σ and the task is to find all the locations in 𝖬 where 𝖯 appears as a (contiguous) submatrix. The patterns can be of any size, but as long as they are square in shape data structures like suffix trees and suffix array exist [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998] for the task of efficient pattern matching. These are essentially 2D counterparts of classic suffix trees and arrays known for traditional 1-dimensional (1D) pattern matching. They work based on linearization of 2D suffixes which would preserve the prefix match property (i.e., every pattern match is a prefix of some suffix). The main limitation of the suffix trees and the suffix arrays (in 1D) was their space utilization of O(N log N) bits, where N is the size of the text. This was suboptimal compared to Nlog σ bits of space, which is information theoretic optimal for the text. With the advent of the field of succinct/compressed data structures, it was possible to develop compressed variants of suffix trees and array based on Burrows-Wheeler Tansform and LF-mapping (or Φ function) [Roberto Grossi and Jeffrey Scott Vitter, 2005; Paolo Ferragina and Giovanni Manzini, 2005; Kunihiko Sadakane, 2007]. These data structures indeed achieve O(N log σ) bits of space or better. This gives rise to the question: analogous to 1D case, can we design a succinct or compressed index for 2D pattern matching? Can there be a 2D compressed suffix tree? Are there analogues of Burrows-Wheeler Transform or LF-mapping? The problem has been acknowledged for over a decade now and there have been a few attempts at applying Φ function [Ankur Gupta, 2004] and achieving entropy based compression [Veli Mäkinen and Gonzalo Navarro, 2008]. However, achieving the complexity breakthrough akin to 1D case has yet to be found. In this paper, we still do not know how to answer suffix array queries in O(N log σ) bits of space - which would have led to efficient pattern matching. However, for the first time, we show an interesting result that it is indeed possible to compute inverse suffix array (ISA) queries in near compact space in O(polylog n) time. Our 2D succinct text index design is based on two 1D compressed suffix trees and it takes O(N log log N + N logσ) bits of space which is much smaller than its naive design that takes O(N log N) bits. Although the main problem is still evasive, this index gives a hope on the existence of a full 2D succinct index with all functionalities similar to that of 1D case.

Cite as

Dhrumil Patel and Rahul Shah. Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 60:1-60:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{patel_et_al:LIPIcs.ISAAC.2021.60,
  author =	{Patel, Dhrumil and Shah, Rahul},
  title =	{{Inverse Suffix Array Queries for 2-Dimensional Pattern Matching in Near-Compact Space}},
  booktitle =	{32nd International Symposium on Algorithms and Computation (ISAAC 2021)},
  pages =	{60:1--60:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-214-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{212},
  editor =	{Ahn, Hee-Kap and Sadakane, Kunihiko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2021.60},
  URN =		{urn:nbn:de:0030-drops-154932},
  doi =		{10.4230/LIPIcs.ISAAC.2021.60},
  annote =	{Keywords: Pattern Matching, Succinct Data Structures}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.71

LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching

Authors: Arnab Ganguly, Dhrumil Patel, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

Two strings are order isomorphic iff the relative ordering of their characters is the same at all positions. For a given text T[1,n] over an ordered alphabet of size σ, we can maintain an order-isomorphic suffix tree/array in O(nlog n) bits and support (order-isomorphic) pattern/substring matching queries efficiently. It is interesting to know if we can encode these structures in space close to the text’s size of nlogσ bits. We answer this question positively by presenting an O(nlog σ)-bit index that allows access to any entry in order-isomorphic suffix array (and its inverse array) in t_{SA} = {O}(log²n/logσ) time. For any pattern P given as a query, this index can count the number of substrings of T that are order-isomorphic to P (denoted by occ) in {O}((|P|logσ+t_{SA})log n) time using standard techniques. Also, it can report the locations of those substrings in additional O(occ ⋅ t_{SA}) time.

Cite as

Arnab Ganguly, Dhrumil Patel, Rahul Shah, and Sharma V. Thankachan. LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 71:1-71:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ICALP.2021.71,
  author =	{Ganguly, Arnab and Patel, Dhrumil and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{71:1--71:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.71},
  URN =		{urn:nbn:de:0030-drops-141400},
  doi =		{10.4230/LIPIcs.ICALP.2021.71},
  annote =	{Keywords: Succinct data structures, Pattern Matching}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.9

Categorical Range Reporting with Frequencies

Authors: Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

In this paper, we consider a variant of the color range reporting problem called color reporting with frequencies. Our goal is to pre-process a set of colored points into a data structure, so that given a query range Q, we can report all colors that appear in Q, along with their respective frequencies. In other words, for each reported color, we also output the number of times it occurs in Q. We describe an external-memory data structure that uses O(N(1+log^2D/log N)) words and answers one-dimensional queries in O(1 +K/B) I/Os, where N is the total number of points in the data structure, D is the total number of colors in the data structure, K is the number of reported colors, and B is the block size. Next we turn to an approximate version of this problem: report all colors sigma that appear in the query range; for every reported color, we provide a constant-factor approximation on its frequency. We consider color reporting with approximate frequencies in two dimensions. Our data structure uses O(N) space and answers two-dimensional queries in O(log_B N +log^*B + K/B) I/Os in the special case when the query range is bounded on two sides. As a corollary, we can also answer one-dimensional approximate queries within the same time and space bounds.

Cite as

Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan. Categorical Range Reporting with Frequencies. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ICDT.2019.9,
  author =	{Ganguly, Arnab and Munro, J. Ian and Nekrich, Yakov and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Categorical Range Reporting with Frequencies}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.9},
  URN =		{urn:nbn:de:0030-drops-103115},
  doi =		{10.4230/LIPIcs.ICDT.2019.9},
  annote =	{Keywords: Data Structures, Range Reporting, Range Counting, Categorical Range Reporting, Orthogonal Range Query}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.35

Structural Pattern Matching - Succinctly

Authors: Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

Let T be a text of length n containing characters from an alphabet \Sigma, which is the union of two disjoint sets: \Sigma_s containing static characters (s-characters) and \Sigma_p containing parameterized characters (p-characters). Each character in \Sigma_p has an associated complementary character from \Sigma_p. A pattern P (also over \Sigma) matches an equal-length substring $S$ of T iff the s-characters match exactly, there exists a one-to-one function that renames the p-characters in S to the p-characters in P, and if a p-character x is renamed to another p-character y then the complement of x is renamed to the complement of y. The task is to find the starting positions (occurrences) of all such substrings S. Previous indexing solution [Shibuya, SWAT 2000], known as Structural Suffix Tree, requires \Theta(n\log n) bits of space, and can find all occ occurrences in time O(|P|\log \sigma+ occ), where \sigma = |\Sigma|. In this paper, we present the first succinct index for this problem, which occupies n \log \sigma + O(n) bits and offers O(|P|\log\sigma+ occ\cdot \log n \log\sigma) query time.

Cite as

Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural Pattern Matching - Succinctly. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 35:1-35:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ISAAC.2017.35,
  author =	{Ganguly, Arnab and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Structural Pattern Matching - Succinctly}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{35:1--35:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.35},
  URN =		{urn:nbn:de:0030-drops-82566},
  doi =		{10.4230/LIPIcs.ISAAC.2017.35},
  annote =	{Keywords: Parameterized Pattern Matching, Suffix tree, Burrows-Wheeler Transform, Wavelet Tree, Fully-functional succinct tree}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2016.34

Space-Time Trade-Offs for the Shortest Unique Substring Problem

Authors: Arnab Ganguly, Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

Abstract

Given a string X[1, n] and a position k in [1, n], the Shortest Unique Substring of X covering k, denoted by S_k, is a substring X[i, j] of X which satisfies the following conditions: (i) i leq k leq j, (ii) i is the only position where there is an occurrence of X[i, j], and (iii) j - i is minimized. The best-known algorithm [Hon et al., ISAAC 2015] can find S k for all k in [1, n] in time O(n) using the string X and additional 2n words of working space. Let tau be a given parameter. We present the following new results. For any given k in [1, n], we can compute S_k via a deterministic algorithm in O(n tau^2 log n tau) time using X and additional O(n/tau) words of working space. For every k in [1, n], we can compute S_k via a deterministic algorithm in O(n tau^2 log n/tau) time using X and additional O(n/tau) words and 4n + o(n) bits of working space. For both problems above, we present an O(n tau log^{c+1} n)-time randomized algorithm that uses n/ log c n words in addition to that mentioned above, where c geq 0 is an arbitrary constant. In this case, the reported string is unique and covers k, but with probability at most n^{-O(1)} , may not be the shortest. As a consequence of our techniques, we also obtain similar space-and-time tradeoffs for a related problem of finding Maximal Unique Matches of two strings [Delcher et al., Nucleic Acids Res. 1999].

Cite as

Arnab Ganguly, Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan. Space-Time Trade-Offs for the Shortest Unique Substring Problem. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 34:1-34:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ISAAC.2016.34,
  author =	{Ganguly, Arnab and Hon, Wing-Kai and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Space-Time Trade-Offs for the Shortest Unique Substring Problem}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{34:1--34:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.34},
  URN =		{urn:nbn:de:0030-drops-68041},
  doi =		{10.4230/LIPIcs.ISAAC.2016.34},
  annote =	{Keywords: Suffix Tree, Sparsification, Rabin-Karp Fingerprint, Probabilistic z-Fast Trie, Succinct Data-Structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.2

Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching

Authors: Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

Let S and S' be two strings of the same length.We consider the following two variants of string matching. * Parameterized Matching: The characters of S and S' are partitioned into static characters and parameterized characters. The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S'. * Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S'[i] <= S'[j]. Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma. Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching. Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of $T$ starting at j. Previous best-known indexes occupy O(n * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching.

Cite as

Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.CPM.2016.2,
  author =	{Ganguly, Arnab and Hon, Wing-Kai and Sadakane, Kunihiko and Shah, Rahul and Thankachan, Sharma V. and Yang, Yilin},
  title =	{{Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{2:1--2:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.2},
  URN =		{urn:nbn:de:0030-drops-60736},
  doi =		{10.4230/LIPIcs.CPM.2016.2},
  annote =	{Keywords: Parameterized Matching, Order-preserving Matching, Dictionary Indexing, Aho-Corasick Automaton, Sparsification}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2016.10

A Framework for Dynamic Parameterized Dictionary Matching

Authors: Arnab Ganguly, Wing-Kai Hon, and Rahul Shah

Published in: LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)

Abstract

Two equal-length strings S and S' are a parameterized-match (p-match) iff there exists a one-to-one function that renames the characters in S to those in S'. Let P be a collection of d patterns of total length n characters that are chosen from an alphabet Sigma of cardinality sigma. The task is to index P such that we can support the following operations. * search(T): given a text T, report all occurrences <j,P_i> such that there exists a pattern P_i in P that is a p-match with the substring T[j,j+|P_i|-1]. * ins(P_i)/del(P_i): modify the index when a pattern P_i is inserted/deleted. We present a linear-space index that occupies O(n*log n) bits and supports (i) search(T) in worst-case O(|T|*log^2 n + occ) time, where occ is the number of occurrences reported, and (ii) ins(P_i) and del(P_i) in amortized O(|P_i|*polylog(n)) time. Then, we present a succinct index that occupies (1+o(1))n*log sigma + O(d*log n) bits and supports (i) search(T) in worst-case O(|T|*log^2 n + occ) time, and (ii) ins(P_i) and del(P_i) in amortized O(|P_i|*polylog(n)) time. We also present results related to the semi-dynamic variant of the problem, where deletion is not allowed.

Cite as

Arnab Ganguly, Wing-Kai Hon, and Rahul Shah. A Framework for Dynamic Parameterized Dictionary Matching. In 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.SWAT.2016.10,
  author =	{Ganguly, Arnab and Hon, Wing-Kai and Shah, Rahul},
  title =	{{A Framework for Dynamic Parameterized Dictionary Matching}},
  booktitle =	{15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)},
  pages =	{10:1--10:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-011-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{53},
  editor =	{Pagh, Rasmus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016.10},
  URN =		{urn:nbn:de:0030-drops-60256},
  doi =		{10.4230/LIPIcs.SWAT.2016.10},
  annote =	{Keywords: Parameterized Dictionary Indexing, Generalized Suffix Tree, Succinct Data Structures, Sparsification}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2015.320

Forbidden Extension Queries

Authors: Sudip Biswas, Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

Abstract

Document retrieval is one of the most fundamental problem in information retrieval. The objective is to retrieve all documents from a document collection that are relevant to an input pattern. Several variations of this problem such as ranked document retrieval, document listing with two patterns and forbidden patterns have been studied. We introduce the problem of document retrieval with forbidden extensions. Let D={T_1,T_2,...,T_D} be a collection of D string documents of n characters in total, and P^+ and P^- be two query patterns, where P^+ is a proper prefix of P^-. We call P^- as the forbidden extension of the included pattern P^+. A forbidden extension query < P^+,P^- > asks to report all occ documents in D that contains P^+ as a substring, but does not contain P^- as one. A top-k forbidden extension query < P^+,P^-,k > asks to report those k documents among the occ documents that are most relevant to P^+. We present a linear index (in words) with an O(|P^-| + occ) query time for the document listing problem. For the top-k version of the problem, we achieve the following results, when the relevance of a document is based on PageRank: - an O(n) space (in words) index with O(|P^-|log sigma+ k) query time, where sigma is the size of the alphabet from which characters in D are chosen. For constant alphabets, this yields an optimal query time of O(|P^-|+ k). - for any constant epsilon > 0, a |CSA| + |CSA^*| + Dlog frac{n}{D} + O(n) bits index with O(search(P)+ k cdot tsa cdot log ^{2+epsilon} n) query time, where search(P) is the time to find the suffix range of a pattern P, tsa is the time to find suffix (or inverse suffix) array value, and |CSA^*| denotes the maximum of the space needed to store the compressed suffix array CSA of the concatenated text of all documents, or the total space needed to store the individual CSA of each document.

Cite as

Sudip Biswas, Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Forbidden Extension Queries. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, pp. 320-335, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{biswas_et_al:LIPIcs.FSTTCS.2015.320,
  author =	{Biswas, Sudip and Ganguly, Arnab and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Forbidden Extension Queries}},
  booktitle =	{35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)},
  pages =	{320--335},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-97-2},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{45},
  editor =	{Harsha, Prahladh and Ramalingam, G.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.320},
  URN =		{urn:nbn:de:0030-drops-56522},
  doi =		{10.4230/LIPIcs.FSTTCS.2015.320},
  annote =	{Keywords: document retrieval, suffix trees, range queries, succinct data structure}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2015.277

Shared-Constraint Range Reporting

Authors: Sudip Biswas, Manish Patil, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)

Abstract

Orthogonal range reporting is one of the classic and most fundamental data structure problems. (2,1,1) query is a 3 dimensional query with two-sided constraint on the first dimension and one sided constraint on each of the 2nd and 3rd dimension. Given a set of N points in three dimension, a particular formulation of such a (2,1,1) query (known as four-sided range reporting in three-dimension) asks to report all those K points within a query region [a, b]X(-infinity, c]X[d, infinity). These queries have overall 4 constraints. In Word-RAM model, the best known structure capable of answering such queries with optimal query time takes O(N log^{epsilon} N) space, where epsilon>0 is any positive constant. It has been shown that any external memory structure in optimal I/Os must use Omega(N log N/ log log_B N) space (in words), where B is the block size [Arge et al., PODS 1999]. In this paper, we study a special type of (2,1,1) queries, where the query parameters a and c are the same i.e., a=c. Even though the query is still four-sided, the number of independent constraints is only three. In other words, one constraint is shared. We call this as a Shared-Constraint Range Reporting (SCRR) problem. We study this problem in both internal as well as external memory models. In RAM model where coordinates can only be compared, we achieve linear-space and O(log N+K) query time solution, matching the best-known three dimensional dominance query bound. Whereas in external memory, we present a linear space structure with O(log_B N + log log N + K/B) query I/Os. We also present an I/O-optimal (i.e., O(log_B N+K/B) I/Os) data structure which occupies O(N log log N)-word space. We achieve these results by employing a novel divide and conquer approach. SCRR finds application in database queries containing sharing among the constraints. We also show that SCRR queries naturally arise in many well known problems such as top-k color reporting, range skyline reporting and ranked document retrieval.

Cite as

Sudip Biswas, Manish Patil, Rahul Shah, and Sharma V. Thankachan. Shared-Constraint Range Reporting. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 277-290, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{biswas_et_al:LIPIcs.ICDT.2015.277,
  author =	{Biswas, Sudip and Patil, Manish and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Shared-Constraint Range Reporting}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{277--290},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.277},
  URN =		{urn:nbn:de:0030-drops-49900},
  doi =		{10.4230/LIPIcs.ICDT.2015.277},
  annote =	{Keywords: data structure, shared constraint, multi-slab, point partitioning}
}

Search Results

Documents authored by Shah, Rahul

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message