DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2026.62

Relative Compressed Reverse Suffix Array

Authors: Muhammed Oguzhan Kulekci, Mano Prakash Parthasarathi, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

Suffix trees and suffix arrays are two fundamental data structures in the field of string algorithms. For a string (a.k.a. text or sequence) of length n over an alphabet of size σ, these structures typically require O(nlog n) bits of space. The FM-index provides a compressed representation of the suffix array in ≈ nlog σ bits, allowing for efficient queries on both the suffix array and its inverse array in near logarithmic time. In certain applications, such as approximate pattern matching (i.e., with wildcards, mismatches, edits), there is a need to access the suffix array of a text, as well as the suffix array of text’s reverse. Motivated by this, we explore the possibility of encoding the suffix array of the reversed text in a compact form, assuming the availability of the FM-index for the original text. Our first solution is an O(n)-bit (relative) encoding of the suffix array of the reversed text, with the time for decoding an entry being only O(log^*n) times that of decoding an entry in the text’s suffix array using FM-index. We then demonstrate how to reduce the space to O(n/κ) bits for a parameter κ, while multiplicative factor in time becomes approximately O(κlog^*n+κ³). We can also support inverse suffix array and longest common extension queries on the reversed text. These results are achieved through some careful and non-trivial application of various succinct data structure techniques.

Cite as

Muhammed Oguzhan Kulekci, Mano Prakash Parthasarathi, Rahul Shah, and Sharma V. Thankachan. Relative Compressed Reverse Suffix Array. In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 62:1-62:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kulekci_et_al:LIPIcs.STACS.2026.62,
  author =	{Kulekci, Muhammed Oguzhan and Parthasarathi, Mano Prakash and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Relative Compressed Reverse Suffix Array}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{62:1--62:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.62},
  URN =		{urn:nbn:de:0030-drops-255512},
  doi =		{10.4230/LIPIcs.STACS.2026.62},
  annote =	{Keywords: String Matching, Text Indexing, Data Structures, Suffix Trees}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.16

Faster Range LCP Queries in Linear Space

Authors: Yakov Nekirch and Sharma V. Thankachan

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

A range LCP query rlcp(α,β) on a text T[1 .. n] asks to return the length of the longest common prefix of any two suffixes of T with starting positions in a range [α,β]. In this paper we describe a data structure that uses O(n) space and supports range LCP queries in time O(log^ε n) for any constant ε > 0. Our result is the fastest currently known linear-space solution for this problem.

Cite as

Yakov Nekirch and Sharma V. Thankachan. Faster Range LCP Queries in Linear Space. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 16:1-16:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{nekirch_et_al:OASIcs.Grossi.16,
  author =	{Nekirch, Yakov and Thankachan, Sharma V.},
  title =	{{Faster Range LCP Queries in Linear Space}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{16:1--16:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.16},
  URN =		{urn:nbn:de:0030-drops-238158},
  doi =		{10.4230/OASIcs.Grossi.16},
  annote =	{Keywords: Data Structures, String Algorithms, Longest Common Prefix}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.88

Repetition Aware Text Indexing for Matching Patterns with Wildcards

Authors: Daniel Gibney, Jackson Huffstutler, Mano Prakash Parthasarathi, and Sharma V. Thankachan

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We study the problem of indexing a text T[1..n] to support pattern matching with wildcards. The input of a query is a pattern P[1..m] containing h ∈ [0, k] wildcard (a.k.a. don't care) characters and the output is the set of occurrences of P in T (i.e., starting positions of substrings of T that matches P), where k = o(log n) is fixed at index construction. A classic solution by Cole et al. [STOC 2004] provides an index with space complexity O(n ⋅ (clog n)^k/k!)) and query time O(m+2^h log log n+occ), where c > 1 is a constant, and occ denotes the number of occurrences of P in T. We introduce a new data structure that significantly reduces space usage for highly repetitive texts while maintaining efficient query processing. Its space (in words) and query time are as follows: O(δ log (n/δ)⋅ c^k (1+(log^k (δ log n))/k!)) and O((m+2^h +occ)log n)) The parameter δ, known as substring complexity, is a recently introduced measure of repetitiveness that serves as a unifying and lower-bounding metric for several popular measures, including the number of phrases in the LZ77 factorization (denoted by z) and the number of runs in the Burrows-Wheeler Transform (denoted by r). Moreover, O(δ log (n/δ)) represents the optimal space required to encode the data in terms of n and δ, helping us see how close our space is to the minimum required. In another trade-off, we match the query time of Cole et al.’s index using O(n+δ log (n/δ) ⋅ (clogδ)^{k+ε}/k!) space, where ε > 0 is an arbitrarily small constant. We also demonstrate how these techniques can be applied to a more general indexing problem, where the query pattern includes k-gaps (a gap can be interpreted as a contiguous sequence of wildcard characters).

Cite as

Daniel Gibney, Jackson Huffstutler, Mano Prakash Parthasarathi, and Sharma V. Thankachan. Repetition Aware Text Indexing for Matching Patterns with Wildcards. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 88:1-88:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gibney_et_al:LIPIcs.ICALP.2025.88,
  author =	{Gibney, Daniel and Huffstutler, Jackson and Parthasarathi, Mano Prakash and Thankachan, Sharma V.},
  title =	{{Repetition Aware Text Indexing for Matching Patterns with Wildcards}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{88:1--88:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.88},
  URN =		{urn:nbn:de:0030-drops-234656},
  doi =		{10.4230/LIPIcs.ICALP.2025.88},
  annote =	{Keywords: Pattern Matching, Text Indexing, Wildcard Matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.24

Sorted Consecutive Occurrence Queries in Substrings

Authors: Waseem Akram and Takuya Mieno

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for preprocessing and a pattern P of length m as a query, the goal is to report all occurrences of P as substrings of T. Navarro and Thankachan [CPM 2015, Theor. Comput. Sci. 2016] introduced a variant of this problem called the gap-bounded consecutive occurrence query, which reports pairs of consecutive occurrences of P in T such that their gaps (i.e., the distances between them) lie within a query-specified range [g₁, g₂]. Recently, Bille et al. [FSTTCS 2020, Theor. Comput. Sci. 2022] proposed the top-k close consecutive occurrence query, which reports the k closest consecutive occurrences of P in T, sorted in non-decreasing order of distance. Both problems are optimally solved in query time with O(n log n)-space data structures. In this paper, we generalize these problems to the range query model, which focuses only on occurrences of P in a specified substring T[a.. b] of T. Our contributions are as follows: - We propose an O(n log² n)-space data structure that answers the range top-k consecutive occurrence query in O(|P| + log log n + k) time. - We propose an O(n log^{2+ε} n)-space data structure that answers the range gap-bounded consecutive occurrence query in O(|P| + log log n + output) time, where ε is a positive constant and output denotes the number of outputs. Additionally, as by-products, we present algorithms for geometric problems involving weighted horizontal segments in a 2D plane, which are of independent interest.

Cite as

Waseem Akram and Takuya Mieno. Sorted Consecutive Occurrence Queries in Substrings. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 24:1-24:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{akram_et_al:LIPIcs.CPM.2025.24,
  author =	{Akram, Waseem and Mieno, Takuya},
  title =	{{Sorted Consecutive Occurrence Queries in Substrings}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{24:1--24:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.24},
  URN =		{urn:nbn:de:0030-drops-231187},
  doi =		{10.4230/LIPIcs.CPM.2025.24},
  annote =	{Keywords: string algorithm, consecutive occurrences, suffix tree}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.8

Non-Overlapping Indexing - Cache Obliviously

Authors: Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size.

Cite as

Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan. Non-Overlapping Indexing - Cache Obliviously. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{hooshmand_et_al:LIPIcs.CPM.2018.8,
  author =	{Hooshmand, Sahar and Abedin, Paniz and K\"{u}lekci, M. Oguzhan and Thankachan, Sharma V.},
  title =	{{Non-Overlapping Indexing - Cache Obliviously}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.8},
  URN =		{urn:nbn:de:0030-drops-87009},
  doi =		{10.4230/LIPIcs.CPM.2018.8},
  annote =	{Keywords: Suffix Trees, Cache Oblivious, Data Structure, String Algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.20

The Heaviest Induced Ancestors Problem Revisited

Authors: Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

We revisit the heaviest induced ancestors problem, which has several interesting applications in string matching. Let T_1 and T_2 be two weighted trees, where the weight W(u) of a node u in either of the two trees is more than the weight of u's parent. Additionally, the leaves in both trees are labeled and the labeling of the leaves in T_2 is a permutation of those in T_1. A node x in T_1 and a node y in T_2 are induced, iff their subtree have at least one common leaf label. A heaviest induced ancestor query HIA(u_1,u_2) is: given a node u_1 in T_1 and a node u_2 in T_2, output the pair (u_1^*,u_2^*) of induced nodes with the highest combined weight W(u^*_1) + W(u^*_2), such that u_1^* is an ancestor of u_1 and u^*_2 is an ancestor of u_2. Let n be the number of nodes in both trees combined and epsilon >0 be an arbitrarily small constant. Gagie et al. [CCCG' 13] introduced this problem and proposed three solutions with the following space-time trade-offs: - an O(n log^2n)-word data structure with O(log n log log n) query time - an O(n log n)-word data structure with O(log^2 n) query time - an O(n)-word data structure with O(log^{3+epsilon}n) query time. In this paper, we revisit this problem and present new data structures, with improved bounds. Our results are as follows. - an O(n log n)-word data structure with O(log n log log n) query time - an O(n)-word data structure with O(log^2 n/log log n) query time. As a corollary, we also improve the LZ compressed index of Gagie et al. [CCCG' 13] for answering longest common substring (LCS) queries. Additionally, we show that the LCS after one edit problem of size n [Amir et al., SPIRE' 17] can also be reduced to the heaviest induced ancestors problem over two trees of n nodes in total. This yields a straightforward improvement over its current solution of O(n log^3 n) space and O(log^3 n) query time.

Cite as

Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The Heaviest Induced Ancestors Problem Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 20:1-20:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{abedin_et_al:LIPIcs.CPM.2018.20,
  author =	{Abedin, Paniz and Hooshmand, Sahar and Ganguly, Arnab and Thankachan, Sharma V.},
  title =	{{The Heaviest Induced Ancestors Problem Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{20:1--20:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.20},
  URN =		{urn:nbn:de:0030-drops-86898},
  doi =		{10.4230/LIPIcs.CPM.2018.20},
  annote =	{Keywords: Data Structure, String Algorithms, Orthogonal Range Queries}
}

6 Search Results for "Abedin, Paniz"

Relative Compressed Reverse Suffix Array

Abstract

Cite as

Faster Range LCP Queries in Linear Space

Abstract

Cite as

Repetition Aware Text Indexing for Matching Patterns with Wildcards

Abstract

Cite as

Sorted Consecutive Occurrence Queries in Substrings

Abstract

Cite as

Non-Overlapping Indexing - Cache Obliviously

Abstract

Cite as

The Heaviest Induced Ancestors Problem Revisited

Abstract

Cite as

Thanks for your feedback!

Could not send message