DROPS

Document

Circular Dictionary Matching Using Extended BWT

Authors: Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The dictionary matching problem involves preprocessing a set of strings (patterns) into a data structure that efficiently identifies all occurrences of these patterns within a query string (text). In this work, we investigate a variation of this problem, termed circular dictionary matching, where the patterns are circular, meaning their cyclic shifts are also considered valid patterns. Such patterns naturally occur in areas such as bioinformatics and computational geometry. Based on the extended Burrows-Wheeler Transformation (eBWT), we design a space-efficient solution for this problem. Specifically, we show that a dictionary of d circular patterns of total length n can be indexed in nlog σ + O(n+dlog n+σ log n) bits of space and support circular dictionary matching on a query text T in O((|T|+occ)log n) time, where σ represents the size of the underlying alphabet and occ represents the output size.

Cite as

Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan. Circular Dictionary Matching Using Extended BWT. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hon_et_al:OASIcs.Manzini.11,
  author =	{Hon, Wing-Kai and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Circular Dictionary Matching Using Extended BWT}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.11},
  URN =		{urn:nbn:de:0030-drops-239195},
  doi =		{10.4230/OASIcs.Manzini.11},
  annote =	{Keywords: String algorithms, Burrows-Wheeler transformation, suffix trees, succinct data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.18

Improved Circular Dictionary Matching

Authors: Nicola Cotumaccio

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The circular dictionary matching problem is an extension of the classical dictionary matching problem where every string in the dictionary is interpreted as a circular string: after reading the last character of a string, we can move back to its first character. The circular dictionary matching problem is motivated by applications in bioinformatics and computational geometry. In 2011, Hon et al. [ISAAC 2011] showed how to efficiently solve circular dictionary matching queries within compressed space by building on Mantaci et al.’s eBWT and Sadakane’s compressed suffix tree. The proposed solution is based on the assumption that the strings in the dictionary are all distinct and non-periodic, no string is a circular rotation of some other string, and the strings in the dictionary have similar lengths. In this paper, we consider arbitrary dictionaries, and we show how to solve circular dictionary matching queries in O((m + occ) log n) time within compressed space using n log σ (1 + o(1)) + O(n) + O(d log n) bits, where n is the total length of the dictionary, m is the length of the pattern, occ is the number of occurrences, d is the number of strings in the dictionary and σ is the size of the alphabet. Our solution is based on an extension of the suffix array to arbitrary dictionaries and a sampling mechanism for the LCP array of a dictionary inspired by recent results in graph indexing and compression.

Cite as

Nicola Cotumaccio. Improved Circular Dictionary Matching. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 18:1-18:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cotumaccio:LIPIcs.CPM.2025.18,
  author =	{Cotumaccio, Nicola},
  title =	{{Improved Circular Dictionary Matching}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{18:1--18:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.18},
  URN =		{urn:nbn:de:0030-drops-231122},
  doi =		{10.4230/LIPIcs.CPM.2025.18},
  annote =	{Keywords: Circular pattern matching, dictionary matching, suffix tree, compressed suffix tree, suffix array, LCP array, Burrows-Wheeler Transform, FM-index}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.21

Compressed Dictionary Matching on Run-Length Encoded Strings

Authors: Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

Given a set of pattern strings 𝒫 = {P₁, P₂,… P_k} and a text string S, the classic dictionary matching problem is to report all occurrences of each pattern in S. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let m and n be the total length of the patterns 𝒫 and the length of the text string S, respectively, and let ̅m and ̅n be the total number of runs in the run-length encoding of the patterns in 𝒫 and S, respectively. Our main result is an algorithm that achieves O(( ̅m + ̅n)log log m + occ) expected time, and O( ̅m) space, where occ is the total number of occurrences of patterns in S. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an log log m factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.

Cite as

Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow. Compressed Dictionary Matching on Run-Length Encoded Strings. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2025.21,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Puglisi, Simon J. and Tarnow, Simon R.},
  title =	{{Compressed Dictionary Matching on Run-Length Encoded Strings}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.21},
  URN =		{urn:nbn:de:0030-drops-231158},
  doi =		{10.4230/LIPIcs.CPM.2025.21},
  annote =	{Keywords: Dictionary matching, run-length encoding, compressed pattern matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.10

Partial Permutations Comparison, Maintenance and Applications

Authors: Avivit Levy, Ely Porat, and B. Riva Shalom

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

This paper focuses on the concept of partial permutations and their use in algorithmic tasks. A partial permutation over Σ is a bijection π_{par}: Σ₁↦Σ₂ mapping a subset Σ₁ ⊂ Σ to a subset Σ₂ ⊂ Σ, where |Σ₁| = |Σ₂| (|Σ| denotes the size of a set Σ). Intuitively, two partial permutations agree if their mapping pairs do not form conflicts. This notion, which is formally defined in this paper, enables a consistent as well as informatively rich comparison between partial permutations. We formalize the Partial Permutations Agreement problem (PPA), as follows. Given two sets A₁, A₂ of partial permutations over alphabet Σ, each of size n, output all pairs (π_i, π_j), where π_i ∈ A₁, π_j ∈ A₂ and π_i agrees with π_j. The possibility of having a data structure for efficiently maintaining a dynamic set of partial permutations enabling to retrieve agreement of partial permutations is then studied, giving both negative and positive results. Applying our study enables to point out fruitful versus futile methods for efficient genes sequences comparison in database or automatic color transformation data augmentation technique for image processing through neural networks. It also shows that an efficient solution of strict Parameterized Dictionary Matching with One Gap (PDMOG) over general dictionary alphabets is not likely, unless the Strong Exponential Time Hypothesis (SETH) fails, thus negatively answering an open question posed lately.

Cite as

Avivit Levy, Ely Porat, and B. Riva Shalom. Partial Permutations Comparison, Maintenance and Applications. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{levy_et_al:LIPIcs.CPM.2022.10,
  author =	{Levy, Avivit and Porat, Ely and Shalom, B. Riva},
  title =	{{Partial Permutations Comparison, Maintenance and Applications}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.10},
  URN =		{urn:nbn:de:0030-drops-161376},
  doi =		{10.4230/LIPIcs.CPM.2022.10},
  annote =	{Keywords: Partial permutations, Partial words, Genes comparison, Color transformation, Dictionary matching with gaps, Parameterized matching, SETH hypothesis}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2016.12

Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

Authors: Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

Abstract

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal. We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions.

Cite as

Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ISAAC.2016.12,
  author =	{Amir, Amihood and Kopelowitz, Tsvi and Levy, Avivit and Pettie, Seth and Porat, Ely and Shalom, B. Riva},
  title =	{{Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.12},
  URN =		{urn:nbn:de:0030-drops-67841},
  doi =		{10.4230/LIPIcs.ISAAC.2016.12},
  annote =	{Keywords: Pattern matching, Dictionary matching, 3SUM, Triangle reporting}
}

5 Search Results for "Shalom, B. Riva"

Circular Dictionary Matching Using Extended BWT

Abstract

Cite as

Improved Circular Dictionary Matching

Abstract

Cite as

Compressed Dictionary Matching on Run-Length Encoded Strings

Abstract

Cite as

Partial Permutations Comparison, Maintenance and Applications

Abstract

Cite as

Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

Abstract

Cite as

Thanks for your feedback!

Could not send message