DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2024.10

Searching 2D-Strings for Matching Frames

Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

We study a natural type of repetitions in 2-dimensional strings. Such a repetition, called a matching frame, is a rectangular substring of size at least 2× 2 with equal marginal rows and equal marginal columns. Matching frames first appeared in literature in the context of Wang tiles. We present two algorithms finding a matching frame with the maximum perimeter in a given n× m input string. The first algorithm solves the problem exactly in Õ(n^{2.5}) time (assuming n ≥ m). The second algorithm finds a (1-ε)-approximate solution in Õ((nm)/ε⁴) time, which is near linear in the size of the input for constant ε. In particular, by setting ε = O(1) the second algorithm decides the existence of a matching frame in a given string in Õ(nm) time. Some technical elements and structural properties used in these algorithms can be of independent interest.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur. Searching 2D-Strings for Matching Frames. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.10,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan and Micl\u{a}u\c{s}, Adrian and Shur, Arseny},
  title =	{{Searching 2D-Strings for Matching Frames}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.10},
  URN =		{urn:nbn:de:0030-drops-201205},
  doi =		{10.4230/LIPIcs.CPM.2024.10},
  annote =	{Keywords: 2D string, matching frame, LCP, multidimensional range query}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.11

Hairpin Completion Distance Lower Bound

Authors: Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

Hairpin completion, derived from the hairpin formation observed in DNA biochemistry, is an operation applied to strings, particularly useful in DNA computing. Conceptually, a right hairpin completion operation transforms a string S into S⋅ S' where S' is the reverse complement of a prefix of S. Similarly, a left hairpin completion operation transforms a string S into S'⋅ S where S' is the reverse complement of a suffix of S. The hairpin completion distance from S to T is the minimum number of hairpin completion operations needed to transform S into T. Recently Boneh et al. [Itai Boneh et al., 2023] showed an O(n²) time algorithm for finding the hairpin completion distance between two strings of length at most n. In this paper we show that for any ε > 0 there is no O(n^{2-ε})-time algorithm for the hairpin completion distance problem unless the Strong Exponential Time Hypothesis (SETH) is false. Thus, under SETH, the time complexity of the hairpin completion distance problem is quadratic, up to sub-polynomial factors.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus. Hairpin Completion Distance Lower Bound. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.11,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan},
  title =	{{Hairpin Completion Distance Lower Bound}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{11:1--11:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.11},
  URN =		{urn:nbn:de:0030-drops-201215},
  doi =		{10.4230/LIPIcs.CPM.2024.11},
  annote =	{Keywords: Fine-grained complexity, Hairpin completion, LCS}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.19

String Factorization via Prefix Free Families

Authors: Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equality-free if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution. Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation. The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equality-free factorization problem.

Cite as

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia. String Factorization via Prefix Free Families. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kraus_et_al:LIPIcs.CPM.2023.19,
  author =	{Kraus, Matan and Lewenstein, Moshe and Popa, Alexandru and Porat, Ely and Sadia, Yonathan},
  title =	{{String Factorization via Prefix Free Families}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.19},
  URN =		{urn:nbn:de:0030-drops-179738},
  doi =		{10.4230/LIPIcs.CPM.2023.19},
  annote =	{Keywords: string factorization, NP-hard problem, approximation algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.5

Time-Space Tradeoffs for Finding a Long Common Substring

Authors: Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic 𝒪(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require Θ(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n^{2/3} ≤ s ≤ n, the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/s) time. Kociumaka et al. (ESA 2014) generalized this tradeoff to 1 ≤ s ≤ n, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1 ≤ s ≤ n, we show that the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/(L⋅s) +n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

Cite as

Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus. Time-Space Tradeoffs for Finding a Long Common Substring. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bennun_et_al:LIPIcs.CPM.2020.5,
  author =	{Ben-Nun, Stav and Golan, Shay and Kociumaka, Tomasz and Kraus, Matan},
  title =	{{Time-Space Tradeoffs for Finding a Long Common Substring}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{5:1--5:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.5},
  URN =		{urn:nbn:de:0030-drops-121302},
  doi =		{10.4230/LIPIcs.CPM.2020.5},
  annote =	{Keywords: longest common substring, time-space tradeoff, local consistency, periodicity}
}

Search Results

Documents authored by Kraus, Matan

Searching 2D-Strings for Matching Frames

Abstract

Cite as

Hairpin Completion Distance Lower Bound

Abstract

Cite as

String Factorization via Prefix Free Families

Abstract

Cite as

Time-Space Tradeoffs for Finding a Long Common Substring

Abstract

Cite as