DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2025.20

Text Indexing for Simple Regular Expressions

Authors: Hideo Bannai, Philip Bille, Inge Li Gørtz, Gad M. Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, and Simon Rumle Tarnow

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

We study the problem of indexing a text T[1..n] ∈ Σⁿ so that, later, given a query regular expression pattern R of size m = |R|, we can report all the occ substrings T[i..j] of T matching R. The problem is known to be hard for arbitrary patterns R, so in this paper, we consider the following two types of patterns. (1) Character-class Kleene-star patterns of the form P₁ D^* P₂, where P₁ and P₂ are strings and D = {c₁, …, c_k} ⊂ Σ is a character-class (shorthand for the regular expression (c₁ | c₂ | ⋯ | c_k)) and (2) String Kleene-star patterns of the form P₁ P^* P₂ where P, P₁ and P₂ are strings. In case (1), we describe an index of O(nlog^{1+ε}n) space (for any constant ε > 0) solving queries in time O(m + log n/log log n + occ) on constant-sized alphabets. We also describe a general solution for any alphabet size. This result is conditioned on the existence of an anchor: a character of P₁P₂ that does not belong to D. We justify this assumption by proving that no efficient indexing solution can exist if an anchor is not present unless the Set Disjointness Conjecture fails. In case (2), we describe an index of size O(n) answering queries in time O(m + (occ+1)log^{ε}n) on any alphabet size.

Cite as

Hideo Bannai, Philip Bille, Inge Li Gørtz, Gad M. Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, and Simon Rumle Tarnow. Text Indexing for Simple Regular Expressions. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 20:1-20:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2025.20,
  author =	{Bannai, Hideo and Bille, Philip and G{\o}rtz, Inge Li and Landau, Gad M. and Navarro, Gonzalo and Prezza, Nicola and Steiner, Teresa Anna and Tarnow, Simon Rumle},
  title =	{{Text Indexing for Simple Regular Expressions}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{20:1--20:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.20},
  URN =		{urn:nbn:de:0030-drops-231143},
  doi =		{10.4230/LIPIcs.CPM.2025.20},
  annote =	{Keywords: Text indexing, regular expressions, data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.13

Order-Preserving Squares in Strings

Authors: Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

An order-preserving square in a string is a fragment of the form uv where u ≠ v and u is order-isomorphic to v. We show that a string w of length n over an alphabet of size σ contains 𝒪(σn) order-preserving squares that are distinct as words. This improves the upper bound of 𝒪(σ²n) by Kociumaka, Radoszewski, Rytter, and Waleń [TCS 2016]. Further, for every σ and n we exhibit a string with Ω(σn) order-preserving squares that are distinct as words, thus establishing that our upper bound is asymptotically tight. Finally, we design an 𝒪(σn) time algorithm that outputs all order-preserving squares that occur in a given string and are distinct as words. By our lower bound, this is optimal in the worst case.

Cite as

Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. Order-Preserving Squares in Strings. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 13:1-13:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2023.13,
  author =	{Gawrychowski, Pawe{\l} and Ghazawi, Samah and Landau, Gad M.},
  title =	{{Order-Preserving Squares in Strings}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{13:1--13:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.13},
  URN =		{urn:nbn:de:0030-drops-179676},
  doi =		{10.4230/LIPIcs.CPM.2023.13},
  annote =	{Keywords: repetitions, distinct squares, order-isomorphism}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.3

Double String Tandem Repeats

Authors: Amihood Amir, Ayelet Butman, Gad M. Landau, Shoshana Marcus, and Dina Sokol

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

A tandem repeat is an occurrence of two adjacent identical substrings. In this paper, we introduce the notion of a double string, which consists of two parallel strings, and we study the problem of locating all tandem repeats in a double string. The problem introduced here has applications beyond actual double strings, as we illustrate by solving two different problems with the algorithm of the double string tandem repeats problem. The first problem is that of finding all corner-sharing tandems in a 2-dimensional text, defined by Apostolico and Brimkov. The second problem is that of finding all scaled tandem repeats in a 1d text, where a scaled tandem repeat is defined as a string UU' such that U' is discrete scale of U. In addition to the algorithms for exact tandem repeats, we also present algorithms that solve the problem in the inexact sense, allowing up to k mismatches. We believe that this framework will open a new perspective for other problems in the future.

Cite as

Amihood Amir, Ayelet Butman, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Double String Tandem Repeats. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2020.3,
  author =	{Amir, Amihood and Butman, Ayelet and Landau, Gad M. and Marcus, Shoshana and Sokol, Dina},
  title =	{{Double String Tandem Repeats}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{3:1--3:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.3},
  URN =		{urn:nbn:de:0030-drops-121283},
  doi =		{10.4230/LIPIcs.CPM.2020.3},
  annote =	{Keywords: double string, tandem repeat, 2-dimensional, scale}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.14

On Indeterminate Strings Matching

Authors: Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

Given two indeterminate equal-length strings p and t with a set of characters per position in both strings, we obtain a determinate string p_w from p and a determinate string t_w from t by choosing one character per position. Then, we say that p and t match when p_w and t_w match for some choice of the characters. While the most standard notion of a match for determinate strings is that they are simply identical, in certain applications it is more appropriate to use other definitions, with the prime examples being parameterized matching, order-preserving matching, and the recently introduced Cartesian tree matching. We provide a systematic study of the complexity of string matching for indeterminate equal-length strings, for different notions of matching. We use n to denote the length of both strings, and r to be an upper-bound on the number of uncertain characters per position. First, we provide the first polynomial time algorithm for the Cartesian tree version that runs in deterministic 𝒪(nlog² n) and expected 𝒪(nlog nlog log n) time using 𝒪(nlog n) space, for constant r. Second, we establish NP-hardness of the order-preserving version for r=2, thus solving a question explicitly stated by Henriques et al. [CPM 2018], who showed hardness for r=3. Third, we establish NP-hardness of the parameterized version for r=2. As both parameterized and order-preserving indeterminate matching reduce to the standard determinate matching for r=1, this provides a complete classification for these three variants.

Cite as

Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. On Indeterminate Strings Matching. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 14:1-14:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2020.14,
  author =	{Gawrychowski, Pawe{\l} and Ghazawi, Samah and Landau, Gad M.},
  title =	{{On Indeterminate Strings Matching}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{14:1--14:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.14},
  URN =		{urn:nbn:de:0030-drops-121393},
  doi =		{10.4230/LIPIcs.CPM.2020.14},
  annote =	{Keywords: string matching, indeterminate strings, Cartesian trees, order-preserving matching, parameterized matching}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.4

Top Tree Compression of Tries

Authors: Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, Gad M. Landau, and Oren Weimann

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size sigma into a compressed data structure of worst-case optimal size O(n/log_sigma n) that given a pattern string P of length m determines if P is a prefix of one of the strings in time O(min(m log sigma,m + log n)). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use Omega(n) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

Cite as

Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Top Tree Compression of Tries. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ISAAC.2019.4,
  author =	{Bille, Philip and Gawrychowski, Pawe{\l} and G{\o}rtz, Inge Li and Landau, Gad M. and Weimann, Oren},
  title =	{{Top Tree Compression of Tries}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.4},
  URN =		{urn:nbn:de:0030-drops-115000},
  doi =		{10.4230/LIPIcs.ISAAC.2019.4},
  annote =	{Keywords: pattern matching, tree compression, top trees, pointer machine}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.16

Cartesian Tree Matching and Indexing

Authors: Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.

Cite as

Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian Tree Matching and Indexing. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{park_et_al:LIPIcs.CPM.2019.16,
  author =	{Park, Sung Gwan and Amir, Amihood and Landau, Gad M. and Park, Kunsoo},
  title =	{{Cartesian Tree Matching and Indexing}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.16},
  URN =		{urn:nbn:de:0030-drops-104879},
  doi =		{10.4230/LIPIcs.CPM.2019.16},
  annote =	{Keywords: Cartesian tree matching, Pattern matching, Indexing, Parent-distance representation}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2018.66

Fast Entropy-Bounded String Dictionary Look-Up with Mismatches

Authors: Pawel Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)

Abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of d strings of length m and an integer k, we must preprocess it into a data structure to answer the following queries: Given a query string Q of length m, find all strings in the dictionary that are at Hamming distance at most k from Q. Chan and Lewenstein (CPM 2015) showed a data structure for k = 1 with optimal query time O(m/w + occ), where w is the size of a machine word and occ is the size of the output. The data structure occupies O(w d log^{1+epsilon} d) extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values k. Namely, we give a data structure that has O(m/w + log^k d + occ) query time and uses O(w d log^k d) extra bits of space.

Cite as

Pawel Gawrychowski, Gad M. Landau, and Tatiana Starikovskaya. Fast Entropy-Bounded String Dictionary Look-Up with Mismatches. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 66:1-66:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.MFCS.2018.66,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Starikovskaya, Tatiana},
  title =	{{Fast Entropy-Bounded String Dictionary Look-Up with Mismatches}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{66:1--66:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.66},
  URN =		{urn:nbn:de:0030-drops-96486},
  doi =		{10.4230/LIPIcs.MFCS.2018.66},
  annote =	{Keywords: Dictionary look-up, Hamming distance, compact data structures}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.2

Two-Dimensional Maximal Repetitions

Authors: Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

Maximal repetitions or runs in strings have a wide array of applications and thus have been extensively studied. In this paper, we extend this notion to 2-dimensions, precisely defining a maximal 2D repetition. We provide initial bounds on the number of maximal 2D repetitions that can occur in a matrix. The main contribution of this paper is the presentation of the first algorithm for locating all maximal 2D repetitions in a matrix. The algorithm is efficient and straightforward, with runtime O(n^2 log n log log n+ rho log n), where n^2 is the size of the input, and rho is the number of 2D repetitions in the output.

Cite as

Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Two-Dimensional Maximal Repetitions. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2018.2,
  author =	{Amir, Amihood and Landau, Gad M. and Marcus, Shoshana and Sokol, Dina},
  title =	{{Two-Dimensional Maximal Repetitions}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{2:1--2:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.2},
  URN =		{urn:nbn:de:0030-drops-94652},
  doi =		{10.4230/LIPIcs.ESA.2018.2},
  annote =	{Keywords: pattern matching algorithms, repetitions, periodicity, two-dimensional}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.63

A Faster Construction of Greedy Consensus Trees

Authors: Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

A consensus tree is a phylogenetic tree that captures the similarity between a set of conflicting phylogenetic trees. The problem of computing a consensus tree is a major step in phylogenetic tree reconstruction. It is also central for predicting a species tree from a set of gene trees, as indicated recently in [Nature 2013]. This paper focuses on two of the most well-known and widely used consensus tree methods: the greedy consensus tree and the frequency difference consensus tree. Given k conflicting trees each with n leaves, the previous fastest algorithms for these problems were O(k n^2) for the greedy consensus tree [J. ACM 2016] and O~(min{k n^2, k^2n}) for the frequency difference consensus tree [ACM TCBB 2016]. We improve these running times to O~(k n^{1.5}) and O~(k n) respectively.

Cite as

Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann. A Faster Construction of Greedy Consensus Trees. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 63:1-63:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2018.63,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Sung, Wing-Kin and Weimann, Oren},
  title =	{{A Faster Construction of Greedy Consensus Trees}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{63:1--63:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.63},
  URN =		{urn:nbn:de:0030-drops-90676},
  doi =		{10.4230/LIPIcs.ICALP.2018.63},
  annote =	{Keywords: phylogenetic trees, greedy consensus trees, dynamic trees}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.25

The Nearest Colored Node in a Tree

Authors: Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

We start a systematic study of data structures for the nearest colored node problem on trees. Given a tree with colored nodes and weighted edges, we want to answer queries (v,c) asking for the nearest node to node v that has color c. This is a natural generalization of the well-known nearest marked ancestor problem. We give an O(n)-space O(log log n)-query solution and show that this is optimal. We also consider the dynamic case where updates can change a node's color and show that in O(n) space we can support both updates and queries in O(log n) time. We complement this by showing that O(polylog n) update time implies Omega(log n \ log log n) query time. Finally, we consider the case where updates can change the edges of the tree (link-cut operations). There is a known (top-tree based) solution that requires update time that is roughly linear in the number of colors. We show that this solution is probably optimal by showing that a strictly sublinear update time implies a strictly subcubic time algorithm for the classical all pairs shortest paths problem on a general graph. We also consider versions where the tree is rooted, and the query asks for the nearest ancestor/descendant of node v that has color c, and present efficient data structures for both variants in the static and the dynamic setting.

Cite as

Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann. The Nearest Colored Node in a Tree. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 25:1-25:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.25,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Mozes, Shay and Weimann, Oren},
  title =	{{The Nearest Colored Node in a Tree}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{25:1--25:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.25},
  URN =		{urn:nbn:de:0030-drops-60674},
  doi =		{10.4230/LIPIcs.CPM.2016.25},
  annote =	{Keywords: Marked ancestor, Vertex-label distance oracles, Nearest colored descend- ant, Top-trees}
}

Document

DOI: 10.4230/LIPIcs.STACS.2009.1804

A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression

Authors: Danny Hermelin, Gad M. Landau, Shir Landau, and Oren Weimann

Published in: LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

Abstract

The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic-programming solution for this problem computes the edit-distance between a pair of strings of total length $O(N)$ in $O(N^2)$ time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known $o(N^2)$ edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using \emph{straight-line programs}. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods. For two strings of total length $N$ having straight-line program representations of total size $n$, we present an algorithm running in $O(n^{1.4}N^{1.2})$ time for computing the edit-distance of these two strings under any rational scoring function, and an $O(n^{1.34}N^{1.34})$-time algorithm for arbitrary scoring functions. This improves on a recent algorithm of Tiskin that runs in $O(nN^{1.5})$ time, and works only for rational scoring functions.

Cite as

Danny Hermelin, Gad M. Landau, Shir Landau, and Oren Weimann. A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 529-540, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{hermelin_et_al:LIPIcs.STACS.2009.1804,
  author =	{Hermelin, Danny and Landau, Gad M. and Landau, Shir and Weimann, Oren},
  title =	{{A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression}},
  booktitle =	{26th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{529--540},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-09-5},
  ISSN =	{1868-8969},
  year =	{2009},
  volume =	{3},
  editor =	{Albers, Susanne and Marion, Jean-Yves},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1804},
  URN =		{urn:nbn:de:0030-drops-18040},
  doi =		{10.4230/LIPIcs.STACS.2009.1804},
  annote =	{Keywords: Edit distance, Straight-line Programs, Dynamic programming acceleration via compression, Combinatorial pattern matching}
}

Search Results

Documents authored by Landau, Gad M.

Text Indexing for Simple Regular Expressions

Abstract

Cite as

Order-Preserving Squares in Strings

Abstract

Cite as

Double String Tandem Repeats

Abstract

Cite as

On Indeterminate Strings Matching

Abstract

Cite as

Top Tree Compression of Tries

Abstract

Cite as

Cartesian Tree Matching and Indexing

Abstract

Cite as

Fast Entropy-Bounded String Dictionary Look-Up with Mismatches

Abstract

Cite as

Two-Dimensional Maximal Repetitions

Abstract

Cite as

A Faster Construction of Greedy Consensus Trees

Abstract

Cite as

The Nearest Colored Node in a Tree

Abstract

Cite as

A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression

Abstract

Cite as

Thanks for your feedback!

Could not send message