DROPS

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.3

Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk)

Authors: Sharma V. Thankachan

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

In the past two decades, we have witnessed the design of various compact data structures for pattern matching over an indexed text [Navarro, 2016]. Popular indexes like the FM-index [Paolo Ferragina and Giovanni Manzini, 2005], compressed suffix arrays/trees [Roberto Grossi and Jeffrey Scott Vitter, 2005; Kunihiko Sadakane, 2007], the recent r-index [Travis Gagie et al., 2020; Takaaki Nishimoto and Yasuo Tabei, 2021], etc., capture the key functionalities of classic suffix arrays/trees [Udi Manber and Eugene W. Myers, 1993; Peter Weiner, 1973] in compact space. Mostly, they rely on the Burrows-Wheeler Transform (BWT) and its associated operations [Burrows and Wheeler, 1994]. However, compactly encoding some advanced suffix tree (ST) variants, like parameterized ST [Brenda S. Baker, 1993; S. Rao Kosaraju, 1995; Juan Mendivelso et al., 2020], order-isomorphic/preserving ST [Maxime Crochemore et al., 2016], two-dimensional ST [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998], etc. [Sung Gwan Park et al., 2019; Tetsuo Shibuya, 2000]- collectively known as suffix trees with missing suffix links [Richard Cole and Ramesh Hariharan, 2003], has been challenging. The previous techniques are not easily extendable because these variants do not hold some structural properties of the standard ST that enable compression. However, some limited progress has been made in these directions recently [Arnab Ganguly et al., 2017; Travis Gagie et al., 2017; Gianni Decaroli et al., 2017; Dhrumil Patel and Rahul Shah, 2021; Arnab Ganguly et al., 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Arnab Ganguly et al., 2017; Arnab Ganguly et al., 2022; Arnab Ganguly et al., 2021]. This talk will briefly survey them and highlight some interesting open problems.

Cite as

Sharma V. Thankachan. Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{thankachan:LIPIcs.CPM.2022.3,
  author =	{Thankachan, Sharma V.},
  title =	{{Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc.}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{3:1--3:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.3},
  URN =		{urn:nbn:de:0030-drops-161300},
  doi =		{10.4230/LIPIcs.CPM.2022.3},
  annote =	{Keywords: Text Indexing, Suffix Trees, String Matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.13

Back-To-Front Online Lyndon Forest Construction

Authors: Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

A Lyndon word is a word that is lexicographically smaller than all of its non-trivial rotations (e.g. ananas is a Lyndon word; banana is not a Lyndon word due to its smaller rotation abanan). The Lyndon forest (or equivalently Lyndon table) identifies maximal Lyndon factors of a word, and is of great combinatoric interest, e.g. when finding maximal repetitions in words. While optimal linear time algorithms for computing the Lyndon forest are known, none of them work in an online manner. We present algorithms that compute the Lyndon forest of a word in a reverse online manner, processing the input word from back to front. We assume a general ordered alphabet, i.e. the only elementary operations on symbols are comparisons of the form less-equal-greater. We start with a naive algorithm and show that, despite its quadratic worst-case behaviour, it already takes expected linear time on words drawn uniformly at random. We then introduce a much more sophisticated algorithm that takes linear time in the worst case. It borrows some ideas from the offline algorithm by Bille et al. (ICALP 2020), combined with new techniques that are necessary for the reverse online setting. While the back-to-front approach for this computation is rather natural (see Franek and Liut, PSC 2019), the steps required to achieve linear time are surprisingly intricate. We envision that our algorithm will be useful for the online computation of maximal repetitions in words.

Cite as

Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud. Back-To-Front Online Lyndon Forest Construction. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 13:1-13:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{badkobeh_et_al:LIPIcs.CPM.2022.13,
  author =	{Badkobeh, Golnaz and Crochemore, Maxime and Ellert, Jonas and Nicaud, Cyril},
  title =	{{Back-To-Front Online Lyndon Forest Construction}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{13:1--13:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.13},
  URN =		{urn:nbn:de:0030-drops-161404},
  doi =		{10.4230/LIPIcs.CPM.2022.13},
  annote =	{Keywords: Lyndon factorisation, Lyndon forest, Lyndon table, Lyndon array, right Lyndon tree, Cartesian tree, standard factorisation, online algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.22

Linear-Time Computation of Shortest Covers of All Rotations of a String

Authors: Maxime Crochemore, Costas S. Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

We show that lengths of shortest covers of all rotations of a length-n string over an integer alphabet can be computed in 𝒪(n) time in the word-RAM model, thus improving an 𝒪(n log n)-time algorithm from Crochemore et al. (Theor. Comput. Sci., 2021). Similarly as Crochemore et al., we use a relation of covers of rotations of a string S to seeds and squares in S³. The crucial parameter of a string S is the number ξ(S) of primitive covers of all rotations of S. We show first that the time complexity of the algorithm from Crochemore et al. can be slightly improved which results in time complexity Θ(ξ(S)). However, we also show that in the worst case ξ(S) is Ω(|S|log |S|). This is the main difficulty in obtaining a linear time algorithm. We overcome it and obtain yet another application of runs in strings.

Cite as

Maxime Crochemore, Costas S. Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Shortest Covers of All Rotations of a String. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.CPM.2022.22,
  author =	{Crochemore, Maxime and Iliopoulos, Costas S. and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Linear-Time Computation of Shortest Covers of All Rotations of a String}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{22:1--22:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.22},
  URN =		{urn:nbn:de:0030-drops-161495},
  doi =		{10.4230/LIPIcs.CPM.2022.22},
  annote =	{Keywords: cover, quasiperiod, cyclic rotation, seed, run}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.9

Constructing Strings Avoiding Forbidden Substrings

Authors: Giulia Bernardini, Alberto Marchetti-Spaccamela, Solon P. Pissis, Leen Stougie, and Michelle Sweering

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

We consider the problem of constructing strings over an alphabet Σ that start with a given prefix u, end with a given suffix v, and avoid occurrences of a given set of forbidden substrings. In the decision version of the problem, given a set S_k of forbidden substrings, each of length k, over Σ, we are asked to decide whether there exists a string x over Σ such that u is a prefix of x, v is a suffix of x, and no s ∈ S_k occurs in x. Our first result is an 𝒪(|u|+|v|+k|S_k|)-time algorithm to decide this problem. In the more general optimization version of the problem, given a set S of forbidden arbitrary-length substrings over Σ, we are asked to construct a shortest string x over Σ such that u is a prefix of x, v is a suffix of x, and no s ∈ S occurs in x. Our second result is an 𝒪(|u|+|v|+||S||⋅|Σ|)-time algorithm to solve this problem, where ||S|| denotes the total length of the elements of S. Interestingly, our results can be directly applied to solve the reachability and shortest path problems in complete de Bruijn graphs in the presence of forbidden edges or of forbidden paths. Our algorithms are motivated by data privacy, and in particular, by the data sanitization process. In the context of strings, sanitization consists in hiding forbidden substrings from a given string by introducing the least amount of spurious information. We consider the following problem. Given a string w of length n over Σ, an integer k, and a set S_k of forbidden substrings, each of length k, over Σ, construct a shortest string y over Σ such that no s ∈ S_k occurs in y and the sequence of all other length-k fragments occurring in w is a subsequence of the sequence of the length-k fragments occurring in y. Our third result is an 𝒪(nk|S_k|⋅|Σ|)-time algorithm to solve this problem.

Cite as

Giulia Bernardini, Alberto Marchetti-Spaccamela, Solon P. Pissis, Leen Stougie, and Michelle Sweering. Constructing Strings Avoiding Forbidden Substrings. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2021.9,
  author =	{Bernardini, Giulia and Marchetti-Spaccamela, Alberto and Pissis, Solon P. and Stougie, Leen and Sweering, Michelle},
  title =	{{Constructing Strings Avoiding Forbidden Substrings}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{9:1--9:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.9},
  URN =		{urn:nbn:de:0030-drops-139604},
  doi =		{10.4230/LIPIcs.CPM.2021.9},
  annote =	{Keywords: string algorithms, forbidden strings, de Bruijn graphs, data sanitization}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.25

Quasi-Linear-Time Algorithm for Longest Common Circular Factor

Authors: Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift occurs as a factor of T. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in O(n log^4 n) time using O(n log^2 n) space.

Cite as

Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25,
  author =	{Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{25:1--25:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25},
  URN =		{urn:nbn:de:0030-drops-104961},
  doi =		{10.4230/LIPIcs.CPM.2019.25},
  annote =	{Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles}
}

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25,
  author =	{Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{25:1--25:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25},
  URN =		{urn:nbn:de:0030-drops-104961},
  doi =		{10.4230/LIPIcs.CPM.2019.25},
  annote =	{Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.23

Linear-Time Algorithm for Long LCF with k Mismatches

Authors: Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

In the Longest Common Factor with k Mismatches (LCF_k) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in O(n log^k n) time and O(n) space for constant k. We consider the LCF_k(l) problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the LCF_k(l) problem with l=Omega(log^{2k+2}n) to a task involving m=O(n/log^{k+1}n) synchronized factors. The latter can be solved in O(m log^{k+1}m) time, which results in a linear-time algorithm for LCF_k(l) with l=Omega(log^{2k+2}n). In general, our solution to the LCF_k(l) problem for arbitrary l takes O(n + n log^{k+1} n/sqrt{l}) time.

Cite as

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Linear-Time Algorithm for Long LCF with k Mismatches. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2018.23,
  author =	{Charalampopoulos, Panagiotis and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Walen, Tomasz},
  title =	{{Linear-Time Algorithm for Long LCF with k Mismatches}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{23:1--23:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.23},
  URN =		{urn:nbn:de:0030-drops-86869},
  doi =		{10.4230/LIPIcs.CPM.2018.23},
  annote =	{Keywords: longest common factor, longest common substring, Hamming distance, heavy-light decomposition, difference cover}
}

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2018.23,
  author =	{Charalampopoulos, Panagiotis and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Walen, Tomasz},
  title =	{{Linear-Time Algorithm for Long LCF with k Mismatches}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{23:1--23:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.23},
  URN =		{urn:nbn:de:0030-drops-86869},
  doi =		{10.4230/LIPIcs.CPM.2018.23},
  annote =	{Keywords: longest common factor, longest common substring, Hamming distance, heavy-light decomposition, difference cover}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.9

Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time

Authors: Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.

Cite as

Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz. Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.WABI.2017.9,
  author =	{Crochemore, Maxime and Francisco, Alexandre P. and Pissis, Solon P. and Vaz, C\'{a}tia},
  title =	{{Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.9},
  URN =		{urn:nbn:de:0030-drops-76529},
  doi =		{10.4230/LIPIcs.WABI.2017.9},
  annote =	{Keywords: computational biology, phylogenetic inference, Hamming distance}
}

Document

DOI: 10.4230/DagRep.4.3.28

Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111)

Authors: Maxime Crochemore, James D. Currie, Gregory Kucherov, and Dirk Nowotka

Published in: Dagstuhl Reports, Volume 4, Issue 3 (2014)

Abstract

Strings (aka sequences or words) form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the Latin alphabet), in the process of heredity transmission in living cells (through DNA sequences) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and is actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, encode and decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application. The purpose of this seminar is to bring together researchers from different disciplines whose interests are string processing algorithms and related combinatorial problems on words. The two main areas of interest for this seminar are Combinatorics on Words and Stringology. This report documents the program and the outcomes of Dagstuhl Seminar 14111 "Combinatorics and Algorithmics of Strings".

Cite as

Maxime Crochemore, James D. Currie, Gregory Kucherov, and Dirk Nowotka. Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111). In Dagstuhl Reports, Volume 4, Issue 3, pp. 28-46, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@Article{crochemore_et_al:DagRep.4.3.28,
  author =	{Crochemore, Maxime and Currie, James D. and Kucherov, Gregory and Nowotka, Dirk},
  title =	{{Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111)}},
  pages =	{28--46},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2014},
  volume =	{4},
  number =	{3},
  editor =	{Crochemore, Maxime and Currie, James D. and Kucherov, Gregory and Nowotka, Dirk},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.4.3.28},
  URN =		{urn:nbn:de:0030-drops-45524},
  doi =		{10.4230/DagRep.4.3.28},
  annote =	{Keywords: combinatorics on words, string algorithms, automata}
}

Document

DOI: 10.4230/DagRep.1.2.47

Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081)

Authors: Maxime Crochemore, Lila Kari, Mehryar Mohri, and Dirk Nowotka

Published in: Dagstuhl Reports, Volume 1, Issue 2 (2011)

Abstract

Sequences form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the latin alphabet), in the process of heredity transmission in living cells (as DNA sequence) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, learn, and encode/decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application. Scientists working in the fields of Combinatorics on Words, Computational Biology, Stringology, Natural Computing, and Machine Learning were invited to consider the seminar's topic from a~wide range of perspectives. This report documents the program and the outcomes of Dagstuhl Seminar 11081 ``Combinatorial and Algorithmic Aspects of Sequence Processing''.

Cite as

Maxime Crochemore, Lila Kari, Mehryar Mohri, and Dirk Nowotka. Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081). In Dagstuhl Reports, Volume 1, Issue 2, pp. 47-66, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@Article{crochemore_et_al:DagRep.1.2.47,
  author =	{Crochemore, Maxime and Kari, Lila and Mohri, Mehryar and Nowotka, Dirk},
  title =	{{Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081)}},
  pages =	{47--66},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2011},
  volume =	{1},
  number =	{2},
  editor =	{Crochemore, Maxime and Kari, Lila and Mohri, Mehryar and Nowotka, Dirk},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.1.2.47},
  URN =		{urn:nbn:de:0030-drops-31554},
  doi =		{10.4230/DagRep.1.2.47},
  annote =	{Keywords: Combinatorics on words, computational biology, stringology, natural computing, machine learning}
}

Document

DOI: 10.4230/LIPIcs.STACS.2009.1825

Reverse Engineering Prefix Tables

Authors: Julien Clement, Maxime Crochemore, and Giuseppina Rindone

Published in: LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

Abstract

The Prefix table of a string reports for each position the maximal length of its prefixes starting here. The Prefix table and its dual Suffix table are basic tools used in the design of the most efficient string-matching and pattern extraction algorithms. These tables can be computed in linear time independently of the alphabet size. We give an algorithmic characterisation of a Prefix table (it can be adapted to a Suffix table). Namely, the algorithm tests if an integer table of size $n$ is the Prefix table of some word and, if successful, it constructs the lexicographically smallest string having it as a Prefix table. We show that the alphabet of the string can be bounded to $\log_2 n$ letters. The overall algorithm runs in $O(n)$ time.

Cite as

Julien Clement, Maxime Crochemore, and Giuseppina Rindone. Reverse Engineering Prefix Tables. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 289-300, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{clement_et_al:LIPIcs.STACS.2009.1825,
  author =	{Clement, Julien and Crochemore, Maxime and Rindone, Giuseppina},
  title =	{{Reverse Engineering Prefix Tables}},
  booktitle =	{26th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{289--300},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-09-5},
  ISSN =	{1868-8969},
  year =	{2009},
  volume =	{3},
  editor =	{Albers, Susanne and Marion, Jean-Yves},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1825},
  URN =		{urn:nbn:de:0030-drops-18258},
  doi =		{10.4230/LIPIcs.STACS.2009.1825},
  annote =	{Keywords: Design and analysis of algorithms, Algorithms on strings, Pattern matching, String matching, Combinatorics on words, Prefix table, Suffix table}
}

Document

DOI: 10.4230/LIPIcs.STACS.2008.1326

Preface -- 25th International Symposium on Theoretical Aspects of Computer Science

Authors: Susanne Albers and Pascal Weil

Published in: LIPIcs, Volume 1, 25th International Symposium on Theoretical Aspects of Computer Science (2008)

Abstract

The interest in STACS has remained at a high level over the past years. The STACS 2008 call for papers led to approximately 200 submissions from 38 countries. Each was assigned to at least three program committee members. The program committee held a 2-week long electronic meeting at the end of November, to select 54 papers. As co-chairs of this committee, we would like to sincerely thank its members and the many external referees for the valuable work they put into the reviewing process. The overall very high quality of the papers that were submitted to the conference made this selection a difficult task. We would like to express our thanks to the three invited speakers, Maxime Crochemore, Thomas Schwentick and Mihalis Yannakakis, for their contributions to the proceedings. Special thanks are due to A. Voronkov for his EasyChair software (www.easychair.org) which gives the organisers of conferences such as STACS a remarkable level of comfort; to Ralf Klasing for helping us explore the many possibilities of this brilliant software; to Emilka Bojanczyk for the design of the STACS poster, proceedings and logo; and to the members of the Organizing Committee, chaired by David Janin. An innovation in this year's STACS is the electronic format of the publication. A printed version was also available at the conference, with ISBN 978-3-939897-06-4. The electronic proceedings are available through several portals, and in particular through HAL and DROPS. HAL is an electronic repository managed by several French research agencies, and DROPS is the Dagstuhl Research Online Publication Server. We want to thank both these servers for hosting the proceedings of STACS and guaranteeing them perennial availability. The rights on the articles in the proceedings are kept with the authors and the papers are available freely, under a Creative Commons license (see www.stacs-conf.org/faq.html for more details).

Cite as

Susanne Albers and Pascal Weil. Preface -- 25th International Symposium on Theoretical Aspects of Computer Science. In 25th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 1, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{albers_et_al:LIPIcs.STACS.2008.1326,
  author =	{Albers, Susanne and Weil, Pascal},
  title =	{{Preface -- 25th International Symposium on Theoretical Aspects of Computer Science}},
  booktitle =	{25th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{1--10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-06-4},
  ISSN =	{1868-8969},
  year =	{2008},
  volume =	{1},
  editor =	{Albers, Susanne and Weil, Pascal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2008.1326},
  URN =		{urn:nbn:de:0030-drops-13263},
  doi =		{10.4230/LIPIcs.STACS.2008.1326},
  annote =	{Keywords: Symposium, theoretical computer science, algorithms and data structures, automata and formal languages, computational and structural complexity, logic in computer science, applications}
}

Document

DOI: 10.4230/LIPIcs.STACS.2008.1344

Understanding Maximal Repetitions in Strings

Authors: Maxime Crochemore and Lucian Ilie

Published in: LIPIcs, Volume 1, 25th International Symposium on Theoretical Aspects of Computer Science (2008)

Abstract

The cornerstone of any algorithm computing all repetitions in a string of length $n$ in ${mathcal O(n)$ time is the fact that the number of runs (or maximal repetitions) is ${mathcal O(n)$. We give a simple proof of this result. As a consequence of our approach, the stronger result concerning the linearity of the sum of exponents of all runs follows easily.

Cite as

Maxime Crochemore and Lucian Ilie. Understanding Maximal Repetitions in Strings. In 25th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 1, pp. 11-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.STACS.2008.1344,
  author =	{Crochemore, Maxime and Ilie, Lucian},
  title =	{{Understanding Maximal Repetitions in Strings}},
  booktitle =	{25th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{11--16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-06-4},
  ISSN =	{1868-8969},
  year =	{2008},
  volume =	{1},
  editor =	{Albers, Susanne and Weil, Pascal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2008.1344},
  URN =		{urn:nbn:de:0030-drops-13442},
  doi =		{10.4230/LIPIcs.STACS.2008.1344},
  annote =	{Keywords: Combinatorics on words, repetitions in strings, runs, maximal repetitions, maximal periodicities, sum of exponents}
}

Document

DOI: 10.4230/LIPIcs.STACS.2008.1359

Improved Algorithms for the Range Next Value Problem and Applications

Authors: Costas S. Iliopoulos, Maxime Crochemore, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen

Published in: LIPIcs, Volume 1, 25th International Symposium on Theoretical Aspects of Computer Science (2008)

Abstract

The Range Next Value problem (Problem RNV) is a recent interesting variant of the range search problems, where the query is for the immediate next (or equal) value of a given number within a given interval of an array. Problem RNV was introduced and studied very recently by Crochemore et. al [Finding Patterns In Given Intervals, MFCS 2007]. In this paper, we present improved algorithms for Problem RNV. We also show how this problem can be used to achieve optimal query time for a number of interesting variants of the classic pattern matching problems.

Cite as

Costas S. Iliopoulos, Maxime Crochemore, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen. Improved Algorithms for the Range Next Value Problem and Applications. In 25th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 1, pp. 205-216, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.STACS.2008.1359,
  author =	{Iliopoulos, Costas S. and Crochemore, Maxime and Kubica, Marcin and Rahman, M. Sohel and Walen, Tomasz},
  title =	{{Improved Algorithms for the Range Next Value Problem and Applications}},
  booktitle =	{25th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{205--216},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-06-4},
  ISSN =	{1868-8969},
  year =	{2008},
  volume =	{1},
  editor =	{Albers, Susanne and Weil, Pascal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2008.1359},
  URN =		{urn:nbn:de:0030-drops-13596},
  doi =		{10.4230/LIPIcs.STACS.2008.1359},
  annote =	{Keywords: Algorithms, Data structures}
}

13 Search Results for "Crochemore, Maxime"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message