Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

A Lyndon word is a word that is lexicographically smaller than all of its non-trivial rotations (e.g. ananas is a Lyndon word; banana is not a Lyndon word due to its smaller rotation abanan). The Lyndon forest (or equivalently Lyndon table) identifies maximal Lyndon factors of a word, and is of great combinatoric interest, e.g. when finding maximal repetitions in words. While optimal linear time algorithms for computing the Lyndon forest are known, none of them work in an online manner. We present algorithms that compute the Lyndon forest of a word in a reverse online manner, processing the input word from back to front. We assume a general ordered alphabet, i.e. the only elementary operations on symbols are comparisons of the form less-equal-greater. We start with a naive algorithm and show that, despite its quadratic worst-case behaviour, it already takes expected linear time on words drawn uniformly at random. We then introduce a much more sophisticated algorithm that takes linear time in the worst case. It borrows some ideas from the offline algorithm by Bille et al. (ICALP 2020), combined with new techniques that are necessary for the reverse online setting. While the back-to-front approach for this computation is rather natural (see Franek and Liut, PSC 2019), the steps required to achieve linear time are surprisingly intricate. We envision that our algorithm will be useful for the online computation of maximal repetitions in words.

Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud. Back-To-Front Online Lyndon Forest Construction. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 13:1-13:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{badkobeh_et_al:LIPIcs.CPM.2022.13, author = {Badkobeh, Golnaz and Crochemore, Maxime and Ellert, Jonas and Nicaud, Cyril}, title = {{Back-To-Front Online Lyndon Forest Construction}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {13:1--13:23}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.13}, URN = {urn:nbn:de:0030-drops-161404}, doi = {10.4230/LIPIcs.CPM.2022.13}, annote = {Keywords: Lyndon factorisation, Lyndon forest, Lyndon table, Lyndon array, right Lyndon tree, Cartesian tree, standard factorisation, online algorithms} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We show that lengths of shortest covers of all rotations of a length-n string over an integer alphabet can be computed in 𝒪(n) time in the word-RAM model, thus improving an 𝒪(n log n)-time algorithm from Crochemore et al. (Theor. Comput. Sci., 2021). Similarly as Crochemore et al., we use a relation of covers of rotations of a string S to seeds and squares in S³. The crucial parameter of a string S is the number ξ(S) of primitive covers of all rotations of S. We show first that the time complexity of the algorithm from Crochemore et al. can be slightly improved which results in time complexity Θ(ξ(S)). However, we also show that in the worst case ξ(S) is Ω(|S|log |S|). This is the main difficulty in obtaining a linear time algorithm. We overcome it and obtain yet another application of runs in strings.

Maxime Crochemore, Costas S. Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Shortest Covers of All Rotations of a String. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.CPM.2022.22, author = {Crochemore, Maxime and Iliopoulos, Costas S. and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Linear-Time Computation of Shortest Covers of All Rotations of a String}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {22:1--22:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.22}, URN = {urn:nbn:de:0030-drops-161495}, doi = {10.4230/LIPIcs.CPM.2022.22}, annote = {Keywords: cover, quasiperiod, cyclic rotation, seed, run} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift occurs as a factor of T. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in O(n log^4 n) time using O(n log^2 n) space.

Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25, author = {Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {25:1--25:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25}, URN = {urn:nbn:de:0030-drops-104961}, doi = {10.4230/LIPIcs.CPM.2019.25}, annote = {Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

In the Longest Common Factor with k Mismatches (LCF_k) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in O(n log^k n) time and O(n) space for constant k. We consider the LCF_k(l) problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the LCF_k(l) problem with l=Omega(log^{2k+2}n) to a task involving m=O(n/log^{k+1}n) synchronized factors. The latter can be solved in O(m log^{k+1}m) time, which results in a linear-time algorithm for LCF_k(l) with l=Omega(log^{2k+2}n). In general, our solution to the LCF_k(l) problem for arbitrary l takes O(n + n log^{k+1} n/sqrt{l}) time.

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Linear-Time Algorithm for Long LCF with k Mismatches. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2018.23, author = {Charalampopoulos, Panagiotis and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Walen, Tomasz}, title = {{Linear-Time Algorithm for Long LCF with k Mismatches}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {23:1--23:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.23}, URN = {urn:nbn:de:0030-drops-86869}, doi = {10.4230/LIPIcs.CPM.2018.23}, annote = {Keywords: longest common factor, longest common substring, Hamming distance, heavy-light decomposition, difference cover} }

Document

**Published in:** LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.

Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz. Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.WABI.2017.9, author = {Crochemore, Maxime and Francisco, Alexandre P. and Pissis, Solon P. and Vaz, C\'{a}tia}, title = {{Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time}}, booktitle = {17th International Workshop on Algorithms in Bioinformatics (WABI 2017)}, pages = {9:1--9:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-050-7}, ISSN = {1868-8969}, year = {2017}, volume = {88}, editor = {Schwartz, Russell and Reinert, Knut}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.9}, URN = {urn:nbn:de:0030-drops-76529}, doi = {10.4230/LIPIcs.WABI.2017.9}, annote = {Keywords: computational biology, phylogenetic inference, Hamming distance} }

Document

**Published in:** Dagstuhl Reports, Volume 4, Issue 3 (2014)

Strings (aka sequences or words) form the most basic and natural data structure.
They occur whenever information is electronically transmitted (as bit streams),
when natural language text is spoken or written down (as words over, for example, the Latin alphabet), in the process of heredity transmission in living cells (through DNA sequences) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and is actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, encode and decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application.
The purpose of this seminar is to bring together researchers from different disciplines whose interests are string processing algorithms and related combinatorial problems on words. The two main areas of interest for this seminar are Combinatorics on Words and Stringology. This report documents the program and the outcomes of Dagstuhl Seminar 14111 "Combinatorics and Algorithmics of Strings".

Maxime Crochemore, James D. Currie, Gregory Kucherov, and Dirk Nowotka. Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111). In Dagstuhl Reports, Volume 4, Issue 3, pp. 28-46, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@Article{crochemore_et_al:DagRep.4.3.28, author = {Crochemore, Maxime and Currie, James D. and Kucherov, Gregory and Nowotka, Dirk}, title = {{Combinatorics and Algorithmics of Strings (Dagstuhl Seminar 14111)}}, pages = {28--46}, journal = {Dagstuhl Reports}, ISSN = {2192-5283}, year = {2014}, volume = {4}, number = {3}, editor = {Crochemore, Maxime and Currie, James D. and Kucherov, Gregory and Nowotka, Dirk}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagRep.4.3.28}, URN = {urn:nbn:de:0030-drops-45524}, doi = {10.4230/DagRep.4.3.28}, annote = {Keywords: combinatorics on words, string algorithms, automata} }

Document

**Published in:** Dagstuhl Reports, Volume 1, Issue 2 (2011)

Sequences form the most basic and natural data
structure. They occur whenever information is electronically
transmitted (as bit streams), when natural language text is spoken or
written down (as words over, for example, the latin alphabet), in the
process of heredity transmission in living cells (as DNA sequence) or
the protein synthesis (as sequence of amino acids), and in many more
different contexts. Given this universal form of representing
information, the need to process strings is apparent and actually a
core purpose of computer use. Algorithms to efficiently search
through, analyze, (de-)compress, match, learn, and encode/decode strings
are therefore of chief interest. Combinatorial problems about strings
lie at the core of such algorithmic questions. Many such combinatorial
problems are common in the string processing efforts in the different
fields of application.
Scientists working in the fields of Combinatorics on Words, Computational Biology, Stringology, Natural Computing, and Machine Learning were invited to consider the seminar's topic from a~wide range of perspectives. This report documents the program and the outcomes of Dagstuhl Seminar 11081
``Combinatorial and Algorithmic Aspects of Sequence Processing''.

Maxime Crochemore, Lila Kari, Mehryar Mohri, and Dirk Nowotka. Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081). In Dagstuhl Reports, Volume 1, Issue 2, pp. 47-66, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@Article{crochemore_et_al:DagRep.1.2.47, author = {Crochemore, Maxime and Kari, Lila and Mohri, Mehryar and Nowotka, Dirk}, title = {{Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081)}}, pages = {47--66}, journal = {Dagstuhl Reports}, ISSN = {2192-5283}, year = {2011}, volume = {1}, number = {2}, editor = {Crochemore, Maxime and Kari, Lila and Mohri, Mehryar and Nowotka, Dirk}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagRep.1.2.47}, URN = {urn:nbn:de:0030-drops-31554}, doi = {10.4230/DagRep.1.2.47}, annote = {Keywords: Combinatorics on words, computational biology, stringology, natural computing, machine learning} }

Document

**Published in:** LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

The Prefix table of a string reports for each position the maximal length of its prefixes starting here. The Prefix table and its dual Suffix table are basic tools used in the design of the most efficient string-matching and pattern extraction algorithms. These tables can be computed in linear time independently of the alphabet size.
We give an algorithmic characterisation of a Prefix table (it can be adapted to a Suffix table). Namely, the algorithm tests if an integer table of size $n$ is the Prefix table of some word and, if successful, it constructs the lexicographically smallest string having it as a Prefix table. We show that the alphabet of the string can be bounded to $\log_2 n$ letters. The overall algorithm runs in $O(n)$ time.

Julien Clement, Maxime Crochemore, and Giuseppina Rindone. Reverse Engineering Prefix Tables. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 289-300, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{clement_et_al:LIPIcs.STACS.2009.1825, author = {Clement, Julien and Crochemore, Maxime and Rindone, Giuseppina}, title = {{Reverse Engineering Prefix Tables}}, booktitle = {26th International Symposium on Theoretical Aspects of Computer Science}, pages = {289--300}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-09-5}, ISSN = {1868-8969}, year = {2009}, volume = {3}, editor = {Albers, Susanne and Marion, Jean-Yves}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1825}, URN = {urn:nbn:de:0030-drops-18258}, doi = {10.4230/LIPIcs.STACS.2009.1825}, annote = {Keywords: Design and analysis of algorithms, Algorithms on strings, Pattern matching, String matching, Combinatorics on words, Prefix table, Suffix table} }

Document

**Published in:** LIPIcs, Volume 1, 25th International Symposium on Theoretical Aspects of Computer Science (2008)

The cornerstone of any algorithm computing all repetitions in a
string of length $n$ in ${mathcal O(n)$ time is the fact that
the number of runs (or maximal repetitions) is ${mathcal O(n)$.
We give a simple proof of this result. As a consequence of our
approach, the stronger result concerning the linearity of the sum
of exponents of all runs follows easily.

Maxime Crochemore and Lucian Ilie. Understanding Maximal Repetitions in Strings. In 25th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 1, pp. 11-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.STACS.2008.1344, author = {Crochemore, Maxime and Ilie, Lucian}, title = {{Understanding Maximal Repetitions in Strings}}, booktitle = {25th International Symposium on Theoretical Aspects of Computer Science}, pages = {11--16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-06-4}, ISSN = {1868-8969}, year = {2008}, volume = {1}, editor = {Albers, Susanne and Weil, Pascal}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2008.1344}, URN = {urn:nbn:de:0030-drops-13442}, doi = {10.4230/LIPIcs.STACS.2008.1344}, annote = {Keywords: Combinatorics on words, repetitions in strings, runs, maximal repetitions, maximal periodicities, sum of exponents} }

Document

**Published in:** LIPIcs, Volume 1, 25th International Symposium on Theoretical Aspects of Computer Science (2008)

The Range Next Value problem (Problem RNV) is a recent interesting
variant of the range search problems, where the query is for the
immediate next (or equal) value of a given number within a given
interval of an array. Problem RNV was introduced and studied very
recently by Crochemore et. al [Finding Patterns In Given
Intervals, MFCS 2007]. In this paper, we present improved
algorithms for Problem RNV. We also show how this problem can be
used to achieve optimal query time for a number of interesting
variants of the classic pattern matching problems.

Costas S. Iliopoulos, Maxime Crochemore, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen. Improved Algorithms for the Range Next Value Problem and Applications. In 25th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 1, pp. 205-216, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.STACS.2008.1359, author = {Iliopoulos, Costas S. and Crochemore, Maxime and Kubica, Marcin and Rahman, M. Sohel and Walen, Tomasz}, title = {{Improved Algorithms for the Range Next Value Problem and Applications}}, booktitle = {25th International Symposium on Theoretical Aspects of Computer Science}, pages = {205--216}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-06-4}, ISSN = {1868-8969}, year = {2008}, volume = {1}, editor = {Albers, Susanne and Weil, Pascal}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2008.1359}, URN = {urn:nbn:de:0030-drops-13596}, doi = {10.4230/LIPIcs.STACS.2008.1359}, annote = {Keywords: Algorithms, Data structures} }