Document

**Published in:** LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph’s paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with n nodes and m edges labeled over an alphabet of size σ. The first step is to transform the input graph G into a deterministic Wheeler pseudoforest G_{is} with O(n) edges encoding the lexicographically- smallest and largest strings entering in each node of the original graph. Using state-of-the-art algorithms, this step runs in O(min{mlog n, m+n²}) time on arbitrary labeled graphs, and in O(m) time on Wheeler DFAs. The LCP array of G stores the longest common prefixes between those strings, i.e. it can easily be derived from the LCP array of G_{is}. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time Ω(nσ) on pseudoforests, we present a new algorithm based on dynamic range stabbing building the LCP array of G_{is} in O(nlog σ) time and O(nlogσ) bits of working space. Combined with our reduction, we obtain the first efficient algorithm to build the LCP array of an arbitrary labeled graph. An implementation of our algorithm is publicly available at https://github.com/regindex/Labeled-Graph-LCP.

Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza. Computing the LCP Array of a Labeled Graph. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2024.1, author = {Alanko, Jarno N. and Cenzato, Davide and Cotumaccio, Nicola and Kim, Sung-Hwan and Manzini, Giovanni and Prezza, Nicola}, title = {{Computing the LCP Array of a Labeled Graph}}, booktitle = {35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)}, pages = {1:1--1:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-326-3}, ISSN = {1868-8969}, year = {2024}, volume = {296}, editor = {Inenaga, Shunsuke and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.1}, URN = {urn:nbn:de:0030-drops-201113}, doi = {10.4230/LIPIcs.CPM.2024.1}, annote = {Keywords: LCP array, Wheeler automata, prefix sorting, pattern matching, sorting} }

Document

**Published in:** LIPIcs, Volume 289, 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)

The model of generalized automata, introduced by Eilenberg in 1974, allows representing a regular language more concisely than conventional automata by allowing edges to be labeled not only with characters, but also strings. Giammaresi and Montalbano introduced a notion of determinism for generalized automata [STACS 1995]. While generalized deterministic automata retain many properties of conventional deterministic automata, the uniqueness of a minimal generalized deterministic automaton is lost.
In the first part of the paper, we show that the lack of uniqueness can be explained by introducing a set 𝒲(𝒜) associated with a generalized automaton 𝒜. The set 𝒲(𝒜) is always trivially equal to the set of all prefixes of the language recognized by the automaton, if 𝒜 is a conventional automaton, but this need not be true for generalized automata. By fixing 𝒲(𝒜), we are able to derive for the first time a full Myhill-Nerode theorem for generalized automata, which contains the textbook Myhill-Nerode theorem for conventional automata as a degenerate case.
In the second part of the paper, we show that the set 𝒲(𝒜) leads to applications for pattern matching and data compression. Wheeler automata [TCS 2017, SODA 2020] are a popular class of automata that can be compactly stored using e log σ (1 + o(1)) + O(e) bits (e being the number of edges, σ being the size of the alphabet) in such a way that pattern matching queries can be solved in Õ(m) time (m being the length of the pattern). In the paper, we show how to extend these results to generalized automata. More precisely, a Wheeler generalized automata can be stored using 𝔢 log σ (1 + o(1)) + O(e + rn) bits so that pattern matching queries can be solved in Õ(rm) time, where 𝔢 is the total length of all edge labels, r is the maximum length of an edge label and n is the number of states.

Nicola Cotumaccio. A Myhill-Nerode Theorem for Generalized Automata, with Applications to Pattern Matching and Compression. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 289, pp. 26:1-26:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cotumaccio:LIPIcs.STACS.2024.26, author = {Cotumaccio, Nicola}, title = {{A Myhill-Nerode Theorem for Generalized Automata, with Applications to Pattern Matching and Compression}}, booktitle = {41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)}, pages = {26:1--26:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-311-9}, ISSN = {1868-8969}, year = {2024}, volume = {289}, editor = {Beyersdorff, Olaf and Kant\'{e}, Mamadou Moustapha and Kupferman, Orna and Lokshtanov, Daniel}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2024.26}, URN = {urn:nbn:de:0030-drops-197369}, doi = {10.4230/LIPIcs.STACS.2024.26}, annote = {Keywords: Generalized Automata, Myhill-Nerode Theorem, Regular Languages, Wheeler Graphs, FM-index, Burrows-Wheeler Transform} }

Document

**Published in:** LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

In the past thirty years, numerous algorithms for building the suffix array of a string have been proposed. In 2021, the notion of suffix array was extended from strings to DFAs, and it was shown that the resulting data structure can be built in O(m² + n^{5/2}) time, where n is the number of states and m is the number of edges [SODA 2021]. Recently, algorithms running in O(mn) and O(n²log n) time have been described [CPM 2023].
In this paper, we improve the previous bounds by proposing an O(n²) recursive algorithm inspired by Farach’s algorithm for building a suffix tree [FOCS 1997]. To this end, we provide insight into the rich lexicographic and combinatorial structure of a graph, so contributing to the fascinating journey which might lead to solve the long-standing open problem of building the suffix tree of a graph.

Nicola Cotumaccio. Prefix Sorting DFAs: A Recursive Algorithm. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{cotumaccio:LIPIcs.ISAAC.2023.22, author = {Cotumaccio, Nicola}, title = {{Prefix Sorting DFAs: A Recursive Algorithm}}, booktitle = {34th International Symposium on Algorithms and Computation (ISAAC 2023)}, pages = {22:1--22:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-289-1}, ISSN = {1868-8969}, year = {2023}, volume = {283}, editor = {Iwata, Satoru and Kakimura, Naonori}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.22}, URN = {urn:nbn:de:0030-drops-193242}, doi = {10.4230/LIPIcs.ISAAC.2023.22}, annote = {Keywords: Suffix Array, Burrows-Wheeler Transform, FM-index, Recursive Algorithms, Graph Theory, Pattern Matching} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail