Document

**Published in:** LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Wheeler nondeterministic finite automata (WNFAs) were introduced in (Gagie et al., TCS 2017) as a powerful generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages such as compression and regular expression matching. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete (Gibney and Thankachan, ESA 2019). Recently, however, Alanko et al. (Information and Computation 2021) showed how to side-step this complexity by switching to preorders: letting Q be the set of states and δ the set of transitions, they provided a O(|δ|⋅|Q|²)-time algorithm computing a totally-ordered partition (i.e. equivalence relation) of the WNFA’s states such that (1) equivalent states recognize the same regular language, and (2) the order of (the classes of) non-equivalent states is consistent with any Wheeler order, when one exists. As a result, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders.
Further extensions of this line of work (Cotumaccio et al., SODA 2021 and DCC 2022) generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: in general, any NFA admits a partial preorder of its states reflecting the co-lexicographic order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in O(|δ|² + |Q|^{5/2}) time (Cotumaccio, DCC 2022), while on DFAs the same task can be accomplished in O(min(|Q|²log|Q|, |δ|⋅|Q|)) time (Kim et al., CPM 2023).
In this paper, we provide much more efficient solutions to the co-lex order computation problem. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem of Paige and Tarjan to work with ordered partitions. More specifically, we provide a O(|δ|log|Q|)-time algorithm computing a co-lex total preorder when the input is a Wheeler NFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. In addition, we present implementations of our algorithms and show that they are very efficient also in practice.

Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.ESA.2023.15, author = {Becker, Ruben and C\'{a}ceres, Manuel and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Olivares, Francisco and Prezza, Nicola}, title = {{Sorting Finite Automata via Partition Refinement}}, booktitle = {31st Annual European Symposium on Algorithms (ESA 2023)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-295-2}, ISSN = {1868-8969}, year = {2023}, volume = {274}, editor = {G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.15}, URN = {urn:nbn:de:0030-drops-186684}, doi = {10.4230/LIPIcs.ESA.2023.15}, annote = {Keywords: Wheeler automata, prefix sorting, pattern matching, graph compression, sorting, partition refinement} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in O(m² + n^{5/2}) time, m being the number of transitions and n the number of states of the input DFA.
In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in O(mn) and O(n² log n) time, respectively, and an algorithm sorting acyclic DFAs in O(m log n) time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the O(n² log n)-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.

Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.CPM.2023.16, author = {Kim, Sung-Hwan and Olivares, Francisco and Prezza, Nicola}, title = {{Faster Prefix-Sorting Algorithms for Deterministic Finite Automata}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {16:1--16:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.16}, URN = {urn:nbn:de:0030-drops-179707}, doi = {10.4230/LIPIcs.CPM.2023.16}, annote = {Keywords: String Matching, Deterministic Finite Automata, Graph Indexing, Co-lexicographical Sorting} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail