LIPIcs.STACS.2024.26.pdf
- Filesize: 0.79 MB
- 19 pages
The model of generalized automata, introduced by Eilenberg in 1974, allows representing a regular language more concisely than conventional automata by allowing edges to be labeled not only with characters, but also strings. Giammaresi and Montalbano introduced a notion of determinism for generalized automata [STACS 1995]. While generalized deterministic automata retain many properties of conventional deterministic automata, the uniqueness of a minimal generalized deterministic automaton is lost. In the first part of the paper, we show that the lack of uniqueness can be explained by introducing a set 𝒲(𝒜) associated with a generalized automaton 𝒜. The set 𝒲(𝒜) is always trivially equal to the set of all prefixes of the language recognized by the automaton, if 𝒜 is a conventional automaton, but this need not be true for generalized automata. By fixing 𝒲(𝒜), we are able to derive for the first time a full Myhill-Nerode theorem for generalized automata, which contains the textbook Myhill-Nerode theorem for conventional automata as a degenerate case. In the second part of the paper, we show that the set 𝒲(𝒜) leads to applications for pattern matching and data compression. Wheeler automata [TCS 2017, SODA 2020] are a popular class of automata that can be compactly stored using e log σ (1 + o(1)) + O(e) bits (e being the number of edges, σ being the size of the alphabet) in such a way that pattern matching queries can be solved in Õ(m) time (m being the length of the pattern). In the paper, we show how to extend these results to generalized automata. More precisely, a Wheeler generalized automata can be stored using 𝔢 log σ (1 + o(1)) + O(e + rn) bits so that pattern matching queries can be solved in Õ(rm) time, where 𝔢 is the total length of all edge labels, r is the maximum length of an edge label and n is the number of states.
Feedback for Dagstuhl Publishing