The Rational Construction of a Wheeler DFA

Authors Giovanni Manzini , Alberto Policriti , Nicola Prezza , Brian Riccardi



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.23.pdf
  • Filesize: 0.86 MB
  • 15 pages

Document Identifiers

Author Details

Giovanni Manzini
  • Dept. of Computer Science, University of Pisa, Italy
Alberto Policriti
  • Dept. of Mathematics, Computer Science and Physics, University of Udine, Italy
Nicola Prezza
  • Dept. of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, Italy
Brian Riccardi
  • Dept. of Informatics, Systems and Communication, University of Milano-Bicocca, Italy

Cite AsGet BibTex

Giovanni Manzini, Alberto Policriti, Nicola Prezza, and Brian Riccardi. The Rational Construction of a Wheeler DFA. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.23

Abstract

Deterministic Finite Wheeler Automata are a natural generalisation to regular languages of the theory of compressed data structures originated by the introduction of the Burrows-Wheeler transform. Indeed, if we can find a Wheeler automaton recognizing a given language L, such automaton can be used to design time and space efficient algorithms for representing and searching L. In this paper we introduce an alternative representation of Deterministic Wheeler Automata by showing that a natural map between strings and rational numbers in ℚ [0,1) can be extended to represent the automaton’s states as intervals in ℚ [0,1). Using this representation it emerges a natural relationship between automata properties and some properties of real numbers. In addition, such representation enables us to formulate problems related to automata in a numerical setting. Although at the moment the numerical approach does not lead to time efficient algorithms, we believe this new perspective deserves further consideration. As a further demonstration of the convenience of this new representation, we use it to provide a simple proof of an unexpected result on regular languages. More precisely, we compare the size of the smallest Wheeler automaton recognizing a given language L with respect to the size of the smallest automaton, possibly non-Wheeler, recognizing the same language. We show settings in which there can be an exponential gap between the two sizes, and we discuss the implications of this result on the problem of representing regular languages.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Formal languages and automata theory
Keywords
  • String Matching
  • Deterministic Finite Automata
  • Wheeler languages
  • Graph Indexing
  • Co-lexicographical Sorting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 911-930. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.55.
  2. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Wheeler languages. Inf. Comput., 281:104820, 2021. URL: https://doi.org/10.1016/j.ic.2021.104820.
  3. Krzysztof Apt. Principles of constraint programming. Cambridge university press, 2003. URL: https://doi.org/10.1017/CBO9780511615320.
  4. Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  5. Nicola Cotumaccio, Giovanna D’Agostino, Alberto Policriti, and Nicola Prezza. Co-Lexicographically Ordering Automata and Regular Languages - Part I. J. ACM, 70(4), August 2023. URL: https://doi.org/10.1145/3607471.
  6. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theoretical Computer Science, 698:67-78, 2017. Algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo). URL: https://doi.org/10.1016/j.tcs.2017.06.016.
  7. Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster prefix-sorting algorithms for deterministic finite automata. In Proc. CPM 23. Schloss-Dagstuhl - Leibniz Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.CPM.2023.16.
  8. John Myhill. Finite automata and the representation of events. WADD Technical Report, 57:112-137, 1957. Google Scholar
  9. Anil Nerode. Linear automaton transformations. Proceedings of the American Mathematical Society, 9(4):541-544, 1958. Google Scholar
  10. Christos H Papadimitriou. On the complexity of integer programming. Journal of the ACM (JACM), 28(4):765-768, 1981. URL: https://doi.org/10.1145/322276.322287.
  11. Francesca Rossi, Peter Van Beek, and Toby Walsh. Constraint programming. Foundations of Artificial Intelligence, 3:181-211, 2008. URL: https://doi.org/10.1016/S1574-6526(07)03004-0.