Random Wheeler Automata

Authors Ruben Becker , Davide Cenzato , Sung-Hwan Kim , Bojana Kodric , Riccardo Maso, Nicola Prezza



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.5.pdf
  • Filesize: 1.16 MB
  • 15 pages

Document Identifiers

Author Details

Ruben Becker
  • Ca' Foscari University of Venice, Italy
Davide Cenzato
  • Ca' Foscari University of Venice, Italy
Sung-Hwan Kim
  • Ca' Foscari University of Venice, Italy
Bojana Kodric
  • Ca' Foscari University of Venice, Italy
Riccardo Maso
  • Ca' Foscari University of Venice, Italy
Nicola Prezza
  • Ca' Foscari University of Venice, Italy

Cite AsGet BibTex

Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza. Random Wheeler Automata. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.5

Abstract

Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the natural co-lexicographic order of the strings labeling the automaton’s paths; this property makes it possible to represent the automaton’s topology in a constant number of bits per transition, as well as efficiently solving pattern matching queries on its accepted regular language. After their introduction, Wheeler automata have been the subject of a prolific line of research, both from the algorithmic and language-theoretic points of view. A recurring issue faced in these studies is the lack of large datasets of Wheeler automata on which the developed algorithms and theories could be tested. One possible way to overcome this issue is to generate random Wheeler automata. Motivated by this observation of practical nature, in this paper we initiate the theoretical study of random Wheeler automata, focusing our attention on the deterministic case (Wheeler DFAs - WDFAs). We start by naturally extending the Erdős-Rényi random graph model to WDFAs, and proceed by providing an algorithm generating uniform WDFAs according to this model. Our algorithm generates a uniform WDFA with n states, m transitions, and alphabet’s cardinality σ in O(m) expected time (O(mlog m) time w.h.p.) and constant working space for all alphabets of size σ ≤ m/ln m. The output WDFA is streamed directly to the output. As a by-product, we also give formulas for the number of distinct WDFAs and obtain that nσ + (n - σ) log σ bits are necessary and sufficient to encode a WDFA with n states and alphabet of size σ, up to an additive Θ(n) term. We present an implementation of our algorithm and show that it is extremely fast in practice, with a throughput of over 8 million transitions per second.

Subject Classification

ACM Subject Classification
  • Theory of computation → Generating random combinatorial structures
  • Theory of computation → Sorting and searching
  • Theory of computation → Graph algorithms analysis
Keywords
  • Wheeler automata
  • Burrows-Wheeler transform
  • random graphs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular Languages Meet Prefix Sorting. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '20, pages 911-930, USA, 2020. Society for Industrial and Applied Mathematics. Google Scholar
  2. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Wheeler languages. Information and Computation, 281:104820, 2021. URL: https://www.sciencedirect.com/science/article/pii/S0890540121001504.
  3. Jarno Alanko, Travis Gagie, Gonzalo Navarro, and Louisa Seelbach Benkner. Tunneling on wheeler graphs. In 2019 Data Compression Conference (DCC), pages 122-131. IEEE, 2019. Google Scholar
  4. Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza. Random wheeler automata. CoRR, abs/2307.07267, 2023. URL: https://doi.org/10.48550/arXiv.2307.07267.
  5. Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  6. Kuan-Hao Chao, Pei-Wei Chen, Sanjit A Seshia, and Ben Langmead. WGT: Tools and algorithms for recognizing, visualizing and generating Wheeler graphs. bioRxiv, pages 2022-10, 2022. Google Scholar
  7. Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, and Marinella Sciortino. Computing matching statistics on wheeler dfas. In 2023 Data Compression Conference (DCC), pages 150-159, 2023. URL: https://doi.org/10.1109/DCC55655.2023.00023.
  8. Giovanna D'Agostino, Davide Martincigh, and Alberto Policriti. Ordering regular languages and automata: Complexity. Theoretical Computer Science, 949:113709, 2023. URL: https://www.sciencedirect.com/science/article/pii/S0304397523000221.
  9. Lavinia Egidi, Felipe A Louza, and Giovanni Manzini. Space efficient merging of de Bruijn graphs and Wheeler graphs. Algorithmica, 84(3):639-669, 2022. Google Scholar
  10. Travis Gagie. On Representing the Degree Sequences of Sublogarithmic-Degree Wheeler Graphs. In String Processing and Information Retrieval: 29th International Symposium, SPIRE 2022, Concepción, Chile, November 8-10, 2022, Proceedings, pages 250-256. Springer, 2022. Google Scholar
  11. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theoretical Computer Science, 698:67-78, 2017. URL: https://doi.org/10.1016/j.tcs.2017.06.016.
  12. Daniel Gibney and Sharma V Thankachan. On the complexity of recognizing wheeler graphs. Algorithmica, 84(3):784-814, 2022. Google Scholar
  13. Adrián Goga and Andrej Baláž. Prefix-Free Parsing for Building Large Tunnelled Wheeler Graphs. In 22nd International Workshop on Algorithms in Bioinformatics, 2022. Google Scholar
  14. Donald Ervin Knuth. The art of computer programming, Volume II: Seminumerical Algorithms, 3rd Edition. Addison-Wesley, 1998. URL: https://www.worldcat.org/oclc/312898417.
  15. Cyril Nicaud. Random Deterministic Automata. In Proceedings of the 39th International Symposium on Mathematical Foundation of Computer Science (MFCS), pages 5-23, 2014. Google Scholar
  16. Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '02, pages 233-242, USA, 2002. Society for Industrial and Applied Mathematics. Google Scholar
  17. Michael Shekelyan and Graham Cormode. Sequential random sampling revisited: Hidden shuffle method. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3628-3636. PMLR, April 2021. URL: https://proceedings.mlr.press/v130/shekelyan21a.html.