Search Results

Documents authored by Maso, Riccardo


Document
Random Wheeler Automata

Authors: Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the natural co-lexicographic order of the strings labeling the automaton’s paths; this property makes it possible to represent the automaton’s topology in a constant number of bits per transition, as well as efficiently solving pattern matching queries on its accepted regular language. After their introduction, Wheeler automata have been the subject of a prolific line of research, both from the algorithmic and language-theoretic points of view. A recurring issue faced in these studies is the lack of large datasets of Wheeler automata on which the developed algorithms and theories could be tested. One possible way to overcome this issue is to generate random Wheeler automata. Motivated by this observation of practical nature, in this paper we initiate the theoretical study of random Wheeler automata, focusing our attention on the deterministic case (Wheeler DFAs - WDFAs). We start by naturally extending the Erdős-Rényi random graph model to WDFAs, and proceed by providing an algorithm generating uniform WDFAs according to this model. Our algorithm generates a uniform WDFA with n states, m transitions, and alphabet’s cardinality σ in O(m) expected time (O(mlog m) time w.h.p.) and constant working space for all alphabets of size σ ≤ m/ln m. The output WDFA is streamed directly to the output. As a by-product, we also give formulas for the number of distinct WDFAs and obtain that nσ + (n - σ) log σ bits are necessary and sufficient to encode a WDFA with n states and alphabet of size σ, up to an additive Θ(n) term. We present an implementation of our algorithm and show that it is extremely fast in practice, with a throughput of over 8 million transitions per second.

Cite as

Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza. Random Wheeler Automata. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.CPM.2024.5,
  author =	{Becker, Ruben and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Maso, Riccardo and Prezza, Nicola},
  title =	{{Random Wheeler Automata}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.5},
  URN =		{urn:nbn:de:0030-drops-201157},
  doi =		{10.4230/LIPIcs.CPM.2024.5},
  annote =	{Keywords: Wheeler automata, Burrows-Wheeler transform, random graphs}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail