Sorting Finite Automata via Partition Refinement

Authors Ruben Becker , Manuel Cáceres , Davide Cenzato , Sung-Hwan Kim , Bojana Kodric , Francisco Olivares , Nicola Prezza



PDF
Thumbnail PDF

File

LIPIcs.ESA.2023.15.pdf
  • Filesize: 1.15 MB
  • 15 pages

Document Identifiers

Author Details

Ruben Becker
  • Ca' Foscari University of Venice, Italy
Manuel Cáceres
  • University of Helsinki, Finland
Davide Cenzato
  • Ca' Foscari University of Venice, Italy
Sung-Hwan Kim
  • Ca' Foscari University of Venice, Italy
Bojana Kodric
  • Ca' Foscari University of Venice, Italy
Francisco Olivares
  • CeBiB - Centre for Biotechnology and Bioengineering, Santiago, Chile
  • Department of Computer Science, University of Chile, Santiago, Chile
Nicola Prezza
  • Ca' Foscari University of Venice, Italy

Cite AsGet BibTex

Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ESA.2023.15

Abstract

Wheeler nondeterministic finite automata (WNFAs) were introduced in (Gagie et al., TCS 2017) as a powerful generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages such as compression and regular expression matching. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete (Gibney and Thankachan, ESA 2019). Recently, however, Alanko et al. (Information and Computation 2021) showed how to side-step this complexity by switching to preorders: letting Q be the set of states and δ the set of transitions, they provided a O(|δ|⋅|Q|²)-time algorithm computing a totally-ordered partition (i.e. equivalence relation) of the WNFA’s states such that (1) equivalent states recognize the same regular language, and (2) the order of (the classes of) non-equivalent states is consistent with any Wheeler order, when one exists. As a result, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders. Further extensions of this line of work (Cotumaccio et al., SODA 2021 and DCC 2022) generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: in general, any NFA admits a partial preorder of its states reflecting the co-lexicographic order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in O(|δ|² + |Q|^{5/2}) time (Cotumaccio, DCC 2022), while on DFAs the same task can be accomplished in O(min(|Q|²log|Q|, |δ|⋅|Q|)) time (Kim et al., CPM 2023). In this paper, we provide much more efficient solutions to the co-lex order computation problem. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem of Paige and Tarjan to work with ordered partitions. More specifically, we provide a O(|δ|log|Q|)-time algorithm computing a co-lex total preorder when the input is a Wheeler NFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. In addition, we present implementations of our algorithms and show that they are very efficient also in practice.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Pattern matching
Keywords
  • Wheeler automata
  • prefix sorting
  • pattern matching
  • graph compression
  • sorting
  • partition refinement

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 911-930. SIAM, 2020. Google Scholar
  2. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Wheeler languages. Information and Computation, 281:104820, 2021. URL: https://doi.org/10.1016/j.ic.2021.104820.
  3. Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008. Google Scholar
  4. Kuan-Hao Chao, Pei-Wei Chen, Sanjit A Seshia, and Ben Langmead. WGT: Tools and algorithms for recognizing, visualizing and generating Wheeler graphs. bioRxiv, 2022. URL: https://doi.org/10.1101/2022.10.15.512390.
  5. Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in bioinformatics, 19(1):118-135, 2018. Google Scholar
  6. Nicola Cotumaccio. Graphs can be succinctly indexed for pattern matching in O(| E| ²+| V| ^5/2) time. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Data Compression Conference, DCC 2022, Snowbird, UT, USA, March 22-25, 2022, pages 272-281. IEEE, 2022. URL: https://doi.org/10.1109/DCC52660.2022.00035.
  7. Nicola Cotumaccio, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Co-lexicographically ordering automata and regular languages. Part I, 2022. URL: https://doi.org/10.48550/arXiv.2208.04931.
  8. Nicola Cotumaccio and Nicola Prezza. On indexing and compressing finite automata. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2585-2599. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.153.
  9. Robert P. Dilworth. A Decomposition Theorem for Partially Ordered Sets. Annals of Mathematics, 51(1):161-166, 1950. URL: https://doi.org/10.2307/1969503.
  10. Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless seth fails. In Proceedings of the 47th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pages 608-622, 2021. URL: https://doi.org/10.1007/978-3-030-67731-2_44.
  11. Massimo Equi, Veli Mäkinen, Alexandru I Tomescu, and Roberto Grossi. On the complexity of string matching for graphs. ACM Transactions on Algorithms, 19(3):1-25, 2023. Google Scholar
  12. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. Lz77-based self-indexing with faster pattern matching. In Alberto Pardo and Alfredo Viola, editors, LATIN 2014: Theoretical Informatics, pages 731-742, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg. Google Scholar
  13. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theoretical computer science, 698:67-78, 2017. Google Scholar
  14. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Optimal-Time Text Indexing in BWT-runs Bounded Space. In Proceedings of the 2018 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1459-1477, 2018. URL: https://doi.org/10.1137/1.9781611975031.96.
  15. Daniel Gibney and Sharma V. Thankachan. On the hardness and inapproximability of recognizing wheeler graphs. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, volume 144 of LIPIcs, pages 51:1-51:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  16. Paris C. Kanellakis and Scott A. Smolka. CCS expressions, finite state processes, and three problems of equivalence. Inf. Comput., 86(1):43-68, 1990. Google Scholar
  17. Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), pages 16:1-16:16, 2023. URL: https://doi.org/10.4230/LIPIcs.CPM.2023.16.
  18. Udi Manber and Gene Myers. Suffix Arrays: A New Method for on-Line String Searches. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 319-327, 1990. URL: https://doi.org/10.1137/0222058.
  19. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. Journal of computational biology : a journal of computational molecular cell biology, 17:281-308, March 2010. URL: https://doi.org/10.1089/cmb.2009.0169.
  20. Gonzalo Navarro. Compact Data Structures - A practical approach. Cambridge University Press, 2016. ISBN 978-1-107-15238-0. 536 pages. Google Scholar
  21. Robert Paige and Robert Endre Tarjan. Three partition refinement algorithms. SIAM J. Comput., 16(6):973-989, 1987. URL: https://doi.org/10.1137/0216062.
  22. Jouni Sirén, Jean Monlong, Xian Chang, Adam M Novak, Jordan M Eizenga, Charles Markello, Jonas A Sibbesen, Glenn Hickey, Pi-Chuan Chang, Andrew Carroll, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science, 374(6574):abg8871, 2021. Google Scholar
  23. Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11(2):375-388, 2014. URL: https://doi.org/10.1109/TCBB.2013.2297101.
  24. Peter Weiner. Linear pattern matching algorithms. In Switching and Automata Theory, 1973. SWAT'08. IEEE Conference Record of 14th Annual Symposium on, pages 1-11. IEEE, 1973. Google Scholar
  25. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2-3):357-365, 2005. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail