Prefix Sorting DFAs: A Recursive Algorithm

Author Nicola Cotumaccio



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2023.22.pdf
  • Filesize: 0.69 MB
  • 15 pages

Document Identifiers

Author Details

Nicola Cotumaccio
  • Gran Sasso Science Institute, L'Aquila, Italy
  • Dalhousie University, Halifax, Canada

Acknowledgements

I thank Nicola Prezza for pointing out the paper [Sung{-}Hwan Kim et al., 2023].

Cite AsGet BibTex

Nicola Cotumaccio. Prefix Sorting DFAs: A Recursive Algorithm. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ISAAC.2023.22

Abstract

In the past thirty years, numerous algorithms for building the suffix array of a string have been proposed. In 2021, the notion of suffix array was extended from strings to DFAs, and it was shown that the resulting data structure can be built in O(m² + n^{5/2}) time, where n is the number of states and m is the number of edges [SODA 2021]. Recently, algorithms running in O(mn) and O(n²log n) time have been described [CPM 2023]. In this paper, we improve the previous bounds by proposing an O(n²) recursive algorithm inspired by Farach’s algorithm for building a suffix tree [FOCS 1997]. To this end, we provide insight into the rich lexicographic and combinatorial structure of a graph, so contributing to the fascinating journey which might lead to solve the long-standing open problem of building the suffix tree of a graph.

Subject Classification

ACM Subject Classification
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Pattern matching
Keywords
  • Suffix Array
  • Burrows-Wheeler Transform
  • FM-index
  • Recursive Algorithms
  • Graph Theory
  • Pattern Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Shuchi Chawla, editor, Proc. of the 31st Symposium on Discrete Algorithms, (SODA'20), pages 911-930. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.55.
  2. Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms (ESA 2023), volume 274 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1-15:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2023.15.
  3. Alexander Bowe, Taku Onodera, Kunihiko Sadakane, and Tetsuo Shibuya. Succinct de Bruijn graphs. In Ben Raphael and Jijun Tang, editors, Algorithms in Bioinformatics, pages 225-235, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. Google Scholar
  4. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Systems Research Center, 1994. Google Scholar
  5. Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, and Marinella Sciortino. Computing matching statistics on Wheeler DFAs. In 2023 Data Compression Conference (DCC), pages 150-159, 2023. URL: https://doi.org/10.1109/DCC55655.2023.00023.
  6. Nicola Cotumaccio. Graphs can be succinctly indexed for pattern matching in O(| E| ²+| V| ^5/2) time. In 2022 Data Compression Conference (DCC), pages 272-281, 2022. URL: https://doi.org/10.1109/DCC52660.2022.00035.
  7. Nicola Cotumaccio. Prefix sorting dfas: a recursive algorithm, 2023. URL: https://arxiv.org/abs/2305.02526.
  8. Nicola Cotumaccio, Giovanna D’Agostino, Alberto Policriti, and Nicola Prezza. Co-lexicographically ordering automata and regular languages - part i. J. ACM, 70(4), August 2023. URL: https://doi.org/10.1145/3607471.
  9. Nicola Cotumaccio and Nicola Prezza. On indexing and compressing finite automata. In Dániel Marx, editor, Proc. of the 32nd Symposium on Discrete Algorithms, (SODA'21), pages 2585-2599. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.153.
  10. Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Tomáš Bureš, Riccardo Dondi, Johann Gamper, Giovanna Guerrini, Tomasz Jurdziński, Claus Pahl, Florian Sikora, and Prudence W.H. Wong, editors, SOFSEM 2021: Theory and Practice of Computer Science, pages 608-622, Cham, 2021. Springer International Publishing. Google Scholar
  11. Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, and Roberto Grossi. On the complexity of string matching for graphs. ACM Trans. Algorithms, 19(3), April 2023. URL: https://doi.org/10.1145/3588334.
  12. M. Farach. Optimal suffix tree construction with large alphabets. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 137-143, 1997. URL: https://doi.org/10.1109/SFCS.1997.646102.
  13. P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Structuring labeled trees for optimal succinctness, and beyond. In proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05), pages 184-193, 2005. URL: https://doi.org/10.1109/SFCS.2005.69.
  14. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS'00), pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
  15. Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. J. ACM, 57(1), November 2009. URL: https://doi.org/10.1145/1613676.1613680.
  16. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, July 2005. URL: https://doi.org/10.1145/1082036.1082039.
  17. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theoretical Computer Science, 698:67-78, 2017. Algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo). URL: https://doi.org/10.1016/j.tcs.2017.06.016.
  18. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. URL: https://doi.org/10.1017/CBO9780511574931.
  19. Ramana M. Idury and Michael S. Waterman. A new algorithm for DNA sequence assembly. Journal of computational biology: A journal of computational molecular cell biology, 2 2:291-306, 1995. Google Scholar
  20. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, November 2006. URL: https://doi.org/10.1145/1217856.1217858.
  21. Dong Kyue Kim, Jeong Seop Sim, Heejin Park, and Kunsoo Park. Constructing suffix arrays in linear time. Journal of Discrete Algorithms, 3(2):126-142, 2005. Combinatorial Pattern Matching (CPM) Special Issue. URL: https://doi.org/10.1016/j.jda.2004.08.019.
  22. Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster prefix-sorting algorithms for deterministic finite automata. In Laurent Bulteau and Zsuzsanna Lipták, editors, 34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, June 26-28, 2023, Marne-la-Vallée, France, volume 259 of LIPIcs, pages 16:1-16:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.CPM.2023.16.
  23. Pang Ko and Srinivas Aluru. Space efficient linear time construction of suffix arrays. Journal of Discrete Algorithms, 3(2):143-156, 2005. Combinatorial Pattern Matching (CPM) Special Issue. URL: https://doi.org/10.1016/j.jda.2004.08.002.
  24. Veli Mäkinen, Niko Välimäki, and Jouni Sirén. Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11:375-388, 2014. Google Scholar
  25. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
  26. Joong Chae Na. Linear-time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets. In Alberto Apostolico, Maxime Crochemore, and Kunsoo Park, editors, Combinatorial Pattern Matching, pages 57-67, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. Google Scholar
  27. Robert Paige and Robert E. Tarjan. Three partition refinement algorithms. SIAM Journal on Computing, 16(6):973-989, 1987. URL: https://doi.org/10.1137/0216062.
  28. Simon J. Puglisi, W. F. Smyth, and Andrew H. Turpin. A taxonomy of suffix array construction algorithms. ACM Comput. Surv., 39(2):4-es, July 2007. URL: https://doi.org/10.1145/1242471.1242472.
  29. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst., 41(4):589-607, 2007. URL: https://doi.org/10.1007/s00224-006-1198-x.
  30. Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26(12):i367-i373, June 2010. URL: https://doi.org/10.1093/bioinformatics/btq217.
  31. P. Weiner. Linear pattern matching algorithms. In Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, pages 1-11, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail