Computing the LCP Array of a Labeled Graph

Authors Jarno N. Alanko , Davide Cenzato , Nicola Cotumaccio , Sung-Hwan Kim , Giovanni Manzini , Nicola Prezza



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.1.pdf
  • Filesize: 0.81 MB
  • 15 pages

Document Identifiers

Author Details

Jarno N. Alanko
  • University of Helsinki, Finland
Davide Cenzato
  • Ca' Foscari University of Venice, Italy
Nicola Cotumaccio
  • University of Helsinki, Finland
Sung-Hwan Kim
  • Ca' Foscari University of Venice, Italy
Giovanni Manzini
  • University of Pisa, Italy
Nicola Prezza
  • Ca' Foscari University of Venice, Italy

Cite AsGet BibTex

Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza. Computing the LCP Array of a Labeled Graph. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.1

Abstract

The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph’s paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with n nodes and m edges labeled over an alphabet of size σ. The first step is to transform the input graph G into a deterministic Wheeler pseudoforest G_{is} with O(n) edges encoding the lexicographically- smallest and largest strings entering in each node of the original graph. Using state-of-the-art algorithms, this step runs in O(min{mlog n, m+n²}) time on arbitrary labeled graphs, and in O(m) time on Wheeler DFAs. The LCP array of G stores the longest common prefixes between those strings, i.e. it can easily be derived from the LCP array of G_{is}. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time Ω(nσ) on pseudoforests, we present a new algorithm based on dynamic range stabbing building the LCP array of G_{is} in O(nlog σ) time and O(nlogσ) bits of working space. Combined with our reduction, we obtain the first efficient algorithm to build the LCP array of an arbitrary labeled graph. An implementation of our algorithm is publicly available at https://github.com/regindex/Labeled-Graph-LCP.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Pattern matching
Keywords
  • LCP array
  • Wheeler automata
  • prefix sorting
  • pattern matching
  • sorting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno Ohlebusch. Replacing suffix trees with enhanced suffix arrays. Journal of discrete algorithms, 2(1):53-86, 2004. Google Scholar
  2. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular Languages meet Prefix Sorting, pages 911-930. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.55.
  3. Jarno N. Alanko, Elena Biagi, and Simon J. Puglisi. Longest common prefix arrays for succinct k-spectra. In Franco Maria Nardini, Nadia Pisanti, and Rossano Venturini, editors, String Processing and Information Retrieval - 30th International Symposium, SPIRE 2023, Pisa, Italy, September 26-28, 2023, Proceedings, volume 14240 of Lecture Notes in Computer Science, pages 1-13. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-43980-3_1.
  4. Jarno N. Alanko, Simon J. Puglisi, and Jaakko Vuohtoniemi. Small searchable κ-spectra via subset rank queries on the spectral burrows-wheeler transform. In Jonathan W. Berry, David B. Shmoys, Lenore Cowen, and Uwe Naumann, editors, SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023, Seattle, WA, USA, May 31 - June 2, 2023, pages 225-236. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977714.20.
  5. Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms (ESA 2023), volume 274 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1-15:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2023.15.
  6. T. Beller, S. Gog, E. Ohlebusch, and T. Schnattinger. Computing the longest common prefix array based on the Burrows-Wheeler transform. J. Discrete Algorithms, 18:22-31, 2013. Google Scholar
  7. Christina Boucher, Alex Bowe, Travis Gagie, Simon J. Puglisi, and Kunihiko Sadakane. Variable-order de bruijn graphs. In 2015 Data Compression Conference, pages 383-392, 2015. URL: https://doi.org/10.1109/DCC.2015.70.
  8. M. Burrows and D.J. Wheeler. A Block Sorting data Compression Algorithm. Technical report, DEC Systems Research Center, 1994. Google Scholar
  9. Alessio Conte, Nicola Cotumaccio, Travis Gagie, Giovanni Manzini, Nicola Prezza, and Marinella Sciortino. Computing matching statistics on wheeler dfas. In 2023 Data Compression Conference (DCC), pages 150-159, 2023. URL: https://doi.org/10.1109/DCC55655.2023.00023.
  10. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms (4th ed.). The MIT Press, 2022. Google Scholar
  11. Nicola Cotumaccio. Graphs can be succinctly indexed for pattern matching in o(| e| ²+| v| ^5/2) time. In 2022 Data Compression Conference (DCC), pages 272-281, 2022. URL: https://doi.org/10.1109/DCC52660.2022.00035.
  12. Nicola Cotumaccio. Prefix Sorting DFAs: A Recursive Algorithm. In Satoru Iwata and Naonori Kakimura, editors, 34th International Symposium on Algorithms and Computation (ISAAC 2023), volume 283 of Leibniz International Proceedings in Informatics (LIPIcs), pages 22:1-22:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2023.22.
  13. Nicola Cotumaccio. A Myhill-Nerode Theorem for Generalized Automata, with Applications to Pattern Matching and Compression. In Olaf Beyersdorff, Mamadou Moustapha Kanté, Orna Kupferman, and Daniel Lokshtanov, editors, 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024), volume 289 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1-26:19, Dagstuhl, Germany, 2024. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.STACS.2024.26.
  14. Nicola Cotumaccio. Enhanced graph pattern matching, 2024. URL: https://arxiv.org/abs/2402.16205.
  15. Nicola Cotumaccio, Giovanna D’Agostino, Alberto Policriti, and Nicola Prezza. Co-lexicographically ordering automata and regular languages - part i. J. ACM, 70(4), August 2023. URL: https://doi.org/10.1145/3607471.
  16. Nicola Cotumaccio, Travis Gagie, Dominik Köppl, and Nicola Prezza. Space-time trade-offs for the LCP array of wheeler dfas. In Franco Maria Nardini, Nadia Pisanti, and Rossano Venturini, editors, String Processing and Information Retrieval - 30th International Symposium, SPIRE 2023, Pisa, Italy, September 26-28, 2023, Proceedings, volume 14240 of Lecture Notes in Computer Science, pages 143-156. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-43980-3_12.
  17. Nicola Cotumaccio and Nicola Prezza. On Indexing and Compressing Finite Automata, pages 2585-2599. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.153.
  18. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
  19. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theoretical Computer Science, 698:67-78, 2017. URL: https://doi.org/10.1016/j.tcs.2017.06.016.
  20. Travis Gagie, Simon J. Puglisi, and Andrew Turpin. Range quantile queries: Another virtue of wavelet trees. In Jussi Karlgren, Jorma Tarhio, and Heikki Hyyrö, editors, String Processing and Information Retrieval, pages 1-6, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. Google Scholar
  21. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '03, pages 841-850, USA, 2003. Society for Industrial and Applied Mathematics. Google Scholar
  22. Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In Laurent Bulteau and Zsuzsanna Lipták, editors, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), volume 259 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1-16:16, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2023.16.
  23. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
  24. Yakov Nekrich. A Dynamic Stabbing-Max Data Structure with Sub-Logarithmic Query Time. In Proceedings of the 22nd International Symposium on Algorithms and Computations (ISAAC), pages 170-179, 2011. URL: https://doi.org/10.1007/978-3-642-25591-5_19.
  25. Franco P. Preparata and Michael Ian Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. Google Scholar
  26. Nicola Prezza and Giovanna Rosone. Space-efficient construction of compressed suffix trees. Theoretical Computer Science, 852:138-156, 2021. Google Scholar
  27. Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43-es, November 2007. URL: https://doi.org/10.1145/1290672.1290680.
  28. Thomas Schnattinger, Enno Ohlebusch, and Simon Gog. Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Information and Computation, 213:13-22, 2012. Special Issue: Combinatorial Pattern Matching (CPM 2010). URL: https://doi.org/10.1016/j.ic.2011.03.007.