Document Open Access Logo

Algorithms and Complexity on Indexing Elastic Founder Graphs

Authors Massimo Equi , Tuukka Norri , Jarno Alanko , Bastien Cazaux , Alexandru I. Tomescu , Veli Mäkinen



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2021.20.pdf
  • Filesize: 1.26 MB
  • 18 pages

Document Identifiers

Author Details

Massimo Equi
  • Department of Computer Science, University of Helsinki, Finland
Tuukka Norri
  • Department of Computer Science, University of Helsinki, Finland
Jarno Alanko
  • Department of Computer Science, University of Helsinki, Finland
  • Faculty of Computer Science, Dalhousie University, Halifax, Canada
Bastien Cazaux
  • LIRMM, Univ. Montpellier, CNRS, France
Alexandru I. Tomescu
  • Department of Computer Science, University of Helsinki, Finland
Veli Mäkinen
  • Department of Computer Science, University of Helsinki, Finland

Cite AsGet BibTex

Massimo Equi, Tuukka Norri, Jarno Alanko, Bastien Cazaux, Alexandru I. Tomescu, and Veli Mäkinen. Algorithms and Complexity on Indexing Elastic Founder Graphs. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 20:1-20:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.20

Abstract

We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments. Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses that are easy to index. Moreover, we give a near-linear time algorithm to construct indexable elastic founder graphs. This algorithm is based on an earlier segmentation algorithm for gapless multiple sequence alignments inducing non-elastic founder graphs (Mäkinen et al., WABI 2020), but uses more involved techniques to cope with repetitive string collections synchronized with gaps. Finally, we show that one of the subclasses admits a reduction to Wheeler graphs in polynomial time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Problems, reductions and completeness
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Pattern matching
  • Theory of computation → Sorting and searching
  • Theory of computation → Dynamic programming
  • Applied computing → Genomics
Keywords
  • orthogonal vectors hypothesis
  • multiple sequence alignment
  • segmentation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Giovanna D'Agostino, Alberto Policriti, and Nicola Prezza. Regular languages meet prefix sorting. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 911-930. SIAM, 2020. Google Scholar
  2. Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Comparing degenerate strings. Fundam. Informaticae, 175(1-4):41-58, 2020. Google Scholar
  3. Amihood Amir, Moshe Lewenstein, and Noa Lewenstein. Pattern matching in hypertext. J. Algorithms, 35(1):82-99, 2000. Google Scholar
  4. Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching (CPM 2018), volume 105 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1-9:10, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.9.
  5. Giulia Bernardini, Pawel Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 21:1-21:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.21.
  6. Giulia Bernardini, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Pattern matching on elastic-degenerate text with errors. In Gabriele Fici, Marinella Sciortino, and Rossano Venturini, editors, String Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings, volume 10508 of Lecture Notes in Computer Science, pages 74-90. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-67428-5_7.
  7. Giulia Bernardini, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci., 812:109-122, 2020. URL: https://doi.org/10.1016/j.tcs.2019.08.012.
  8. Bastien Cazaux, Dmitry Kosolobov, Veli Mäkinen, and Tuukka Norri. Linear time maximum segmentation problems in column stream model. In Nieves R. Brisaboa and Simon J. Puglisi, editors, String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7-9, 2019, Proceedings, volume 11811 of Lecture Notes in Computer Science, pages 322-336. Springer, 2019. Google Scholar
  9. Maria Chatzou, Cedrik Magis, Jia-Ming Chang, Carsten Kemena, Giovanni Bussotti, Ionas Erb, and Cedric Notredame. Multiple sequence alignment modeling: methods and applications. Briefings in Bioinformatics, 17(6):1009-1023, November 2015. Google Scholar
  10. Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the complexity of string matching for graphs. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 55:1-55:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  11. Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Tomás Bures, Riccardo Dondi, Johann Gamper, Giovanna Guerrini, Tomasz Jurdzinski, Claus Pahl, Florian Sikora, and Prudence W. H. Wong, editors, SOFSEM 2021: Theory and Practice of Computer Science - 47th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2021, Bolzano-Bozen, Italy, January 25-29, 2021, Proceedings, volume 12607 of Lecture Notes in Computer Science, pages 608-622. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-67731-2_44.
  12. Martin Farach. Optimal suffix tree construction with large alphabets. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 137-143. IEEE, 1997. Google Scholar
  13. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for bwt-based data structures. Theor. Comput. Sci., 698:67-78, 2017. Google Scholar
  14. Travis Gagie and Gonzalo Navarro. Compressed indexes for repetitive textual datasets. In Sherif Sakr and Albert Y. Zomaya, editors, Encyclopedia of Big Data Technologies. Springer, 2019. Google Scholar
  15. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in bwt-runs bounded space. J. ACM, 67(1):2:1-2:54, 2020. Google Scholar
  16. Daniel Gibney. An efficient elastic-degenerate text index? not likely. In International Symposium on String Processing and Information Retrieval, pages 76-88. Springer, 2020. Google Scholar
  17. Daniel Gibney and Sharma V. Thankachan. On the hardness and inapproximability of recognizing wheeler graphs. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, volume 144 of LIPIcs, pages 51:1-51:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  18. Costas S. Iliopoulos, Ritu Kundu, and Solon P. Pissis. Efficient pattern matching in elastic-degenerate texts. In Frank Drewes, Carlos Martín-Vide, and Bianca Truthe, editors, Language and Automata Theory and Applications - 11th International Conference, LATA 2017, Umeå, Sweden, March 6-9, 2017, Proceedings, volume 10168 of Lecture Notes in Computer Science, pages 131-142, 2017. URL: https://doi.org/10.1007/978-3-319-53733-7_9.
  19. Costas S. Iliopoulos and Jakub Radoszewski. Truly subquadratic-time extension queries and periodicity detection in strings with uncertainties. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 8:1-8:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. Google Scholar
  20. Russell Impagliazzo and Ramamohan Paturi. On the Complexity of k-SAT. Journal of Computer and System Sciences, 62(2):367-375, 2001. Google Scholar
  21. David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978. URL: https://doi.org/10.1145/322063.322075.
  22. Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press, 2015. URL: https://doi.org/10.1017/CBO9781139940023.
  23. Veli Mäkinen, Bastien Cazaux, Massimo Equi, Tuukka Norri, and Alexandru I. Tomescu. Linear time construction of indexable founder block graphs. In Carl Kingsford and Nadia Pisanti, editors, 20th International Workshop on Algorithms in Bioinformatics, WABI 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 172 of LIPIcs, pages 7:1-7:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.WABI.2020.7.
  24. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. Journal of Computational Biology, 17(3):281-308, 2010. Google Scholar
  25. U. Manber and S. Wu. Approximate string matching with arbitrary costs for text and hypertext. In IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, pages 22-33, 1992. Google Scholar
  26. Tobias Marschall, Manja Marz, Thomas Abeel, Louis Dijkstra, Bas E Dutilh, Ali Ghaffaari, Paul Kersey, Wigard Kloosterman, Veli Mäkinen, Adam Novak, et al. Computational pan-genomics: status, promises and challenges. BioRxiv, page 043430, 2016. Google Scholar
  27. Joong Na, Hyunjoon Kim, Seunghwan Min, Heejin Park, Thierry Lecroq, Martine Leonard, Laurent Mouchard, and Kunsoo Park. FM-index of alignment with gaps. Theoretical Computer Science, 710, June 2016. URL: https://doi.org/10.1016/j.tcs.2017.02.020.
  28. Joong Chae Na, Hyunjoon Kim, Heejin Park, Thierry Lecroq, Martine Léonard, Laurent Mouchard, and Kunsoo Park. FM-index of alignment: A compressed index for similar strings. Theoretical Computer Science, 638:159-170, 2016. Pattern Matching, Text Data Structures and Compression. URL: https://doi.org/10.1016/j.tcs.2015.08.008.
  29. Joong Chae Na, Heejin Park, Maxime Crochemore, Jan Holub, Costas S. Iliopoulos, Laurent Mouchard, and Kunsoo Park. Suffix tree of alignment: An efficient index for similar data. In Thierry Lecroq and Laurent Mouchard, editors, Combinatorial Algorithms - 24th International Workshop, IWOCA 2013, Rouen, France, July 10-12, 2013, Revised Selected Papers, volume 8288 of Lecture Notes in Computer Science, pages 337-348. Springer, 2013. Google Scholar
  30. Joong Chae Na, Heejin Park, Sunho Lee, Minsung Hong, Thierry Lecroq, Laurent Mouchard, and Kunsoo Park. Suffix array of alignment: A practical index for similar data. In Oren Kurland, Moshe Lewenstein, and Ely Porat, editors, String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013, Proceedings, volume 8214 of Lecture Notes in Computer Science, pages 243-254. Springer, 2013. Google Scholar
  31. Tuukka Norri, Bastien Cazaux, Dmitry Kosolobov, and Veli Mäkinen. Linear time minimum segmentation enables scalable founder reconstruction. Algorithms Mol. Biol., 14(1):12:1-12:15, 2019. Google Scholar
  32. Mikko Rautiainen and Tobias Marschall. Aligning sequences to general graphs in O(V+ mE) time. bioRxiv, pages 216-127, 2017. Google Scholar
  33. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst., 41(4):589-607, 2007. URL: https://doi.org/10.1007/s00224-006-1198-x.
  34. Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11(2):375-388, 2014. Google Scholar
  35. Chris Thachuk. Indexing hypertext. Journal of Discrete Algorithms, 18:113-122, 2013. Selected papers from the 18th International Symposium on String Processing and Information Retrieval (SPIRE 2011). Google Scholar
  36. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail