Document

On the Hardness and Inapproximability of Recognizing Wheeler Graphs

File

LIPIcs.ESA.2019.51.pdf
• Filesize: 0.83 MB
• 16 pages

Acknowledgements

We would like to thank T. Gagie and N. Prezza for their valuable feedback.

Cite As

Daniel Gibney and Sharma V. Thankachan. On the Hardness and Inapproximability of Recognizing Wheeler Graphs. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 51:1-51:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.51

Abstract

In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present: - The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA’s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA’s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1; - There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time; - We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation; - We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1); The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem’s difficulty.

Subject Classification

ACM Subject Classification
• Theory of computation → Design and analysis of algorithms
Keywords
• Burrows–Wheeler transform
• string algorithms
• suffix trees
• NP-completeness

Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

References

1. Alfred V. Aho and Margaret J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM, 18(6):333-340, 1975. URL: https://doi.org/10.1145/360825.360855.
2. Jarno Alanko, Travis Gagie, Gonzalo Navarro, and Louisa Seelbach Benkner. Tunneling on Wheeler Graphs. CoRR, abs/1811.02457, 2018. URL: http://arxiv.org/abs/1811.02457.
3. Jarno Alanko, Alberto Policriti, and Nicola Prezza. On Prefix-Sorting Finite Automata, 2019. URL: http://arxiv.org/abs/1902.01088.
4. László Babai and Eugene M. Luks. Canonical Labeling of Graphs. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25-27 April, 1983, Boston, Massachusetts, USA, pages 171-183, 1983. URL: https://doi.org/10.1145/800061.808746.
5. Djamal Belazzougui. Succinct Dictionary Matching with No Slowdown. In Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, pages 88-100, 2010. URL: https://doi.org/10.1007/978-3-642-13509-5_9.
6. Alexander Bowe, Taku Onodera, Kunihiko Sadakane, and Tetsuo Shibuya. Succinct de Bruijn Graphs. In Algorithms in Bioinformatics - 12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10-12, 2012. Proceedings, pages 225-235, 2012. URL: https://doi.org/10.1007/978-3-642-33122-0_18.
7. Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm, 1994.
8. Jianer Chen, Yang Liu, Songjian Lu, Barry O'Sullivan, and Igor Razgon. A fixed-parameter algorithm for the directed feedback vertex set problem. J. ACM, 55(5):21:1-21:19, 2008. URL: https://doi.org/10.1145/1411509.1411511.
9. Francisco Claude, Gonzalo Navarro, and Alberto Ordóñez Pereira. The wavelet matrix: An efficient wavelet tree for large alphabets. Inf. Syst., 47:15-32, 2015. URL: https://doi.org/10.1016/j.is.2014.06.002.
10. Nicolaas Govert De Bruijn. A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen, 49(49):758-764, 1946.
11. Vida Dujmovic and David R. Wood. On Linear Layouts of Graphs. Discrete Mathematics & Theoretical Computer Science, 6(2):339-358, 2004. URL: http://dmtcs.episciences.org/317.
12. Massimo Equi, Roberto Grossi, and Veli Mäkinen. On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree. CoRR, abs/1901.05264, 2019. URL: http://arxiv.org/abs/1901.05264.
13. Massimo Equi, Roberto Grossi, Alexandru I. Tomescu, and Veli Mäkinen. On the Complexity of Exact Pattern Matching in Graphs: Determinism and Zig-Zag Matching. CoRR, abs/1902.03560, 2019. URL: http://arxiv.org/abs/1902.03560.
14. Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. J. ACM, 57(1):4:1-4:33, 2009. URL: https://doi.org/10.1145/1613676.1613680.
15. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. URL: https://doi.org/10.1145/1082036.1082039.
16. Paolo Ferragina and Rossano Venturini. The compressed permuterm index. ACM Trans. Algorithms, 7(1):10:1-10:21, 2010. URL: https://doi.org/10.1145/1868237.1868248.
17. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theor. Comput. Sci., 698:67-78, 2017. URL: https://doi.org/10.1016/j.tcs.2017.06.016.
18. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 397-407, 2017. URL: https://doi.org/10.1137/1.9781611974782.25.
19. Daniel Gibney and Sharma V. Thankachan. On the Hardness and Inapproximability of Recognizing Wheeler Graphs, 2019. URL: http://arxiv.org/abs/1902.01960.
20. Venkatesan Guruswami, Rajsekar Manokaran, and Prasad Raghavendra. Beating the Random Ordering is Hard: Inapproximability of Maximum Acyclic Subgraph. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 573-582, 2008. URL: https://doi.org/10.1109/FOCS.2008.51.
21. Lenwood S. Heath and Sriram V. Pemmaraju. Stack and Queue Layouts of Directed Acyclic Graphs: Part II. SIAM J. Comput., 28(5):1588-1626, 1999. URL: https://doi.org/10.1137/S0097539795291550.
22. Lenwood S. Heath, Sriram V. Pemmaraju, and Ann N. Trenk. Stack and Queue Layouts of Directed Acyclic Graphs: Part I. SIAM J. Comput., 28(4):1510-1539, 1999. URL: https://doi.org/10.1137/S0097539795280287.
23. Lenwood S. Heath and Arnold L. Rosenberg. Laying out Graphs Using Queues. SIAM J. Comput., 21(5):927-958, 1992. URL: https://doi.org/10.1137/0221055.
24. Wing-Kai Hon, Tsung-Han Ku, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Faster compressed dictionary matching. Theor. Comput. Sci., 475:113-119, 2013. URL: https://doi.org/10.1016/j.tcs.2012.10.050.
25. Viggo Kann. On the approximability of NP-complete optimization problems. PhD thesis, Royal Institute of Technology Stockholm, 1992.
26. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression. In Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005, Proceedings, pages 178-189, 2005. URL: https://doi.org/10.1007/11496656_16.
27. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. An extension of the Burrows-Wheeler Transform. Theor. Comput. Sci., 387(3):298-312, 2007. URL: https://doi.org/10.1016/j.tcs.2007.07.014.
28. Gary L. Miller. Graph Isomorphism, General Remarks. J. Comput. Syst. Sci., 18(2):128-142, 1979. URL: https://doi.org/10.1016/0022-0000(79)90043-6.
29. Adam M. Novak, Erik Garrison, and Benedict Paten. A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms for Molecular Biology, 12(1):18:1-18:12, 2017. URL: https://doi.org/10.1186/s13015-017-0109-9.
30. Jaroslav Opatrny. Total Ordering Problem. SIAM J. Comput., 8(1):111-114, 1979. URL: https://doi.org/10.1137/0208008.
31. Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 11(2):375-388, 2014.
32. D Younger. Minimum feedback arc sets for a directed graph. IEEE Transactions on Circuit Theory, 10(2):238-245, 1963.
X

Feedback for Dagstuhl Publishing