Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

Authors Daniel Gibney , Sharma V. Thankachan



PDF
Thumbnail PDF

File

LIPIcs.STACS.2021.35.pdf
  • Filesize: 0.72 MB
  • 15 pages

Document Identifiers

Author Details

Daniel Gibney
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA
Sharma V. Thankachan
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA

Cite AsGet BibTex

Daniel Gibney and Sharma V. Thankachan. Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 35:1-35:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.STACS.2021.35

Abstract

This work establishes several strong hardness results on the problem of finding an ordering on a string’s alphabet that either minimizes or maximizes the number of factors in that string’s Lyndon factorization. In doing so, we demonstrate that these ordering problems are sufficiently complex to model a wide variety of ordering constraint satisfaction problems (OCSPs). Based on this, we prove that (i) the decision versions of both the minimization and maximization problems are NP-complete, (ii) for both the minimization and maximization problems there does not exist a constant approximation algorithm running in polynomial time under the Unique Game Conjecture and (iii) there does not exist an algorithm to solve the minimization problem in time poly(|T|) ⋅ 2^o(σlog σ) for a string T over an alphabet of size σ under the Exponential Time Hypothesis (essentially the brute force approach of trying every alphabet order is hard to improve significantly).

Subject Classification

ACM Subject Classification
  • Theory of computation → Problems, reductions and completeness
Keywords
  • Lyndon Factorization
  • String Algorithms
  • Burrows-Wheeler Transform

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "runs" theorem. SIAM J. Comput., 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
  2. Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piatkowski. Constructing the bijective BWT. CoRR, abs/1911.06985, 2019. URL: http://arxiv.org/abs/1911.06985.
  3. Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piatkowski. Indexing the bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, pages 17:1-17:14, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.17.
  4. Jason W. Bentley, Daniel Gibney, and Sharma V. Thankachan. On the complexity of bwt-runs minimization via alphabet reordering. In 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), pages 15:1-15:13, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.15.
  5. Moses Charikar, Venkatesan Guruswami, and Rajsekar Manokaran. Every permutation CSP of arity 3 is approximation resistant. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity, CCC 2009, Paris, France, 15-18 July 2009, pages 62-73, 2009. URL: https://doi.org/10.1109/CCC.2009.29.
  6. Moses Charikar, Konstantin Makarychev, and Yury Makarychev. On the advantage over random for maximum acyclic subgraph. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 625-633, 2007. URL: https://doi.org/10.1109/FOCS.2007.47.
  7. Kuo Tsai Chen, Ralph H Fox, and Roger C Lyndon. Free differential calculus, iv. the quotient groups of the lower central series. Annals of Mathematics, pages 81-95, 1958. Google Scholar
  8. Amanda Clare and Jacqueline W. Daykin. Enhanced string factoring from alphabet orderings. Inf. Process. Lett., 143:4-7, 2019. URL: https://doi.org/10.1016/j.ipl.2018.10.011.
  9. Amanda Clare, Jacqueline W. Daykin, Thomas Mills, and Christine Zarges. Evolutionary search techniques for the lyndon factorization of biosequences. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, pages 1543-1550, 2019. URL: https://doi.org/10.1145/3319619.3326872.
  10. Maxime Crochemore and Dominique Perrin. Two-way string matching. J. ACM, 38(3):651-675, 1991. URL: https://doi.org/10.1145/116825.116845.
  11. Jean-Pierre Duval. Génération d'une section des classes de conjugaison et arbre des mots de lyndon de longueur bornée. Theor. Comput. Sci., 60:255-283, 1988. URL: https://doi.org/10.1016/0304-3975(88)90113-2.
  12. Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon factorization of grammar compressed texts revisited. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 24:1-24:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.24.
  13. Joseph Yossi Gil and David Allen Scott. A bijective string sorting transform. CoRR, abs/1201.3077, 2012. URL: http://arxiv.org/abs/1201.3077.
  14. Venkatesan Guruswami, Johan Håstad, Rajsekar Manokaran, Prasad Raghavendra, and Moses Charikar. Beating the random ordering is hard: Every ordering CSP is approximation resistant. SIAM J. Comput., 40(3):878-914, 2011. URL: https://doi.org/10.1137/090756144.
  15. Venkatesan Guruswami and Yuan Zhou. Approximating bounded occurrence ordering csps. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings, pages 158-169, 2012. URL: https://doi.org/10.1007/978-3-642-32512-0_14.
  16. Johan Håstad. Some optimal inapproximability results. In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 1-10, 1997. URL: https://doi.org/10.1145/258533.258536.
  17. Christophe Hohlweg and Christophe Reutenauer. Lyndon words, permutations and trees. Theor. Comput. Sci., 307(1):173-178, 2003. URL: https://doi.org/10.1016/S0304-3975(03)00099-9.
  18. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci., 656:215-224, 2016. URL: https://doi.org/10.1016/j.tcs.2016.03.005.
  19. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727.
  20. Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, and Arseny M. Shur. On the size of lempel-ziv and lyndon factorizations. In Heribert Vollmer and Brigitte Vallée, editors, 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017, March 8-11, 2017, Hannover, Germany, volume 66 of LIPIcs, pages 45:1-45:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.STACS.2017.45.
  21. Subhash Khot. On the unique games conjecture. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA, Proceedings, page 3, 2005. URL: https://doi.org/10.1109/SFCS.2005.61.
  22. Dong Kyue Kim, Jeong Seop Sim, Heejin Park, and Kunsoo Park. Linear-time construction of suffix arrays. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, pages 186-199, 2003. URL: https://doi.org/10.1007/3-540-44888-8_14.
  23. Eun Jung Kim and Daniel Gonçalves. On exact algorithms for the permutation CSP. Theor. Comput. Sci., 511:109-116, 2013. URL: https://doi.org/10.1016/j.tcs.2012.10.035.
  24. Manfred Kufleitner. On bijective variants of the burrows-wheeler transform. In Proceedings of the Prague Stringology Conference 2009, Prague, Czech Republic, August 31 - September 2, 2009, pages 65-79, 2009. URL: http://www.stringology.org/event/2009/p07.html.
  25. Pierre Lalonde and Arun Ram. Standard lyndon bases of lie algebras and enveloping algebras. Transactions of the American Mathematical Society, 347(5):1821-1830, 1995. Google Scholar
  26. M. Lothaire. Combinatorics on words, volume 17. Cambridge university press, 1997. Google Scholar
  27. Lily Major, Amanda Clare, Jacqueline W. Daykin, Benjamin Mora, Leonel Jose Peña Gamboa, and Christine Zarges. Evaluation of a permutation-based evolutionary framework for lyndon factorizations. In Parallel Problem Solving from Nature - PPSN XVI - 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5-9, 2020, Proceedings, Part I, pages 390-403, 2020. URL: https://doi.org/10.1007/978-3-030-58112-1_27.
  28. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Sorting suffixes of a text via its lyndon factorization. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference 2013, Prague, Czech Republic, September 2-4, 2013, pages 119-127. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2013. URL: http://www.stringology.org/event/2013/p11.html.
  29. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Suffix array and lyndon factorization of a text. J. Discrete Algorithms, 28:2-8, 2014. URL: https://doi.org/10.1016/j.jda.2014.06.001.
  30. Marcin Mucha. Lyndon words and short superstrings. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 958-972, 2013. URL: https://doi.org/10.1137/1.9781611973105.69.
  31. Alantha Newman. Cuts and orderings: On semidefinite relaxations for the linear ordering problem. In Approximation, Randomization, and Combinatorial Optimization, Algorithms and Techniques, 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2004, and 8th International Workshop on Randomization and Computation, RANDOM 2004, Cambridge, MA, USA, August 22-24, 2004, Proceedings, pages 195-206, 2004. URL: https://doi.org/10.1007/978-3-540-27821-4_18.
  32. Jaroslav Opatrny. Total ordering problem. SIAM J. Comput., 8(1):111-114, 1979. URL: https://doi.org/10.1137/0208008.
  33. Prasad Raghavendra. Optimal algorithms and inapproximability results for every csp? In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada, May 17-20, 2008, pages 245-254, 2008. URL: https://doi.org/10.1145/1374376.1374414.
  34. Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Grammar-compressed self-index with lyndon words. CoRR, abs/2004.05309, 2020. URL: http://arxiv.org/abs/2004.05309.
  35. Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. On the size of overlapping lempel-ziv and lyndon factorizations. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, volume 128 of LIPIcs, pages 29:1-29:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.29.