The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance

Authors Gary Hoppenworth, Jason W. Bentley, Daniel Gibney, Sharma V. Thankachan



PDF
Thumbnail PDF

File

LIPIcs.ESA.2020.61.pdf
  • Filesize: 0.6 MB
  • 19 pages

Document Identifiers

Author Details

Gary Hoppenworth
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA
Jason W. Bentley
  • Department of Mathematics, University of Central Florida, Orlando, FL, USA
Daniel Gibney
  • Department of Computer Science, University of Central Florida, Orlando, FL,USA
Sharma V. Thankachan
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA

Cite AsGet BibTex

Gary Hoppenworth, Jason W. Bentley, Daniel Gibney, and Sharma V. Thankachan. The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ESA.2020.61

Abstract

We present the first fine-grained complexity results on two classic problems on strings. The first one is the k-Median-Edit-Distance problem, where the input is a collection of k strings, each of length at most n, and the task is to find a new string that minimizes the sum of the edit distances from itself to all other strings in the input. Arising frequently in computational biology, this problem provides an important generalization of edit distance to multiple strings and is similar to the multiple sequence alignment problem in bioinformatics. We demonstrate that for any ε > 0 and k ≥ 2, an O(n^{k-ε}) time solution for the k-Median-Edit-Distance problem over an alphabet of size O(k) refutes the Strong Exponential Time Hypothesis (SETH). This provides the first matching conditional lower bound for the O(n^k) time algorithm established in 1975 by Sankoff. The second problem we study is the k-Center-Edit-Distance problem. Here also, the input is a collection of k strings, each of length at most n. The task is to find a new string that minimizes the maximum edit distance from itself to any other string in the input. We prove that the same conditional lower bound as before holds. Our results also imply new conditional lower bounds for the k-Tree-Alignment and the k-Bottleneck-Tree-Alignment problems studied in phylogenetics.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Edit Distance
  • Median String
  • Center String
  • SETH

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Or Zamir. Subtree isomorphism revisited. ACM Trans. Algorithms, 14(3):27:1-27:23, 2018. URL: https://doi.org/10.1145/3093239.
  2. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  3. Amir Abboud, Karl Bringmann, Danny Hermelin, and Dvir Shabtay. Seth-based lower bounds for subset sum and bicriteria path. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 41-57, 2019. URL: https://doi.org/10.1137/1.9781611975482.3.
  4. Amir Abboud, Fabrizio Grandoni, and Virginia Vassilevska Williams. Subcubic equivalences between graph centrality problems, APSP and diameter. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 1681-1697. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.112.
  5. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 375-388, 2016. URL: https://doi.org/10.1145/2897518.2897653.
  6. Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 434-443, 2014. URL: https://doi.org/10.1109/FOCS.2014.53.
  7. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, volume 8572 of Lecture Notes in Computer Science, pages 39-51. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_4.
  8. Amir Abboud, Virginia Vassilevska Williams, and Huacheng Yu. Matching triangles and basing hardness on an extremely popular conjecture. SIAM J. Comput., 47(3):1098-1122, 2018. URL: https://doi.org/10.1137/15M1050987.
  9. Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 457-466, 2016. URL: https://doi.org/10.1109/FOCS.2016.56.
  10. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  11. Christina Boucher and Mohamed Omar. On the hardness of counting and sampling center strings. IEEE/ACM Trans. Comput. Biology Bioinform., 9(6):1843-1846, 2012. URL: https://doi.org/10.1109/TCBB.2012.84.
  12. Karl Bringmann. Why walking the dog takes time: Frechet distance has no strongly subquadratic algorithms unless SETH fails. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 661-670, 2014. URL: https://doi.org/10.1109/FOCS.2014.76.
  13. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 79-97, 2015. URL: https://doi.org/10.1109/FOCS.2015.15.
  14. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1216-1235, 2018. URL: https://doi.org/10.1137/1.9781611975031.79.
  15. Timothy M. Chan and Moshe Lewenstein. Clustered integer 3sum via additive combinatorics. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 31-40. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746568.
  16. Yi-Jun Chang. Hardness of RNA folding problem with four symbols. Theor. Comput. Sci., 757:11-26, 2019. URL: https://doi.org/10.1016/j.tcs.2018.07.010.
  17. Lijie Chen and Ryan Williams. An equivalence class for orthogonal vectors. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 21-40, 2019. URL: https://doi.org/10.1137/1.9781611975482.2.
  18. Yen Hung Chen and Chuan Yi Tang. On the bottleneck tree alignment problems. Inf. Sci., 180(11):2134-2141, 2010. URL: https://doi.org/10.1016/j.ins.2010.02.008.
  19. Raphaël Clifford, Allan Grønlund, Kasper Green Larsen, and Tatiana Starikovskaya. Upper and lower bounds for dynamic data structures on strings. In Rolf Niedermeier and Brigitte Vallée, editors, 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, February 28 to March 3, 2018, Caen, France, volume 96 of LIPIcs, pages 22:1-22:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.STACS.2018.22.
  20. Colin de la Higuera and Francisco Casacuberta. Topology of strings: Median string is np-complete. Theor. Comput. Sci., 230(1-2):39-48, 2000. URL: https://doi.org/10.1016/S0304-3975(97)00240-5.
  21. Erik D. Demaine, Andrea Lincoln, Quanquan C. Liu, Jayson Lynch, and Virginia Vassilevska Williams. Fine-grained I/O complexity via reductions: New lower bounds, faster algorithms, and a time hierarchy. In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, volume 94 of LIPIcs, pages 34:1-34:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ITCS.2018.34.
  22. Lech Duraj, Marvin Künnemann, and Adam Polak. Tight conditional lower bounds for longest common increasing subsequence. Algorithmica, 81(10):3968-3992, 2019. URL: https://doi.org/10.1007/s00453-018-0485-7.
  23. Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the complexity of string matching for graphs. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 55:1-55:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.55.
  24. Isaac Goldstein, Moshe Lewenstein, and Ely Porat. On the hardness of set disjointness and set intersection with bounded universe. In Pinyan Lu and Guochuan Zhang, editors, 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8-11, 2019, Shanghai University of Finance and Economics, Shanghai, China, volume 149 of LIPIcs, pages 7:1-7:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2019.7.
  25. Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Fixed-parameter algorithms for CLOSEST STRING and related problems. Algorithmica, 37(1):25-42, 2003. URL: https://doi.org/10.1007/s00453-003-1028-3.
  26. Franziska Hufsky, Léon Kuchenbecker, Katharina Jahn, Jens Stoye, and Sebastian Böcker. Swiftly computing center strings. In Algorithms in Bioinformatics, 10th International Workshop, WABI 2010, Liverpool, UK, September 6-8, 2010. Proceedings, pages 325-336, 2010. URL: https://doi.org/10.1007/978-3-642-15294-8_27.
  27. Franziska Hufsky, Léon Kuchenbecker, Katharina Jahn, Jens Stoye, and Sebastian Böcker. Swiftly computing center strings. BMC Bioinformatics, 12:106, 2011. URL: https://doi.org/10.1186/1471-2105-12-106.
  28. Tao Jiang, Lusheng Wang, and Kaizhong Zhang. Alignment of trees - an alternative to tree edit. In Maxime Crochemore and Dan Gusfield, editors, Combinatorial Pattern Matching, 5th Annual Symposium, CPM 94, Asilomar, California, USA, June 5-8, 1994, Proceedings, volume 807 of Lecture Notes in Computer Science, pages 75-86. Springer, 1994. URL: https://doi.org/10.1007/3-540-58094-8_7.
  29. Xiaoyi Jiang, Horst Bunke, and Janos Csirik. Median strings: A review. In Data Mining in Time Series Databases, pages 173-192. World Scientific, 2004. Google Scholar
  30. Teuvo Kohonen. Median strings. Pattern Recognition Letters, 3(5):309-313, 1985. URL: https://doi.org/10.1016/0167-8655(85)90061-3.
  31. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3sum conjecture. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1272-1287, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch89.
  32. Robert Krauthgamer and Ohad Trabelsi. Conditional lower bounds for all-pairs max-flow. ACM Trans. Algorithms, 14(4):42:1-42:15, 2018. URL: https://doi.org/10.1145/3212510.
  33. Ferenc Kruzslicz. Improved greedy algorithm for computing approximate median strings. Acta Cybern., 14(2):331-339, 1999. URL: http://www.inf.u-szeged.hu/actacybernetica/edb/vol14n2/Kruzslicz_1999_ActaCybernetica.xml.
  34. J. Kevin Lanctôt, Ming Li, Bin Ma, Shaojiu Wang, and Louxin Zhang. Distinguishing string selection problems. Inf. Comput., 185(1):41-55, 2003. URL: https://doi.org/10.1016/S0890-5401(03)00057-9.
  35. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. J. ACM, 49(2):157-171, 2002. URL: https://doi.org/10.1145/506147.506150.
  36. Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM J. Comput., 39(4):1432-1443, 2009. URL: https://doi.org/10.1137/080739069.
  37. Hiromitsu Maji and Taisuke Izumi. Listing center strings under the edit distance metric. In Combinatorial Optimization and Applications - 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, pages 771-782, 2015. URL: https://doi.org/10.1007/978-3-319-26626-8_57.
  38. Carlos D. Martínez-Hinarejos, Alfons Juan, Francisco Casacuberta, and Ramón Alberto Mollineda. Reducing the computational cost of computing approximated median strings. In Terry Caelli, Adnan Amin, Robert P. W. Duin, Mohamed S. Kamel, and Dick de Ridder, editors, Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshops SSPR 2002 and SPR 2002, Windsor, Ontario, Canada, August 6-9, 2002, Proceedings, volume 2396 of Lecture Notes in Computer Science, pages 47-55. Springer, 2002. URL: https://doi.org/10.1007/3-540-70659-3_4.
  39. François Nicolas and Eric Rivals. Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms, 3(2-4):390-415, 2005. URL: https://doi.org/10.1016/j.jda.2004.08.015.
  40. R. Ravi and John D. Kececioglu. Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree. Discrete Applied Mathematics, 88(1-3):355-366, 1998. URL: https://doi.org/10.1016/S0166-218X(98)00079-1.
  41. David Sankoff. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28(1):35-42, 1975. Google Scholar
  42. Jeong Seop Sim and Kunsoo Park. The consensus string problem for a metric is np-complete. J. Discrete Algorithms, 1(1):111-117, 2003. URL: https://doi.org/10.1016/S1570-8667(03)00011-X.
  43. Andrés Varón and Ward C. Wheeler. The tree alignment problem. BMC Bioinformatics, 13:293, 2012. URL: https://doi.org/10.1186/1471-2105-13-293.
  44. T. K. Vintsyuk. Speech discrimination by dynamic programming. Cybernetics, 4(1):52-57, 1968. Russian Kibernetika 4(1):81-88 (1968). Google Scholar
  45. Lusheng Wang and Dan Gusfield. Improved approximation algorithms for tree alignment. J. Algorithms, 25(2):255-273, 1997. URL: https://doi.org/10.1006/jagm.1997.0882.
  46. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail