Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

Author Brian Brubach



PDF
Thumbnail PDF

File

LIPIcs.CPM.2018.5.pdf
  • Filesize: 0.49 MB
  • 14 pages

Document Identifiers

Author Details

Brian Brubach
  • Department of Computer Science, University of Maryland, College Park, MD 20742, USA

Cite AsGet BibTex

Brian Brubach. Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.CPM.2018.5

Abstract

We present a new approach to approximating the Maximum Duo-Preservation String Mapping Problem (MPSM) based on massaging the constraints into a tractable matching problem. MPSM was introduced in Chen, Chen, Samatova, Peng, Wang, and Tang [Chen et al., 2014] as the complement to the well-studied Minimum Common String Partition problem (MCSP). Prior work also considers the k-MPSM and k-MCSP variants in which each letter occurs at most k times in each string. The authors of [Chen et al., 2014] showed a k^2-appoximation for k >= 3 and 2-approximation for k = 2. Boria, Kurpisz, Leppänen, and Mastrolilli [Boria et al., 2014] gave a 4-approximation independent of k and showed that even 2-MPSM is APX-Hard. A series of improvements led to the current best bounds of a (2 + epsilon)-approximation for any epsilon > 0 in n^{O(1/epsilon)} time for strings of length n and a 2.67-approximation running in O(n^2) time, both by Dudek, Gawrychowski, and Ostropolski-Nalewaja [Dudek et al., 2017]. Here, we show that a 2.67-approximation can surprisingly be achieved in O(n) time for alphabets of constant size and O(n + alpha^7) for alphabets of size alpha. Recently, Mehrabi [Mehrabi, 2017] introduced the more general weighted variant, Maximum Weight Duo-Preservation String Mapping (MWPSM) and provided a 6-approximation. Our approach gives a 2.67-approximation to this problem running in O(n^3) time. This approach can also find an 8/(3(1-epsilon))-approximation to MWPSM for any epsilon > 0 in O(n^2 epsilon^{-1} lg{epsilon^{-1}}) time using the approximate weighted matching algorithm of Duan and Pettie [Duan and Pettie, 2014]. Finally, we introduce the first streaming algorithm for MPSM. We show that a single pass suffices to find a 4-approximation on the size of an optimal solution using only O(alpha^2 lg{n}) space.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • approximation algorithm
  • maximum duo-preservation string mapping
  • minimum common string partition
  • string comparison
  • streaming algorithm
  • comparative genomics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal and R. Sharathkumar. Approximation algorithms for bipartite matching with metric and geometric costs. In Proceedings of the Forty-sixth Annual ACM Symposium on Theory of Computing, STOC '14, pages 555-564, New York, NY, USA, 2014. ACM. Google Scholar
  2. R. Bar-Yehuda and S. Even. A local-ratio theorem for approximating the weighted vertex cover problem. In G. Ausiello and M. Lucertini, editors, Analysis and Design of Algorithms for Combinatorial Problems, volume 109 of North-Holland Mathematics Studies, pages 27-45. North-Holland, 1985. Google Scholar
  3. Stefano Beretta, Mauro Castelli, and Riccardo Dondi. Parameterized tractability of the maximum-duo preservation string mapping problem. CoRR, abs/1512.03220, 2015. URL: http://arxiv.org/abs/1512.03220.
  4. Christian Blum, José A. Lozano, and Pinacho Davidson. Mathematical programming strategies for solving the minimum common string partition problem. European Journal of Operational Research, 242(3):769-777, 2015. Google Scholar
  5. Nicolas Boria, Gianpiero Cabodi, Paolo Camurati, Marco Palena, Paolo Pasini, and Stefano Quer. A 7/2-approximation algorithm for the maximum duo-preservation string mapping problem. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), volume 54 of Leibniz International Proceedings in Informatics (LIPIcs), pages 11:1-11:8, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  6. Nicolas Boria, Adam Kurpisz, Samuli Leppänen, and Monaldo Mastrolilli. Improved approximation for the maximum duo-preservation string mapping problem. In Algorithms in Bioinformatics: 14th International Workshop, WABI 2014, Wroclaw, Poland, September 8-10, 2014. Proceedings, pages 14-25, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg. Google Scholar
  7. Brian Brubach. Further improvement in approximating the maximum duo-preservation string mapping problem. In Martin Frith and Christian Nørgaard Storm Pedersen, editors, Algorithms in Bioinformatics, pages 52-64, Cham, 2016. Springer International Publishing. Google Scholar
  8. Laurent Bulteau, Guillaume Fertin, Christian Komusiewicz, and Irena Rusu. A fixed-parameter algorithm for minimum common string partition with few duplications. In Aaron Darling and Jens Stoye, editors, Algorithms in Bioinformatics: 13th International Workshop, WABI 2013, Sophia Antipolis, France, September 2-4, 2013. Proceedings, pages 244-258, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. Google Scholar
  9. Laurent Bulteau and Christian Komusiewicz. Minimum common string partition parameterized by partition size is fixed-parameter tractable. In Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '14, pages 102-121, Philadelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics. Google Scholar
  10. Wenbin Chen, Zhengzhang Chen, Nagiza F. Samatova, Lingxi Peng, Jianxiong Wang, and Maobin Tang. Solving the maximum duo-preservation string mapping problem with linear programming. Theoretical Computer Science, 530:1-11, 2014. Google Scholar
  11. Xin Chen, Jie Zheng, Zheng Fu, Peng Nan, Yang Zhong, S. Lonardi, and Tao Jiang. Assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(4):302-315, Oct 2005. Google Scholar
  12. Marek Chrobak, Petr Kolman, and Jiří Sgall. The greedy algorithm for the minimum common string partition problem. In Klaus Jansen, Sanjeev Khanna, José D. P. Rolim, and Dana Ron, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques: 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2004, and 8th International Workshop on Randomization and Computation, RANDOM 2004, Cambridge, MA, USA, August 22-24, 2004. Proceedings, pages 84-95, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. Google Scholar
  13. Graham Cormode and S. Muthukrishnan. The string edit distance matching problem with moves. ACM Trans. Algorithms, 3(1):2:1-2:19, 2007. Google Scholar
  14. Peter Damaschke. Minimum common string partition parameterized. In Keith A. Crandall and Jens Lagergren, editors, Algorithms in Bioinformatics: 8th International Workshop, WABI 2008, Karlsruhe, Germany, September 15-19, 2008. Proceedings, pages 87-98, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. Google Scholar
  15. Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. J. ACM, 61(1):1:1-1:23, 2014. Google Scholar
  16. Bartlomiej Dudek, Pawel Gawrychowski, and Piotr Ostropolski-Nalewaja. A family of approximation algorithms for the maximum duo-preservation string mapping problem. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), volume 78 of Leibniz International Proceedings in Informatics (LIPIcs), pages 10:1-10:14, Dagstuhl, Germany, 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  17. S. M. Ferdous and M. Sohel Rahman. Solving the minimum common string partition problem with the help of ants. In Ying Tan, Yuhui Shi, and Hongwei Mo, editors, Advances in Swarm Intelligence: 4th International Conference, ICSI 2013, Harbin, China, June 12-15, 2013, Proceedings, Part I, pages 306-313, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. Google Scholar
  18. S. M. Ferdous and M. Sohel Rahman. An integer programming formulation of the minimum common string partition problem. PLoS ONE, 10(7):1-16, 07 2015. Google Scholar
  19. Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, 34(3):596-615, jul 1987. Google Scholar
  20. Avraham Goldstein, Petr Kolman, and Jie Zheng. Minimum common string partition problem: Hardness and approximations. In Proceedings of the 15th International Conference on Algorithms and Computation, ISAAC'04, pages 484-495, Berlin, Heidelberg, 2004. Springer-Verlag. Google Scholar
  21. Isaac Goldstein and Moshe Lewenstein. Quick greedy computation for minimum common string partition. Theor. Comput. Sci., 542:98-107, 2014. Google Scholar
  22. RC Hardison. Comparative genomics. PLoS Biol, 1(2):e58, 2003. Google Scholar
  23. Haitao Jiang, Binhai Zhu, Daming Zhu, and Hong Zhu. Minimum common string partition revisited. Journal of Combinatorial Optimization, 23(4):519-527, 2012. Google Scholar
  24. Haim Kaplan and Nira Shafrir. The greedy algorithm for edit distance with moves. Information Processing Letters, 97(1):23-27, 2006. Google Scholar
  25. Petr Kolman and Tomasz Waleń. Approximating reversal distance for strings with bounded number of duplicates. Discrete Applied Mathematics, 155(3):327-336, 2007. Google Scholar
  26. Petr Kolman and Tomasz Waleń. Reversal distance for strings with duplicates: Linear time approximation using hitting set. In Thomas Erlebach and Christos Kaklamanis, editors, Approximation and Online Algorithms: 4th International Workshop, WAOA 2006, Zurich, Switzerland, September 14-15, 2006. Revised Papers, pages 279-289, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google Scholar
  27. Christian Komusiewicz, Mateus de Oliveira Oliveira, and Meirav Zehavi. Revisiting the parameterized complexity of maximum-duo preservation string mapping. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), volume 78 of Leibniz International Proceedings in Informatics (LIPIcs), pages 11:1-11:17, Dagstuhl, Germany, 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  28. Saeed Mehrabi. Approximating weighted duo-preservation in comparative genomics. In Yixin Cao and Jianer Chen, editors, Computing and Combinatorics, pages 396-406, Cham, 2017. Springer International Publishing. Google Scholar
  29. A. R. Mushegian. Foundations of Comparative Genomics. Elsevier, 1 2007. Google Scholar
  30. James B. Orlin. Max flows in O(nm) time, or better. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC '13, pages 765-774, New York, NY, USA, 2013. ACM. Google Scholar
  31. Krister M. Swenson, Mark Marron, Joel V. Earnest-Deyoung, and Bernard M. E. Moret. Approximating the true evolutionary distance between two genomes. J. Exp. Algorithmics, 12:3.5:1-3.5:17, 2008. Google Scholar
  32. Yao Xu, Yong Chen, Guohui Lin, Tian Liu, Taibo Luo, and Peng Zhang. A (1.4 + epsilon)-approximation algorithm for the 2-max-duo problem. In Yoshio Okamoto and Takeshi Tokuyama, editors, 28th International Symposium on Algorithms and Computation (ISAAC 2017), volume 92 of Leibniz International Proceedings in Informatics (LIPIcs), pages 66:1-66:12, Dagstuhl, Germany, 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail