Can We Recover the Cover?

Amir, Amihood; Levy, Avivit; Lewenstein, Moshe; Lubin, Ronit; Porat, Benny

doi:10.4230/LIPIcs.CPM.2017.25

Abstract

Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are NP-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions.

We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be NP-hard [Amir,Levy,Lubin,Porat, CPM 2017], we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the cover recovery problem (CRP).

Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. URL: http://dx.doi.org/10.1137/0216067.
Amihood Amir, Mika Amit, Gad M. Landau, and Dina Sokol. Period recovery over the Hamming and edit distances. In Evangelos Kranakis, Gonzalo Navarro, and Edgar Chávez, editors, Proceedings of the 12th Latin American Symposium on Theoretical Informatics (LATIN 2016), volume 9644 of LNCS, pages 55-67. Springer, 2016. URL: http://dx.doi.org/10.1007/978-3-662-49529-2_5.
Amihood Amir, Estrella Eisenberg, and Avivit Levy. Approximate periodicity. In Otfried Cheong, Kyung-Yong Chwa, and Kunsoo Park, editors, Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC 2010), volume 6506 of LNCS, pages 25-36. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-17517-6_5.
Amihood Amir, Estrella Eisenberg, Avivit Levy, Ely Porat, and Natalie Shapira. Cycle detection and correction. ACM Trans. Algorithms, 9(1):13:1-13:20, 2012. URL: http://dx.doi.org/10.1145/2390176.2390189.
Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate cover of strings. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), volume 78 of LIPIcs, pages 26:1-26:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2017.26.
Pavlos Antoniou, Maxime Crochemore, Costas S. Iliopoulos, Inuka Jayasekera, and Gad M. Landau. Conservative string covering of indeterminate strings. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference (PSC 2008), pages 108-115. Czech Technical University in Prague, 2008. URL: http://www.stringology.org/event/2008/p10.html.
Alberto Apostolico and Dany Breslauer. Of periods, quasiperiods, repetitions and covers. In Jan Mycielski, Grzegorz Rozenberg, and Arto Salomaa, editors, Structures in Logic and Computer Science: A Selection of Essays in Honor of Andrzej Ehrenfeucht, volume 1261 of LNCS, pages 236-248. Springer, 1997. URL: http://dx.doi.org/10.1007/3-540-63246-8_14.
Alberto Apostolico and Andrzej Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theor. Comput. Sci., 119(2):247-265, 1993. URL: http://dx.doi.org/10.1016/0304-3975(93)90159-Q.
Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Inf. Process. Lett., 39(1):17-20, 1991. URL: http://dx.doi.org/10.1016/0020-0190(91)90056-N.
Dany Breslauer. An on-line string superprimitivity test. Inf. Process. Lett., 44(6):345-347, 1992. URL: http://dx.doi.org/10.1016/0020-0190(92)90111-8.
Dany Breslauer. Testing string superprimitivity in parallel. Inf. Process. Lett., 49(5):235-241, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90060-4.
Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. Approximate seeds of strings. J. Autom. Lang. Comb., 10(5/6):609-626, 2005.
Tim Crawford, Costas S. Iliopoulos, and Rajeev Raman. String-matching techniques for musical similarity and melodic recognition. In Walter B. Hewlett and Eleanor S. Field, editors, Melodic Similarity: Concepts, Procedures, and Applications, volume 11 of Computing in Musicology, pages 73-100. MIT Press, Cambridge, Massachusetts, 1998.
Maxime Crochemore, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. Cover array string reconstruction. In Amihood Amir and Laxmi Parida, editors, Proceedings of the 21st Annual Symposium on Combinatorial Pattern Matching (CPM 2010), volume 6129 of LNCS, pages 251-259. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_23.
Maxime Crochemore, Costas S. Iliopoulos, and Hiafeng Yu. Algorithms for computing evolutionary chains in molecular and musical sequences. In Costas S. Iliopoulos, editor, Proceedings of the 9th Australian Workshop on Combinatorial Algorithms (AWOCA 1998), pages 172-184, France, 1998. URL: https://hal-upec-upem.archives-ouvertes.fr/hal-00619988/file/9807-EC.pdf.
Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. Enhanced string covering. Theor. Comput. Sci., 506:102-114, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2013.08.013.
Ondřej Guth and Bořivoj Melichar. Using finite automata approach for searching approximate seeds of strings. In Xu Huang, Sio-Iong Ao, and Oscar Castillo, editors, Intelligent Automation and Computer Engineering, volume 52 of Lecture Notes in Electrical Engineering, pages 347-360. Springer Netherlands, 2010. URL: http://dx.doi.org/10.1007/978-90-481-3517-2_27.
Ondřej Guth, Bořivoj Melichar, and Miroslav Balík. Searching all approximate covers and their distance using finite automata. In Peter Vojtáš, editor, Proceedings of the Conference on Theory and Practice of Information Technologies (ITAT 2008), volume 414 of CEUR Workshop Proceedings, pages 21-26, 2009. URL: http://ceur-ws.org/Vol-414/paper4.pdf.
Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. Covering a string. Algorithmica, 16(3):288-297, 1996. URL: http://dx.doi.org/10.1007/BF01955677.
Costas S. Iliopoulos and Laurent Mouchard. Quasiperiodicity and string covering. Theor. Comput. Sci., 218(1):205-216, 1999. URL: http://dx.doi.org/10.1016/S0304-3975(98)00260-6.
Costas S. Iliopoulos and William F. Smyth. An on-line algorithm of computing a minimum set of k-covers of a string. In Costas S. Iliopoulos, editor, Proceedings of the 9th Australian Workshop on Combinatorial Algorithms (AWOCA 1998), pages 97-106, 1998.
Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977. URL: http://dx.doi.org/10.1137/0206024.
Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Fast algorithm for partial covers in words. Algorithmica, 73(1):217-233, 2015. URL: http://dx.doi.org/10.1007/s00453-014-9915-3.
Roman M. Kolpakov and Gregory Kucherov. Finding approximate repetitions under Hamming distance. Theor. Comput. Sci., 1(303):135-156, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00448-6.
Gad M. Landau and Jeanette P. Schmidt. An algorithm for approximate tandem repeats. In Alberto Apostolico, Maxime Crochemore, Zvi Galil, and Udi Manber, editors, Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM 1993), volume 684 of LNCS, pages 120-133. Springer, 1993. URL: http://dx.doi.org/10.1007/BFb0029801.
Gad M. Landau, Jeanette P. Schmidt, and Dina Sokol. An algorithm for approximate tandem repeats. J. Comput. Biol., 8(1):1-18, 2001. URL: http://dx.doi.org/10.1089/106652701300099038.
Yin Li and William F. Smyth. Computing the cover array in linear time. Algorithmica, 32(1):95-106, 2002. URL: http://dx.doi.org/10.1007/s00453-001-0062-2.
M. Lothaire, editor. Combinatorics on words. Cambridge Mathematical Library. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/CBO9780511566097.
Dennis Moore and William F. Smyth. An optimal algorithm to compute all the covers of a string. Inf. Process. Lett., 50(5):239-246, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)00045-X.
Dennis Moore and William F. Smyth. A correction to "An optimal algorithm to compute all the covers of a string". Inf. Process. Lett., 54(2):101-103, 1995. URL: http://dx.doi.org/10.1016/0020-0190(94)00235-Q.
Jeong Seop Sim, Costas S. Iliopoulos, Kunsoo Park, and William F. Smyth. Approximate periods of strings. Theor. Comput. Sci., 262(1):557-568, 2001. URL: http://dx.doi.org/10.1016/S0304-3975(00)00365-0.
William F. Smyth. Repetitive perhaps, but certainly not boring. Theor. Comput. Sci., 249(2):343-355, 2000. URL: http://dx.doi.org/10.1016/S0304-3975(00)00067-0.
Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Algorithms for computing the lambda-regularities in strings. Fundam. Inform., 84(1):33-49, 2008. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi84-1-04.
Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Varieties of regularities in weighted sequences. In Bo Chen, editor, Proceedings of the 6th International Conference on Algorithmic Aspects in Information and Management (AAIM 2010), volume 6124 of LNCS, pages 271-280. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-14355-7_28.

Can We Recover the Cover?

Authors Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, Benny Porat

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message