The Number of Repetitions in 2D-Strings

Charalampopoulos, Panagiotis; Radoszewski, Jakub; Rytter, Wojciech; Waleń, Tomasz; Zuba, Wiktor

doi:10.4230/LIPIcs.ESA.2020.32

Abstract

The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are 𝒪(n³) of them in an n × n 2D-string and presented a simple construction giving a lower bound of Ω(n²) for their number (Theoretical Computer Science, 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an n × n 2D-string is 𝒪(n² log² n). In particular, our bound implies that the 𝒪(n²log n + output) run-time of the algorithm of Amir et al. for computing 2D-runs is also 𝒪(n² log² n). We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching.
A quartic is a 2D-string composed of 2 × 2 identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (Theoretical Computer Science, 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are 𝒪(n² log² n) occurrences of primitively rooted quartics in an n × n 2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is 𝒪(n² log² n). The straightforward bound for the maximal number of distinct general quartics is 𝒪(n⁴). Here, we prove that the number of distinct general quartics is also 𝒪(n² log² n). This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (Journal of Combinatorial Theory, Series A, 1998), to two dimensions.
Finally, we show some algorithmic applications of 2D-runs. Specifically, we present algorithms for computing all occurrences of primitively rooted quartics and counting all general distinct quartics in 𝒪(n² log² n) time, which is quasi-linear with respect to the size of the input. The former algorithm is optimal due to the lower bound of Apostolico and Brimkov. The latter can be seen as a continuation of works on enumeration of distinct squares in 1D-strings using runs (Crochemore et al., Theoretical Computer Science, 2014). However, the methods used in 2D are different because of different properties of 2D-runs and quartics.

Amihood Amir, Gary Benson, and Martin Farach. An alphabet independent approach to two-dimensional pattern matching. SIAM Journal on Computing, 23(2):313-323, 1994. URL: https://doi.org/10.1137/S0097539792226321.
Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition detection in a dynamic string. In 27th Annual European Symposium on Algorithms, ESA 2019, volume 144 of LIPIcs, pages 5:1-5:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.5.
Amihood Amir, Ayelet Butman, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Double string tandem repeats. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 3:1-3:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.3.
Amihood Amir and Martin Farach. Efficient 2-dimensional approximate matching of non-rectangular figures. In Proceedings of the Second Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, pages 212-223. ACM/SIAM, 1991. URL: http://dl.acm.org/citation.cfm?id=127787.127829.
Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Two-dimensional maximal repetitions. In 26th Annual European Symposium on Algorithms, ESA 2018, volume 112 of LIPIcs, pages 2:1-2:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ESA.2018.2.
Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Two-dimensional maximal repetitions. Theoretical Computer Science, 812:49-61, 2020. URL: https://doi.org/10.1016/j.tcs.2019.07.006.
Alberto Apostolico and Valentin E. Brimkov. Fibonacci arrays and their two-dimensional repetitions. Theoretical Computer Science, 237(1-2):263-273, 2000. URL: https://doi.org/10.1016/S0304-3975(98)00182-0.
Alberto Apostolico and Valentin E. Brimkov. Optimal discovery of repetitions in 2D. Discrete Applied Mathematics, 151(1-3):5-20, 2005. URL: https://doi.org/10.1016/j.dam.2005.02.019.
Theodore P. Baker. A technique for extending rapid exact-match string matching to arrays of more than one dimension. SIAM Journal on Computing, 7(4):533-541, 1978. URL: https://doi.org/10.1137/0207043.
Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "runs" theorem. SIAM Journal on Computing, 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 of LIPIcs, pages 22:1-22:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.22.
Jon Louis Bentley. Algorithms for Klee’s rectangle problems. Unpublished notes, Computer Science Department, Carnegie Mellon University, 1977.
Omer Berkman, Baruch Schieber, and Uzi Vishkin. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. Journal of Algorithms, 14(3):344-370, 1993. URL: https://doi.org/10.1006/jagm.1993.1018.
Richard S. Bird. Two dimensional pattern matching. Information Processing Letters, 6(5):168-170, 1977. URL: https://doi.org/10.1016/0020-0190(77)90017-5.
Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Counting distinct patterns in internal dictionary matching. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 8:1-8:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.8.
Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal dictionary matching. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, volume 149 of LIPIcs, pages 22:1-22:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2019.22.
Maxime Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244-250, 1981. URL: https://doi.org/10.1016/0020-0190(81)90024-7.
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007.
Maxime Crochemore and Lucian Ilie. Analysis of maximal repetitions in strings. In Mathematical Foundations of Computer Science 2007, 32nd International Symposium, MFCS 2007, volume 4708 of Lecture Notes in Computer Science, pages 465-476. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-74456-6_42.
Maxime Crochemore and Lucian Ilie. Maximal repetitions in strings. Journal of Computer and System Sciences, 74(5):796-807, 2008. URL: https://doi.org/10.1016/j.jcss.2007.09.003.
Maxime Crochemore, Lucian Ilie, and Liviu Tinta. The "runs" conjecture. Theoretical Computer Science, 412(27):2931-2941, 2011. URL: https://doi.org/10.1016/j.tcs.2010.06.019.
Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Extracting powers and periods in a word from its runs structure. Theoretical Computer Science, 521:29-41, 2014. URL: https://doi.org/10.1016/j.tcs.2013.11.018.
Maxime Crochemore and Robert Mercaş. On the density of Lyndon roots in factors. Theoretical Computer Science, 656:234-240, 2016. URL: https://doi.org/10.1016/j.tcs.2016.02.015.
Maxime Crochemore and Wojciech Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13(5):405-425, 1995. URL: https://doi.org/10.1007/BF01190846.
Maxime Crochemore and Wojciech Rytter. Jewels of stringology. World Scientific, 2002. URL: https://doi.org/10.1142/4838.
Antoine Deza, Frantisek Franek, and Adrien Thierry. How many double squares can a string contain? Discrete Applied Mathematics, 180:52-69, 2015. URL: https://doi.org/10.1016/j.dam.2014.08.016.
Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: https://doi.org/10.2307/2034009.
Johannes Fischer, Stepan Holub, Tomohiro I, and Moshe Lewenstein. Beyond the runs theorem. In String Processing and Information Retrieval - 22nd International Symposium, SPIRE 2015, volume 9309 of Lecture Notes in Computer Science, pages 277-286. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23826-5_27.
Aviezri S. Fraenkel and Jamie Simpson. How many squares can a string contain? Journal of Combinatorial Theory, Series A, 82(1):112-120, 1998. URL: https://doi.org/10.1006/jcta.1997.2843.
Frantisek Franek and Qian Yang. An asymptotic lower bound for the maximal number of runs in a string. International Journal of Foundations of Computer Science, 19(1):195-203, 2008. URL: https://doi.org/10.1142/S0129054108005620.
Mathieu Giraud. Not so many runs in strings. In Language and Automata Theory and Applications, Second International Conference, LATA 2008, volume 5196 of Lecture Notes in Computer Science, pages 232-239. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-88282-4_22.
Amy Glen and Jamie Simpson. The total run length of a word. Theoretical Computer Science, 501:41-48, 2013. URL: https://doi.org/10.1016/j.tcs.2013.06.004.
Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences, 69(4):525-546, 2004. URL: https://doi.org/10.1016/j.jcss.2004.03.004.
Lucian Ilie. A note on the number of squares in a word. Theoretical Computer Science, 380(3):373-376, 2007. URL: https://doi.org/10.1016/j.tcs.2007.03.025.
Costas S. Iliopoulos, Dennis W. G. Moore, and William F. Smyth. A characterization of the squares in a Fibonacci string. Theoretical Computer Science, 172(1-2):281-291, 1997. URL: https://doi.org/10.1016/S0304-3975(96)00141-7.
Dominik Kempa and Tomasz Kociumaka. String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 756-767. ACM, 2019. URL: https://doi.org/10.1145/3313276.3316368.
Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 532-551. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In 40th Annual Symposium on Foundations of Computer Science, FOCS 1999, pages 596-604. IEEE Computer Society, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814634.
M. Lothaire. Combinatorics on words, Second Edition. Cambridge mathematical library. Cambridge University Press, 1997.
Shoshana Marcus and Dina Sokol. 2D Lyndon words and applications. Algorithmica, 77(1):116-133, 2017. URL: https://doi.org/10.1007/s00453-015-0065-z.
Wataru Matsubara, Kazuhiko Kusano, Akira Ishino, Hideo Bannai, and Ayumi Shinohara. New lower bounds for the maximum number of runs in a string. In Proceedings of the Prague Stringology Conference 2008, pages 140-145, 2008. URL: http://www.stringology.org/event/2008/p13.html.
Simon J. Puglisi, Jamie Simpson, and William F. Smyth. How many runs can a string contain? Theoretical Computer Science, 401(1-3):165-171, 2008. URL: https://doi.org/10.1016/j.tcs.2008.04.020.
Wojciech Rytter. The number of runs in a string: Improved analysis of the linear upper bound. In 23rd Annual Symposium on Theoretical Aspects of Computer Science, STACS 2006, volume 3884 of Lecture Notes in Computer Science, pages 184-195. Springer, 2006. URL: https://doi.org/10.1007/11672142_14.
Wojciech Rytter. The number of runs in a string. Information and Computation, 205(9):1459-1469, 2007. URL: https://doi.org/10.1016/j.ic.2007.01.007.
Jamie Simpson. Modified Padovan words and the maximum number of runs in a word. The Australasian Journal of Combinatorics, 46:129-146, 2010. URL: http://ajc.maths.uq.edu.au/pdf/46/ajc_v46_p129.pdf.
Jens Stoye and Dan Gusfield. Simple and flexible detection of contiguous repeats using a suffix tree. Theoretical Computer Science, 270(1-2):843-856, 2002. URL: https://doi.org/10.1016/S0304-3975(01)00121-9.
Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: https://doi.org/10.1007/BF01206331.

The Number of Repetitions in 2D-Strings

Authors Panagiotis Charalampopoulos , Jakub Radoszewski , Wojciech Rytter , Tomasz Waleń , Wiktor Zuba

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

The Number of Repetitions in 2D-Strings

Authors Panagiotis Charalampopoulos , Jakub Radoszewski , Wojciech Rytter , Tomasz Waleń , Wiktor Zuba

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message