Document

# Maximal Number of Subword Occurrences in a Word

## File

LIPIcs.AofA.2024.3.pdf
• Filesize: 0.69 MB
• 12 pages

## Acknowledgements

I would like to thank Stéphane Vialette for bringing the question of maximal number of subword occurrences of a given word to our attention, and for giving an idea for Proposition 3.4.

## Cite As

Wenjie Fang. Maximal Number of Subword Occurrences in a Word. In 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 302, pp. 3:1-3:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.AofA.2024.3

## Abstract

We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per letter has a limit value. A better upper bound of minimal subword entropy for a binary alphabet is then given by looking at certain families of periodic words. We also give some conjectures based on experimental observations.

## Subject Classification

##### ACM Subject Classification
• Mathematics of computing → Enumeration
• Mathematics of computing → Combinatorics on words
##### Keywords
• Subword occurrence
• subword entropy
• enumeration
• periodic words

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. A. Burstein, P. Hästö, and T. Mansour. Packing Patterns into Words. Eletron. J. Combin., 9(2), 2003. URL: https://doi.org/10.37236/1692.
2. A. Burstein and T. Mansour. Counting occurrences of some subword patterns. Discrete Mathematics & Theoretical Computer Science, Vol. 6 no. 1, January 2003. URL: https://doi.org/10.46298/dmtcs.320.
3. Wenjie Fang. fwjmath/maxocc-subword. Software, swhId: https://archive.softwareheritage.org/swh:1:dir:fef689a6896632f63f67b460e989fc106d5899e0;origin=https://github.com/fwjmath/maxocc-subword;visit=swh:1:snp:93b3836bd2f1078505ef49ee70d7bfaedcbda9cc;anchor=swh:1:rev:82a00ae9fddc73a2a246bfdb1980f1a39c3c8496 (visited on 2024-07-05). URL: https://github.com/fwjmath/maxocc-subword.
4. M. Fekete. Über die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzzahligen Koeffizienten. Math. Z., 17(1):228-249, 1923. URL: https://doi.org/10.1007/bf01504345.
5. Ph. Flajolet and R. Sedgewick. Analytic combinatorics. Cambridge University Press, Cambridge, 2009. URL: https://doi.org/10.1017/CBO9780511801655.
6. Ph. Flajolet, W. Szpankowski, and B. Vallée. Hidden word statistics. Journal of the ACM, 53(1):147-183, 2006. URL: https://doi.org/10.1145/1120582.1120586.
7. I. Gheorghiciuc and M. D. Ward. On Correlation Polynomials and Subword Complexity. Discrete Mathematics & Theoretical Computer Science, DMTCS Proceedings vol. AH, 2007 Conference on Analysis of Algorithms (AofA 2007), 2007. URL: https://doi.org/10.46298/dmtcs.3553.
8. K. Iwanuma, R. Ishihara, Y. Takano, and H. Nabeshima. Extracting Frequent Subsequences from a Single Long Data Sequence: A Novel Anti-Monotonic Measure and a Simple On-Line Algorithm. In Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, 2005. URL: https://doi.org/10.1109/icdm.2005.60.
9. S. Kitaev. Patterns in Permutations and Words. Springer Berlin Heidelberg, 2011. URL: https://doi.org/10.1007/978-3-642-17333-2.
10. S. Melczer. Algorithmic and Symbolic Combinatorics: An Invitation to Analytic Combinatorics in Several Variables. Springer International Publishing, 2021. URL: https://doi.org/10.1007/978-3-030-67080-1.
11. K. Menon and A. Singh. Subsequence frequency in binary words. Discrete Mathematics, 347(5):113928, May 2024. URL: https://doi.org/10.1016/j.disc.2024.113928.
12. M. Mishna. Analytic combinatorics: a multidimensional approach. Discrete Mathematics and its Applications (Boca Raton). CRC Press, 2020.
13. M. Morse and G. A. Hedlund. Symbolic dynamics. Amer. J. Math., 60(4):815, October 1938. URL: https://doi.org/10.2307/2371264.
14. V. Vatter. Permutation classes. In Handbook of Enumerative Combinatorics. CRC Press, 2015.
15. G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD04. ACM, August 2004. URL: https://doi.org/10.1145/1014052.1014091.
X

Feedback for Dagstuhl Publishing