Enumerating Regular Languages with Bounded Delay

Amarilli, Antoine; Monet, Mikaël

doi:10.4230/LIPIcs.STACS.2023.8

Abstract

We study the task, for a given language L, of enumerating the (generally infinite) sequence of its words, without repetitions, while bounding the delay between two consecutive words. To allow for delay bounds that do not depend on the current word length, we assume a model where we produce each word by editing the preceding word with a small edit script, rather than writing out the word from scratch. In particular, this witnesses that the language is orderable, i.e., we can write its words as an infinite sequence such that the Levenshtein edit distance between any two consecutive words is bounded by a value that depends only on the language. For instance, (a+b)^* is orderable (with a variant of the Gray code), but a^* + b^* is not.
We characterize which regular languages are enumerable in this sense, and show that this can be decided in PTIME in an input deterministic finite automaton (DFA) for the language. In fact, we show that, given a DFA A, we can compute in PTIME automata A₁, …, A_t such that L(A) is partitioned as L(A₁) ⊔ … ⊔ L(A_t) and every L(A_i) is orderable in this sense. Further, we show that the value of t obtained is optimal, i.e., we cannot partition L(A) into less than t orderable languages.
In the case where L(A) is orderable (i.e., t = 1), we show that the ordering can be produced by a bounded-delay algorithm: specifically, the algorithm runs in a suitable pointer machine model, and produces a sequence of bounded-length edit scripts to visit the words of L(A) without repetitions, with bounded delay - exponential in |A| - between each script. In fact, we show that we can achieve this while only allowing the edit operations push and pop at the beginning and end of the word, which implies that the word can in fact be maintained in a double-ended queue.
By contrast, when fixing the distance bound d between consecutive words and the number of classes of the partition, it is NP-hard in the input DFA A to decide if L(A) is orderable in this sense, already for finite languages. 
Last, we study the model where push-pop edits are only allowed at the end of the word, corresponding to a case where the word is maintained on a stack. We show that these operations are strictly weaker and that the slender languages are precisely those that can be partitioned into finitely many languages that are orderable in this sense. For the slender languages, we can again characterize the minimal number of languages in the partition, and achieve bounded-delay enumeration.

Cite As Get BibTex

Antoine Amarilli and Mikaël Monet. Enumerating Regular Languages with Bounded Delay. In 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 254, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.STACS.2023.8

Author Details

Antoine Amarilli

LTCI, Télécom Paris, Institut Polytechnique de Paris, France

Mikaël Monet

Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France

Funding

Amarilli, Antoine: Partially supported by the ANR project EQUUS ANR-19-CE48-0019 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 431183758.

Acknowledgements

We thank Florent Capelli and Charles Paperman for their insights during initial discussions about this problem. We thank user pcpthm on the Theoretical Computer Science Stack Exchange forum for giving the argument for Proposition 6.1 in https://cstheory.stackexchange.com/users/65605/pcpthm. We thank Jeffrey Shallit for pointing us to related work. We thank Torsten Mütze and Arturo Merino for other helpful pointers. We thank the anonymous reviewers for their valuable feedback. Finally, we are grateful to Louis Jachiet and Lê Thành Dũng (Tito) Nguyễn for feedback on the draft.

References

Margareta Ackerman and Erkki Mäkinen. Three new algorithms for regular language enumeration. In ICCC, 2009. URL: https://maya-ackerman.com/wp-content/uploads/2018/09/ThreeNewAlgorithmsForRegularLanEnum.pdf.
Margareta Ackerman and Jeffrey Shallit. Efficient enumeration of words in regular languages. Theoretical Computer Science, 410(37), 2009. URL: https://maya-ackerman.com/wp-content/uploads/2018/09/Enumeration_AckermanShallit_TCS.pdf.
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-delay enumeration for nondeterministic document spanners. In ICDT, 2019. URL: http://arxiv.org/abs/1807.09320.
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Enumeration on trees with tractable combined complexity and efficient updates. In https://sigmod2019.org/, 2019. URL: http://arxiv.org/abs/1812.09519.
Antoine Amarilli and Mikaël Monet. Enumerating regular languages with bounded delay, 2023. Full version with proofs. URL: http://arxiv.org/abs/2209.14878.
Guillaume Bagan. MSO queries on tree decomposable structures are computable with linear delay. In CSL, 2006.
Guillaume Bagan, Arnaud Durand, and Étienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In CSL, 2007. URL: https://grandjean.users.greyc.fr/Recherche/PublisGrandjean/EnumAcyclicCSL07.pdf.
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2), 2015. URL: https://pdfs.semanticscholar.org/8df0/ad1c6aa0df93e58071b8afe3371a16a3182f.pdf, URL: https://doi.org/10.1145/2699442.
Rainer Feldmann and Peter Mysliwietz. The shuffle exchange network has a Hamiltonian path. Mathematical systems theory, 29(5), 1996.
Lukas Fleischer and Jeffrey Shallit. Recognizing lexicographically smallest words and computing successors in regular languages. International Journal of Foundations of Computer Science, 32(06), 2021.
Fernando Florenzano, Cristian Riveros, Martin Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. Constant delay algorithms for regular document spanners. In PODS, 2018. URL: http://arxiv.org/abs/1803.05277.
Alon Itai, Christos H Papadimitriou, and Jayme Luiz Szwarcfiter. Hamilton paths in grid graphs. SIAM Journal on Computing, 11(4):676-686, 1982. URL: http://www.cs.technion.ac.il/~itai/publications/Algorithms/Hamilton-paths.pdf.
Jerome J. Karaganis. On the cube of a graph. Canadian Mathematical Bulletin, 11(2), 1968.
Wojciech Kazana and Luc Segoufin. Enumeration of monadic second-order queries on trees. TOCL, 14(4), 2013. URL: https://hal.archives-ouvertes.fr/docs/00/90/70/85/PDF/cdlin-survey.pdf.
Erkki Mäkinen. On lexicographic enumeration of regular and context-free languages. Acta Cybernetica, 13(1):55-61, 1997. URL: http://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3479/3464.
Torsten Mütze. Proof of the middle levels conjecture. Proceedings of the London Mathematical Society, 112(4):677-713, 2016. URL: http://arxiv.org/abs/1404.4442.
Torsten Mütze. Combinatorial Gray codes - An updated survey, 2022. URL: http://arxiv.org/abs/2202.01280.
pcpthm (https://cstheory.stackexchange.com/users/65605/pcpthm). Enumerating finite set of words with Hamming distance 1. Theoretical Computer Science Stack Exchange. Version: 2022-07-02. URL: https://cstheory.stackexchange.com/q/51653.
Jean-Éric Pin. Mathematical foundations of automata theory, 2019. URL: https://www.irif.fr/~jep/PDF/MPRI/MPRI.pdf.
Jakub Radoszewski and Wojciech Rytter. Hamiltonian paths in the square of a tree. In ISAAC, 2011. URL: https://www.mimuw.edu.pl/~rytter/MYPAPERS/isaac2011_rytter.pdf.
Frank Ruskey. Combinatorial generation. Preliminary working draft, 2003. URL: https://page.math.tu-berlin.de/~felsner/SemWS17-18/Ruskey-Comb-Gen.pdf.
Milan Sekanina. On an ordering of the set of vertices of a connected graph. Publ. Fac. Sci. Univ. Brno, 412, 1960.
Yann Strozecki et al. Enumeration complexity. Bulletin of EATCS, 3(129), 2019. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/596.
Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. Journal of computer and system sciences, 18(2):110-127, 1979.
Takeaki Uno. Two general methods to reduce delay and change of enumeration algorithms. Technical report, National Institute of Informatics, 2003. URL: https://www.nii.ac.jp/TechReports/public_html/03-004E.pdf.
Kunihiro Wasa. Enumeration of enumeration algorithms. CoRR, 2016. URL: http://arxiv.org/abs/1605.05102.

Enumerating Regular Languages with Bounded Delay

Authors Antoine Amarilli , Mikaël Monet

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Enumerating Regular Languages with Bounded Delay

Authors Antoine Amarilli , Mikaël Monet

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message