Passive Learning of Regular Data Languages in Polynomial Time and Data

Balachander, Mrudula; Filiot, Emmanuel; Gentilini, Raffaella

doi:10.4230/LIPIcs.CONCUR.2024.10

Abstract

A regular data language is a language over an infinite alphabet recognized by a deterministic register automaton (DRA), as defined by Benedikt, Ley and Puppis. The later model, which is expressively equivalent to the deterministic finite-memory automata introduced earlier by Francez and Kaminsky, enjoys unique minimal automata (up to isomorphism), based on a Myhill-Nerode theorem.
In this paper, we introduce a polynomial time passive learning algorithm for regular data languages from positive and negative samples. Following Gold’s model for learning languages, we prove that our algorithm can identify in the limit any regular data language L, i.e. it returns a minimal DRA recognizing L if a characteristic sample set for L is provided as input. We prove that there exist characteristic sample sets of polynomial size with respect to the size of the minimal DRA recognizing L. To the best of our knowledge, it is the first passive learning algorithm for data languages, and the first learning algorithm which is fully polynomial, both with respect to time complexity and size of the characteristic sample set.

Parosh Aziz Abdulla, C. Aiswarya, and Mohamed Faouzi Atig. Data communicating processes with unreliable channels. In Martin Grohe, Eric Koskinen, and Natarajan Shankar, editors, Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS '16, New York, NY, USA, July 5-8, 2016, pages 166-175. ACM, 2016. URL: https://doi.org/10.1145/2933575.2934535.
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmet Kara, and Othmane Rezine. Verification of dynamic register automata. In Venkatesh Raman and S. P. Suresh, editors, 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, December 15-17, 2014, New Delhi, India, volume 29 of LIPIcs, pages 653-665. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2014.653.
László Babai. On the length of subgroup chains in the symmetric group. Communications in Algebra, 14(9):1729-1736, 1986. URL: https://doi.org/10.1080/00927878608823393.
Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008.
Mrudula Balachander, Emmanuel Filiot, and Jean-François Raskin. LTL reactive synthesis with a few hints. In Sriram Sankaranarayanan and Natasha Sharygina, editors, Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Paris, France, April 22-27, 2023, Proceedings, Part II, volume 13994 of Lecture Notes in Computer Science, pages 309-328. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-30820-8_20.
Michael Benedikt, Clemens Ley, and Gabriele Puppis. What you must remember when processing data words. In Alberto H. F. Laender and Laks V. S. Lakshmanan, editors, Proceedings of the 4th Alberto Mendelzon International Workshop on Foundations of Data Management, Buenos Aires, Argentina, May 17-20, 2010, volume 619 of CEUR Workshop Proceedings. CEUR-WS.org, 2010. URL: https://ceur-ws.org/Vol-619/paper11.pdf.
Therese Berg, Bengt Jonsson, and Harald Raffelt. Regular inference for state machines using domains with equality tests. In José Luiz Fiadeiro and Paola Inverardi, editors, Fundamental Approaches to Software Engineering, 11th International Conference, FASE 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4961 of Lecture Notes in Computer Science, pages 317-331. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-78743-3_24.
León Bohn and Christof Löding. Constructing deterministic ω-automata from examples by an extension of the RPNI algorithm. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science, MFCS 2021, August 23-27, 2021, Tallinn, Estonia, volume 202 of LIPIcs, pages 20:1-20:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.MFCS.2021.20.
León Bohn and Christof Löding. Passive learning of deterministic büchi automata by combinations of dfas. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 114:1-114:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ICALP.2022.114.
Mikolaj Bojanczyk. Orbit-finite sets and their algorithms (invited talk). In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 1:1-1:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.1.
Mikolaj Bojanczyk, Bartek Klin, and Slawomir Lasota. Automata theory in nominal sets. Log. Methods Comput. Sci., 10(3), 2014. URL: https://doi.org/10.2168/LMCS-10(3:4)2014.
Mikolaj Bojanczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin, and Claire David. Two-variable logic on words with data. In 21th IEEE Symposium on Logic in Computer Science (LICS 2006), 12-15 August 2006, Seattle, WA, USA, Proceedings, pages 7-16. IEEE Computer Society, 2006. URL: https://doi.org/10.1109/LICS.2006.51.
Benedikt Bollig. An automaton over data words that captures EMSO logic. In Joost-Pieter Katoen and Barbara König, editors, CONCUR 2011 - Concurrency Theory - 22nd International Conference, CONCUR 2011, Aachen, Germany, September 6-9, 2011. Proceedings, volume 6901 of Lecture Notes in Computer Science, pages 171-186. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-23217-6_12.
Benedikt Bollig, Aiswarya Cyriac, Paul Gastin, and K. Narayan Kumar. Model checking languages of data words. In Lars Birkedal, editor, Foundations of Software Science and Computational Structures - 15th International Conference, FOSSACS 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings, volume 7213 of Lecture Notes in Computer Science, pages 391-405. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-28729-9_26.
Julien Carme, Rémi Gilleron, Aurélien Lemay, and Joachim Niehren. Interactive learning of node selecting tree transducer. Mach. Learn., 66(1):33-67, 2007. URL: https://doi.org/10.1007/s10994-006-9613-8.
Colin de la Higuera. Characteristic sets for polynomial grammatical inference. Mach. Learn., 27(2):125-138, 1997. URL: https://doi.org/10.1023/A:1007353007695.
Colin de la Higuera, José Oncina, and Enrique Vidal. Identification of DFA: data-dependent vs data-independent algorithms. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Inference: Learning Syntax from Sentences, 3rd International Colloquium, ICGI-96, Montpellier, France, September 25-27, 1996, Proceedings, volume 1147 of Lecture Notes in Computer Science, pages 313-325. Springer, 1996. URL: https://doi.org/10.1007/BFb0033365.
Pierre Dupont. Incremental regular inference. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Interference: Learning Syntax from Sentences, pages 222-237, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/BFb0033357.
Léo Exibard, Emmanuel Filiot, and Ayrat Khalimov. Church synthesis on register automata over linearly ordered data domains. Formal Methods Syst. Des., 61(2):290-337, 2022. URL: https://doi.org/10.1007/s10703-023-00435-w.
Léo Exibard, Emmanuel Filiot, and Pierre-Alain Reynier. Synthesis of data word transducers. Log. Methods Comput. Sci., 17(1), 2021. URL: https://lmcs.episciences.org/7279.
Dana Fisman, Hadar Frenkel, and Sandra Zilles. Inferring symbolic automata. Log. Methods Comput. Sci., 19(2), 2023. URL: https://doi.org/10.46298/lmcs-19(2:5)2023.
Pedro García and Jose Oncina. Inference of recognizable tree sets. Tech. rep., Departamento de Sistemas Informáticos y Computación, Universidad de Alicante. DSIC-II/47/93, 1993.
E. Mark Gold. Language identification in the limit. Inf. Control., 10(5):447-474, 1967. URL: https://doi.org/10.1016/S0019-9958(67)91165-5.
Radu Grigore and Nikos Tzevelekos. History-register automata. Log. Methods Comput. Sci., 12(1), 2016. URL: https://doi.org/10.2168/LMCS-12(1:7)2016.
Falk Howar, Bernhard Steffen, Bengt Jonsson, and Sofia Cassel. Inferring canonical register automata. In Viktor Kuncak and Andrey Rybalchenko, editors, Verification, Model Checking, and Abstract Interpretation - 13th International Conference, VMCAI 2012, Philadelphia, PA, USA, January 22-24, 2012. Proceedings, volume 7148 of Lecture Notes in Computer Science, pages 251-266. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-27940-9_17.
Malte Isberner, Falk Howar, and Bernhard Steffen. Learning register automata: from languages to program structures. Mach. Learn., 96(1-2):65-98, 2014. URL: https://doi.org/10.1007/s10994-013-5419-7.
Michael Kaminski and Nissim Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329-363, 1994. URL: https://doi.org/10.1016/0304-3975(94)90242-9.
Ayrat Khalimov and Orna Kupferman. Register-bounded synthesis. In Wan J. Fokkink and Rob van Glabbeek, editors, 30th International Conference on Concurrency Theory, CONCUR 2019, August 27-30, 2019, Amsterdam, the Netherlands, volume 140 of LIPIcs, pages 25:1-25:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CONCUR.2019.25.
Martin Leucker. Learning meets verification. In Frank S. de Boer, Marcello M. Bonsangue, Susanne Graf, and Willem P. de Roever, editors, Formal Methods for Components and Objects, 5th International Symposium, FMCO 2006, Amsterdam, The Netherlands, November 7-10, 2006, Revised Lectures, volume 4709 of Lecture Notes in Computer Science, pages 127-151. Springer, 2006. URL: https://doi.org/10.1007/978-3-540-74792-5_6.
Damián López and Pedro García. On the inference of finite state automata from positive and negative data. Topics in Grammatical Inference, pages 73-112, 2016.
Amaldev Manuel, Anca Muscholl, and Gabriele Puppis. Walking on data words. Theory Comput. Syst., 59(2):180-208, 2016. URL: https://doi.org/10.1007/s00224-014-9603-3.
Joshua Moerman, Matteo Sammartino, Alexandra Silva, Bartek Klin, and Michal Szynwelski. Learning nominal automata. In Giuseppe Castagna and Andrew D. Gordon, editors, Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, pages 613-625. ACM, 2017. URL: https://doi.org/10.1145/3009837.3009879.
Andrzej S. Murawski, Steven J. Ramsay, and Nikos Tzevelekos. Bisimilarity in fresh-register automata. In Proceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), LICS '15, pages 156-167, USA, 2015. IEEE Computer Society. URL: https://doi.org/10.1109/LICS.2015.24.
Jose Oncina and Pedro García. Inferring regular languages in polynomial update time. World Scientific, January 1992. URL: https://doi.org/10.1142/9789812797902_0004.
Nikos Tzevelekos. Fresh-register automata. In Thomas Ball and Mooly Sagiv, editors, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 295-306. ACM, 2011. URL: https://doi.org/10.1145/1926385.1926420.

Passive Learning of Regular Data Languages in Polynomial Time and Data

Authors Mrudula Balachander , Emmanuel Filiot , Raffaella Gentilini

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Passive Learning of Regular Data Languages in Polynomial Time and Data

Authors Mrudula Balachander , Emmanuel Filiot , Raffaella Gentilini

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message