Automata Learning with an Incomplete Teacher

Authors Mark Moeller , Thomas Wiener, Alaia Solko-Breslin , Caleb Koch, Nate Foster , Alexandra Silva

Thumbnail PDF


  • Filesize: 1.23 MB
  • 30 pages

Document Identifiers

Author Details

Mark Moeller
  • Cornell University, Ithaca, NY, USA
Thomas Wiener
  • Cornell University, Ithaca, NY, USA
Alaia Solko-Breslin
  • University of Pennsylvania, Philadelphia, PA, USA
Caleb Koch
  • Stanford University, CA, USA
Nate Foster
  • Cornell University, Ithaca, NY, USA
Alexandra Silva
  • Cornell University, Ithaca, NY, USA


We thank Marijn Heule, Martin Leucker, and Arlindo Oliveira for their efforts in providing us access to their code and benchmarks. We also thank Akshat Singh and Sheetal Athrey, with whom this project began as an undergraduate research project.

Cite AsGet BibTex

Mark Moeller, Thomas Wiener, Alaia Solko-Breslin, Caleb Koch, Nate Foster, and Alexandra Silva. Automata Learning with an Incomplete Teacher. In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 21:1-21:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


The preceding decade has seen significant interest in use of active learning to build models of programs and protocols. But existing algorithms assume the existence of an idealized oracle - a so-called Minimally Adequate Teacher (MAT) - that cannot be fully realized in practice and so is usually approximated with testing. This work proposes a new framework for active learning based on an incomplete teacher. This new formulation, called iMAT, neatly handles scenarios in which the teacher has access to only a finite number of tests or otherwise has gaps in its knowledge. We adapt Angluin’s L^⋆ algorithm for learning finite automata to incomplete teachers and we build a prototype implementation in OCaml that uses an SMT solver to help fill in information not supplied by the teacher. We demonstrate the behavior of our iMAT prototype on a variety of learning problems from a standard benchmark suite.

Subject Classification

ACM Subject Classification
  • Theory of computation → Active learning
  • Finite Automata
  • Active Learning
  • SMT Solvers


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Dana Angluin. A note on the number of queries needed to identify regular languages. Inf. Control., 51:76-87, 1981. Google Scholar
  2. Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87-106, 1987. Google Scholar
  3. Dana Angluin. Negative results for equivalence queries. Mach. Learn., 5(2):121-150, July 1990. URL:
  4. Borja Balle and Mehryar Mohri. Learning weighted automata. In Andreas Maletti, editor, Algebraic Informatics, pages 1-21, Cham, 2015. Springer International Publishing. Google Scholar
  5. Francesco Bergadano and Stefano Varricchio. Learning behaviors of automata from multiplicity and equivalence queries. SIAM J. Comput., 25(6):1268-1280, December 1996. URL:
  6. A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput., 21(6):592-597, June 1972. URL:
  7. Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, pages 487-502, New York, NY, USA, 2020. Association for Computing Machinery. URL:
  8. Yu-Fang Chen, Azadeh Farzan, Edmund M. Clarke, Yih-Kuen Tsay, and Bow-Yaw Wang. Learning minimal separating dfa’s for compositional verification. In Stefan Kowalewski and Anna Philippou, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 31-45, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. Google Scholar
  9. Orlando Cicchello and Stefan C. Kremer. Beyond edsm. In Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications, ICGI '02, pages 37-48, Berlin, Heidelberg, 2002. Springer-Verlag. Google Scholar
  10. Colin de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, 2010. URL:
  11. Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: an efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337-340. Springer, 2008. URL:
  12. Samuel Drews and Loris D'Antoni. Learning symbolic automata. In Axel Legay and Tiziana Margaria, editors, Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part I, volume 10205 of Lecture Notes in Computer Science, pages 173-189, 2017. URL:
  13. E. Mark Gold. Language identification in the limit. Inf. Control., 10:447-474, 1967. Google Scholar
  14. E. Mark Gold. System identification via state characterization. Automatica, 8(5):621-636, September 1972. URL:
  15. E Mark Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302-320, 1978. URL:
  16. O. Grinchtein, M. Leucker, and N. Piterman. Inferring network invariants automatically. In 3rd International Joint Conference on Automated Reasoning, volume 4130 of Lecture Notes in Computer Science, pages 483-497. Springer-Verlag, 2006. Google Scholar
  17. Marijn J. H. Heule and Sicco Verwer. Exact dfa identification using sat solvers. In José M. Sempere and Pedro García, editors, Grammatical Inference: Theoretical Results and Applications, pages 66-79, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. Google Scholar
  18. Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning. In Borzoo Bonakdarpour and Scott A. Smolka, editors, Runtime Verification, volume 8734, pages 307-322. Springer International Publishing, Cham, 2014. Series Title: Lecture Notes in Computer Science. URL:
  19. Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, USA, 1994. Google Scholar
  20. Kevin J. Lang. Faster algorithms for finding minimal consistent dfas. Technical report, NEC Research Institute, 1999. Google Scholar
  21. Kevin J. Lang, Barak A. Pearlmutter, and Rodney A. Price. Results of the abbadingo one dfa learning competition and a new evidence-driven state merging algorithm. In Proceedings of the 4th International Colloquium on Grammatical Inference, ICGI '98, pages 1-12, Berlin, Heidelberg, 1998. Springer-Verlag. Google Scholar
  22. Vu Le and Sumit Gulwani. Flashextract: A framework for data extraction by examples. SIGPLAN Not., 49(6):542-553, June 2014. URL:
  23. Mina Lee, Sunbeom So, and Hakjoo Oh. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, pages 70-80, New York, NY, USA, 2016. Association for Computing Machinery. URL:
  24. M. Leucker and Daniel Neider. Learning minimal deterministic automata from inexperienced teachers. In Leveraging Applications of Formal Methods, 2012. Google Scholar
  25. Yeting Li, Shuaimin Li, Zhiwu Xu, Jialun Cao, Zixuan Chen, Yun Hu, Haiming Chen, and Shing-Chi Cheung. Transregex: Multi-modal regular expression synthesis by generate-and-repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1210-1222. IEEE, 2021. Google Scholar
  26. Mark Liffiton and Karem Sakallah. Algorithms for computing minimal unsatisfiable subsets of constraints. J. Autom. Reasoning, 40:1-33, January 2008. URL:
  27. Oded Maler and Amir Pnueli. On the learnability of infinitary regular sets. In COLT 1991, 1991. Google Scholar
  28. Joshua Moerman, Matteo Sammartino, Alexandra Silva, Bartek Klin, and Michał Szynwelski. Learning nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), POPL '17, pages 613-625, 2017. Google Scholar
  29. Arlindo Oliveira and J.P.M. Silva. Efficient algorithms for the inference of minimum size dfas. Machine Learning, 44, July 2001. Google Scholar
  30. Jose Oncina and Pedro García. Inferring regular languages in polynomial update time. World Scientific, January 1992. URL:
  31. J.M. Pena and A.L. Oliveira. A new algorithm for exact reduction of incompletely specified finite state machines. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(11):1619-1632, 1999. URL:
  32. Leonard Pitt and Manfred K. Warmuth. The minimum consistent dfa problem cannot be approximated within any polynomial. J. ACM, 40(1):95-142, January 1993. URL:
  33. Frits Vaandrager. Model learning. Commun. ACM, 60(2):86-95, January 2017. URL:
  34. Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A New Approach for Active Automata Learning Based on Apartness. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, pages 223-243. Springer, 2022. URL:
  35. Pierre Wolper and Vinciane Lovinfosse. Verifying properties of large sets of processes with network invariants. In International Workshop on Automatic Verification Methods for Finite State Systems, pages 68-80, 1990. Google Scholar
  36. Tianyi Zhang, London Lowmanstone, Xinyu Wang, and Elena L Glassman. Interactive program synthesis by augmented examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pages 627-648, 2020. Google Scholar