Proactive Synthesis of Recursive Tree-to-String Functions from Examples

Authors Mikaël Mayer, Jad Hamza, Viktor Kuncak



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2017.19.pdf
  • Filesize: 0.73 MB
  • 30 pages

Document Identifiers

Author Details

Mikaël Mayer
Jad Hamza
Viktor Kuncak

Cite AsGet BibTex

Mikaël Mayer, Jad Hamza, and Viktor Kuncak. Proactive Synthesis of Recursive Tree-to-String Functions from Examples. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 74, pp. 19:1-19:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ECOOP.2017.19

Abstract

Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential top-down tree-to-string transducers with a single state (1STS). The first problem we consider is learning a tree-to-string transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each subtree of a tree in the input/output example set is also part of the input/output examples. Because coming up with relevant input/output examples may be difficult for the user while creating hard constraint problems for the synthesizer, we also study a more automated active learning scenario in which the algorithm chooses the inputs for which the user provides the outputs. Our algorithm asks a worst-case linear number of queries as a function of the size of the algebraic data type definition to determine a unique transducer. To construct our algorithms we present two new results on formal languages. First, we define a class of word equations, called sequential word equations, for which we prove that satisfiability can be solved in deterministic polynomial time. This is in contrast to the general word equations for which the best known complexity upper bound is in linear space. Second, we close a long-standing open problem about the asymptotic size of test sets for context-free languages. A test set of a language of words L is a subset T of L such that any two word homomorphisms equivalent on T are also equivalent on L. We prove that it is possible to build test sets of cubic size for context-free languages, matching for the first time the lower bound found 20 years ago.
Keywords
  • programming by example
  • active learning
  • program synthesis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In International Conference on Computer Aided Verification, 2013. Google Scholar
  2. Jürgen Albert, Karel Culik, and Juhani Karhumäki. Test sets for context free languages and algebraic systems of equations over a free monoid. Information and Control, 52(2):172-186, 1982. Google Scholar
  3. Michael H Albert and J Lawrence. A proof of Ehrenfeucht’s conjecture. Theoretical Computer Science, 41:121-123, 1985. Google Scholar
  4. Rajeev Alur and Loris D’Antoni. Streaming tree transducers. In Automata, Languages, and Programming, pages 42-53. Springer, 2012. Google Scholar
  5. Rajeev Alur and Pavol Černý. Expressiveness of streaming string transducers. In Kamal Lodaya and Meena Mahajan, editors, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2010, December 15-18, 2010, Chennai, India, volume 8 of LIPIcs, pages 1-12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2010. Google Scholar
  6. Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, pages 87-106, 1987. Google Scholar
  7. Patrick Bahr and Laurence E. Day. Programming macro tree transducers. In Proceedings of the 9th ACM SIGPLAN workshop on Generic programming, pages 61-72. ACM, 2013. Google Scholar
  8. Jérôme Besombes and Jean-Yves Marion. Learning tree languages from positive examples and membership queries. In Shai Ben-David, John Case, and Akira Maruoka, editors, Algorithmic Learning Theory, 15th International Conference, ALT 2004, Padova, Italy, October 2-5, 2004, Proceedings, volume 3244 of Lecture Notes in Computer Science, pages 440-453. Springer, 2004. URL: http://dx.doi.org/10.1007/978-3-540-30215-5_33.
  9. Adrien Boiret. Normal Form on Linear Tree-to-word Transducers. In 10th International Conference on Language and Automata Theory and Applications, 2016. Google Scholar
  10. Adrien Boiret and Raphaela Palenta. Deciding equivalence of linear tree-to-word transducers in polynomial time. CoRR, abs/1606.03758, 2016. Google Scholar
  11. Joost Engelfriet and Sebastian Maneth. Output string languages of compositions of deterministic macro tree transducers. Journal of Computer and System Sciences, 64(2):350-395, 2002. Google Scholar
  12. John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing Data Structure Transformations from Input-output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, pages 229-239, New York, NY, USA, 2015. ACM. Google Scholar
  13. Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. Example-directed synthesis: a type-theoretic interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, 2016. Google Scholar
  14. Jonathan Graehl and Kevin Knight. Training tree transducers. Technical report, DTIC Document, 2004. Google Scholar
  15. Victor Sergeevich Guba. Equivalence of infinite systems of equations in free groups and semigroups to finite subsystems. Mathematical Notes, 40(3):688-690, 1986. Google Scholar
  16. Sumit Gulwani. Synthesis from Examples. In WAMBSE Special Issue, Infosys Labs Briefings, volume 10(2), 2012. Google Scholar
  17. Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. Complete completion using types and weights. In Hans-Juergen Boehm and Cormac Flanagan, editors, ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013, pages 27-38. ACM, 2013. URL: http://dx.doi.org/10.1145/2462156.2462192.
  18. Tihomir Gvero, Viktor Kuncak, and Ruzica Piskac. Interactive Synthesis of Code Snippets. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV'11, pages 418-423, Berlin, Heidelberg, 2011. Springer-Verlag. Google Scholar
  19. Helmut Seidl, Sebastian Maneth, and Gregor Kemper. Equivalence of deterministic top-down tree-to-string transducers is decidable. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 943-962. IEEE, 2015. Google Scholar
  20. Thibaud Hottelier, Ras Bodik, and Kimiko Ryokai. Programming by manipulation for layout. In Proceedings of the 27th annual ACM symposium on User interface software and technology, 2014. Google Scholar
  21. Patrik Jansson. Functional Polytypic Programming. PhD thesis, Institutionen för datavetenska, Göteborg : Chalmers University of Technology, 2000. Google Scholar
  22. Artur Jeż. Word equations in linear space. arXiv preprint arXiv:1702.00736, 2017. Google Scholar
  23. Jean-Pierre Jouannaud, Claude Kirchner, Hélène Kirchner, and Aristide Megrelis. Programming with equalities, subsorts, overloading, and parametrization in OBJ. The Journal of Logic Programming, 12(3):257-279, 1992. Google Scholar
  24. Dileep Kini and Sumit Gulwani. Flashnormalize: Programming by examples for text normalization. In Qiang Yang and Michael Wooldridge, editors, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 776-783. AAAI Press, 2015. URL: http://ijcai.org/Abstract/15/115.
  25. Manos Koukoutos, Etienne Kneuss, and Viktor Kuncak. An update on deductive synthesis and repair in the leon tool. In Ruzica Piskac and Rayna Dimitrova, editors, Proceedings Fifth Workshop on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016., volume 229 of EPTCS, pages 100-111, 2016. URL: http://dx.doi.org/10.4204/EPTCS.229.9.
  26. Grégoire Laurence. Normalisation et Apprentissage de Transductions d'Arbres en Mots. PhD thesis, Université des Sciences et Technologie de Lille-Lille I, 2014. Google Scholar
  27. Grégoire Laurence, Aurélien Lemay, Joachim Niehren, Sławek Staworko, and Marc Tommasi. Learning sequential tree-to-word transducers. In International Conference on Language and Automata Theory and Applications, pages 490-502. Springer, 2014. Google Scholar
  28. Vu Le and Sumit Gulwani. FlashExtract: A framework for data extraction by examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 55. ACM, 2014. Google Scholar
  29. Aurélien Lemay, Sebastian Maneth, and Joachim Niehren. A learning algorithm for top-down XML transformations. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 285-296. ACM, 2010. URL: http://dx.doi.org/10.1145/1807085.1807122.
  30. M Lothaire. Combinatorics on words, volume 17. Cambridge University Press, 1997. Google Scholar
  31. Sebastian Maneth and Helmut Seidl. Deciding equivalence of top-down XML transformations in polynomial time. In PLAN-X, pages 73-79, 2007. Google Scholar
  32. Mikaël Mayer and Jad Hamza. Optimal test sets for context-free languages. CoRR, abs/1611.06703, 2016. URL: http://arxiv.org/abs/1611.06703.
  33. Mikaël Mayer, Jad Hamza, and Viktor Kuncak. Polynomial-time proactive synthesis of tree-to-string functions from examples. CoRR, abs/1701.04288, 2017. URL: http://arxiv.org/abs/1701.04288.
  34. Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Alex Polozov, Rishabh Singh, Ben Zorn, and Sumit Gulwani. User interaction models for disambiguation in programming by example. In 28th ACM User Interface Software and Technology Symposium, 2015. Google Scholar
  35. Heather Miller, Philipp Haller, Eugene Burmako, and Martin Odersky. Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, part of SPLASH 2013, Indianapolis, IN, USA, October 26-31, 2013, pages 183-202, 2013. Google Scholar
  36. Terence Parr and Jurgen Vinju. Towards a universal code formatter through machine learning. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering, pages 137-151. ACM, 2016. Google Scholar
  37. Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. Test-driven synthesis. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 43. ACM, 2014. Google Scholar
  38. Wojciech Plandowski. Testing equivalence of morphisms on context-free languages. In European Symposium on Algorithms, pages 460-470. Springer, 1994. Google Scholar
  39. Wojciech Plandowski. The complexity of the morphism equivalence problem for context-free languages. PhD thesis, Department of Mathematics, Informatics, and Mechanics, Warsaw University, 1995. Google Scholar
  40. Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pages 495-500. IEEE, 1999. Google Scholar
  41. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, 2016. Google Scholar
  42. Oleksandr Polozov and Sumit Gulwani. Flashmeta: a framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015, pages 107-126, 2015. URL: http://dx.doi.org/10.1145/2814270.2814310.
  43. Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In Proc. of the 24th CAV conference, pages 634-651, Berlin, Heidelberg, 2012. Springer-Verlag. Google Scholar
  44. Sławomir Staworko, Grégoire Laurence, Aurélien Lemay, and Joachim Niehren. Equivalence of deterministic nested word to word transducers. In International Symposium on Fundamentals of Computation Theory, pages 310-322. Springer, 2009. Google Scholar
  45. Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C. Miller, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. A colorful approach to text processing by example. In Shahram Izadi, Aaron J. Quigley, Ivan Poupyrev, and Takeo Igarashi, editors, The 26th Annual ACM Symposium on User Interface Software and Technology, UIST'13, St. Andrews, United Kingdom, October 8-11, 2013, pages 495-504. ACM, 2013. URL: http://dx.doi.org/10.1145/2501988.2502040.
  46. Chicheng Zhang and Kamalika Chaudhuri. Active learning from weak and strong labelers. In Advances in Neural Information Processing Systems, 2015. Google Scholar