Proactive Synthesis of Recursive Tree-to-String Functions from Examples

Mayer, Mikaël; Hamza, Jad; Kuncak, Viktor

doi:10.4230/LIPIcs.ECOOP.2017.19

Abstract

Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data.  We formalize our problem as learning deterministic sequential
top-down tree-to-string transducers with a single state (1STS).

The first problem we consider is learning a tree-to-string
transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each
subtree of a tree in the input/output example set is also
part of the input/output examples.

Because coming up with relevant input/output examples may be
difficult for the user while creating hard constraint problems
for the synthesizer, we also study a more automated
active learning scenario in which the algorithm chooses the
inputs for which the user provides the outputs. Our
algorithm asks a worst-case linear number of queries as a
function of the size of the algebraic data type definition
to determine a unique transducer.

To construct our algorithms we present two new results on
formal languages.

First, we define a class of word equations, called
sequential word equations, for which we prove that
satisfiability can be solved in deterministic polynomial
time. This is in contrast to the general word equations for
which the best known complexity upper bound is in linear space.

Second, we close a long-standing open problem about the
asymptotic size of test sets for context-free languages. A
test set of a language of words L is a subset T of L
such that any two word homomorphisms equivalent on T are
also equivalent on L. We prove that it is possible to
build test sets of cubic size for context-free languages,
matching for the first time the lower bound found 20 years
ago.

Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In International Conference on Computer Aided Verification, 2013.
Jürgen Albert, Karel Culik, and Juhani Karhumäki. Test sets for context free languages and algebraic systems of equations over a free monoid. Information and Control, 52(2):172-186, 1982.
Michael H Albert and J Lawrence. A proof of Ehrenfeucht’s conjecture. Theoretical Computer Science, 41:121-123, 1985.
Rajeev Alur and Loris D’Antoni. Streaming tree transducers. In Automata, Languages, and Programming, pages 42-53. Springer, 2012.
Rajeev Alur and Pavol Černý. Expressiveness of streaming string transducers. In Kamal Lodaya and Meena Mahajan, editors, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2010, December 15-18, 2010, Chennai, India, volume 8 of LIPIcs, pages 1-12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2010.
Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, pages 87-106, 1987.
Patrick Bahr and Laurence E. Day. Programming macro tree transducers. In Proceedings of the 9th ACM SIGPLAN workshop on Generic programming, pages 61-72. ACM, 2013.
Jérôme Besombes and Jean-Yves Marion. Learning tree languages from positive examples and membership queries. In Shai Ben-David, John Case, and Akira Maruoka, editors, Algorithmic Learning Theory, 15th International Conference, ALT 2004, Padova, Italy, October 2-5, 2004, Proceedings, volume 3244 of Lecture Notes in Computer Science, pages 440-453. Springer, 2004. URL: http://dx.doi.org/10.1007/978-3-540-30215-5_33.
Adrien Boiret. Normal Form on Linear Tree-to-word Transducers. In 10th International Conference on Language and Automata Theory and Applications, 2016.
Adrien Boiret and Raphaela Palenta. Deciding equivalence of linear tree-to-word transducers in polynomial time. CoRR, abs/1606.03758, 2016.
Joost Engelfriet and Sebastian Maneth. Output string languages of compositions of deterministic macro tree transducers. Journal of Computer and System Sciences, 64(2):350-395, 2002.
John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing Data Structure Transformations from Input-output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, pages 229-239, New York, NY, USA, 2015. ACM.
Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. Example-directed synthesis: a type-theoretic interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, 2016.
Jonathan Graehl and Kevin Knight. Training tree transducers. Technical report, DTIC Document, 2004.
Victor Sergeevich Guba. Equivalence of infinite systems of equations in free groups and semigroups to finite subsystems. Mathematical Notes, 40(3):688-690, 1986.
Sumit Gulwani. Synthesis from Examples. In WAMBSE Special Issue, Infosys Labs Briefings, volume 10(2), 2012.
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. Complete completion using types and weights. In Hans-Juergen Boehm and Cormac Flanagan, editors, ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013, pages 27-38. ACM, 2013. URL: http://dx.doi.org/10.1145/2462156.2462192.
Tihomir Gvero, Viktor Kuncak, and Ruzica Piskac. Interactive Synthesis of Code Snippets. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV'11, pages 418-423, Berlin, Heidelberg, 2011. Springer-Verlag.
Helmut Seidl, Sebastian Maneth, and Gregor Kemper. Equivalence of deterministic top-down tree-to-string transducers is decidable. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 943-962. IEEE, 2015.
Thibaud Hottelier, Ras Bodik, and Kimiko Ryokai. Programming by manipulation for layout. In Proceedings of the 27th annual ACM symposium on User interface software and technology, 2014.
Patrik Jansson. Functional Polytypic Programming. PhD thesis, Institutionen för datavetenska, Göteborg : Chalmers University of Technology, 2000.
Artur Jeż. Word equations in linear space. arXiv preprint arXiv:1702.00736, 2017.
Jean-Pierre Jouannaud, Claude Kirchner, Hélène Kirchner, and Aristide Megrelis. Programming with equalities, subsorts, overloading, and parametrization in OBJ. The Journal of Logic Programming, 12(3):257-279, 1992.
Dileep Kini and Sumit Gulwani. Flashnormalize: Programming by examples for text normalization. In Qiang Yang and Michael Wooldridge, editors, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 776-783. AAAI Press, 2015. URL: http://ijcai.org/Abstract/15/115.
Manos Koukoutos, Etienne Kneuss, and Viktor Kuncak. An update on deductive synthesis and repair in the leon tool. In Ruzica Piskac and Rayna Dimitrova, editors, Proceedings Fifth Workshop on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016., volume 229 of EPTCS, pages 100-111, 2016. URL: http://dx.doi.org/10.4204/EPTCS.229.9.
Grégoire Laurence. Normalisation et Apprentissage de Transductions d'Arbres en Mots. PhD thesis, Université des Sciences et Technologie de Lille-Lille I, 2014.
Grégoire Laurence, Aurélien Lemay, Joachim Niehren, Sławek Staworko, and Marc Tommasi. Learning sequential tree-to-word transducers. In International Conference on Language and Automata Theory and Applications, pages 490-502. Springer, 2014.
Vu Le and Sumit Gulwani. FlashExtract: A framework for data extraction by examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 55. ACM, 2014.
Aurélien Lemay, Sebastian Maneth, and Joachim Niehren. A learning algorithm for top-down XML transformations. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 285-296. ACM, 2010. URL: http://dx.doi.org/10.1145/1807085.1807122.
M Lothaire. Combinatorics on words, volume 17. Cambridge University Press, 1997.
Sebastian Maneth and Helmut Seidl. Deciding equivalence of top-down XML transformations in polynomial time. In PLAN-X, pages 73-79, 2007.
Mikaël Mayer and Jad Hamza. Optimal test sets for context-free languages. CoRR, abs/1611.06703, 2016. URL: http://arxiv.org/abs/1611.06703.
Mikaël Mayer, Jad Hamza, and Viktor Kuncak. Polynomial-time proactive synthesis of tree-to-string functions from examples. CoRR, abs/1701.04288, 2017. URL: http://arxiv.org/abs/1701.04288.
Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Alex Polozov, Rishabh Singh, Ben Zorn, and Sumit Gulwani. User interaction models for disambiguation in programming by example. In 28th ACM User Interface Software and Technology Symposium, 2015.
Heather Miller, Philipp Haller, Eugene Burmako, and Martin Odersky. Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, part of SPLASH 2013, Indianapolis, IN, USA, October 26-31, 2013, pages 183-202, 2013.
Terence Parr and Jurgen Vinju. Towards a universal code formatter through machine learning. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering, pages 137-151. ACM, 2016.
Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. Test-driven synthesis. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 43. ACM, 2014.
Wojciech Plandowski. Testing equivalence of morphisms on context-free languages. In European Symposium on Algorithms, pages 460-470. Springer, 1994.
Wojciech Plandowski. The complexity of the morphism equivalence problem for context-free languages. PhD thesis, Department of Mathematics, Informatics, and Mechanics, Warsaw University, 1995.
Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. In Foundations of Computer Science, 1999. 40th Annual Symposium on, pages 495-500. IEEE, 1999.
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, 2016.
Oleksandr Polozov and Sumit Gulwani. Flashmeta: a framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015, pages 107-126, 2015. URL: http://dx.doi.org/10.1145/2814270.2814310.
Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In Proc. of the 24th CAV conference, pages 634-651, Berlin, Heidelberg, 2012. Springer-Verlag.
Sławomir Staworko, Grégoire Laurence, Aurélien Lemay, and Joachim Niehren. Equivalence of deterministic nested word to word transducers. In International Symposium on Fundamentals of Computation Theory, pages 310-322. Springer, 2009.
Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C. Miller, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. A colorful approach to text processing by example. In Shahram Izadi, Aaron J. Quigley, Ivan Poupyrev, and Takeo Igarashi, editors, The 26th Annual ACM Symposium on User Interface Software and Technology, UIST'13, St. Andrews, United Kingdom, October 8-11, 2013, pages 495-504. ACM, 2013. URL: http://dx.doi.org/10.1145/2501988.2502040.
Chicheng Zhang and Kamalika Chaudhuri. Active learning from weak and strong labelers. In Advances in Neural Information Processing Systems, 2015.

Proactive Synthesis of Recursive Tree-to-String Functions from Examples

Authors Mikaël Mayer, Jad Hamza, Viktor Kuncak

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Proactive Synthesis of Recursive Tree-to-String Functions from Examples

Authors Mikaël Mayer, Jad Hamza, Viktor Kuncak

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message