Abstract
Synthesis from examples enables nonexpert users to generate programs by specifying examples of their behavior. A domainspecific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential
topdown treetostring transducers with a single state (1STS).
The first problem we consider is learning a treetostring
transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NPcomplete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each
subtree of a tree in the input/output example set is also
part of the input/output examples.
Because coming up with relevant input/output examples may be
difficult for the user while creating hard constraint problems
for the synthesizer, we also study a more automated
active learning scenario in which the algorithm chooses the
inputs for which the user provides the outputs. Our
algorithm asks a worstcase linear number of queries as a
function of the size of the algebraic data type definition
to determine a unique transducer.
To construct our algorithms we present two new results on
formal languages.
First, we define a class of word equations, called
sequential word equations, for which we prove that
satisfiability can be solved in deterministic polynomial
time. This is in contrast to the general word equations for
which the best known complexity upper bound is in linear space.
Second, we close a longstanding open problem about the
asymptotic size of test sets for contextfree languages. A
test set of a language of words L is a subset T of L
such that any two word homomorphisms equivalent on T are
also equivalent on L. We prove that it is possible to
build test sets of cubic size for contextfree languages,
matching for the first time the lower bound found 20 years
ago.
BibTeX  Entry
@InProceedings{mayer_et_al:LIPIcs:2017:7257,
author = {Mika{\"e}l Mayer and Jad Hamza and Viktor Kuncak},
title = {{Proactive Synthesis of Recursive TreetoString Functions from Examples}},
booktitle = {31st European Conference on ObjectOriented Programming (ECOOP 2017)},
pages = {19:119:30},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770354},
ISSN = {18688969},
year = {2017},
volume = {74},
editor = {Peter M{\"u}ller},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2017/7257},
URN = {urn:nbn:de:0030drops72575},
doi = {10.4230/LIPIcs.ECOOP.2017.19},
annote = {Keywords: programming by example, active learning, program synthesis}
}
Keywords: 

programming by example, active learning, program synthesis 
Seminar: 

31st European Conference on ObjectOriented Programming (ECOOP 2017) 
Issue Date: 

2017 
Date of publication: 

13.06.2017 