Combining Generalization Algorithms
in Regular Collapse-Free Theories
Abstract
We look at the generalization problem modulo some equational theories. This problem is dual to the unification problem: given two input terms, we want to find a common term whose respective two instances are equivalent to the original terms modulo the theory.
There exist algorithms for finding generalizations over various equational theories. We focus on modular construction of equational generalization algorithms for the union of signature-disjoint theories. Specifically, we consider the class of regular and collapse-free theories, showing how to combine existing generalization algorithms to produce specific solutions in these cases.
Additionally, we identify a class of theories that admit a generalization algorithm based on the application of axioms to resolve the problem. To define this class, we rely on the notion of syntactic theories, a concept originally introduced to develop unification procedures similar to the one known for syntactic unification. We demonstrate that syntactic theories are also helpful in developing generalization procedures similar to those used for syntactic generalization.
Keywords and phrases:
Generalization, Anti-unification, Equational theories, CombinationFunding:
Mauricio Ayala-Rincón: Brazilian National Council for Scientific and Technological Development (CNPq), Grant Universal 409003/21-2 and RG 313290/21-0.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Equational logic and rewriting ; Computing methodologies Symbolic and algebraic manipulationEditors:
Maribel FernándezSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The problem of generalization of two terms and asks to find a term such that and are substitution instances of . The generalizations of interest are those that maximally retain similarities between the given terms while abstracting their differences uniformly. In other words, they should be the least general among generalizations of the given terms. The generalization problem is also called the problem of anti-unification since it can be seen as a dual to unification: while unification focuses on making two expressions more specific by finding their common instance via unifiers, anti-unification abstracts them to a more general form.
The first generalization algorithms were developed in the 1970s [38, 41], motivated by the usefulness of this technique for inductive reasoning. Nowadays, application areas of anti-unification are quite diverse, including, e.g., automated program repair [8, 19, 44], software code or specification synthesis [21, 40, 49], code similarity detection and change analysis [48, 9, 33], learning-related tasks [18, 36, 39], natural language processing [3, 24, 45], indexing/compression [12, 37, 25], just to name a few. Generalization computation techniques have been investigated for various structures (see, e.g., the recent survey [16] for an overview), among them, for first-order equational theories, which is also the subject of this paper.
In equational anti-unification, syntactic equality is replaced by equality modulo the given equational theory. As a consequence, the given terms do not necessarily have a single least general generalization. Instead, problems are characterized by their minimal complete set of generalizations, which might be a singleton, a finite set, an infinite set, or might not exist at all due to a clash between the minimality and completeness requirements. Several authors considered (first-order) equational anti-unification, studying the problem in associative, commutative, associative-commutative, unital, idempotent, absorptive theories, semirings, theories that lead to regular congruence classes, etc. [1, 2, 4, 5, 11, 13, 14, 15, 31, 47].
A more general question is related to anti-unification in a combined equational theory: Is it possible to derive a generalization algorithm for a union of equational theories from the existing generalization algorithms for the component theories? This is called the combination problem for generalization algorithms and is very important for applications. To the best of our knowledge, our work is the first attempt to address the modular construction of generalization algorithms.
On the other hand, the combination problem has been studied quite intensively for unification; see, e.g., [7, 20, 22, 26, 28, 30, 42, 43, 46]. These modularity results impose certain restrictions on the component theories to guarantee the existence and good properties of the combined algorithm, the main restriction being that the theories are signature-disjoint. Considering regular collapse-free theories is also a classical restriction. In the class of regular collapse-free theories, the left-hand side and the right-hand side of any equational axiom are non-variable terms with exactly the same variables. A combination method is known for unification algorithms in any union of disjoint regular collapse-free theories [46], and another one exists for matching algorithms in any union of disjoint regular theories [35] where regular theories may have collapse axioms between a non-variable left-hand side and a variable right-hand side. It is important to notice that when we relax the restriction on equational theories, we may need to impose a stronger restriction on the kind of algorithms available in the component theories. Hence, the combination method introduced in [7] for the unification problem works in any union of disjoint theories. Still, as a counterpart, it assumes the existence of general unification procedures capable of dealing with free function symbols.
Our contributions
We consider the combination problem for generalization algorithms in the union of equational theories and over the signatures and and a set of free constants , such that:
-
, i.e., the theories are signature-disjoint (and holds for , since consists of free constants);
-
both and are regular collapse-free;
-
for each , , there exists a complete generalization algorithm which
-
–
handles ground terms built over , and
-
–
for any given ground terms and returns a triple , where is a generalization of with respective solved substitutions , meaning that application of to does not lead to further (more specific) generalizations.
-
–
For such equational theories and , a combination algorithm for generalization problems in is designed, which relies on -generalization algorithms for , and its completeness is shown. This is our main result, which we further refine in two ways, as explained below.
-
1.
The class of finite theories (i.e., theories where every equivalence class is finite) satisfies the above-mentioned conditions, and we show that combined generalization problems in this class can be solved in a modular way. Furthermore, we provide a rule-based generalization algorithm for the subclass of finite syntactic theories. The notion of syntactic theories was originally introduced to develop unification procedures similar to the one known for syntactic unification (cf. [27, 29, 34]). We demonstrate that syntactic theories are also helpful in developing a rule-based generalization algorithm that extends syntactic generalization. This extension is obtained by introducing a new mutation rule where the resolvent axioms of the theory are applied to simplify the generalization problems.
-
2.
For theories that admit rule-based generalization algorithms working on configurations of a special form, we design a combination algorithm that uses the rules of the component algorithms (instead of using them as black-boxes), thus obtaining a combined rule-based algorithm with rules of the same form as the component algorithms. We call it the white-box combination method. This leads to a modular result: a combined algorithm obtained in such a way can be used as a component algorithm for further white-box combinations. This white-box combination method applies to any instance of the rule-based generalization algorithm introduced for finite syntactic theories.
Organization
After this introduction, Section 2 presents the required background. Then, Section 3 presents some results connecting equational generalization to free generalization, showing that equational generalization in finite theories is reducible to free generalization, and so in finite theories equational generalization is finitary. In Section 4, we discuss the combination of generalization for regular collapse-free theories. Then, Section 5 explores the class of finite syntactic theories by showing a rule-based generalization algorithm. Section 6 considers the problem of combining rule-based generalization algorithms. Finally, Section 7 concludes and briefly discusses possible future work.
2 Preliminaries
2.1 Terms and Substitutions
We consider a first-order alphabet consisting of a signature (set of fixed arity function symbols) and a set of variables . The set of terms over and is defined in the standard way: iff is defined by the grammar , where and is an -ary symbol with .
We denote arbitrary function symbols by , constants by , variables by , and terms by . The notions of term depth, term size, and a position in a term are defined in the standard way, see, e.g., [6]. By , we denote the subterm of at position , and by , a term obtained from by replacing the subterm at position with the term . For any position in a term (including the root position ), is the symbol at position . The set of all variables in is denoted by . A term is called linear if no variable occurs in it more than once, and ground if .
A sequence of terms may be written when is clear from the context.
Given any signature and any finite set of constants such that , the signature is denoted by .
A substitution is a mapping from to , which is the identity almost everywhere. We use the Greek letters to denote substitutions, except for the identity substitution, which is written as . We represent substitutions using the usual set notation, e.g., where is the domain of , denoted by . Application of a substitution to a term , denoted by , is defined as and . Substitution composition is defined as a composition of mappings. It is associative but not commutative, with playing the role of the unit element. We write for the composition of with .
2.2 Equational Theories
Let be a set of term pairs, i.e., elements of . An equational -theory generated by , denoted by , is the least congruence relation on that is closed under substitution application and contains . We call a presentation of , and the elements of are called the axioms of , written as . If , we say that is equal modulo to . When , every term is equal only to itself, and the theory is called the empty or free theory.
It is common to slightly abuse the terminology in the literature by calling both and an equational theory. For a given , we denote by the set of all function symbols occurring in .
A sequence of -equalities may be written when is clear from the context.
An axiom is regular if . An axiom is collapse-free if and are non-variable terms. An axiom is shallow if variables can only occur at a position at depth at most in both and . An equational theory is regular (resp., collapse-free/shallow) if all its axioms are regular (resp., collapse-free/shallow). An equational theory is finite if, for each term , there are only finitely many terms such that . An equational theory is subterm collapse-free if there are no terms such that and is a strict subterm of . A subterm collapse-free theory is necessarily collapse-free, and a finite theory is necessarily regular and subterm collapse-free.
A theory is syntactic if it has a finite resolvent presentation , defined as a finite set of axioms such that each equality has an equational proof with at most one equational step applied at the root position. Note that any axiom in can be applied in both directions: from the left to the right and from the right to the left. Any shallow theory is syntactic [17], and any collapse-free theory with finitary unification is syntactic [29].
2.3 Combination of Equational Theories
The rest of the paper assumes that is a regular collapse-free -theory for , where and are disjoint signatures. For the sake of brevity, the combined signature is abbreviated to , and the combined theory is also denoted by .
A -rooted term is a term whose root symbol is in . Given any -term where is -rooted, an alien subterm of is any subterm of which is not -rooted and such that any proper superterm of in is -rooted. The set of alien subterms of is denoted by . Given any term , its height of theory is formally defined as follows: . Given a finite set of ground -terms, we consider and a mapping such that is a set of fresh free constants () and for any , iff . Conversely, we define as a mapping such that for any , . The -abstraction of any term is denoted by and is inductively defined as follows:
-
if ,
-
if ,
-
if for .
Lemma 1 (Correspondence for Equality [46]).
Let and be two signature-disjoint regular collapse-free theories. For and any -rooted terms and , iff .
2.4 Generalization Problems
Given an equational theory , a term is more general than modulo , or is an -generalization of , if there exists a substitution such that . In such a case, we write and also say the term is less general than modulo or that is an -instance of . The relation is a quasi-order and generates the equivalence relation, denoted by . The strict part of is denoted by .
An equational generalization problem is formulated as follows:
- Given:
-
an equational theory and two terms and ;
- Find:
-
a term such that and ( is said to be an -generalization of and ).
Given an equational theory and two terms and , a set of terms is called a complete set of -generalizations of and if it satisfies the following properties:
-
1.
Soundness: every element of is an -generalization of and ,
-
2.
Completeness: for every -generalization of and there exists such that .
The set is called a minimal complete set of -generalizations of and , denoted if it, in addition, satisfies the following:
-
3.
Minimality: if there exist such that , then .
The existence and cardinality of define what is called the generalization type of an equational theory as follows: has the generalization type 0 (or is nullary) if there are two terms and such that does not exist. Otherwise, the generalization type of is
-
unitary, if is a singleton for all and (in this case, the sole element of is called a least general -generalization of and and is denoted by ),
-
finitary, if is finite for all and , and there exists at least one pair of terms for which this set is not a singleton,
-
infinitary, if there exist and for which is infinite.
In the rest of the paper, we consider generalization algorithms defined as terminating inference systems transforming anti-unification triples (AUTs, for short) written where is a variable, and and are the two ground terms to generalize. More precisely, we are focusing on rule-based systems working on configurations of the form , where and are sets of AUTs, called the active set and store of , respectively, and is a substitution, called the generalization component of . From now on, is said to be a PSS-configuration; it consists of a problem together with a store and a substitution. Given a set of AUTs , we associate two substitutions: is the left substitution of and is the right substitution of . A set of AUTs may be written using the disjoint union symbol to isolate a specific AUT from that set.
3 Equational Generalization vs Free Generalization
The following two theorems characterize equational generalizations with the help of free least general generalizations.
Theorem 2 (Free ’s versus -Generalizations).
Given an equational theory and two terms and , if a term is an -generalization of and , then there exist terms , , and such that , , is a free least general generalization of and , and .
Proof.
From being an -generalization of and we get the existence of two substitutions and such that and . Take and . Then we have and . Moreover, is a free generalization of and . Then there exists a free least general generalization of and , and we get .
Theorem 3 (Completeness and Finiteness through Free ’s).
Let be an equational theory, and be two terms, and be the set of terms . Then
-
1.
is a complete set of -generalizations of and .
-
2.
is not necessarily minimal.
-
3.
If is a finite theory, then is finite modulo variable renaming. Consequently, every finite theory has the -generalization type at most finitary.
-
4.
For any , if is finite, then (as well as ) is also finite modulo variable renaming.
Proof.
-
1.
Every element is an -generalization of and , because there exist substitutions and such that and . Thus, together with Theorem 2, we may conclude that for every -generalization of and there exists an -generalization of and such that and , which implies the completeness of .
-
2.
Let be the equational theory that asserts the commutativity of , and take and . Then , because and . Obviously, is more general than and, hence, is not minimal.
-
3.
If is a finite theory, then and are finite sets. Free least general generalizations are unique (modulo variable renaming). Hence, is finite modulo renaming. Since -matching is finitary in any finite theory , can always be minimized, and we get that -generalization type in finite theories is at most finitary.
-
4.
First, note that the depth of a free generalization of two terms never exceeds the minimum of their depths, and such a generalization does not contain any function symbol that does not appear in both of them. Therefore, for a fixed , the depth of is bounded by the depth of , and the set of function symbols that potentially may appear in is restricted to the set of function symbols of . Hence, only finitely many terms (over a fixed-arity alphabet like ours) can satisfy these conditions. Hence, whatever we take, the candidates for may come only from a fixed finite (modulo variable renaming) set that depends on only: the elements of this set are terms whose free instance is . Therefore, for the given , the set of all terms is finite. Since by assumption is finite, we get that is a finite union of finite sets, which implies that it is finite.
Example 4.
This example illustrates an infinite for that is non-finite subterm-collapse free (the simplest class of non-finite theories according to [10]). Let , , and . Then
In fact, this set contains an infinite chain
4 Combined Generalization in Regular Collapse-Free Theories
In this section, we consider regular collapse-free theories . When is regular, the -equivalence class of any ground term only contains ground terms. Thus, any solution of an -matching problem necessarily maps a variable occurring in the problem to a ground term. As a consequence, the following property holds when is assumed to be regular: given any generalization of a pair of ground terms , any substitutions such that and satisfy that and are ground for any variable . This assumption is useful in the following definition, where the considered substitutions are necessarily ground.
Definition 5 (Generalizations with Solved Instantiations).
Let be a regular -theory. Given any pair of ground -terms , an -generalization of with respective instantiations is a -term together with a pair of substitutions such that , and . Given any pair of ground -terms , a triple representing an -generalization of with solved instantiations (an -GSI triple of , for short) is any triple such that
-
is an -generalization of with respective instantiations ,
-
for any , there is no non-variable -generalization of ,
-
for any , and implies 111This third condition may be dropped when considering linear generalizations..
A set of -GSI triples of is said to be complete if is a complete set of -generalizations of . We say that admits an algorithm computing a complete set of -generalizations with solved instantiations (GSI algorithm, for short) if there exists a computable function returning, for each input pair of ground -terms , a complete set of -GSI triples of denoted by .
A GSI algorithm can be constructed via the repeated application of a generalization algorithm together with a matching algorithm. In the case of a finite theory, the generalization problem is finitary, and the matching problem as well. Consequently, this iteration can be implemented, and it is necessarily terminating in the case of a finite theory since the size of any generalization is necessarily bounded. Thus, we have that:
Lemma 6.
Any finite theory admits a GSI algorithm.
Remark 7.
Example 8.
Section 5 introduces another class of theories admitting a GSI algorithm. This class includes non-finite theories.
In general, for the theories we want to combine, we could imagine deriving GSI algorithms from existing rule-based generalization algorithms transforming PSS-configurations (see Section 6). Then, the resulting GSI algorithms can be combined directly as black-boxes, provided the theories are regular collapse-free.
Theorem 9 (Modular Property for GSI Algorithm).
The class of regular collapse-free theories admitting a GSI algorithm and with decidable equality is closed by disjoint union.
Proof.
The rule-based procedure given in Figure 1 can be shown terminating thanks to a complexity measure defined as follows: to each configuration , we associate a pair where is the multiset of heights of theory , and is the cardinality of . These pairs are ordered lexicographically, the first component being ordered by the multiset extension of the (Noetherian) ordering on positive integers, while the second component is ordered by the (Noetherian) ordering on positive integers. One can easily verify that -Gen and -Sol strictly decrease the first component of the pair, and -Mer does not increase the first component but strictly decreases the second one.
The normal forms with respect to the rule-based procedure are the configurations of the form upon which -Mer does not apply. Indeed, any configuration with is necessarily reducible by either -Gen or -Sol.
For any derivation starting with the input configuration and leading to a final configuration , we can associate a tuple where
-
,
-
.
Then, the set of all such tuples defines a as stated in Definition 5. This is a consequence of the following facts:
-
-Sol is sound and complete since and are collapse-free.
-Gen: Generalization in the -theory
where
-
,
-
,
-
,
-
.
-Sol: Solving in the combined theory
if and .
-Mer: Merging in the combined theory
if and .
We now show the completeness of the algorithm presented in Figure 1 by considering any generalization of the given terms over the combined theory and showing that we can construct from it an -generalization, where depends on the root symbol, that uses the same variable instantiations as the combined generalization. In essence, our combination algorithm builds on the completeness of the algorithms for the individual theories and is thus complete as well.
The following technical lemma and corollary are used in the proof of Theorem 9.
Lemma 10 (Correspondence for Generalization).
Let and be two signature-disjoint regular collapse-free theories over respectively the signatures and , and any -rooted terms for some , and any -generalization of with respective instantiations . Consider a bijective mapping where is a set of fresh variables and the following terms and substitutions:
-
is the -pure term obtained from by replacing each by .
-
and are two substitutions such that and defined as follows:
-
–
for any , and ,
-
–
for any , .
-
–
Then, is an -generalization of with respective instantiations , and for any , is an -generalization of with respective instantiations .
Proof.
By assumption, and . By Lemma 1, and . Then, by definition of , and , we have and . Consequently, and .
For each , by definition of and , we have that and . Applying on these identities, we get and . By definition of , for any term . Consequently, and .
Corollary 11.
Let and be two signature-disjoint regular collapse-free theories over respectively the signatures and , and any -rooted terms where . Then, there is no non-variable -generalization of iff there is no non-variable -generalization of .
5 Rule-Based Generalization in Finite Syntactic Theories
In this section, we explore a class of equational theories admitting a rule-based generalization algorithm similar to the one known for the empty theory. For the class of finite syntactic theories, we construct a rule-based algorithm fully parametrized by the axioms of a resolvent presentation of a finite syntactic theory. A similar rule-based algorithm can be found in [27].
Example 12.
Lemma 13.
Proof.
The rule-based procedure in Figure 4 terminates since is assumed to be finite. Indeed, the size of the generalizations associated with configurations is necessarily bounded, and so there are only finitely many applications of Mut. The rule Sol is sound and complete: when an AUT cannot be transformed by Mut or Dec, we cannot have a generalization rooted by a function symbol; a variable is the only possible way to get a generalization for this AUT. The soundness and completeness of Mut and Dec follow because is a resolvent presentation of .
For any input PSS-configuration where are ground -terms, the procedure terminates by computing finitely many PSS-configurations of the form and the union of all the terms corresponds to a complete set of -generalizations of , since all the rules are sound and complete.
Remark 14.
For the inference system given in Figure 2, an input problem is such that iff all the normal forms reached by this inference system are the empty set.
For the inference system in Figure 3, the set of solutions of an input -matching problem consists of all the normal forms reached by this inference system corresponding to solved forms where are pairwise distinct variables.
Mut=:
where is a fresh renaming of an axiom in , and is a substitution such that .
Dec=:
where .
Mut≤: Mutation rule for matching
where is a fresh renaming of an axiom in , and is a substitution such that .
Dec≤: Decomposition rule for matching
where .
Mer≤: Merging rule for matching
if .
MutLR: Left-Right Mutation rule for generalization
where , , and .
MutR: Right Mutation rule for generalization
where , and .
MutL: Left Mutation rule for generalization
where , and .
Dec: Decomposition rule for generalization
where and are fresh variables.
Sol: Solving rule for generalization
if no rule in applies.
Mer: Merging rule for generalization
if and .
The class of finite syntactic theories is known to be closed by disjoint union [34]. Actually, for a union of two disjoint finite theories with respective resolvent presentations and , we have that is a resolvent presentation corresponding to a finite theory. Consequently, the inference system in Figure 4 can be directly applied to any union of disjoint finite syntactic theories for which the resolvent presentations are known. However, finding a resolvent presentation may be a difficult task, especially if -unification is not known to be finitary. In the case we already have a GSI algorithm for a finite theory , we can combine it directly using the approach introduced in Section 4; we don’t have to find a resolvent presentation (if there exists one for ).
Remark 15.
The empty theory is a finite syntactic theory, and its resolvent presentation is empty. In the case , one can notice that the inference system from Figure 4 corresponds to the classical rule-based generalization algorithm known for the empty theory.
6 White-Box Combination of Rule-Based Generalization Algorithms
This section focuses on the modular construction of generalization algorithms transforming PSS-configurations. Let us now define precisely this kind of rule-based generalization algorithm.
Definition 16.
Let be a regular theory. An -generalization algorithm transforming PSS-configurations is an inference system such that for any initial configuration , it derives a finite set of final configurations of the form for which is a complete set of -generalizations of . Furthermore, for any configuration derived from the initial configuration , the following (invariant) properties must hold:
-
is an -generalization of with respective instantiations where (resp., ) is the left (resp., right) substitution of ,
-
for any in , there is no non-variable -generalization of .
Remark 17.
An -generalization algorithm transforming PSS-configurations leads to a GSI-algorithm. Consider an initial configuration and any triple such that where is the set of final configurations given in Definition 16 and (resp., ) denotes the left (resp., right) substitution of . Then, is actually a .
The inference procedure in Figure 4 corresponds to a generalization algorithm transforming PSS-configurations, thus leading to a GSI algorithm. Conversely, note that a GSI algorithm can be turned into a generalization algorithm transforming PSS-configurations where all derivations are of length .
Corollary 18.
A theory admits a generalization algorithm transforming PSS-configurations iff it admits a GSI algorithm.
Generalization algorithms transforming PSS-configurations correspond to GSI algorithms, and so they can be combined using the method given in Figure 1, provided that the respective theories and are regular collapse-free. Since this method is given as an inference system manipulating PSS-configurations, it can be reworded as an -generalization algorithm transforming PSS-configurations. In this new presentation, we do not directly use the triples from -sets, but we just apply one step of the inference system corresponding to an -generalization algorithm transforming PSS-configurations, see Figure 5. This is sufficient to move forward with a solution.
Given two generalization algorithms and transforming PSS-configurations, let be the inference system given by the set of rules where -Step is introduced in Figure 5 while -Sol and -Mer are from Figure 1.
-Step:
where and .
Theorem 19 (Modular Construction for Rule-Based Generalization).
Let and be two signature-disjoint regular collapse-free theories. If is an -generalization algorithm transforming PSS-configurations for , then is an -generalization algorithm transforming PSS-configurations.
Proof.
The inference system is terminating since and are terminating, and there are finitely many alternations of calls to and to (see Lemma 10). Actually, corresponds to an -generalization algorithm transforming PSS-configurations since the -Step rule from Figure 5 is sound and complete by Lemma 10 and Corollary 11.
Remark 20.
In both rules -Step and -Gen, the constant abstraction and the term concretization should be considered as transparent operations. With , aliens are viewed as free constants, with the restriction that two -equal aliens must be viewed as the same free constant. Conversely, with , these free constants are viewed back as terms in the combined theory . By slightly abusing notation, if we omit to apply these transformations, then the rules -Step and -Gen become easy to express.
Note that for free, associative, and commutative theories, all four algorithms for the combined theories , , , described in [2] can be obtained as a combination of generalization algorithms transforming PSS-configurations.
Example 21.
Consider and . For , one can show the existence of an -generalization algorithm transforming PSS-configurations, say , given by the set of rules , where can be found in Figure 4, and the mutation rules -Mut is defined as follows:
-Mut:
For , we call the -generalization algorithm transforming PSS-configurations given in Figure 4, with the following resolvent presentation of shown in [29]:
Let us now detail how the combined generalization algorithm is applied to solve
We focus on two particular derivations leading to final PSS-configurations. The derived PSS-configurations are listed in the table below, where the rightmost column shows the configuration number, while the first column indicates how the configuration in that row is obtained (which generalization algorithm is applied to which configuration):
| (0) | ||
| (1) | ||
| (2) | ||
| (3) | ||
| (4) | ||
| (5) | ||
| (6) | ||
| (7) | ||
| (8) | ||
| (9) | ||
| (10) |
To get (5), is applied with the axioms and in . To get (8), is applied with the axiom in . Notice that could be applied instead of on (6), (8), (9) since both and can handle any AUT between two constants in . Conversely, could be applied instead of on (3) since both and include a Sol rule. Eventually, (4) and (10) are two final configurations:
-
,
-
.
The resulting generalizations are:
-
, with respective instantiations ,
-
, with respective instantiations .
The combined generalization algorithm derives additional final configurations. Still, one can check that the corresponding generalizations are always more general than , just like is more general than .
7 Conclusion
We have introduced a combination method for the generalization problem in any union of disjoint regular collapse-free theories. Our framework requires regularity to ensure that any matching problem only has ground solutions. With this property, we have only to consider the generalization of ground terms involving possibly free constants. Collapse-freeness is also very useful for solving easily the generalization of two terms rooted in distinct theories. In future work, we plan to investigate the possibility of relaxing this regular and collapse-free assumption.
The study of non-disjoint unions of theories would also be a natural continuation. In particular, we plan to consider the case of shared constructors, relying on the combination results available for the matching problem in unions of theories sharing only constructors [23]. When the constructors do not appear in the root of either side of the axioms, it might be possible to solve the combined generalization problem similarly to the disjoint case. More generally, allowing only shared constants at the root of the axioms could also be helpful to solve the problem.
For the class of regular collapse-free theories, we have investigated the disjoint combination of rule-based generalization algorithms transforming PSS-configurations. For the subclass of finite syntactic theories, we have shown how the axioms of a resolvent presentation can be used to construct a generalization algorithm transforming PSS-configurations. More broadly, we are interested in identifying examples of non-finite (yet regular collapse-free) theories that admit a generalization algorithm transforming PSS-configurations, to which our combination approach would be applicable. Finally, we conjecture that this combination approach can be naturally extended to generalization procedures that are not necessarily terminating.
References
- [1] María Alpuente, Demis Ballis, Angel Cuenca-Ortega, Santiago Escobar, and José Meseguer. ACUOS: A High-Performance System for Modular ACU Generalization with Subtyping and Inheritance. In Francesco Calimeri, Nicola Leone, and Marco Manna, editors, Logics in Artificial Intelligence - 16th European Conference, JELIA 2019, Rende, Italy, May 7-11, 2019, Proceedings, volume 11468 of Lecture Notes in Computer Science, pages 171–181. Springer, 2019. doi:10.1007/978-3-030-19570-0_11.
- [2] María Alpuente, Santiago Escobar, Javier Espert, and José Meseguer. A modular order-sorted equational generalization algorithm. Inf. Comput., 235:98–136, 2014. doi:10.1016/J.IC.2014.01.006.
- [3] Nino Amiridze and Temur Kutsia. Anti-unification and natural language processing. EasyChair Preprint no. 203, EasyChair, 2018. doi:10.29007/fkrh.
- [4] Mauricio Ayala-Rincón, David M. Cerna, Andres Felipe Gonzalez Barragan, and Temur Kutsia. Equational anti-unification over absorption theories. In Christoph Benzmüller, Marijn J. H. Heule, and Renate A. Schmidt, editors, Automated Reasoning - 12th International Joint Conference, IJCAR 2024, Nancy, France, July 3-6, 2024, Proceedings, Part II, volume 14740 of Lecture Notes in Computer Science, pages 317–337. Springer, 2024. doi:10.1007/978-3-031-63501-4_17.
- [5] Franz Baader. Unification, weak unification, upper bound, lower bound, and generalization problems. In Ronald V. Book, editor, Rewriting Techniques and Applications, 4th International Conference, RTA-91, Como, Italy, April 10-12, 1991, Proceedings, volume 488 of Lecture Notes in Computer Science, pages 86–97. Springer, 1991. doi:10.1007/3-540-53904-2_88.
- [6] Franz Baader and Tobias Nipkow. Term rewriting and all that. Cambridge University Press, 1998. doi:10.1017/CBO9781139172752.
- [7] Franz Baader and Klaus U. Schulz. Unification in the union of disjoint equational theories: Combining decision procedures. J. Symb. Comput., 21(2):211–243, 1996. doi:10.1006/JSCO.1996.0009.
- [8] Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. Getafix: learning to fix bugs automatically. Proc. ACM Program. Lang., 3(OOPSLA):159:1–159:27, 2019. doi:10.1145/3360585.
- [9] Adam D. Barwell, Christopher Brown, and Kevin Hammond. Finding parallel functional pearls: Automatic parallel recursion scheme detection in Haskell functions via anti-unification. Future Gener. Comput. Syst., 79:669–686, 2018. doi:10.1016/j.future.2017.07.024.
- [10] Hans-Jürgen Bürckert, Alexander Herold, and Manfred Schmidt-Schauß. On equational theories, unification, and (un)decidability. J. Symb. Comput., 8(1/2):3–49, 1989. doi:10.1016/S0747-7171(89)80021-5.
- [11] Jochen Burghardt. E-generalization using grammars. Artif. Intell., 165(1):1–35, 2005. doi:10.1016/J.ARTINT.2005.01.008.
- [12] David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. babble: Learning Better Abstractions with E-Graphs and Anti-unification. Proceedings of the ACM on Programming Languages, 7(POPL):396–424, 2023. doi:10.1145/3571207.
- [13] David M. Cerna. Anti-unification and the theory of semirings. Theor. Comput. Sci., 848:133–139, 2020. doi:10.1016/J.TCS.2020.10.020.
- [14] David M. Cerna and Temur Kutsia. Idempotent anti-unification. ACM Trans. Comput. Log., 21(2):10:1–10:32, 2020. doi:10.1145/3359060.
- [15] David M. Cerna and Temur Kutsia. Unital anti-unification: Type and algorithms. In Zena M. Ariola, editor, 5th International Conference on Formal Structures for Computation and Deduction, FSCD 2020, June 29-July 6, 2020, Paris, France (Virtual Conference), volume 167 of LIPIcs, pages 26:1–26:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.FSCD.2020.26.
- [16] David M. Cerna and Temur Kutsia. Anti-unification and generalization: A survey. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 6563–6573. ijcai.org, 2023. doi:10.24963/IJCAI.2023/736.
- [17] Hubert Comon, Marianne Haberstrau, and Jean-Pierre Jouannaud. Syntacticness, cycle-syntacticness, and shallow theories. Inf. Comput., 111(1):154–191, 1994. doi:10.1006/inco.1994.1043.
- [18] Andrew Cropper, Sebastijan Dumancic, Richard Evans, and Stephen H. Muggleton. Inductive logic programming at 30. Mach. Learn., 111(1):147–172, 2022. doi:10.1007/S10994-021-06089-1.
- [19] Reudismam Rolim de Sousa, Gustavo Soares, Rohit Gheyi, Titus Barik, and Loris D’Antoni. Learning quick fixes from code repositories. In Cristiano D. Vasconcellos, Karina Girardi Roggia, Vanessa Collere, and Paulo Bousfield, editors, 35th Brazilian Symposium on Software Engineering, SBES 2021, Joinville, Santa Catarina, Brazil, 27 September 2021 - 1 October 2021, pages 74–83. ACM, 2021. doi:10.1145/3474624.3474650.
- [20] Eric Domenjoud, Francis Klay, and Christophe Ringeissen. Combination techniques for non-disjoint equational theories. In Alan Bundy, editor, Automated Deduction - CADE-12, 12th International Conference on Automated Deduction, Nancy, France, June 26 - July 1, 1994, Proceedings, volume 814 of Lecture Notes in Computer Science, pages 267–281. Springer, 1994. doi:10.1007/3-540-58156-1_19.
- [21] Rui Dong, Zhicheng Huang, Ian Iong Lam, Yan Chen, and Xinyu Wang. WebRobot: web robotic process automation using interactive programming-by-demonstration. In Ranjit Jhala and Isil Dillig, editors, PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022, pages 152–167. ACM, 2022. doi:10.1145/3519939.3523711.
- [22] Serdar Erbatur, Andrew M. Marshall, and Christophe Ringeissen. Non-disjoint combined unification and closure by equational paramodulation. In Boris Konev and Giles Reger, editors, Frontiers of Combining Systems - 13th International Symposium, FroCoS 2021, Birmingham, UK, September 8-10, 2021, Proceedings, volume 12941 of Lecture Notes in Computer Science, pages 25–42. Springer, 2021. doi:10.1007/978-3-030-86205-3_2.
- [23] Serdar Erbatur, Andrew M. Marshall, and Christophe Ringeissen. Combined Hierarchical Matching: the Regular Case. In Amy P. Felty, editor, 7th International Conference on Formal Structures for Computation and Deduction, FSCD 2022, August 2-5, 2022, Haifa, Israel, volume 228 of LIPIcs, pages 6:1–6:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.FSCD.2022.6.
- [24] Boris Galitsky. Developing Enterprise Chatbots - Learning Linguistic Structures. Springer, 2019. doi:10.1007/978-3-030-04299-8.
- [25] Peter Graf. Substitution tree indexing. In Jieh Hsiang, editor, Rewriting Techniques and Applications, 6th International Conference, RTA-95, Kaiserslautern, Germany, April 5-7, 1995, Proceedings, volume 914 of Lecture Notes in Computer Science, pages 117–131. Springer, 1995. doi:10.1007/3-540-59200-8_52.
- [26] Alexander Herold. Combination of unification algorithms in equational theories. PhD thesis, Kaiserslautern University of Technology, Germany, 1987. URL: https://d-nb.info/880327979.
- [27] Jean-Pierre Jouannaud. Syntactic theories. In Branislav Rovan, editor, Mathematical Foundations of Computer Science 1990, MFCS’90, Banská Bystrica, Czechoslovakia, August 27-31, 1990, Proceedings, volume 452 of Lecture Notes in Computer Science, pages 15–25. Springer, 1990. doi:10.1007/BFB0029593.
- [28] Claude Kirchner. Méthodes et outils de conception systématique d’algorithmes d’unification dans les théories équationnelles. Thèses d’état, Université de Nancy I, 1985.
- [29] Claude Kirchner and Francis Klay. Syntactic theories and unification. In Proceedings of the Fifth Annual Symposium on Logic in Computer Science (LICS ’90), Philadelphia, Pennsylvania, USA, June 4-7, 1990, pages 270–277. IEEE Computer Society, 1990. doi:10.1109/LICS.1990.113753.
- [30] Hélène Kirchner and Christophe Ringeissen. Combining symbolic constraint solvers on algebraic domains. J. Symb. Comput., 18(2):113–155, 1994. doi:10.1006/JSCO.1994.1040.
- [31] Boris Konev and Temur Kutsia. Anti-unification of concepts in description logic EL. In Chitta Baral, James P. Delgrande, and Frank Wolter, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR 2016, Cape Town, South Africa, April 25-29, 2016, pages 227–236. AAAI Press, 2016. URL: http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12880.
- [32] Christopher Lynch and Barbara Morawska. Basic syntactic mutation. In Andrei Voronkov, editor, Automated Deduction - CADE-18, 18th International Conference on Automated Deduction, Copenhagen, Denmark, July 27-30, 2002, Proceedings, volume 2392 of Lecture Notes in Computer Science, pages 471–485. Springer, 2002. doi:10.1007/3-540-45620-1_37.
- [33] Sonu Mehta, Ranjita Bhagwan, Rahul Kumar, Chetan Bansal, Chandra Shekhar Maddila, B. Ashok, Sumit Asthana, Christian Bird, and Aditya Kumar. Rex: Preventing bugs and misconfiguration in large services using correlated change analysis. In Ranjita Bhagwan and George Porter, editors, 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, pages 435–448. USENIX Association, 2020. URL: https://www.usenix.org/conference/nsdi20/presentation/mehta.
- [34] Tobias Nipkow. Proof transformations for equational theories. In Proceedings of the Fifth Annual Symposium on Logic in Computer Science (LICS ’90), Philadelphia, Pennsylvania, USA, June 4-7, 1990, pages 278–288. IEEE Computer Society, 1990. doi:10.1109/LICS.1990.113754.
- [35] Tobias Nipkow. Combining matching algorithms: The regular case. J. Symb. Comput., 12(6):633–654, 1991. doi:10.1016/S0747-7171(08)80145-9.
- [36] Julian Parsert and Elizabeth Polgreen. Reinforcement learning and data-generation for syntax-guided synthesis. In Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan, editors, Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, pages 10670–10678. AAAI Press, 2024. doi:10.1609/AAAI.V38I9.28938.
- [37] Brigitte Pientka. Higher-order term indexing using substitution trees. ACM Trans. Comput. Log., 11(1):6:1–6:40, 2009. doi:10.1145/1614431.1614437.
- [38] Gordon D. Plotkin. A note on inductive generalization. Machine Intel., 5(1):153–163, 1970.
- [39] Stanislaw J. Purgal, David M. Cerna, and Cezary Kaliszyk. Learning higher-order logic programs from failures. In Luc De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 2726–2733. ijcai.org, 2022. doi:10.24963/IJCAI.2022/378.
- [40] Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. Programming by Example Using Least General Generalizations. In Carla E. Brodley and Peter Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pages 283–290. AAAI Press, 2014. doi:10.1609/AAAI.V28I1.8744.
- [41] John C. Reynolds. Transformational systems and the algebraic structure of atomic formulas. Machine Intel., 5(1):135–151, 1970.
- [42] Manfred Schmidt-Schauß. Unification in a combination of arbitrary disjoint equational theories. J. Symb. Comput., 8(1/2):51–99, 1989. doi:10.1016/S0747-7171(89)80022-7.
- [43] Erik Tidén. Unification in combinations of collapse-free theories with disjoint sets of function symbols. In Jörg H. Siekmann, editor, 8th International Conference on Automated Deduction, Oxford, England, July 27 - August 1, 1986, Proceedings, volume 230 of Lecture Notes in Computer Science, pages 431–449. Springer, 1986. doi:10.1007/3-540-16780-3_110.
- [44] Emily Rowan Winter, Vesna Nowack, David Bowes, Steve Counsell, Tracy Hall, Sæmundur Óskar Haraldsson, John R. Woodward, Serkan Kirbas, Etienne Windels, Olayori McBello, Abdurahman Atakishiyev, Kevin Kells, and Matthew W. Pagano. Towards developer-centered automatic program repair: findings from Bloomberg. In Abhik Roychoudhury, Cristian Cadar, and Miryung Kim, editors, Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE, Singapore, November 14-18, 2022, pages 1578–1588. ACM, 2022. doi:10.1145/3540250.3558953.
- [45] Kaiyu Yang and Jia Deng. Learning symbolic rules for reasoning in quasi-natural language. Trans. Mach. Learn. Res., 2023, 2023. URL: https://openreview.net/forum?id=gwRwHUZUgz.
- [46] Katherine A. Yelick. Unification in combinations of collapse-free regular theories. J. Symb. Comput., 3(1/2):153–181, 1987. doi:10.1016/S0747-7171(87)80025-1.
- [47] Gonzague Yernaux and Wim Vanhoof. Anti-unification of unordered goals. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany (Virtual Conference), volume 216 of LIPIcs, pages 37:1–37:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CSL.2022.37.
- [48] Gonzague Yernaux and Wim Vanhoof. On detecting semantic clones in constraint logic programs. In 16th IEEE International Workshop on Software Clones, IWSC 2022, Limassol, Cyprus, October 2, 2022, pages 32–38. IEEE, 2022. doi:10.1109/IWSC55060.2022.00014.
- [49] Dongjun Youn, Wonho Shin, Jaehyun Lee, Sukyoung Ryu, Joachim Breitner, Philippa Gardner, Sam Lindley, Matija Pretnar, Xiaojia Rao, Conrad Watt, and Andreas Rossberg. Bringing the WebAssembly Standard up to Speed with SpecTec. Proc. ACM Program. Lang., 8(PLDI):1559–1584, 2024. doi:10.1145/3656440.
