Abstract 1 Introduction 2 Background 3 E-graphs and Equality Saturation 4 Equality saturation and the chase 5 The termination theorems of equality saturation 6 Weak term acyclity for equality saturation termination 7 Conclusion References

Semantic Foundations of Equality Saturation

Dan Suciu ORCID University of Washington, Seattle, WA, USA Yisu Remy Wang ORCID University of California Los Angeles, CA, USA Yihong Zhang ORCID University of Washington, Seattle, WA, USA
Abstract

Equality saturation is an emerging technique for program and query optimization developed in the programming language community. It performs term rewriting over an E-graph, a data structure that compactly represents a program space. Despite its popularity, the theory of equality saturation lags behind the practice. In this paper, we define a fixpoint semantics of equality saturation based on tree automata and uncover deep connections between equality saturation and the chase. We characterize the class of chase sequences that correspond to equality saturation. We study the complexities of terminations of equality saturation in three cases: single-instance, all-term-instance, and all-E-graph-instance. Finally, we define a syntactic criterion based on acyclicity that implies equality saturation termination.

Keywords and phrases:
the chase, equality saturation, term rewriting, tree automata, query optimization
Copyright and License:
[Uncaptioned image] © Dan Suciu, Yisu Remy Wang, and Yihong Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Equational logic and rewriting
; Theory of computation Rewrite systems
Related Version:
Full Version: https://arxiv.org/abs/2501.02413  [27]
Funding:
This work is supported by NSF IIS 2314527 and NSF SHF 2312195, and is conducted while some of the authors participated in the Simons Program on Logic and Algorithms in Databases and AI.
Editors:
Sudeepa Roy and Ahmet Kara

1 Introduction

Given a set of identities between terms, the word problem asks whether the identities imply two ground terms t1,t2 are equivalent, i.e. t1t2. This fundamental problem has applications including automated theorem proving, program verification, and query equivalence checking. In his Ph.D. thesis, Nelson [22] introduced a data structured called E-graph for efficiently answering the word problem. At the core, an E-graph is a compact representation of an equivalence relation over a possibly infinite set of ground terms. During the 2000s, researchers applied E-graphs to program optimization [14, 28]. The compiler populates an E-graph with many equivalent programs, using axiomatic rewrites, then extracts the best program from the equivalent ones. In particular, Tate et.al. [28] coined the term equality saturation (EqSat) and gave a procedural description of the algorithm. In 2021, Willsey et al. [32] proposed egg, which introduced important algorithmic improvements and made EqSat practical. Since 2021, EqSat has been applied to a wide range of topics in domain-specific program optimization, including floating-point computation [24] computational fabrication [20], machine learning systems [33], and hardware design [29, 4]. There is also a growing interest in using EqSat for query optimization in data management. For example, EqSat is used to optimize queries in OLAP [6], linear algebra [30], tensor algebra [25], and Datalog [31].

The equality saturation procedure consists of repeatedly selecting an identity uv from the given set, matching the term u with the E-graph, then adding the term v to the E-graph, if it wasn’t already there. Equality saturation terminates when no new terms can be added. There are striking connections between equality saturation and database concepts. Zhang et al. [35] observed that the matching step is the same as conjunctive query evaluation, and described significant speedups in egg by using a Worst Case Optimal Join algorithm [23] for matching. A recent system, egglog [34], unified EqSat and Datalog to improve egg’s support for program optimization and program analysis.

In this paper we study another deep connection between equality saturation and the chase procedure for Tuple Generating Dependencies (TGDs) and Equality Generating Dependencies (EGDs) [8]. Our hope is that these results will help solve some of the open problems in equality saturation by using techniques and results for the chase procedure. Before describing our results we give a gentle introduction to EqSat and describe some of its open problems.

Example 1.

Consider a simple language with two binary operators f,g and constant a. We want to optimize the following term t (the “8th power” of f on a):

t= f(f(f(a,a),f(a,a)),f(f(a,a),f(a,a))) (1)

We are given a single identity, f(x,x)g(x,x), which says two terms f(t1,t2) and g(t1,t2) are equivalent, provided that t1,t2 are equivalent. Starting with the initial term t, EqSat constructs an E-graph G and grows it to represent all terms equivalent to t. The literature defines an E-graph as a set of E-classes, where each E-class is a set of E-nodes, and each E-node is labeled with a function symbol and has a number of E-class children equal to the arity of the symbol. EqSat starts by constructing an E-graph G representing t, shown on the left in Figure 1. There are 4 E-classes, each consisting of one single E-node; the E-class c4 represents precisely the term t. Next, EqSat repeatedly applies the identity f(x,x)g(x,x), by matching the left-hand side f(x,x) to the E-graph, then adding the right-hand side g(x,x) to the E-graph: we formalize this in Sec. 3. The resulting E-graph H is on right of Figure 1. There are 4 E-classes, c1,,c4, each consisting of 1 or 2 E-nodes. For example, c4 has two E-nodes, and represents two equivalent terms, f(t1,t2)g(t1,t2), where t1,t2 are any terms represented by c3. By continuing this reasoning, we observe that c4, represents a total of 27 possible terms, namely all terms of the form: h(h(h(a,a),h(a,a)),h(h(a,a),h(a,a))) where each h can be either f or g.

G:

H:

Figure 1: Two E-graphs G,H, before and after EqSat.

Open problems about EqSat.

We still understand very little about equality saturation. Most descriptions of EqSat focus on an imperative understanding111 A notable exception is egglog [34], whose semantics is based on fixpoints instead of implementation details. Some early works also define E-graphs (under a different name like abstract congruence closure) as tree automata similar to ours [26, 2, 12]. of equality saturation and E-graphs. E-graphs are described by their individual components (e.g., a hashconsing data structure, a union-find, etc.), and EqSat is commonly defined in pseudocode as a sequence of operations. In other words, the semantics of EqSat is the output of the specific algorithm, if it terminates; if the algorithm diverges, the semantics is undefined.

We also do not know much about when EqSat terminates. The termination problem asks, given a set of rewrite rules, whether EqSat terminates on a given input E-graph, or whether it terminates on all input E-graphs. This is a fundamental problem of EqSat and has applications in program/query optimization and equivalence checking: If EqSat terminates on the symmetric closure of a set of (variable-preserving) rewrite rules , it decides the word problem of the equational theory defined by (22). With an appropriate cost model, EqSat can further pick the optimal program among all programs equivalent to the input, e.g. by using Knuth’s algorithm [15].

Our contribution.

After a review of some background material in Sec. 2, we introduce E-graphs and EqSat in Sec. 3. Our definition, in Sec. 3.1, applies even to the cases when equality saturation does not terminate and, for that purpose, we define the E-graph to be a reachable, deterministic tree automaton, with possibly infinitely many states. By explicitly allowing infinite E-graphs we can define a formal semantics even when EqSat does not terminate. We show that concepts in tree automata are in 1-1 correspondence with those in E-graphs: the automaton states correspond to E-classes, and the transitions correspond to E-nodes. A term is represented by an E-graph iff it is accepted by the E-graph viewed as a standard tree automaton. We prove that, for any two E-graphs there exists at most one homomorphism between them, and, therefore, E-graphs are rigid tree automata. Next, in Sec. 3.2, we define a few basic operations on E-graphs, such as E-matching, insertion, congruence closure, and least upper bounds, by relying on tree-automata concepts. Using these operations, we define in Sec. 3.3 an immediate consequence operator (ICO), and define EqSat formally as the least fixpoint of the ICO. The least fixpoint always exists and is unique, even if the fixpoint procedure does not terminate, in which case the least fixpoint may be infinite. Finally, we prove an important lemma, called the Finite Convergence Lemma, stating that, if the least fixpoint is finite, then equality saturation procedure converges in finitely many steps. This is not immediately obvious because, while E-matching and insertion strictly increase the size of the E-graph, congruence closure may decrease it. A similar proposition fails for TGDs and EGDs: there exists an infinite chase where all instances have bounded size, hence its “limit” is finite.

Next, in Sec. 4 we describe the connection between EqSat and the chase. After a brief review of the chase in Sec. 4.1, we start by presenting a reduction from the Skolem chase to equality saturation, denoted SklChEqSat (Sec. 4.2), then from equality saturation to the standard chase, denoted EqSatStdCh (Sec. 4.3). For SklChEqSat, given a set of dependencies, we show there exists a set of rewrite rules where EqSat produces an encoded result of the Skolem chase and has the same termination behavior. For EqSatStdCh, we show that, given a set of rewrite rules, there exists a set of dependencies where the standard chase produces an encoded result of EqSat (whether it terminates). Since the standard chase is a non-deterministic process, we characterize the type of chase sequences that terminate when EqSat terminates (30). We call them EGD-fair chase sequences. Intuitively, a chase sequence is called EGD-fair if it applies EGDs to a fixpoint frequently enough. We show in 30 that,

EqSat terminates one chase sequence terminates
all EGD-fair chase sequences terminate.

The notion of EGD-fair chase sequences is of independent interest.

Finally, we present our main decidability results for EqSat in Sec. 5: we show that the single-instance termination problem of EqSat, denoted as 𝒯GEqSat, is R.E.-complete, and the all-term-instance termination problem of EqSat, denoted as 𝒯tEqSat, is Π2-complete. Our proof is based on a non-trivial reduction from the Turing machine, first presented in the undecidability proof of the finiteness of congruence classes defined by string rewriting systems [21]. While the single-instance case easily follows from the undecidability of Skolem chase termination, our approach allows us to also prove the Π2-completeness of the all-term-instance termination case by a reduction from the universal halting problem. We also show the all-E-graph-instance termination problem of EqSat, denoted as 𝒯GEqSat, is undecidable, although the exact upper bound is open.

We contrast the termination problems of EqSat with those of the Skolem chase and the standard chase. The single-instance termination problems are R.E.-complete in all three cases [18, 5], and the all-instance termination of the Skolem chase (𝒯SklCh) is R.E.-complete as well [18, 10]. This shows that 𝒯SklCh is easier than 𝒯tEqSat. The case for the standard chase is more interesting. There are two all-instance termination problems of the standard chase: for all database instances, whether all chase sequences terminate in finitely many steps (𝒯,StdCh), and whether there exists at least one chase sequence that terminate (𝒯,StdCh). It has been shown 𝒯,StdCh is Π2-complete [11], but the exact complexity of 𝒯,StdCh is open. Grahne and Onet showed if we allow one denial constraint, 𝒯,StdCh is Π2-complete [11], although Gogacz and Marcinkowski [10] conjectured that this problem is indeed in R.E.

In Sec. 6 we propose a sufficient syntactic criterion that guarantees EqSat termination, called weak term acyclicity, which is based on the classic notion of weak acyclicity [8]. If a set of rewrite rules is weakly term acyclic, then EqSat terminates for all input E-graphs.

2 Background

2.1 Term Rewriting Systems

We review briefly the standard definition of a term rewriting system from [1]. A signature is a finite set Σ of function symbols with given arities. If V is a set of variables, then T(Σ,V) denotes the set of terms constructed inductively using symbols from Σ and variables from V. Members of T(Σ,V) are called patterns, and members of T(Σ)=defT(Σ,) are called ground terms, or simply terms thereafter. A substitution is a function σ:VT(Σ); if u is a pattern, then we denote by u[σ] the term obtained by applying the substitution σ to u. A rewrite rule r has the form lhsrhs where lhs and rhs are patterns and the variables in rhs are a subset of those lhs, Var(rhs)Var(lhs). A term rewriting system (TRS), , is a set of rewrite rules. defines a rewrite relation as follows: lhs[σ]rhs[σ] for any substitution σ and rule lhsrhs in , and, if uv then f(w1,,wi1,u,wi+1,wk)f(w1,,wi1,v,wi+1,wk) for any function symbol fΣ of arity k, and any terms wj, j=1,k;ji. Let be the reflexive and transitive closure of . We define ()=def()1, ()=def()(), and ()=def(). is a congruence relation. We define the set of reachable terms R(t)={ttt}. If a term rewriting system is variable-preserving (i.e., Var(lhs)=Var(rhs) for all rules), we define 1={rhslhs(lhsrhs)}. It follows that ((1))=().

2.2 Tree automata

Let Σ be a signature. A (bottom-up) tree automaton is a tuple 𝒜=Q,Σ,Qfinal,Δ, where Q is a (potentially infinite)222 In this paper, we allow tree automata (and thus E-graphs) to have an infinite number of states and transitions. Talking about infinite E-graphs allow us to define the semantics of equality saturation even when the algorithm does not terminate. set of states, QfinalQ is a set of final states, and Δ is a set of transitions of the form f(q1,,qn)q where q,q1,,qnQ, and fΣ. Denote by ΣQ the signature Σ extended with Q where each state is viewed as a symbol of arity 0. Then Δ is a term rewriting system for ΣQ, and we will denote by 𝒜 (rather than Δ) the rewrite relation defined by Δ. A term tT(Σ) is accepted by a state q if t𝒜q, and we write (q) for the set of terms accepted by q. The language accepted by 𝒜 is (𝒜)=def{tT(Σ)qfQfinal,t𝒜qf}. A tree language LT(Σ) is called regular if it is accepted by some finite tree automaton.

Fix two tree automata 𝒜=Q,Σ,Qfinal,Δ,=Q,Σ,Qfinal,Δ. A homomorphism, h:𝒜, is a function h:QQ that maps final states to final states, and, for every transition f(c1,,ck)c in 𝒜 there exists a transition f(h(c1),,h(ck))h(c) in . An isomorphism333Notice that a bijective homomorphism is not necessarily an isomorphism. is a homomorphism h:𝒜 for which there exists an inverse homomorphism h1:𝒜 such that h1h=id𝒜 and hh1=id. The following holds:

Lemma 2.

Let h:𝒜 be a homomorphism, tT(Σ), and c be a state of 𝒜. If t𝒜c, then th(c). In particular, (𝒜)().

Proof.

We prove the statement by induction on the structure of the term tT(Σ). Assuming t=f(t1,,tk) for k0444 The base case is covered by the case k=0. and t𝒜c, then there exists states c1,,ck such that ti𝒜ci and 𝒜 contains the transition f(c1,,ck)c. By induction hypothesis tih(ci) for i=1,,k, and since h is a homomorphism, there exists a transition f(h(c1),,h(ck))h(c) in , proving that th(c).

We write 𝒜 whenever there exists a homomorphism 𝒜. Observe that is a preorder relation. In the next section, we show that this preorder relation becomes a partial order when restricted to E-graphs (12).

We call an automaton 𝒜 deterministic if t𝒜q1 and t𝒜q2 implies q1=q2 for states q1,q2. We call 𝒜 reachable if every state q accepts some ground term: tT(Σ), t𝒜q.

3 E-graphs and Equality Saturation

Most papers discussing E-graphs use an operational definition not suitable for a theoretical analysis. We introduce an equivalent definition of E-graphs in terms of tree automata, similar to Kozen’s partial tree automata [17]. Throughout this section we fix the signature Σ.

3.1 E-graphs

Definition 3.

An E-graph is a deterministic and reachable tree automaton G=Q,Σ,Δ (without a set of final state Qfinal).

Our definition maps one-to-one to the classical definition of E-graphs: An E-class is a state cQ of the tree automaton, and an E-node is a transition f(c1,,ck)c. A term t is represented by the E-class c if t is accepted by c, i.e. tGc. In the literature, the sets of E-classes and E-nodes are denoted C and N respectively. We will use states/E-classes and transitions/E-nodes interchangeably in this paper. E-graphs do not define a set of “final” E-classes, and for that reason we omit the final states Qfinal from 3555 Alternatively, consider Qfinal=Q. , similarly to [17].

Example 4.

The E-graph H in Figure 1 is the automaton Q,Σ,Δ, where Σ={a,f(,),g(,)}, there are four states Q={c1,,c4}, and Δ consists of seven transitions:

a() c1 f(c1,c1) c2 g(c1,c1) c2 f(c3,c3) c4 g(c3,c3) c4

An example of rewritings is f(a,a)Hf(c1,a)Hf(c1,c1)Hc2, showing that the term f(a,a) is represented by the E-class c2.

It is folklore that E-graphs represent equivalences of terms. We make this observation formal, by defining the semantics of an E-graph to be a certain partial congruence. A partial equivalence relation, or PER, on a set A is a binary relation that is symmetric and transitive. Its support is the set supp()=def{xxx}A. Equivalently, a PER can be described by its support and an equivalence relation on the support. A PER on the set of terms T(Σ) is congruent if siti for i=1,,n and f(s1,,sn)supp() implies f(s1,,sn)f(t1,,tn). A PER is reachable if f(s1,,sn)supp() implies sisupp(), for i=1,,n. A Partial Congruence Relation (PCR)666PCRs are studied in the literature as congruences on partial algebras (e.g., [17]). on T(Σ) is a congruent and reachable PER.

An E-graph G induces a PCR G defined as follows: t1t2 if there exists some E-class c in G that accepts both t1 and t2, i.e. t1GcGt2. We check that G is a PCR: G is symmetric by definition, and transitivity follows from determinacy, because t1GcGt2 and t2GcGt3 implies t1Gc=cGt3. Suppose f(s1,,sn)Gc: then there exists states ci s.t. siGci, and a transition f(c1,,cn)c, proving reachability; if, in addition, siGti for i=1,,n, then tiGci, which implies f(t1,,tn)Gc, proving congruence, f(s1,,sn)Gf(t1,,tn).

Definition 5.

The semantics of an E-graph G is the PCR G.

Theorem 6.

For any PCR over T(Σ) there exists a unique G such that (G)=().

Proof sketch.

The states of G are the equivalence classes of , denoted as [t] for tsupp(), and the transitions are f([t1],,[tn])[f(t1,,tn)] for all t1,,tn,f(t1,,tn) in the support of . One can check by induction on the size of t that tsupp() iff tsupp(G), and tG[s] iff ts, proving that (G)=().

Thus, the semantics of an E-graph G is a PCR G, which is a congruence on (G)=defsupp(G). We say that G represents the set of terms (G).

Example 7.

Continuing 1, the semantics of the E-graph H in Figure 1 is the PCR H that equates aHa (witnessed by state c1), f(a,a)Hg(a,a) (by state c2), f(f(a,a),g(a,a))Hg(f(a,a),f(a,a)) (by state c3), etc.

Example 8.

Let Σ={a,f()}. Consider the E-graph G with a single state c and transitions a()c, f(c)c. It represents infinitely many terms, f(k)(a), for k0, and its semantics is the PCR aGf(a)Gf(f(a))G

Example 9.

Let Σ={a,f(),g()} and consider the infinite E-graph G with states c,c0,c1,c2, and transitions

a c0 f(ci) ci+1 g(ci) c i=0,1,2,

The PCR consists of g(a)Gg(f(a))Gg(f(f(a)))G, defined by the state c. No other distinct terms are in G, for example f(a)Gf(f(a)) because they are represented by the distinct states c1 and c2 respectively. Although G represents a regular language, {f(k)(a)k0}{g(f(k)(a))k0}, its semantics G cannot be captured by a finite E-graph. This example shows that G differs from the Myhill-Nerode equivalence relation [16], under which all terms f(k)(a) would be equivalent. It also illustrates the subtle distinction between tree automata and E-graphs. An optimizer that wants to use the identity g(x)=g(f(x)), but not x=f(x), needs this E-graph to represent all terms equivalent to g(a), and cannot use the finite tree automaton accepting the regular language (c) because that would incorrectly equate all terms f(k)(a).

Recall the definitions of tree automata homomorphisms in Sec. 2.2. When restricted to E-graphs, homomorphisms have some interesting properties:

Lemma 10.

If h:GH is a homomorphism, then (G)(H).

Proof.

Assume t1Gt2. Then there exists some E-class c where t1GcGt2. By 2, t1Hh(c)Ht2, implying t1Ht2.

Lemma 11.

There exists at most one homomorphism h:GH.

Proof.

Call the weight of a state c in G the size of the smallest term t such that tGc. Since G is reachable, every state has a finite weight. Given two homomorphisms h1,h2:GH, we prove by induction on the weight of c that h1(c)=h2(c). Let t be a term of minimal size such that tGc, and assume t=f(t1,,tk), for k0. Then there exists states c1,,ck such that tiGci, i=1,k, and a transition f(c1,,ck)c in G. By induction hypothesis h1(ci)=h2(ci) for i=1,k. By the definition of a homomorphism, H contains both transitions f(h1(c1),,h1(ck))h1(c) and f(h2(c1),,h2(ck))h2(c), and we conclude h1(c)=h2(c) because H is deterministic.

We call a tree automaton 𝒜 rigid [13] if the identity mapping is the only homomorphism 𝒜𝒜. It follows from 11 that every E-graph is a rigid tree automaton.

Lemma 12.

over E-graphs forms a partial order up to isomorphism.

Proof.

Obviously is reflexive and transitive. To prove anti-symmetry, assume two homomorphisms h:GH, h:HG. The composition hh is a homomorphism GG, and, by uniqueness, it must be the identity on G; similarly, hh is the identity on H, proving that h is an isomorphism, thus G,H are isomorphic.

Next, we define models for term rewriting systems.

Definition 13.

We say that an E-graph H=Q,Σ,Δ is a model of a TRS if, for every rule lhsrhs in and any substitution σ:Var(lhs)Q, if lhs[σ]Gc then rhs[σ]Gc. If G is another E-graph, then we say that H is a model for the pair ,G if GH and H is a model of . H is a universal model if for any other model H, it holds that HH.

When it exists, the universal model is unique up to isomorphism, because HH and HH implies H,H are isomorphic.

Continuing 7, let consists of the rule f(x,x)g(x,x), and let G,H be the E-graphs in Figure 1. G is not a model of , because for the substitution σ(x)=c1 we have lhs[σ]=f(c1,c1)Gc2, but rhs[σ]=g(c1,c1)↛Gc2. On the other hand, one can check that H is a model of ; in fact it is a model of ,G, because GH.

Given an E-graph G and a TRS , equality saturation constructs a universal model H of ,G, by repeatedly applying some simple operations on G, which we define next.

3.2 Operations over E-graphs

E-matching, Insertion, and Rebuilding are the building blocks of equality saturation. They defined in the literature operationally [19]. We provide here a formal definition, using tree automata terminology. Throughout this section we fix an E-graph G=Q,Σ,Δ.

E-matching a rule lhsrhs in G returns the set of pairs (σ,c), where σ:Var(lhs)Q is a substitution such that lhs[σ]Gc. For example, considering the E-graph on the left of Figure 1 and the rule f(x,x)g(x,x), E-matching returns ({xci,yci},ci+1) for i=1,2,3. E-matching is analogous to computing the triggers for a TGD or EGD (Sec. 4.1).

The Insertion of a pair (t,c) into G, where tT(ΣQ) and cQ, returns an automaton 𝒜 such that G𝒜 such that t𝒜c. To define 𝒜, we need the following:

Definition 14.

Fix a term tT(ΣQ) and a state cQ. The flattening for t with root c, in notation FL(tc), or just FL when t,c are clear from the context, is an E-graph that has one distinct state qu for each subterm u of t, and has a transition f(qu1,,quk)qu, for all subterms u of the form u=f(u1,,uk) and fΣ. Moreover, it is enforced that the state of the root node is c (i.e., qt=c). One can check that tFLc, and that FL is the identity on all subterms of t. Flattening is also called normalization [9].

G:

𝒜:

H:

Figure 2: Example E-graphs on insertion and rebuilding.

The result of inserting (t,c) in G is 𝒜=defGFL(tc) (i.e. we take the set-union of all states and all transitions). The result 𝒜 is a reachable tree automaton, but it is non-deterministic in general, thus it is not an E-graph; the next operation, rebuilding, converts it back into an E-graph. G𝒜 holds, because the inclusion G𝒜 is a homomorphism.

As an example, if we insert the pair (f(ca),cb) in the E-graph G in Figure 2, the result is 𝒜 in the center of the figure; flattening FL(f(ca)cb) has a single transition f(ca)cb.

Rebuilding converts 𝒜 into a deterministic automaton. Formally, let 𝒜 be any reachable tree automaton, and recall that 𝒜 may be infinite. The congruence closure CC(𝒜) is an E-graph (i.e. deterministic, reachable automaton) such that 𝒜CC(𝒜) and, for any other E-graph G, if 𝒜G, then CC(𝒜)G. We prove the following in the full version:

Lemma 15.

For any reachable tree automaton 𝒜, CC(𝒜) exists and is unique.

The procedure of computing CC(𝒜), also known as rebuilding in EqSat literature [32], can be done efficiently in the finite case, for instance with Tarjan’s algorithm [7]. The idea is to repeatedly find violating transitions f(c1,,ck)c and f(c1,,ck)c with cc, and replace every occurrence of c with c, until fixpoint. This is similar to determinizing 𝒜, but instead of constructing powerset states like {c1,c4,c7}, we equate states c1=c4=c7; thus, CC(𝒜) has at most as many states as 𝒜, and the procedure always terminates for finite 𝒜, as merging shrinks the number of states.

Example 16.

The tree automaton 𝒜 in Figure 2 is non-deterministic, because f(a)cb and f(a)cf. The congruence closure algorithm merges cb and cf, and produces the E-graph H in Figure 2. Notice that H represents strictly more terms than 𝒜. For example, H represents g(b,b), because g(b,b)Hcg, but 𝒜 does not represent g(b,b).

Least upper bound of E-graphs.

Let (Gi)iI be a (possibly infinite) family of E-graphs. Recall that their least upper bound G is an E-graph such that G is an upper bound for every E-graph in the set, GiG for all iI, and for any other upper bound G, it holds that GG. We prove the following in the full version:

Lemma 17 (Least upper bound).

The least upper bound exists and is given by CC(𝒜), where 𝒜 is the automaton consisting of the disjoint union of the states and the disjoint union of the transitions of all E-graphs Gi.

We will denote the least upper bound by iIGi. It is also possible to show that every family of E-graphs admits a greatest lower bound, by using a product construction [12], but we do not need it in this paper.

3.3 Equality saturation

The standard definition of equality saturation in the literature is procedural: given an E-graph G=Q,Σ,Δ and TRS , equality saturation repeatedly applies matching/insertion/rebuilding. EqSat is undefined when this process does not terminate. We provide here an alternative definition, as the least fixpoint of an immediate consequence operator (ICO), and prove that it always exists. We start by introducing the ICO:

ICO=def CCT (2)

T is the match/apply operator: it computes all E-matches then inserts the rhs’s into G:

T(G)=defG{FL(rhs[σ]Gc)(lhsrhs),σ:Var(lhs)Q,lhs[σ]Gc}

CC is the rebuilding operator of 15.

Lemma 18.

ICO is inflationary (GICO(G) for all G) and monotone.

Proof.

That ICO is inflationary follows from GT(G)CC(T(G)). We prove that both T and CC are monotone. Let H=def{FL(rhs[σ]c)(lhsrhs),σ:Var(lhs)Q,lhs[σ]Gc}, thus T(G)=GH. Any homomorphism GG can be extended to a homomorphism GHGH, which proves that T is monotone. Consider two automata 𝒜,𝒜 and assume 𝒜𝒜, i.e. there exists a homomorphism from 𝒜 to 𝒜. Denote G=defCC(𝒜), G=defCC(𝒜). Then, 𝒜𝒜G, which implies 𝒜G. By the definition of G=CC(𝒜), we have GG, proving that CC is monotone.

With Lemmas 17 and 18, we show the following in the full version:

Theorem 19.

Fix an E-graph G, and consider the class 𝒞G of E-graphs GG. Then ICO:𝒞G𝒞G has a least fixpoint, given by

EqSat(,G)=def i0ICO(i)(G) (3)

Furthermore, EqSat(,G) is a universal model of ,G; we call it equality saturation.

Given G,, our semantics of EqSat is the least fixpoint in (3), which is also the unique universal model of G,. When EqSat(,G) is finite, then this coincides with the standard procedural definition in the literature. A common case (e.g., in program and query optimization settings) is when G represents a single term t, more precisely G=FL(tc) with fresh state c; in that case we denote EqSat(,G) as EqSat(,t).

Properties of equality saturation.

We establish several basic facts of EqSat.

Lemma 20 (Inflationary).

GEqSat(,G).

Corollary 21.

(G)(EqSat(,G)) and (G)(EqSat(,G)).

20 follows from 18, while 21 follows from 2 and 10. Thus, EqSat(,G) represents more terms than G, and identifies more pairs of terms than G. Next, we examine the relationship between the PCR defined by EqSat(,G) and the relations and defined by the TRS (see Sec. 2.1). If 1,2 are two PCRs, then we denote by 12 the smallest PCR that contains both. We prove:

Lemma 22 (Representation).

Let wT(Σ) be a term represented by some state of the E-graph H=defEqSat(,G). The following hold: (w)[w]H[w]G.

Proof.

The definitions of E-matching and insertion imply that, if uv and u is represented by some state c of some E-graph K, then v is represented by the same state of the E-graph ICO(K)=CC(T(K)). Therefore, if uv and u is represented by some state of H, then v is represented by the same state of ICO(H)=H (because H is a fixpoint of ICO). This implies that (w)[w]H.

For the second part, we denote by Gk=ICO(k)(G), and check by induction on k that [w]Gk[w]G. When k=0 then G0=G and the claim is obvious. For the inductive step we observe that the only new identities introduced by Gk+1 are justified by .

In other words, if wv, then EqSat(,G) will equate w with v; and if EqSat(,G) equates w with v then this can be derived from and G. In general, wv does not imply wHv. For a simple example, let ={ab}, thus ab, and let G represent only the term b. Then H=EqSat(,G)=G represents only the term b, thus aHb.

Example 23.

For some TRS, the starting E-graph G determines whether EqSat terminates in a finite number of steps. For a simple example, consider Σ={f(),g(),a}, ={f(g(x))g(f(x))}. If the initial E-graph G represents only the term f(g(a)) (and its subterms), then EqSat terminates, and the resulting H=defEqSat(,G) represents a PCR where f(g(a))Hg(f(a)). On the other hand, if G is the E-graph with states cf,cg and transitions {g(cf)cg,f(cg)cf,acf}, then G already represents infinitely many terms (cf)(cg), where (cf)={a,f(g(a)),f(g(f(g(a)))),} and (cg)={g(a),g(f(g(a))),}. After equality saturation, H=EqSat(,G) represents all terms in T(Σ) where the numbers of occurrences #f,#g of f,g satisfy #f#g#f+1. This is not a regular language, hence H is infinite, and EqSat will not terminate in a finite number of steps.

Example 24.

We show that both inclusions in 22 can be strict. Let ={ab,cb} and let G be the E-graph representing a single term f(a,b) with transitions {aca,bcb,f(ca,cb)cf}. Then H=EqSat(,G) has transitions: {aca,bca¯,f(ca,ca¯)cf}. We have:777 We use f(a|b,b) as a shorthand for {f(t,b)t=at=b} (as in regular expressions) and similarly for other terms. (f(a,b))=f(a|b,b), [f(a,b)]H=f(a|b,a|b), and [f(a,b)]G=[f(a,b)]trs=f(a|b|c,a|b|c). All three sets are different.

However, the three expressions in  22 are equal in an important special case:

Corollary 25.

Suppose is a variable-preserving term rewriting system. Let Sym()=1, and let wT(Σ) be a term represented by some state of the E-graph H=defEqSat(Sym(),G). The following hold: (Sym())(w)=[w]H=[w]G.

Let G be a finite E-graph. If EqSat(,G) is infinite, then EqSat does not terminate in a finite number of steps. Somewhat surprisingly, the converse does hold: if EqSat(,G) is finite, then EqSat terminates in a finite number of steps. This follows from the next lemma, whose proof is deferred to the full version.

Lemma 26 (Finite convergence).

Let 𝒢:G1G2 be an ascending sequence of finite E-graphs. If G=iGi is finite, then the sequence 𝒢 is finite.

Corollary 27 (Finite convergence of EqSat).

Let be a term rewriting system and G be a finite E-graph. If EqSat(,G) is finite, EqSat converges in a finite number of steps.

4 Equality saturation and the chase

In this section, we briefly review necessary background on databases and the chase. Then, we will show the fundamental connections between equality saturation and the chase.

4.1 The Chase Procedure

Databases and conjunctive queries.

A relational database schema is a tuple of relation names 𝒮=(R1,,Rm) with associated arities ar(Ri). A database instance is a tuple of relation instances I=(R1I,,RmI), where RiIDomar(Ri) for some domain Dom. We allow an instance to be infinite. We often view a tuple a in RiI as an atom Ri(a), and view the instance I as a set of atoms. The domain Dom is the disjoint union of set of constants and a set of marked nulls.

A conjunctive query λ(x) is a formula with free variables x of the form R1(x1)Rk(xk), where each xi is a tuple of variables from x. The canonical database of a conjunctive query consists of all the tuples Ri(xi), where the variables x are considered marked nulls.

Let I, J be two database instances. A homomorphism from I to J, in notation h:IJ, is a a function h:Dom(I)Dom(J) that is the identity on the set of constants, and maps each atom R(a)I to an atom R(h(a))J. The notion of homomorphism immediately extends to conjunctive queries and/or database instances. The output of a conjunctive query λ(x) on a database I is defined as the set of homomorphisms from λ(x) to I. We say that a database instance I satisfies a conjunctive query λ(x), denoted by Ixλ(x), if there exists a homomorphism λ(x)I.

Dependencies.

TGDs and EGDs describe semantic constraints between relations. A TGD is a first-order formula of the form λ(x,y)z.ρ(x,z) where λ(x,z) and ρ(x,y) are conjunctive queries with free variables in xy and xz. An EGD is a first-order formula of the form λ(x)xi=xj where λ(x) is a conjunctive query with free variables in x and {xi,xj}x.

Fix a set of TGDs and EGDs Γ. If I is a database instance and dΓ, then a trigger for d in I is a homomorphism from λ(x,y) (resp. λ(x)) to I. An active trigger is a trigger h such that, if d is a TGD, then no extension h to a homomorphism h:ρ(x,z)I exists, and, if d is an EGD, then h(xi)h(xj). We say that I is model for Γ, and write IΓ, if it has no active triggers.

Given Γ and I we say that some database instance J is a model for Γ,I, if JΓ and there exists a homomorphism IJ. J is called universal model if there is a homomorphism from J to every model of Γ and I. Universal models are unique up to homomorphisms.

The chase.

The chase is a fixpoint algorithm for computing universal models. We consider two variants of the chase here: the standard chase and Skolem chase. Both the standard chase and the Skolem chase produce a universal model of Γ,I [8, 3, 18]. The standard chase computes answers by deriving a sequence of chase steps until all dependencies are satisfied. A chase step, denoted as Id,hJ, takes as inputs an instance I, a homomorphism h, and a dependency d, where h is an active trigger of d in I, and produces an output instance J by adding some tuples (for TGDs) or collapsing some elements (for EGDs). Specifically, if d is a TGD, the chase step extends I with the tuple h(ρ(x,z)), where h is an extension of h that maps the variables z on which h is undefined to fresh marked nulls. If d is an EGD, if h(xi) (or h(xj)) is a marked null, a chase step replaces in I every occurrence of h(xi) with h(xj) (or h(xj) with h(xi)). If neither h(xi) nor h(xj) is a marked null and h(xi)h(xj), then the chase fails.

A standard chase sequence starting at I0 is a sequence of successful chase steps I0d1,h1I1d2,h2 that is fair: for all i0, for each dependency d and active trigger h of d in Ii, some ji must exist such that h is no longer an active trigger of d in Ij. The result of a (possibly infinite) chase sequence is i0jiIj [3]. A chase sequence is terminating if it ends with In and InΓ, in which case In is the result of the chase sequence. The standard chase is non-deterministic: depending on the order of firing, the chase sequence can be different. Different chase sequences can even differ on whether they terminate.

The Skolem chase, discussed in [18], differs from the standard chase in several ways. It first skolemizes each TGD d:λ(x,y)z.ρ(x,z1,,zk) to λ(x,y)ρ(x,fz1d(x),,fzkd(x)), where each fzjd is an uninterpreted function from Dom|x| to Dom. The result of the Skolem chase, denoted as SklCh(Γ,I), is the least fixpoint of the immediate consequence operator (ICO) of the Skolemized TGDs. Note that the Skolem chase does not directly handle EGDs but uses a technique called singularization [18] to simulate EGDs with TGDs.

4.2 Reducing the Skolem chase to equality saturation

In this section, we show how to reduce the Skolem chase to EqSat. We only consider TGDs, since in the Skolem chase, EGDs are modeled as TGDs using singularization [18].

We show an encoding where there exists a simple mapping from E-graphs to database instances, defined by

ξ(G)={R(c1,,ck)(G)R is a relation symbol in 𝒮}

such that, given a set of dependencies Γ, running EqSat on an encoded term rewriting system from Γ corresponds to running the Skolem chase on the set of dependencies via ξ. Intuitively, given an E-grpah, ξ collects every term that corresponds to a tuple from the language of G. An illustration of ξ is shown in Figure 3.

Theorem 28.

Given a database schema 𝒮=(R1,,Rm), a set of TGDs Γ, and an initial database I, it is possible to define a signature Σ, a term rewriting system over Σ, and an initial term t such that

ξ(EqSat(,t))=SklCh(Γ,I).

Moreover, the Skolem chase terminates if and only if equality saturation terminates.

The intuition for the construction is that we can uniformly treat relational atoms as E-nodes contained in a special E-class, and Skolem functions naturally correspond to terms in EqSat. More specifically,

  • Add symbols {,(,)} to the signature Σ. Add rewrite r:(,) to . Let the initial term t be .

  • Add every Skolem function symbol to Σ, and for every n-ary relational symbol R𝒮, add a n-ary function symbol to Σ, and add rewrite rule rR:R(x1,,xn).

  • For every Skolemized TGD

    d:R1(x1,y1)Rn(xn,yn)R1(x1,fzd1)Rm(xm,fzdm),

    replace the conjunctions in the head and body with nested applications of and :

    rd:(R1(x1,y1),((Rn(xn,yn),)))(R1(x1,fzd1),((Rm(xm,fzdm),)))

    and add rd to .

  • For each constant c in the input database I, add a nullary function symbol c to Σ. For each tuple t=R(c1,,cn) in the input database I, add rewrite rt:R(c1,,cn).

Figure 3: Mapping results of encoded EqSat back to database instances.

Proof of 28.

See the full version.

4.3 Reducing equality saturation to the standard chase

We show how to reduce equality saturation to the standard chase. The encoding itself is straightforward. However, the standard chase is non-deterministic and can have different chase sequences, so a natural question is what kind of the chase sequence will converge finitely, given that EqSat terminates, and vice versa. We show that as long as the chase sequence applies EGDs frequent enough, the chase sequence will always converge. We capture this notion as EGD-fairness.

Definition 29.

Given a database schema 𝒮, a set of dependencies Γ over 𝒮, and an initial database I0. We call a chase sequence I0,I1, of Γ and I EGD-fair if for every i, either Ii is a model of Γ and the chase terminates, or there exists some j>i such that Ij is a model of the EGD subset of Γ.

Given that EqSat terminates, what can we say about chase sequences that are not EGD-fair? In fact, such chase sequences may not terminate. Despite this, it can be shown that the result of such chase sequences, terminating or not, is isomorphic to the result of equality saturation (when encoded as a database). On the other hand, to show that equality saturation terminates, it is sufficient to show an arbitrary chase sequence terminates.

The following theorem shows the connection between EqSat and the standard chase.

Theorem 30.

Given signature Σ, a set of rewrite rules over Σ, and an initial E-graph G, it is possible to define a relational schema 𝒮, a set of dependencies Γ over 𝒮, and an initial database I over 𝒮. The following three statements are equivalent:

  1. 1.

    Equality saturation terminates for and t.

  2. 2.

    There exists a terminating chase sequence of the standard chase for Γ and I.

  3. 3.

    All EGD-fair chase sequences of the standard chase terminate for Γ and I.

Moreover, denote the result of an arbitrary chase sequence as I. If equality saturation terminates, I is isomorphic to the database encoding the resulting E-graph of EqSat(,G).

The encoding consists of two steps. First, we can encode an E-graph as a database. Second, we encode the match/apply operator and congruence closure operator as a set of TGDs and EGDs. To encode an E-graph as a database:

  • Take the domain Dom to be the set of all E-classes, which are treated as marked nulls.

  • For every function symbol f of arity n, add relation symbol Rf of arity n+1 to 𝒮.

  • For every E-node f(c1,,cn)c, add a tuple Rf(c1,,cn,c) to the database I.

Under this encoding, each E-class is treated as a marked null, and each E-node is treated as a tuple.

The encoding of the match/apply operator and congruence closure operator is plain:

  • For every function symbol f of arity n, add a functional dependency Rf(x1,,xn,x)Rf(x1,,xn,x)x=x to Γ.

  • For every rewrite rule lhsrhs in , flatten the left- and right-hand side into conjunctions of relational atoms, unify the variable denoting the root node of lhs with that of rhs, and add existential quantifiers to the head accordingly. For example, rule f(f(x,y),z)f(x,f(y,z)) is flattened into Rf(x,y,w1)Rf(w1,z,r)w2,Rf(x,y,w2)Rf(w2,v,r).

    There are two corner cases to the above translations. First, if lhs is a single variable x, we need to introduce n rules of the form Rf(y1,,yk,x), one for each function symbol, to “ground” x. For instance, suppose Σ={f(,),g()}, rewrite rule xg(x) is flattened into two dependencies: Rf(y1,y2,x)Rg(x,x) and Rg(y1,x)Rg(x,x). Second, in the case that the right-hand side is a single variable x, we need to add an EGD instead of a TGD. For example, rule f(x,y)x is encoded as an EGD Rf(x,y,r)x=r.

Proof of 30.

See the full version.

5 The termination theorems of equality saturation

Finally, we present our main results here.

Theorem 31 (Single-instance termination).

The following problem is R.E.-complete:

  • Instance: A term rewriting system R, a term t.

  • Question: Does EqSat terminate with R and t?

Theorem 32 (All-term-instance termination).

The following problem is Π2-complete:

  • Instance: A term rewriting system R.

  • Question: Does EqSat terminate with R and t for all terms t?

While 31 follows immediately from the fact the the Skolem chase is undecidable, our proof in the full version is based on Narendran et al. [21], which allows us to also show 32. We encode a Turing machine as a term rewriting system with the property that the congruence classes of initial configurations corresponds to traces of running such configurations, and that EqSat terminates if and only if congruence class is finite. For the all-term-instance case, we then show that the congruence class of an arbitrary term is infinite if and only if the congruence class of an initial configuration is. The actual proof is slightly more involved so we refer the reader to the full version for more details.

The technique above does not apply to the all-E-graph-instance case, however. The all-E-graph-instance termination can be thought of as having inputs both a term and a set of ground identities, and we have no control over the latter. Still, we are able to prove that this problem is undecidable by a reduction from the Post correspondence problem (see the full version), while the exact upper bound is unknown.

Theorem 33 (All-E-graph-instance termination).

The following problem is undecidable:

  • Instance: A term rewriting system R.

  • Question: Does EqSat terminate with R and G for all E-graphs G?

6 Weak term acyclity for equality saturation termination

(a) 34.
(b) 35.
Figure 4: Example weak term dependency graphs. Special edges are marked with .

We can adapt the classic weak acyclicity criterion [8], which is used to show the termination of the chase algorithm, to equality saturation. The adapted criterion, which we call weak term acyclicity, is more powerful than simply translating EqSat rules to TGDs/EGDs and applying weak acyclicity. We demonstrate weak term acyclicity with two examples, and the full definition can be found at the full version.

Example 34.

Consider ={f(f(x,y),z)g(f(z,x))}. This ruleset is weakly term acyclic, with the weak term dependency graph shown in 4(a). Note however if we derive the dependencies Γ using the method in Sec. 4.3 from , Γ is not weakly acyclic.

Example 35.

Consider ={g(f(x1,y1),f(z1,x1))g(z1,f(y1,x1)),g(x2,y2)h(y2,g(y2,x2))}. This ruleset is weakly term acyclic. Its weak term dependency graph is shown in 4(b).

7 Conclusion

We have presented a semantic foundation for E-graphs and EqSat: We identified E-graphs as reachable and deterministic tree automata and defined the result of EqSat as the least fixpoint according to E-graph homomoprhisms. We defined the universal model of E-graphs and showed the fixpoint EqSat produces is the universal model (19). We showed several basic properties about E-graphs, including a finite convergence lemma (26). We then established connections between EqSat and the chase in both directions (Sec. 4) and characterize chase sequences that correspond to EqSat with EGD-fairness (29). Our main results are on the terminations of EqSat in three cases: single-instance, all-term-instance, and all-E-graph-instance. Finally, adapting ideas from weak acyclicity for the chase, we defined weak term acyclicity which implies EqSat termination.

The correspondence between EqSat and the chase established in this paper may help further port the rich results of database theory to EqSat, as the current paper only scratches the surface of the deep literatures of the chase. Another direction is to use our better understanding of EqSat to design more efficient and expressive EqSat tools and better support downstream applications of EqSat. Finally, many problems about EqSat are still open. For example, the exact upper bound of the all-E-graph-instance termination is not known. Other problems include rule scheduling, evaluation algorithm, and E-graph extraction.

References

  • [1] Franz Baader and Tobias Nipkow. Term Rewriting and All That. Cambridge University Press, USA, 1999.
  • [2] Leo Bachmair, Ashish Tiwari, and Laurent Vigneron. Abstract congruence closure. J. Autom. Reason., 31(2):129–168, 2003. doi:10.1023/B:JARS.0000009518.26415.49.
  • [3] Michael Benedikt, George Konstantinidis, Giansalvatore Mecca, Boris Motik, Paolo Papotti, Donatello Santoro, and Efthymia Tsamoura. Benchmarking the chase. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’17, pages 37–52, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3034786.3034796.
  • [4] Samuel Coward, George A. Constantinides, and Theo Drane. Automating constraint-aware datapath optimization using e-graphs. 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2023. URL: https://api.semanticscholar.org/CorpusID:257353847.
  • [5] Alin Deutsch, Alan Nash, and Jeff Remmel. The chase revisited. In Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’08, pages 149–158, New York, NY, USA, 2008. Association for Computing Machinery. doi:10.1145/1376916.1376938.
  • [6] RisingLight developers. RisingLight: An Educational OLAP Database System, December 2022. URL: https://github.com/risinglightdb/risinglight.
  • [7] Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. Variations on the common subexpression problem. Journal of the ACM, 27(4):758–771, October 1980. doi:10.1145/322217.322228.
  • [8] Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89–124, 2005. Database Theory. doi:10.1016/j.tcs.2004.10.033.
  • [9] Thomas Genet. Termination criteria for tree automata completion. Journal of Logical and Algebraic Methods in Programming, 85(1, Part 1):3–33, 2016. Rewriting Logic and its Applications. doi:10.1016/j.jlamp.2015.05.003.
  • [10] Tomasz Gogacz and Jerzy Marcinkowski. All–instances termination of chase is undecidable. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming, pages 293–304, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg. doi:10.1007/978-3-662-43951-7_25.
  • [11] Gösta Grahne and Adrian Onet. Anatomy of the chase. Fundam. Informaticae, 157(3):221–270, 2018. doi:10.3233/FI-2018-1627.
  • [12] Sumit Gulwani, Ashish Tiwari, and George C. Necula. Join algorithms for the theory of uninterpreted functions. In Kamal Lodaya and Meena Mahajan, editors, FSTTCS 2004: Foundations of Software Technology and Theoretical Computer Science, pages 311–323, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
  • [13] Pavol Hell and Jaroslav Nesetril. Images of rigid digraphs. Eur. J. Comb., 12(1):33–42, 1991. doi:10.1016/S0195-6698(13)80005-4.
  • [14] Rajeev Joshi, Greg Nelson, and Keith Randall. Denali: A goal-directed superoptimizer. SIGPLAN Not., 37(5):304–314, May 2002. doi:10.1145/543552.512566.
  • [15] Donald E. Knuth. A generalization of dijkstra’s algorithm. Inf. Process. Lett., 6(1):1–5, 1977. doi:10.1016/0020-0190(77)90002-3.
  • [16] Dexter Kozen. On the myhill-nerode theorem for trees. Bulletin of the EATCS, 47:170–173, 1992.
  • [17] Dexter Kozen. Partial Automata and Finitely Generated Congruences: An Extension of Nerode’s Theorem, pages 490–511. Birkhäuser Boston, Boston, MA, 1993. doi:10.1007/978-1-4612-0325-4_16.
  • [18] Bruno Marnette. Generalized schema-mappings: From termination to tractability. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’09, pages 13–22, New York, NY, USA, 2009. Association for Computing Machinery. doi:10.1145/1559795.1559799.
  • [19] Leonardo Moura and Nikolaj Bjørner. Efficient e-matching for smt solvers. In Proceedings of the 21st International Conference on Automated Deduction: Automated Deduction, CADE-21, pages 183–198, Berlin, Heidelberg, 2007. Springer-Verlag. doi:10.1007/978-3-540-73595-3_13.
  • [20] Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. Synthesizing structured CAD models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, pages 31–44, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3385412.3386012.
  • [21] Paliath Narendran, Colm Ó’Dúnlaing, and Heinrich Rolletschek. Complexity of certain decision problems about congruential languages. Journal of Computer and System Sciences, 30(3):343–358, 1985. doi:10.1016/0022-0000(85)90051-0.
  • [22] Charles Gregory Nelson. Techniques for Program Verification. PhD thesis, Stanford University, Stanford, CA, USA, 1980. AAI8011683.
  • [23] Hung Q. Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5–16, 2013. doi:10.1145/2590989.2590991.
  • [24] Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. Automatically improving accuracy for floating point expressions. SIGPLAN Not., 50(6):1–11, June 2015. doi:10.1145/2813885.2737959.
  • [25] Maximilian Schleich, Amir Shaikhha, and Dan Suciu. Optimizing tensor programs on flexible storage, 2022. doi:10.48550/arXiv.2210.06267.
  • [26] Wayne Snyder. A fast algorithm for generating reduced ground rewriting systems from a set of ground equations. J. Symb. Comput., 15(4):415–450, April 1993. doi:10.1006/jsco.1993.1029.
  • [27] Dan Suciu, Yisu Remy Wang, and Yihong Zhang. Semantic foundations of equality saturation, 2025. arXiv:2501.02413.
  • [28] Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: A new approach to optimization. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’09, pages 264–276, New York, NY, USA, 2009. ACM. doi:10.1145/1480881.1480915.
  • [29] Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. Vectorization for Digital Signal Processors via Equality Saturation, pages 874–886. Association for Computing Machinery, New York, NY, USA, 2021. doi:10.1145/3445814.3446707.
  • [30] Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. SPORES: Sum-product optimization via relational equality saturation for large scale linear algebra. Proceedings of the VLDB Endowment, 2020.
  • [31] Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q Ngo, Reinhard Pichler, and Dan Suciu. Optimizing recursive queries with program synthesis. arXiv preprint arXiv:2202.10390, 2022. arXiv:2202.10390.
  • [32] Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. Egg: Fast and extensible equality saturation. Proc. ACM Program. Lang., 5(POPL), January 2021. doi:10.1145/3434304.
  • [33] Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. Equality saturation for tensor graph superoptimization. In Proceedings of Machine Learning and Systems, 2021. arXiv:2101.01332.
  • [34] Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. Better together: Unifying datalog and equality saturation. Proc. ACM Program. Lang., 7(PLDI), June 2023. doi:10.1145/3591239.
  • [35] Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. Relational e-matching. Proc. ACM Program. Lang., 6(POPL), January 2022. doi:10.1145/3498696.