Abstract 1 Introduction 2 Preliminaries 3 Tractability for One-Dimensional Training Data on Well-Behaved Classes 4 Hardness for One-Dimensional Training Data 5 PAC Learning in Higher Dimensions 6 Consistent Learning in Higher Dimensions 7 Hardness of Checking Consistency in Higher Dimensions 8 Conclusion References

The Parameterized Complexity of Learning Monadic Second-Order Logic

Steffen van Bergerem ORCID Humboldt-Universität zu Berlin, Germany Martin Grohe ORCID RWTH Aachen University, Germany Nina Runde ORCID RWTH Aachen University, Germany
Abstract

Within the model-theoretic framework for supervised learning introduced by Grohe and Turán (TOCS 2004), we study the parameterized complexity of learning concepts definable in monadic second-order logic (MSO). We show that the problem of learning an MSO-definable concept from a training sequence of labeled examples is fixed-parameter tractable on graphs of bounded clique-width, and that it is hard for the parameterized complexity class para-NP on general graphs.

It turns out that an important distinction to be made is between 1-dimensional and higher-dimensional concepts, where the instances of a k-dimensional concept are k-tuples of vertices of a graph. For the higher-dimensional case, we give a learning algorithm that is fixed-parameter tractable in the size of the graph, but not in the size of the training sequence, and we give a hardness result showing that this is optimal. By comparison, in the 1-dimensional case, we obtain an algorithm that is fixed-parameter tractable in both.

Keywords and phrases:
monadic second-order definable concept learning, agnostic probably approximately correct learning, parameterized complexity, clique-width, fixed-parameter tractable, Boolean classification, supervised learning, monadic second-order logic
Funding:
Steffen van Bergerem: This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 431183758 (gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 431183758).
Martin Grohe: Funded by the European Union (ERC, SymSim, 101054974). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
Nina Runde: This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 453349072 (gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 453349072).
Copyright and License:
[Uncaptioned image] © Steffen van Bergerem, Martin Grohe, and Nina Runde; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Logic
; Theory of computation Complexity theory and logic ; Theory of computation Fixed parameter tractability ; Computing methodologies Logical and relational learning ; Computing methodologies Supervised learning
Related Version:
Full Version: https://arxiv.org/abs/2309.10489 [10]
Editors:
Jörg Endrullis and Sylvain Schmitz

1 Introduction

We study abstract machine-learning problems in a logical framework with a declarative view on learning, where the (logical) specification of concepts is separated from the choice of specific machine-learning models and algorithms (such as neural networks). Here we are concerned with the computational complexity of learning problems in this logical learning framework, that is, the descriptive complexity of learning [8].

Specifically, we consider Boolean classification problems that can be specified in monadic second-order logic (MSO). The input elements for the classification task come from a set 𝕏, the instance space. A classifier on 𝕏 is a function c:𝕏{+,}. Given a training sequence S of labeled examples (x,λ)𝕏×{+,}, we want to find a classifier, called a hypothesis, that explains the labels given in S and that can also be used to predict the labels of elements from 𝕏 not given as examples. In the logical setting, the instance space 𝕏 is a set of tuples from a (relational) structure, called the background structure, and classifiers are described by formulas of some logic, in our case MSO, using parameters from the background structure. This model-theoretic learning framework was introduced by Grohe and Turán [33] and further studied in [30, 32, 31, 7, 9, 11, 8].

We study these problems within the following well-known settings from computational learning theory. In the consistent-learning model, the examples are assumed to be generated using an unknown classifier, the target concept, from a known concept class. The task is to find a hypothesis that is consistent with the training sequence S, i.e. a function h:𝕏{+,} such that h(x)=λ for all (x,λ)S. In Haussler’s model of agnostic probably approximately correct (PAC) learning [35], a generalization of Valiant’s PAC learning model [50], an (unknown) probability distribution 𝒟 on 𝕏×{+,} is assumed, and training examples are drawn independently from this distribution. The goal is to find a hypothesis that generalizes well, i.e. one is interested in algorithms that return with high probability a hypothesis with a small expected error on new instances drawn from the same distribution. For more background on PAC learning, we refer to [37, 41, 46]. In both settings, we require our algorithms to return a hypothesis from a predefined hypothesis class.

Our Contributions

In this paper, we study the parameterized complexity of the consistent-learning problem MSO-Consistent-Learn and the PAC-learning problem MSO-PAC-Learn. In both problems, we are given a graph G (the background structure) and a sequence of labeled training examples of the form (v¯,λ), where v¯ is a k-tuple of vertices from G and λ{+,}. The goal is to find a hypothesis of the form hφ,w¯ for an MSO formula φ(x¯;y¯) and a tuple w¯ with hφ,w¯(v¯)+ if Gφ(v¯;w¯) and hφ,w¯(v¯) otherwise. For MSO-Consistent-Learn, this hypothesis should be consistent with the given training examples. For MSO-PAC-Learn, the hypothesis should generalize well. We restrict the complexity of allowed hypotheses by giving a bound q on the quantifier rank of φ and a bound on the length of w¯. Both q and as well as the dimension k of the problem, that is, the length of the tuples to classify, are part of the parameterization of the problems. A detailed description of MSO-Consistent-Learn is given in Section 3. The problem MSO-PAC-Learn is formally introduced in Section 5.

Example 1.1.

Assume we are given the graph G depicted in Figure 1, the training sequence S=((v1,+),(v3,+),(v4,),(v5,)), and k=1, =1, q=3. Note that k=1 indicates that the instances are vertices of the input graph G. Furthermore, =1 indicates that the specification may involve one vertex of the input graph as a parameter. Finally, q=3 indicates that the formula specifying the hypothesis must have quantifier rank at most 3.

Our choice of a hypothesis h:V(G){+,} consistent with S says that “there is a bipartite partition of the graph such that all positive instances x are on the same side as v2 and all negative examples are on the other side.” This hypothesis can be formally specified in MSO as hφ,w¯ for the MSO formula φ(x;y)=Z(ψbipartite(Z)Z(x)Z(y)) and parameter setting w¯=(v2), where ψbipartite(Z)=z1z2(E(z1,z2)¬(Z(z1)Z(z2))).

Figure 1: Graph G for Example 1.1. Positive examples are shown in purple, and negative examples are shown in orange.

For the 1-dimensional case of MSO-Consistent-Learn, called 1D-MSO-Consistent- Learn, [31, 30] gave algorithms that are sublinear in the background structures after a linear-time pre-processing stage for the case that the background structure is a string or a tree. This directly implies that 1D-MSO-Consistent-Learn can be solved in time f(,q)n for some function f, that is, in fixed-parameter linear time, if the background structure is a string or a tree. Here n is the size of the background structure and ,q are the parameters of the learning problem described above. We generalize the results to labeled graphs of bounded clique-width. Graphs of clique-width c can be described by a c-expression, that is, an expression in a certain graph grammar that only uses c labels (see Section 2.1 for details). In our algorithmic results for graphs of bounded clique-width, we always assume that the graphs are given in the form of a c-expression. We treat c as just another parameter of our algorithms. By the results of Oum and Seymour [44], we can always compute a 2𝒪(c)-expression for a graph of clique-width c by a fixed-parameter tractable algorithm.

Theorem 1.2.

Let 𝒞 be a class of labeled graphs of bounded clique-width. Then 1D-MSO- Consistent-Learn is fixed-parameter linear on 𝒞.

Since graphs of bounded tree-width also have bounded clique-width, our result directly implies fixed-parameter linearity on graph classes of bounded tree-width.

Our proof for Theorem 1.2 relies on the model-checking techniques due to Courcelle, Makowsky, and Rotics for graph classes of bounded clique-width [23]. To make use of them, we encode the training examples into the graph as new labels. While this construction works for k=1, it fails for higher dimensions if there are too many examples to encode.

As far as we are aware, all previous results for learning MSO formulas are restricted to the one-dimensional case of the problem. We give the first results for k>1, presenting two different approaches that yield tractability results in higher dimensions.

As we discuss in Section 5, for the PAC-learning problem MSO-PAC-Learn in higher dimensions, we can restrict the number of examples to consider to a constant. In this way, we obtain fixed-parameter tractability results for learning MSO-definable concepts in higher dimensions, similar to results for first-order logic on nowhere dense classes [9, 8].

Theorem 1.3.

Let 𝒞 be a class of labeled graphs of bounded clique-width. Then MSO-PAC- Learn is fixed-parameter linear on 𝒞.

In the second approach to higher-dimensional tractability, and as the main result of this paper, we show in Section 6 that a consistent hypothesis can be learned on graphs of bounded clique-width with a quadratic running time in terms of the size of the graph.

Theorem 1.4.

There is a function g:5 such that, for a Λ-labeled graph G of clique-width cw(G)c and a training sequence S of size |S|=m, the problem MSO-Consistent- Learn can be solved in time

𝒪((m+1)g(c,|Λ|,q,k,)|V(G)|2).

While this is not strictly a fixed-parameter tractability result, since we usually do not consider m to be part of the parameterization, we show in Section 7 that this bound is optimal. Technically, this result is much more challenging than Theorems 1.3 and 1.2. While we still use an overall dynamic-programming strategy that involves computing MSO types, here we need to consider MSO types over sequences of tuples. The number of such sequence types is not constantly bounded, but exponential in the length of the sequence. The core of our argument is to prove that the number of relevant types can be polynomially bounded. This fundamentally distinguishes our approach from typical MSO/automata arguments, where types are from a bounded set (and they correspond to the states of a finite automaton).

Lastly, we study MSO-Consistent-Learn on arbitrary classes of labeled graphs. Analogously to the hardness of learning FO-definable concepts and the relation to the FO-model-checking problem discussed in [9], we are interested specifically in the relation of MSO-Consistent-Learn to the MSO-model-checking problem MSO-Mc. We show that MSO-Mc can already be reduced to the 1-dimensional case of MSO-Consistent-Learn, even with a training sequence of size two. This yields the following hardness result that we prove in Section 4.

Theorem 1.5.

1D-MSO-Consistent-Learn is para-NP-hard under fpt Turing reductions.

Related Work

The model-theoretic learning framework studied in this paper was introduced in [33]. There, the authors give information-theoretic learnability results for hypothesis classes that can be defined using first-order and monadic second-order logic on restricted classes of structures.

Algorithmic aspects of the framework were first studied in [32], where it was proved that concepts definable in first-order logic can be learned in time polynomial in the degree of the background structure and the number of labeled examples the algorithm receives as input, independently of the size of the background structure. This was generalized to first-order logic with counting [7] and with weight aggregation [11]. On structures of polylogarithmic degree, the results yield learning algorithms running in time sublinear in the size of the background structure. It was shown in [31, 7] that sublinear-time learning is no longer possible if the degree is unrestricted. To address this issue, in [31], it was proposed to introduce a preprocessing phase where, before seeing any labeled examples, the background structure is converted to a data structure that supports sublinear-time learning later. This model was applied to monadic second-order logic on strings [31] and trees [30].

The parameterized complexity of learning first-order logic was first studied in [9]. Via a reduction from the model-checking problem, the authors show that on arbitrary relational structures, learning hypotheses definable in FO is AW[]-hard. In contrast to this, they show that the problem is fixed-parameter tractable on nowhere dense graph classes. This result has been extended to nowhere dense structure classes in [8]. Although not stated as fpt results, the results in [31, 30] yield fixed-parameter tractability for learning MSO-definable concepts on strings and trees if the problem is restricted to the 1-dimensional case where the tuples to classify are single vertices.

The logical learning framework is related to, but different from the framework of inductive logic programming (see, e. g., [21, 42, 43]), which may be viewed as the classical logic-learning framework. In the database literature, there are various approaches to learning queries from examples [6, 5, 34, 36, 38, 13, 49, 1, 2, 14, 48, 17]. Many of these are concerned with active learning scenarios, whereas we are in a statistical learning setting. Moreover, most of the results are concerned with conjunctive queries or queries outside the relational database model, whereas we focus on monadic second-order logic. Another related subject in the database literature is the problem of learning schema mappings from examples [3, 15, 18, 19, 29]. In formal verification, related logical learning frameworks [20, 25, 28, 40, 52] have been studied as well. In algorithmic learning theory, related works study the parameterized complexity of several learning problems [4, 39] including, quite recently, learning propositional CNF and DNF formulas and learning solutions to graph problems in the PAC setting [16].

2 Preliminaries

We let denote the set of non-negative integers. For m,n, we let [m,n]{mn} and [n][1,n]. For a set V, we let 2V{VVV}.

2.1 Clique-Width

In this paper, graphs are always undirected and simple (no loops or parallel edges); we view them as {E}-structures for a binary relation symbol E, and we denote the set of vertices of a graph G by V(G). A label set is a set Λ of unary relation symbols, and a Λ-graph or Λ-labeled graph is the expansion of a graph to the vocabulary {E}Λ. A labeled graph is a Λ-graph for any label set Λ.

In the following, we define expressions to represent labeled graphs. A base graph is a labeled graph of order 1. For every base graph G, we introduce a base expression β that represents G. Moreover, we have the following operations.

Disjoint union:

For disjoint Λ-graphs G1,G2, we define G1G2 to be the union of G1 and G2. If G1 and G2 are not disjoint, then G1G2 is undefined.

Adding edges:

For a Λ-graph G and unary relation symbols P,QΛ with PQ, we let ηP,Q(G) be the Λ-graph obtained from G by adding an edge between every pair of distinct vertices vP(G), wQ(G). That is, E(ηP,Q(G))E(G){(v,w),(w,v)|vP(G),wQ(G),vw}.

Relabeling:

For a Λ-graph G and unary relation symbols P,QΛ with PQ, we let ρP,Q(G) be the Λ-graph obtained from G by relabeling all vertices in P by Q, that is, V(ρP,Q(G))V(G), P(ρP,Q(G)), Q(ρP,Q(G))Q(G)P(G), and R(ρP,Q(G))R(G) for all RΛ{P,Q}.

Deleting labels:

For a Λ-graph G and a unary relation symbol PΛ, we let δP(G) be the restriction of G to Λ{P}, that is, the (Λ{P})-graph obtained from G by removing the relation P(G).

We also introduce a modification of the disjoint-union operator, namely the ordered-disjoint-union operator <, which is used in Section 6 to simplify notations.

Ordered disjoint union:

To introduce this operator, we need two distinguished unary relation symbols P1< and P2<. For disjoint Λ-graphs G1,G2, where we assume P1<,P2<Λ, we let G1<G2 be the (Λ{P1<,P2<})-expansion of the disjoint union G1G2 with P1<(G1<G2)V(G1) and P2<(G1<G2)V(G2). By deleting the relations P1<,P2< immediately after introducing them in an ordered disjoint union, we can simulate a standard disjoint-union by an ordered disjoint union, that is, G1G2=δP1<(δP2<(G1<G2)).

A Λ-expression is a term formed from base expressions β, whose label set is a subset of Λ, using unary operators ηP,Q, ρP,Q, δP for P,QΛ with PQ, and the binary operator . We require Λ-expressions to be well-formed, that is, all base expressions represent base graphs with mutually distinct vertices, and the label sets fit the operators.

Every Λ-expression Ξ describes a Λ-graph GΞ for some ΛΛ. Note that there is a one-to-one correspondence between the base expressions in Ξ and the vertices of GΞ. Actually, we may simply identify the vertices of GΞ with the base expressions in Ξ. We let VΞV(GΞ) be the set of these base expressions. We may then view an expression Ξ as a tree where VΞ is the set of leaves of this tree. We let |Ξ| be the number of nodes of the tree. We have |GΞ|=|VΞ||Ξ|. In general, we cannot bound |Ξ| in terms of |GΞ|, but for every Λ-expression Ξ, we can find a Λ-expression Ξ such that GΞ=GΞ and |Ξ|𝒪(|Λ2||GΞ|).

Each subexpression Ξ of Ξ describes a labeled graph GΞ on a subset VΞVΞ consisting of all base expressions in Ξ. Note that, in general, GΞ is not a subgraph of GΞ.

For c, a c-expression is a Λ-expression for a label set Λ of size |Λ|=c. It is easy to see that every labeled graph of order n is described by an n-expression. The clique-width cw(G) of a (labeled) graph G is the least c such that G is described by a c-expression.

We remark that our notion of clique-width differs slightly from the one given by Courcelle and Olariu [24], since we allow vertices to have multiple labels, and we also allow the deletion of labels. Thus, our definition is similar to the definition of multi-clique-width [27]. However, for our algorithmic results, the definitions are equivalent, since we have cw(G)cw(G) and cw(G)2𝒪(cw(G)) for every (labeled) graph G, where cw is the notion of clique-width from [24].

Lemma 2.1 ([44]).

For a graph G with n vertices and clique-width ccw(G), there is an algorithm that outputs a c-expression for G where c=23c+21. The algorithm has a running time of 𝒪(n9logn).

2.2 Monadic Second-Order Logic

We consider monadic second-order (MSO) logic, which is a fragment of second-order logic where we only quantify over unary relations (sets). In MSO, we consider two kinds of free variables, which we call set variables (uppercase X,Y,Xi) and individual variables (lowercase x,y,xi). The quantifier rank qr(φ) of a formula φ is the nesting depth of its quantifiers.

Let τ be a relational vocabulary and q. By MSO(τ,q), we denote the set of all MSO formulas of quantifier rank at most q using only relation symbols in τ, and we let MSO(τ)qMSO(τ,q). By MSO(τ,q,k,s), we denote the set of all MSO(τ,q) formulas with free individual variables in {x1,,xk} and free set variables in {X1,,Xs}. In particular, MSO(τ,q,0,0) denotes the set of sentences. Moreover, it will be convenient to separate the free individual variables into instance variables (x1,x2,) and parameter variables (y1,y2,). For this, we let MSO(τ,q,k,,s) denote the set of all MSO(τ,q) formulas with free instance variables in {x1,,xk}, free parameter variables in {y1,,y}, and free set variables in {X1,,Xs}. Furthermore, we write φ(x¯,y¯,X¯) to denote that the formula φ has its free instance variables among the entries of x¯, its free parameter variables among the entries y¯, and its free set variables among the entries X¯.

We normalize formulas such that the set of normalized formulas in MSO(τ,q,k,,s) is finite, and there is an algorithm that, given an arbitrary formula in MSO(τ,q,k,,s), decides if the formula is normalized, and if not, computes an equivalent normalized formula. In the following, we assume that all formulas are normalized.

In this paper, all structures we consider will be labeled graphs for some label set Λ. In notations such as MSO(τ,), it will be convenient to write MSO(Λ,) if τ={E}Λ. For a Λ-labeled graph G and a tuple v¯(V(G))k, the q-type of v¯ in G is the set 𝗍𝗉qG(v¯) of all formulas φ(x¯)MSO(Λ,q,k,0) such that Gφ(v¯).

2.3 VC Dimension

For q,k,, a formula φ(x¯,y¯)MSO(Λ,q,k,,0), a Λ-labeled graph G, and a tuple w¯(V(G)), we let

φ(G,w¯){v¯(V(G))k|Gφ(v¯,w¯)}.

For a set X(V(G))k, we let

Hφ(G,X){Xφ(G,w¯)|w¯(V(G))}.

We say that X is shattered by φ if Hφ(G,X)=2X. The VC dimension VC(φ,G) of φ on G is the maximum d such that there is a set XV(G)k of cardinality |X|=d that is shattered by φ. In this paper, we are only interested in finite graphs, but for infinite G, we let VC(φ,G) if the maximum does not exist. For a class 𝒞 of Λ-labeled graphs, the VC dimension of φ over 𝒞, VC(φ,𝒞), is the least d such that VC(φ,G)d for all G𝒞 if such a d exists, and otherwise.

Lemma 2.2 ([33, Theorem 17]).

There is a function g:5 such that the following holds. Let Λ be a label set, let 𝒞 be the class of all Λ-graphs of clique-width at most c, and let q,k,. Then VC(φ,𝒞)g(c,|Λ|,q,k,) for all φMSO(Λ,q,k,,0).

2.4 Parameterized Complexity

A parameterization κ is a function mapping the input x of a problem to a natural number κ(x). An algorithm 𝔸 is an fpt algorithm with respect to κ if there is a computable function f: and a polynomial p such that for every input x the running time of 𝔸 is at most f(κ(x))p(|x|).

A parameterized problem is a tuple (Q,κ). We say (Q,κ)FPT or (Q,κ) is fixed-parameter tractable if there is an fpt algorithm with respect to κ for Q, and we say (Q,κ) is fixed-parameter linear if the polynomial in the running time of the fpt algorithm is linear. We say (Q,κ)para-NP if there is a nondeterministic fpt algorithm with respect to κ for Q. If the parameterization is clear from the context, then we omit it.

For two parameterized problems (Q,κ),(Q,κ), an fpt Turing reduction from (Q,κ) to (Q,κ) is an algorithm 𝔸 with oracle access to Q such that 𝔸 decides Q, 𝔸 is an fpt algorithm with respect to κ, and there is a computable function g: such that on input x, κ(x)g((κ(x)) for all oracle queries with oracle input x.

For additional background on parameterized complexity, we refer to [26].

3 Tractability for One-Dimensional Training Data on Well-Behaved Classes

We start by formalizing the parameterized version of the problem MSO-Consistent-Learn described in the introduction. For a training sequence S, a graph G, and a hypothesis hφ,w¯, we say hφ,w¯ is consistent with S on G if for every positive example (v¯,+)S, we have Gφ(v¯,w¯), and for every negative example (v¯,)S, we have G⊧̸φ(v¯,w¯).

MSO-Consistent-Learn
Instance: Λ-labeled graph G, q,k,, training sequence S(V(G)k×{+,})m
Parameter: κ|Λ|+q+k+
Problem: Return a hypothesis hφ,w¯ consisting of
a formula φMSO(Λ,q,k,,0) and a parameter setting w¯V(G) such that hφ,w¯ is consistent with the training sequence S on G, if such a hypothesis exists. Reject if there is no consistent hypothesis.

The problem 1D-MSO-Consistent-Learn refers to the 1-dimensional version of the problem MSO-Consistent-Learn where the arity k of the training examples is 1. The tractability results for 1D-MSO-Consistent-Learn are significantly more straightforward than those for the higher-dimensional problem. This is due to the fact that the full training sequence can be encoded into the graph by only adding two new labels, and a parameter setting can be encoded with more new labels.

As discussed in Section 2.2, there is a function f:4 such that |MSO(Λ,q,k,,0)|f(|Λ|,q,k,). Therefore, to solve 1D-MSO-Consistent-Learn, we can iterate over all formulas φMSO(Λ,q,k,,0) and focus on finding a parameter setting w¯V(G) such that hφ,w¯ is consistent with S on G. Moreover, if the model-checking problem on a graph class with additional labels is tractable, then finding a consistent parameter setting is tractable as well by performing model checking on the graph with the encoded training sequence.

Lemma 3.1.

Let 𝒞 be a class of labeled graphs, let 𝒞i be the class of all extensions of graphs from 𝒞 by i additional labels for all i, let f: be a function, and let c such that the MSO-model-checking problem on 𝒞i can be solved in time f(|φ|)|V(G)|c for all i, where φ is the MSO sentence and G𝒞i is the labeled graph given as input. There is a function g:3 such that 1D-MSO-Consistent-Learn can be solved on 𝒞 in time g(|Λ|,q,)|V(G)|c+1.

The formal proof of this result can be found in the full version [10]. If the input graph is given in the form of a c-expression, then the MSO model-checking problem is fixed-parameter linear on classes of bounded clique-width [23]. Therefore, Lemma 3.1 implies that there is a function g:4 such that 1D-MSO-Consistent-Learn can be solved in time g(cw(G),|Λ|,q,)|G|2.

Theorem 1.2 improves this bound for classes of graphs of bounded clique-width even further, showing that the problem 1D-MSO-Consistent-Learn can be solved in time linear in the size of the graph. This can be done by again encoding the training sequence into the graph, but then extracting a consistent parameter setting directly, following techniques similar to the ones used by Courcelle and Seese for the corresponding model-checking problem on graphs of bounded clique-width. The full proof of Theorem 1.2 can be found in [10].

Since graphs of tree-width ct have a clique-width of at most 32ct1 [22], Theorem 1.2 implies that for classes of graphs of bounded tree-width, 1D-MSO-Consistent-Learn is fixed-parameter linear as well. Moreover, although all background structures we consider in this paper are labeled graphs, we remark that the result for classes of bounded tree-width also holds on arbitrary relational structures and a corresponding version of 1D-MSO-Consistent-Learn.

4 Hardness for One-Dimensional Training Data

Previously, we restricted the input graph of the MSO-learning problem to certain well-behaved classes. Now, we consider the problem MSO-Consistent-Learn without any restrictions. Van Bergerem, Grohe, and Ritzert showed in [9] that there is a close relation between first-order model checking (FO-Mc) and learning first-order formulas. The fpt-reduction in [9] from model checking to learning yields AW[]-hardness for learning first-order formulas on classes of structures that are not nowhere dense. It is simple to show (and not surprising) that MSO-Consistent-Learn is at least as hard as FO-Mc. The more interesting question is whether MSO-Consistent-Learn is at least as hard as the model-checking problem for MSO sentences (MSO-Mc), which is defined as follows.

MSO-Mc
Instance: Λ-labeled graph G, MSO(Λ) sentence φ
Parameter: |φ|
Problem: Decide whether Gφ holds.

We give a positive answer, which even holds for the MSO-learning problem with only one-dimensional training data where we restrict the training sequence to contain at most two training examples.

Lemma 4.1.

The model-checking problem MSO-Mc is fpt Turing reducible to 1D-MSO-Consistent-Learn where we restrict the training sequence S given as input to have length at most 2.

MSO-Mc is para-NP-hard under fpt Turing reductions as even for some fixed sentence φ, the corresponding model-checking problem can be NP-hard (for example for a formula defining 3-Colorability, see [26] for details). Hence, Lemma 4.1 proves Theorem 1.5. We give a proof sketch for Lemma 4.1. The full proof can be found in [10].

Proof sketch of Lemma 4.1.

We describe an fpt algorithm solving MSO-Mc using access to a 1D-MSO-Consistent-Learn oracle. Let G be a Λ-labeled graph, and let φ be an MSO(Λ) sentence. We decide whether Gφ holds recursively by decomposing the input formula. While handling negation and Boolean connectives is easy, the crucial part of the computation is handling quantification. Thus, we assume that φ=xψ or φ=Xψ for some MSO formula ψ. For both types of quantifiers, we use the 1D-MSO-Consistent-Learn oracle to identify a small set of candidate vertices or sets such that ψ holds for any vertex or set if and only if it holds for any of the identified candidates. Then, since the number of candidates will only depend on |ψ|, we can check recursively whether ψ holds for any of them and thereby decide MSO-Mc with an fpt algorithm.

More specifically, using the 1D-MSO-Consistent-Learn oracle, we partition the vertices and sets based on their qr(ψ)-type, and we only check one candidate for each class of the partition. Intuitively, for every pair of vertices, we call the oracle with one vertex as a positive example and the other vertex as a negative example. The oracle returns a hypothesis if and only if the types of the two vertices differ. When partitioning the sets of vertices, we encode the two sets to check into the graph before calling the oracle. Moreover, instead of calling the oracle for every pair of sets (which would lead to a running time that is exponential in the size of the graph), we group the sets based on their size, start by finding a small family of candidate sets of size 1, and we use these candidates iteratively to find a small family of sets with one additional vertex. With this technique, the number of oracle calls is only quadratic in the size of the graph.

5 PAC Learning in Higher Dimensions

So far, we considered the consistent-learning setting, where the goal is to return a hypothesis that is consistent with the given examples. In this section, we study the MSO-learning problem in the agnostic PAC-learning setting. There, for an instance space 𝕏, we assume an (unknown) probability distribution 𝒟 on 𝕏×{+,}. The learner’s goal is to find a hypothesis h:𝕏{+,}, using an oracle to draw training examples randomly from 𝒟, such that h (approximately) minimizes the generalization error

err𝒟(h)Pr(x,λ)𝒟(h(x)λ).

For every Λ-labeled graph G and q,k,, let q,k,(G) be the hypothesis class

q,k,(G){hφ,w¯|φMSO(Λ,q,k,,0),w¯(V(G))}.

Formally, we define the MSO PAC-learning problem as follows.

MSO-PAC-Learn
Instance: Λ-labeled graph G, numbers k,,q, δ,ε(0,1), oracle access to probability distribution 𝒟 on (V(G))k×{+,}
Parameter: κ|Λ|+k++q+1/δ+1/ε
Problem: Return a hypothesis hφ,w¯q,k,(G) such that, with probability of at least 1δ over the choice of examples drawn i.i.d. from 𝒟, it holds that
err𝒟(hφ,w¯)minhq,k,(G)err𝒟(h)+ε.

The remainder of this section is dedicated to the proof of Theorem 1.3, that is, we want to show that MSO-PAC-Learn is fixed-parameter linear on classes of bounded clique-width when the input graph is given as a c-expression. To solve the problem algorithmically, we can follow the Empirical Risk Minimization (ERM) rule [51, 46], that is, our algorithm should minimize the training error (or empirical risk)

errS(h)1|S||{(v¯,λ)Sh(v¯)λ}|

on the training sequence S of queried examples. Roughly speaking, an algorithm can solve MSO-PAC-Learn by querying a sufficient number of examples and then following the ERM rule. To bound the number of needed examples, we combine a fundamental result of statistical learning [12, 46], which bounds the number of needed examples in terms of the VC dimension of a hypothesis class, with Lemma 2.2, a result due to Grohe and Turán [33], which bounds the VC dimension of MSO-definable hypothesis classes on graphs of bounded clique-width. Together, they imply the following result. See the full version [10] for details.

Lemma 5.1.

There is a computable function m:5×(0,1)2 such that any algorithm that proceeds as follows solves the problem MSO-PAC-Learn. Given a Λ-labeled graph G of clique-width at most c, numbers k,,q, δ,ε(0,1), and oracle access to a probability distribution 𝒟 on (V(G))k×{+,}, the algorithm queries at least m(c,|Λ|,q,k,,δ,ε) many examples from 𝒟 and then follows the ERM rule.

Using this lemma, we can now give a proof sketch for Theorem 1.3, showing that MSO-PAC-Learn is fixed-parameter linear on classes of bounded clique-width if the input graph is given as a c-expression, even for dimensions k>1. The full proof can be found in [10].

Proof sketch of Theorem 1.3.

Let G be a Λ-labeled graph, let k,,q, δ,ε(0,1), and assume we are given oracle access to a probability distribution 𝒟 on (V(G))k×{+,}. Moreover, let c and let Ξ be a c-expression given as input that describes G.

Let sm(c,|Λ|,q,k,,δ,ε), where m is the function from Lemma 5.1. We sample s examples from 𝒟 and call the resulting sequence of training examples S. Then, for every subsequence S of S, we make use of the techniques in Section 3 (adapted to higher-dimensional training data) and compute a hypothesis hSq,k, that is consistent with S if such a hypothesis exists. This can be computed by an fpt-algorithm with parameters c,|Λ|,k,,q,δ, and ε. Finally, from all subsequences with a consistent hypothesis, we choose a subsequence S of maximum length and return hS. Note that this procedure minimizes the training error on the training sequence S, i. e., errS(hS)=minhq,k,errS(h). Hence, the procedure follows the ERM rule, and, by Lemma 5.1, it solves MSO-PAC-Learn. Since the number of subsequences to check can be bounded by 2s, all in all, the described procedure is an fpt-algorithm that solves MSO-PAC-Learn.

6 Consistent Learning in Higher Dimensions

In this section, we consider the problem MSO-Consistent-Learn with dimension k>1 on labeled graphs of bounded clique-width. A brute-force attempt yields a solution in time g(c,|Λ|,q,k,)(V(G))+1m, where m is the length of the training sequence, for some g:5. This is achieved by iterating over all formulas, then iterating over all parameter assignments, and then performing model checking for each training example. We assume that the graph G is considerably larger in scale than the sequence of training examples S. Therefore, Theorem 1.4 significantly improves the running time to 𝒪((m+1)g(c,|Λ|,q,k,)|V(G)|2). While Theorem 1.4 is not a fixed-parameter tractability result in the classical sense, we show that this is optimal in Section 7. The present section is dedicated to the proof of Theorem 1.4.

Until now, we have viewed the training sequence as a sequence of tuples S((V(G))k×{+,})m. In the following, it is useful to split the training sequence into two parts, a sequence of vertex tuples 𝒶((V(G))k)m and a function σ:[m]{+,} which assigns the corresponding label to each tuple. Let G be a Λ-labeled graph, 𝒶=(v¯1,,v¯m)((V(G))k)m, σ:[m]{+,}, and φ(x¯,y¯)MSO(Λ,q,k,,0). We call (G,𝒶,σ) φ-consistent if there is a parameter setting w¯(V(G)) such that for all i[m], Gφ(v¯i,w¯)σ(v¯i)=+. We say that w¯ is a φ-witness for (G,𝒶,σ). This notation allows us to state the main technical ingredient for the proof of Theorem 1.4 as follows.

Theorem 6.1.

There is a computable function g:5 and an algorithm that, given a Λ-graph G of clique-width cw(G)c, a sequence 𝒶=(v¯1,,v¯m)((V(G))k)m, a function σ:[m]{+,}, and a formula φ(x¯,y¯)MSO(Λ,q,k,,0), decides if (G,𝒶,σ) is φ-consistent in time (m+1)g(c,|Λ|,q,k,)|G|.

Using Theorem 6.1, we can now prove Theorem 1.4.

Proof of Theorem 1.4.

Given a Λ-labeled graph G and a training sequence S=((v¯1,λ1),, (v¯m,λm))((V(G))k×{+,})m, we let 𝒶(v¯1,,v¯m)((V(G))k)m and σ:[m]{+,},iλi. We iterate over all formulas φMSO(Λ,q,k,,0) and use Theorem 6.1 to check whether (G,𝒶,σ) is φ-consistent. If there is no φ such that (G,𝒶,σ) is φ-consistent, then we reject the input. Otherwise, let φMSO(Λ,q,k,,0) be such that (G,𝒶,σ) is φ-consistent. We compute a φ-witness following the same construction as in the proof of Lemma 3.1. That is, using a fresh label, we encode the parameter choice of a single variable into the graph, and then we check whether consistency still holds for the corresponding formula φ that enforces this parameter choice. In total, we perform up to |V(G)| such consistency checks to compute a φ-witness w¯. The consistent formula φ together with the φ-witness w¯ can then be returned as a hypothesis hφ,w¯ that is consistent with S on G and therefore a solution to MSO-Consistent-Learn.

The remainder of this section is dedicated to the proof of Theorem 6.1. We start by introducing the formal definitions for types and sets of types over sequences of elements.

6.1 Type Definitions

Let G be a Λ-labeled graph and v¯(V(G))k. Recall that the q-type of v¯ in G is the set 𝗍𝗉qG(v¯) of all formulas φ(x¯)MSO(Λ,q,k,0) such that Gφ(v¯). A (Λ,q,k)-type is a set θMSO(Λ,q,k,0) such that, for each φMSO(Λ,q,k,0), either φθ or ¬φθ. We denote the set of all (Λ,q,k)-types by 𝖳𝗉(Λ,q,k). Note that 𝗍𝗉qG(v¯)𝖳𝗉(Λ,q,k). For a type θ𝖳𝗉(Λ,q,k), we write Gθ(v¯) if Gφ(v¯) for all φ(x¯)θ. Observe that Gθ(v¯)𝗍𝗉qG(v¯)=θ. We say that a type θ𝖳𝗉(Λ,q,k) is realizable if there is some Λ-labeled graph G and tuple v¯(V(G))k such that θ=𝗍𝗉qG(v¯). We are not particularly interested in types that are not realizable, but it is undecidable if a type θ is realizable, whereas the sets 𝖳𝗉(Λ,q,k) are decidable. (More precisely, there is an algorithm that, given Λ,q,k and a set θ of formulas, decides if θ𝖳𝗉(Λ,q,k).) For a (Λ,q,k)-type θ and a Λ-labeled graph G, we let

θ(G){v¯(V(G))k|Gθ(v¯)}.

If θ(G), we say that θ is realizable in G.

As for formulas, we split the variables for types into two parts, so we consider (Λ,q,k,)-types θMSO(Λ,q,k,,0), and we denote the set of all these types by 𝖳𝗉(Λ,q,k,). For a Λ-labeled graph G and tuples v¯(V(G))k,w¯(V(G)), we often think of 𝗍𝗉qG(v¯,w¯) as the q-type of w¯ over v¯ in G. Moreover, we let

θ(v¯,G){w¯(V(G))Gθ(v¯,w¯)}.

If θ(v¯,G), we say that θ is realizable over v¯ in G.

For a vector k¯=(k1,,km)m and a set V, we let Vk¯ be the set of all sequences 𝒶=(v¯1,,v¯m) of tuples v¯iVki. Let G be a labeled graph, 𝒶=(v¯1,,v¯m)Vk¯ for some k¯m, and w¯(V(A)). We define the q-type of w¯ over 𝒶 in G to be the tuple

𝗍𝗉qG(𝒶,w¯)(𝗍𝗉qG(v¯1,w¯),,𝗍𝗉qG(v¯m,w¯)).

Again, we need an “abstract” notion of type over a sequence. A (Λ,q,k¯,)-type for a tuple k¯=(k1,,km)m is an element of

𝖳𝗉(Λ,q,k¯,)i=1m𝖳𝗉(Λ,q,ki,).

Let θ¯=(θ1,,θm)𝖳𝗉(Λ,q,k¯,). For a labeled graph G, a sequence 𝒶=(v¯1,,v¯m)(V(G))k¯, and a tuple w¯(V(G)), we write Gθ¯(𝒶,w¯) if Gθi(v¯i,w¯) for all i[m]. Note that Gθ¯(𝒶,w¯)𝗍𝗉qG(𝒶,w¯)=θ¯. For a type θ¯𝖳𝗉(Λ,q,k¯,), a Λ-labeled graph G, and a sequence 𝒶(V(G))k¯, we let

θ¯(𝒶,G){w¯(V(G))|Gθ(𝒶,w¯)}.

If θ¯(𝒶,G), we say that θ¯ is realizable over 𝒶 in G.

6.2 Computing the Realizable Types

For the proof of Theorem 6.1, we use the following result that allows us to compute the realizable types of an expression.

Lemma 6.2.

There is a computable function f:4 and an algorithm that, given c,q,k,,m, a vector k¯=(k1,,km)m with kik for all i[m], a Λ-expression Ξ with |Λ|c, and a sequence 𝒶(VΞ)k¯, computes the set of all θ¯𝖳𝗉(Λ,q,k¯,) that are realizable over 𝒶 in GΞ in time

𝒪((m+1)f(c,q,k,)|Ξ|).

Before proving this result, we first show that it implies Theorem 6.1.

Proof of Theorem 6.1.

We assume that the input graph G is given as a c-expression. To check whether (G,𝒶,σ) is φ-consistent, we compute the set of all θ¯𝖳𝗉(Λ,q,k¯,) that are realizable over 𝒶 in G, using Lemma 6.2. Then, for each θ¯=(θ1,,θm) we check if φθiσ(i)=+. If we find such a θ¯, then (G,𝒶,σ) is φ-consistent; otherwise it is not.

It remains to prove Lemma 6.2. In a bottom-up algorithm, starting at the leaves of Ξ, we compute the set of all tuples θ¯=(θ1,,θm) of (Λ,q,ki,)-types θi that are realizable over 𝒶 in G. This is possible because the realizable tuples of types of an expression can be computed from the tuples of types of its subexpressions. We formally prove this for the operators ηP,Q, ρP,Q, δP, and < in the full version [10].

The difficulty with this approach is that we are talking about m-tuples of types. In general, the number of such tuples is exponential in m, and hence the size of the set we aim to compute could be exponentially large. Fortunately, this does not happen in graphs of bounded clique-width. By Lemma 2.2, we can bound the VC dimension of a first-order formula over classes of graphs of bounded clique-width. Further, we show in Lemma 6.3 that this suffices to give a polynomial bound for the number of realizable tuples.

Lemma 6.3.

Let d,q,k,, let t|𝖳𝗉(Λ,q,k,)|, and let G be a Λ-labeled graph such that VC(φ,G)d for all φMSO(Λ,q,k,,0). Let 𝒶(V(G))k¯ for some k¯{0,,k}m. Then at most (k+1)g(d,m)t types in 𝖳𝗉(Λ,q,k¯,) are realizable over 𝒶 in G.

The proof of Lemma 6.3 is based on the Sauer–Shelah Lemma [45, 47]. See [10] for proof details. Based on this, we can now prove Lemma 6.2.

Proof of Lemma 6.2.

As argued in Section 2, we may assume that Ξ only contains ordered-disjoint-union operators and no plain disjoint-union operators.

For every subexpression Ξ, we let ΛΞ be the set of labels of Ξ, that is, the set of unary relation symbols such that GΞ is a ΛΞ-graph. Moreover, let 𝒶Ξ𝒶VΞ, and let k¯Ξm such that 𝒶Ξ(VΞ)k¯Ξ.

We inductively construct, for every subexpression Ξ of Ξ and 0, the set (Ξ) of all types θ¯𝖳𝗉(ΛΞ,q,k¯Ξ,) that are realizable over 𝒶Ξ in GΞ.

Case 1: Ξ is a base expression.

In this case, for each [], we can construct (Ξ) by brute force in time f1(c,q,k,)m for a suitable (computable) function f1. Let (k1,,km)k¯Ξ. We compute θi by iterating over all formulas φ with ki+ free variables and evaluating φ on the single vertex graph GΞ.

Case 2: Ξ=ηP,Q(Ξ′′).

Let 0, and let k¯=(k1,,km)k¯Ξ=k¯Ξ′′. As we show in the full version [10], there is a computable mapping Tη,P,Q:𝖳𝗉(ΛΞ)2𝖳𝗉(ΛΞ′′) such that (Ξ) is the set of all θ¯=(θ1,,θm)𝖳𝗉(ΛΞ,q,k¯,) such that there is a θ¯=(θ1,,θm)(Ξ′′) with θiTη,P,Q(θi) for all i[m]. Moreover, for every realizable θ𝖳𝗉(ΛΞ), we guarantee that there is at most one type θ𝖳𝗉(ΛΞ′′) such that θTη,P,Q(θ). To compute the set (Ξ), we step through all θ¯(Ξ′′). For each such θ¯=(θ1,,θm), for all i[m], we compute the unique θi𝖳𝗉(ΛΞ,q,ki,) such that θiTη,P,Q(θi). If for some i[m], no such θi exists, we move on to the next θ¯. Otherwise, we add θ¯=(θ1,,θm) to (Ξ).

Case 3: Ξ=ρP,Q(Ξ′′).

Analogous to Case 2, again based on results from the full version [10].

Case 4: Ξ=δP(Ξ′′).

Analogous to Case 2, based on results from [10].

Case 5: Ξ=Ξ1<Ξ2.

Let ΛΛΞ, VVΞ, k¯=(k1,,km)k¯Ξ, and 𝒶(v¯1,,v¯m)𝒶Ξ=𝒶V. For j=1,2, let ΛjΛΞj, VjVΞj, k¯j(kj1,,kjm)k¯Ξj, and for all i[m], let Kji[kji] such that v¯iVj=(v¯i)Kji. Let 0.

For all L1,L2[] such that L2=[]L1, we let L1,L2 be the set of all θ¯=(θ1,,θm)𝖳𝗉(Λ,q,k¯,) such that for j=1,2, we have

θ¯j(T,j,Lj,Kj1(θ1),,T,j,Lj,Kjm(θm))|Lj|(Ξj),

where T,j,Lj,Kji:𝖳𝗉(Λ)𝖳𝗉(Λj) for i[m] is a computable mapping that we give in the full version [10]. Then, as we also show in [10],

(Ξ)=L1[]L2=[]L1L1,L2.

To compute L1,L2, we iterate over all θ¯1=(θ11,,θ1m)|L1|(Ξ1). For all i[m] we compute the unique θi𝖳𝗉(Λ,q,ki,) such that T,1,L1,K1i(θi)=θ1i. If, for some i[m], no such θi exists, then we move on to the next θ¯1. Otherwise, we compute

θ¯2=(T,2,L2,K21(θ1),,T,2,L2,K2m(θm))

and check if θ¯2|L2|(Ξ2). If it is, we add θ¯ to L1,L2. Otherwise, we move on to the next θ¯1.

This completes the description of our algorithm. To analyze the running time, let

rmaxΞ|Ξ|,

where Ξ ranges over all subexpressions of Ξ. By Lemmas 2.2 and 6.3, there is a computable function f2:4 such that

r(m+1)f2(c,q,k,).

The running time of each step of the constructions can be bounded by f3(c,q,k,)r for a suitable computable function f3. We need to make |Ξ| steps. Thus, overall, we obtain the desired running time.

7 Hardness of Checking Consistency in Higher Dimensions

The following result shows, under the assumption FPTW[1], that Theorem 6.1 can not be improved to an fpt-result.

Theorem 7.1.

There is a q such that the following parameterized problem is W[1]-hard.

Instance: graph G of clique-width at most 2, sequence 𝒶=(a¯1,,a¯m)((V(G))2)m, function σ:[m]{+1,1}, formula φ(x¯,y¯)MSO(Λ,q,2,,0) Parameter: Problem: decide if (G,𝒶,σ) is φ-consistent.

Proof sketch.

We prove this by a reduction from the W[1]-complete weighted satisfiability problem for Boolean formulas in 2-conjunctive normal form [26]. The weight of an assignment to a set of Boolean variables is the number of variables set to 1.

WSat(2-CNF)
Instance: Boolean formula Φ in 2-CNF
Parameter:
Problem: decide if Φ has a satisfying assignment of weight .

Given a 2-CNF formula Φ=i=1m(Li,1Li,2) in the variables {X1,,Xn} and , we construct an instance (G,𝒶,σ,φ) of the consistency problem from Theorem 7.1 where the graph G is a forest that encodes Φ. Moreover, 𝒶, φ, and σ are chosen to verify that a φ-witness w¯(V(G)) for (G,𝒶,σ) encodes exactly variables among X1,,Xn that can be set to 1 to satisfy Φ. Hence, there is a φ-witness for (G,𝒶,σ) if and only if Φ has a satisfying assignment of weight . Details on the construction can be found in the full version [10].

8 Conclusion

Just like model checking and the associated counting and enumeration problems, the learning problem we study here is a natural algorithmic problem for logics on finite structures. All these problems are related, but each has its own challenges requiring different techniques. Where model checking and enumeration are motivated by automated verification and database systems, we view our work as part of a descriptive complexity theory of machine learning [8].

The first problem we studied is 1D-MSO-Consistent-Learn, where the instances to classify consist of single vertices, and we extended the previous fixed-parameter tractability results for strings and trees [31, 30] to (labeled) graphs of bounded clique-width. Moreover, on general graphs, we showed that the problem is hard for the complexity class para-NP.

For MSO-learning problems in higher dimensions, we presented two different approaches that yield tractability results on graphs of bounded clique-width. For the agnostic PAC-learning problem MSO-PAC-Learn, we described a fixed-parameter tractable learning algorithm. Furthermore, in the consistent-learning setting for higher dimensions, we gave an algorithm that solves the learning problem and is fixed-parameter tractable in the size of the input graph. However, the algorithm is not fixed-parameter tractable in the size of the training sequence, and we showed that this is optimal.

In the learning problems considered so far, hypotheses are built using MSO formulas and tuples of vertices as parameters. We think that the algorithms presented in this paper for the 1-dimensional case could also be extended to hypothesis classes that allow tuples of sets as parameters. Finally, utilizing the full power of MSO, one could also consider a learning problem where, instead of classifying tuples of vertices, we are interested in classifying sets of vertices. That is, for a graph G, we are given labeled subsets of V(G) and want to find a hypothesis h:2V(G){+,} that is consistent with the given examples. It is easy to see that the techniques used in our hardness result also apply to this modified problem, proving that it is para-NP-hard. However, it remains open whether our tractability results could also be lifted to this version of the problem.

References

  • [1] Azza Abouzied, Dana Angluin, Christos H. Papadimitriou, Joseph M. Hellerstein, and Avi Silberschatz. Learning and verifying quantified boolean queries by example. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, June 22–27, 2013, pages 49–60. ACM, 2013. doi:10.1145/2463664.2465220.
  • [2] Howard Aizenstein, Tibor Hegedüs, Lisa Hellerstein, and Leonard Pitt. Complexity theoretic hardness results for query learning. Comput. Complex., 7(1):19–53, 1998. doi:10.1007/PL00001593.
  • [3] Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan. Characterizing schema mappings via data examples. ACM Trans. Database Syst., 36(4):23:1–23:48, 2011. doi:10.1145/2043652.2043656.
  • [4] Vikraman Arvind, Johannes Köbler, and Wolfgang Lindner. Parameterized learnability of k-juntas and related problems. In Algorithmic Learning Theory, 18th International Conference, ALT 2007, Sendai, Japan, October 1–4, 2007, volume 4754 of Lecture Notes in Computer Science, pages 120–134. Springer, 2007. doi:10.1007/978-3-540-75225-7_13.
  • [5] Pablo Barceló, Alexander Baumgartner, Víctor Dalmau, and Benny Kimelfeld. Regularizing conjunctive features for classification. J. Comput. Syst. Sci., 119:97–124, 2021. doi:10.1016/j.jcss.2021.01.003.
  • [6] Pablo Barceló and Miguel Romero. The complexity of reverse engineering problems for conjunctive queries. In 20th International Conference on Database Theory, ICDT 2017, Venice, Italy, March 21–24, 2017, volume 68 of LIPIcs, pages 7:1–7:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ICDT.2017.7.
  • [7] Steffen van Bergerem. Learning concepts definable in first-order logic with counting. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, Vancouver, BC, Canada, June 24–27, 2019, pages 1–13. IEEE, 2019. doi:10.1109/LICS.2019.8785811.
  • [8] Steffen van Bergerem. Descriptive Complexity of Learning. PhD thesis, RWTH Aachen University, Germany, 2023. doi:10.18154/RWTH-2023-02554.
  • [9] Steffen van Bergerem, Martin Grohe, and Martin Ritzert. On the parameterized complexity of learning first-order logic. In PODS 2022: International Conference on Management of Data, Philadelphia, PA, USA, June 12–17, 2022, pages 337–346. ACM, 2022. doi:10.1145/3517804.3524151.
  • [10] Steffen van Bergerem, Martin Grohe, and Nina Runde. The parameterized complexity of learning monadic second-order logic, 2023. doi:10.48550/arXiv.2309.10489.
  • [11] Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, Ljubljana, Slovenia (Virtual Conference), January 25–28, 2021, volume 183 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CSL.2021.10.
  • [12] Anselm Blumer, A. Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(4):929–965, October 1989. doi:10.1145/76359.76371.
  • [13] Angela Bonifati, Radu Ciucanu, and Aurélien Lemay. Learning path queries on graph databases. In Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pages 109–120. OpenProceedings.org, 2015. doi:10.5441/002/edbt.2015.11.
  • [14] Angela Bonifati, Radu Ciucanu, and Slawek Staworko. Learning join queries from user examples. ACM Trans. Database Syst., 40(4):24:1–24:38, 2016. doi:10.1145/2818637.
  • [15] Angela Bonifati, Ugo Comignani, Emmanuel Coquery, and Romuald Thion. Interactive mapping specification with exemplar tuples. ACM Trans. Database Syst., 44(3):10:1–10:44, 2019. doi:10.1145/3321485.
  • [16] Cornelius Brand, Robert Ganian, and Kirill Simonov. A parameterized theory of PAC learning. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7–14, 2023, pages 6834–6841. AAAI Press, 2023. doi:10.1609/aaai.v37i6.25837.
  • [17] Balder ten Cate and Víctor Dalmau. Conjunctive queries: Unique characterizations and exact learnability. In 24th International Conference on Database Theory, ICDT 2021, Nicosia, Cyprus, March 23–26, 2021, volume 186 of LIPIcs, pages 9:1–9:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ICDT.2021.9.
  • [18] Balder ten Cate, Víctor Dalmau, and Phokion G. Kolaitis. Learning schema mappings. ACM Trans. Database Syst., 38(4):28:1–28:31, 2013. doi:10.1145/2539032.2539035.
  • [19] Balder ten Cate, Phokion G. Kolaitis, Kun Qian, and Wang-Chiew Tan. Active learning of GAV schema mappings. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10–15, 2018, pages 355–368. ACM, 2018. doi:10.1145/3196959.3196974.
  • [20] Adrien Champion, Tomoya Chiba, Naoki Kobayashi, and Ryosuke Sato. ICE-based refinement type discovery for higher-order functional programs. J. Autom. Reason., 64(7):1393–1418, 2020. doi:10.1007/s10817-020-09571-y.
  • [21] William W. Cohen and C. David Page Jr. Polynomial learnability and inductive logic programming: Methods and results. New Gener. Comput., 13(3&4):369–409, December 1995. doi:10.1007/BF03037231.
  • [22] Derek G. Corneil and Udi Rotics. On the relationship between clique-width and treewidth. SIAM J. Comput., 34(4):825–847, 2005. doi:10.1137/S0097539701385351.
  • [23] Bruno Courcelle, Johann A. Makowsky, and Udi Rotics. Linear time solvable optimization problems on graphs of bounded clique-width. Theory of Computing Systems, 33(2):125–150, 2000. doi:10.1007/s002249910009.
  • [24] Bruno Courcelle and Stephan Olariu. Upper bounds to the clique width of graphs. Discret. Appl. Math., 101(1-3):77–114, 2000. doi:10.1016/S0166-218X(99)00184-5.
  • [25] P. Ezudheen, Daniel Neider, Deepak D’Souza, Pranav Garg, and P. Madhusudan. Horn-ICE learning for synthesizing invariants and contracts. Proc. ACM Program. Lang., 2(OOPSLA):131:1–131:25, 2018. doi:10.1145/3276501.
  • [26] Jörg Flum and Martin Grohe. Parameterized complexity theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. doi:10.1007/3-540-29953-X.
  • [27] Martin Fürer. Multi-clique-width. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, Berkeley, CA, USA, January 9–11, 2017, volume 67 of LIPIcs, pages 14:1–14:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.ITCS.2017.14.
  • [28] Pranav Garg, Christof Löding, P. Madhusudan, and Daniel Neider. ICE: A robust framework for learning invariants. In Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18–22, 2014, volume 8559 of Lecture Notes in Computer Science, pages 69–87. Springer, 2014. doi:10.1007/978-3-319-08867-9_5.
  • [29] Georg Gottlob and Pierre Senellart. Schema mapping discovery from data instances. J. ACM, 57(2):6:1–6:37, 2010. doi:10.1145/1667053.1667055.
  • [30] Émilie Grienenberger and Martin Ritzert. Learning definable hypotheses on trees. In 22nd International Conference on Database Theory, ICDT 2019, Lisbon, Portugal, March 26–28, 2019, volume 127 of LIPIcs, pages 24:1–24:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.ICDT.2019.24.
  • [31] Martin Grohe, Christof Löding, and Martin Ritzert. Learning MSO-definable hypotheses on strings. In International Conference on Algorithmic Learning Theory, ALT 2017, Kyoto University, Kyoto, Japan, October 15–17, 2017, volume 76 of Proceedings of Machine Learning Research, pages 434–451. PMLR, October 2017. ISSN: 2640-3498. URL: https://proceedings.mlr.press/v76/grohe17a.html.
  • [32] Martin Grohe and Martin Ritzert. Learning first-order definable concepts over structures of small degree. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavík, Iceland, June 20–23, 2017, pages 1–12. IEEE, June 2017. doi:10.1109/LICS.2017.8005080.
  • [33] Martin Grohe and György Turán. Learnability and definability in trees and similar structures. Theory Comput. Syst., 37(1):193–220, January 2004. doi:10.1007/s00224-003-1112-8.
  • [34] David Haussler. Learning conjunctive concepts in structural domains. Mach. Learn., 4:7–40, 1989. doi:10.1007/BF00114802.
  • [35] David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf. Comput., 100(1):78–150, 1992. doi:10.1016/0890-5401(92)90010-D.
  • [36] Kouichi Hirata. On the hardness of learning acyclic conjunctive queries. In Algorithmic Learning Theory, 11th International Conference, ALT 2000, Sydney, Australia, December 11–13, 2000, volume 1968 of Lecture Notes in Computer Science, pages 238–251. Springer, 2000. doi:10.1007/3-540-40992-0_18.
  • [37] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994. URL: https://mitpress.mit.edu/books/introduction-computational-learning-theory.
  • [38] Benny Kimelfeld and Christopher Ré. A relational framework for classifier engineering. ACM Trans. Database Syst., 43(3):11:1–11:36, 2018. doi:10.1145/3268931.
  • [39] Yuanzhi Li and Yingyu Liang. Learning mixtures of linear regressions with nearly optimal complexity. In Conference On Learning Theory, COLT 2018, Stockholm, Sweden, July 6–9, 2018, volume 75 of Proceedings of Machine Learning Research, pages 1125–1144. PMLR, 2018. URL: http://proceedings.mlr.press/v75/li18b.html.
  • [40] Christof Löding, P. Madhusudan, and Daniel Neider. Abstract learning frameworks for synthesis. In Tools and Algorithms for the Construction and Analysis of Systems - 22nd International Conference, TACAS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2–8, 2016, volume 9636 of Lecture Notes in Computer Science, pages 167–185. Springer, 2016. doi:10.1007/978-3-662-49674-9_10.
  • [41] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press, 2nd edition, 2018.
  • [42] Stephen H. Muggleton. Inductive logic programming. New Gener. Comput., 8(4):295–318, February 1991. doi:10.1007/BF03037089.
  • [43] Stephen H. Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. J. Log. Program., 19/20:629–679, 1994. doi:10.1016/0743-1066(94)90035-3.
  • [44] Sang-il Oum and Paul D. Seymour. Approximating clique-width and branch-width. J. Comb. Theory, Ser. B, 96(4):514–528, 2006. doi:10.1016/j.jctb.2005.10.006.
  • [45] Norbert Sauer. On the density of families of sets. J. Comb. Theory, Ser. A, 13(1):145–147, 1972. doi:10.1016/0097-3165(72)90019-2.
  • [46] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, Cambridge, 2014. doi:10.1017/CBO9781107298019.
  • [47] Saharon Shelah. A combinatorial problem: stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41(1):247–261, 1972. doi:10.2140/pjm.1972.41.247.
  • [48] Robert H. Sloan, Balázs Szörényi, and György Turán. Learning Boolean functions with queries. In Boolean Models and Methods in Mathematics, Computer Science, and Engineering, pages 221–256. Cambridge University Press, 2010. doi:10.1017/cbo9780511780448.010.
  • [49] Slawek Staworko and Piotr Wieczorek. Learning twig and path queries. In 15th International Conference on Database Theory, ICDT 2012, Berlin, Germany, March 26–29, 2012, pages 140–154. ACM, 2012. doi:10.1145/2274576.2274592.
  • [50] Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, November 1984. doi:10.1145/1968.1972.
  • [51] Vladimir Vapnik. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems 4, [NIPS Conference, Denver, Colorado, USA, December 2–5, 1991], pages 831–838, 1991. URL: https://proceedings.neurips.cc/paper_files/paper/1991/file/ff4d5fbbafdf976cfdc032e3bde78de5-Paper.pdf.
  • [52] He Zhu, Stephen Magill, and Suresh Jagannathan. A data-driven CHC solver. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18–22, 2018, pages 707–721. ACM, 2018. doi:10.1145/3192366.3192416.