Abstract 1 Introduction 2 Framework 3 Syntax and Semantics of Weighted ASP 4 Propagating Weights 5 Discussion and Future Work References

Elements for Weighted Answer-Set Programming

Francisco Coelho111Corresponding Author ORCID NOVA-LINCS, University of Évora, Portugal Bruno Dinis ORCID CIMA, University of Évora, Portugal Dietmar Seipel ORCID Universität Würzburg, Germany Salvador Abreu ORCID NOVA-LINCS, University of Évora, Portugal
Abstract

logic programs, more specifically, answer-set programs, can be annotated with probabilities on facts to express uncertainty. We address the problem of propagating weight annotations on facts (e.g. probabilities) of an answer-set program to its stable models, and from there to events (defined as sets of atoms) in a dataset over the program’s domain.

We propose a novel approach which is algebraic in the sense that it relies on an equivalence relation over the set of events. Uncertainty is then described as polynomial expressions over variables.

We propagate the weight function in the space of models and events, rather than doing so within the syntax of the program. As evidence that our approach is sound, we show that certain facts behave as expected. Our approach allows us to investigate weight annotated programs and to determine how suitable a given one is for modeling a given dataset containing events. It’s core is illustrated by a running example and the encoding of a Bayesian network.

Keywords and phrases:
Answer-Set Programming, Stable Models, Probabilistic Logic Programming
Copyright and License:
[Uncaptioned image] © Francisco Coelho, Bruno Dinis, Dietmar Seipel, and Salvador Abreu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Program semantics
Related Version:
Full Version: https://arxiv.org/abs/2503.20849
Acknowledgements:
The authors are grateful to Lígia Henriques-Rodrigues, Matthias Knorr and Ricardo Gonçalves for valuable comments on a preliminary version of this paper, and Alice Martins for contributions on software development.
Funding:
This work is supported by UID/04516/NOVA Laboratory for Computer Science and Informatics (NOVA LINCS) with the financial support of FCT.IP.
Editors:
Jorge Baptista and José Barateiro

1 Introduction

Using a logic program (LP) to model and reason over a real world scenario is often difficult because of uncertainty underlying the problem being worked on. Classic LPs represent knowledge in precise and complete terms, which turns out to be problematic when the scenario is characterized by stochastic or observability factors. We aim to explore how answer-set programs plus weight annotated facts can lead to useful characterizations for this class of problems. To setup a working framework, we make the following assumption:

Assumption 1 (System Representation, Data and Propagation).

Consider a system whose states are partially observable (i.e. observations can miss some state information) or stochastic (i.e. observed values are affected by random noise). We assume that knowledge about such system features a formal specification including weighted facts and empirical data such that:

Representation

The system has a formal representation222We use “representation” instead of “model” to avoid confusion with the stable models of answer-set programs. in the form of a certain logic program; The program’s stable models correspond one-to-one with the system states.

Data

Data is a set of observations; a single observation (of the system states) results from a set of (boolean) sensors.

Propagation

The weights in facts are propagated to the stable models of the representation.

In this setting, data from observations can be used to estimate some parameteres used in the propagation process and, more importantly, to address the question of “How accurate is the representation of the system?”. Other probabilistic logic programming (PLP) systems such as Problog [7], P-log [4] or LPMLN [15], in line with Kifer and Subrahmanian [13], derive probability distributions from program syntax, limiting them to prior information. We question such methods and address stable model (SM) uncertainty by including parameters for unknown quantities, estimable with data. To frame this setting we extend our assumptions:

Assumption 2 (Sensor Representation and Events).

 

Sensors

The sensors of the system’s states are associated to some atoms in the representation;

Events

An event is a set of atoms from the representation.

Following the terminology set by Calimeri et al. [5], sensors activate atoms (a) whereas no activation is represented by default negation (naf) a. Negated atoms (¬a) work similarly. By assumption 2, events can represent hidden or faulty sensors, like {a,¬a,b,¬b,c,¬c}, where a and ¬a are activated, b is hidden, and ¬c is consistently activated. Not all events are observations, and some may not uniquely determine system states – how to associate events to stable models and, thus, to system states, is addressed in Section 4.

If we (i) omit the naf-literals; (ii) use x¯ to denote the classical ¬x; (iii) and use expressions like ab to denote sets of literals such as {a,b}, then the event {a,¬a,b,¬b,c,¬c} can be shortened to the equivalent form aa¯c¯. We follow the convention of denoting a model by “true” atoms (positive or negative), with “falsehood” resulting from default negation [11]. Models include atoms like a or b¯ but not literals a. Sensor input is represented using positive and negative atoms, considering different sensors or a single sensor yielding positive/negative values.

Weights, not probabilities, are used to define a weight function on SMs, extended to all events. The step from facts to SMs is non-deterministic, and parameters represent non-unique choices, estimable with data. ASPs, based on SMs semantics, use SAT solving [10, 1, 19] or top-down search [2, 3, 18]. distribution semantics (DS) [25, 23] extends logic programs with probabilistic reasoning. We are particularly interested in the following setting and application scenarios of such an extension to logic programs:

Partial observability

A system’s state can have hidden variables, not reported by the sensors.

Sensor error

Information gathered from the sensors can carry stochastic perturbations.

Representation induction

Combine representations and data to induce more accuraterepresentations.

Probabilistic tasks

Support common probabilistic tasks such as maximum a posteriori (MAP), maximum likelihood estimation (MLE) and Bayesian inference (BI) on the representation domain i.e. the set of all events.

Probabilistic tasks and simple examples
Maximum a posteriory (MAP)

MAP estimates model parameters by combining prior beliefs with data likelihood to find the most probable values:

θ^MAP=argmaxθP(θ|X)

For a biased coin with heads probability θ following a Beta distribution, observing 7 heads in 10 flips, the MAP estimate of θ maximizes the posterior distribution combining the prior Beta and the likelihood of the observations.

Maximum likelihood estimation (MLE)

MLE estimates model parameters by maximizing the likelihood of observed data:

θ^MLE=argmaxθP(X|θ)

For 7 heads in 10 flips, the MLE estimate of the probability of heads is θ=0.7.

Bayesian inference (BI)

Bayesian Inference updates hypothesis probabilities by combining prior beliefs with observed data to produce a new probability distribution.

P(θ|X)=P(X|θ)P(θ)P(X)

Bayesian Inference updates a Beta prior (α,β) with 7 heads and 3 tails from 10 flips, resulting in a Beta posterior (α+7,β+3).

The remainder of this article is structured as follows: Section 2 provides necessary context. In Section 3 we discuss the syntax and semantics of our proposed language for weighted answer-set programs (WASPs). We also define a weight function over total choices and address the issue of how to propagate these weights from facts to events, in Section 4. This method relies on an equivalence relation on the set of events. Furthermore, we express uncertainty by polynomial expressions over variables which depend on the total choices and on the stable models. By then the Partial Observability and Sensor Error points are addressed. An evidence that our approach is sound is given by Equation 28 where we show that replacing certain facts (with weight 1.0) by deterministic facts does not change the probability. Some final remarks and ideas for future developments including Representation Induction and the Probabilistic Tasks are presented in Section 5.

2 Framework

We start by refining assumptions 1 and 2 to set “representation” as an “ASP with weights”:

Assumption 3 (Representation by answer-set programs with weigths).

 

answer-set programs and weights

A representation of a system is an answer-set program that includes weighted facts.

A weighted fact (WF) or an annotated fact has the from “a:w” where a is an atom, w[0,1], and “derives” the disjunctive fact “aa¯”. A model may include either a or a¯ but never both.

Selecting one of a,a¯ for each WF in a program will lead to a total choice (TC) 333We use the term “choice” for historical reasons, e.g. see [6] even though not related to the usual “choice” elements, atoms or rules from e.g. [5]. Propagating weights from WFs to TCs is relatively straightforward (see Equation 6) but propagation to events requires a step trough the program’s stable models, addressed in Section 4.

About propagating weights from total choices

Propagating weights from TCs to SMs and events faces non-determinism, as seen in program P1 from ex. 4, where multiple SMs (ab, ac) result from a single TC (a) without clear weight assignment. We use algebraic variables to address this, allowing deterministic propagation. Variable values can be estimated from data, contributing to Representation Induction. Related works use credal sets [6] or Bayesian approaches [28] to handle uncertainty, while our method remains algebraic.

3 Syntax and Semantics of Weighted ASP

We start the formal definition of weighted answer-set program with the setup and discussion of a minimal syntax and semantics of propositional ASP, without variables, functors or relation symbols, but enough to illustrate our method to propagate weights from annotated facts to events. From now on “¬x” and “x¯” denote classical negation and “x” default negation.

Syntax

We slightly adapt Calimeri et al.’s approach [5]. Let 𝒜 be a finite set of symbols, the positive atoms. For a𝒜, the expressions a and ¬a (the later a negative atom, also denoted a¯) are (classical) atoms. If a is an atom, the expressions a and a are (naf-)literals. A rule is of the form

h1hnb1bm

where the hi are atoms and the bj are literals. The symbol “” separates the head from the body. A rule is a constraint444An “integrity constraint” in [5]. if n=0, normal if n=1, disjunctive if n>1, and a fact if m=0.

An answer-set program (ASP) is a set P of facts and other rules, denoted, resp. (P) and (P), or simply and . In a normal program all the rules are normal. Notice that programs with constraint or disjunctive rules can be converted into normal programs [9].

Semantics

The standard semantics of an ASP has a few different, but equivalent, definitions [17]. A common definition is as follows [11]: let P be a normal program. The Gelfond/Lifschitz reduct of P relative to the set X of atoms results from (i) deleting rules that contain a literal of the form p in the body with pX and then (ii) deleting the remaining literals of the form q from the bodies of the remaining rules. Now, M is a stable model (SM) of P if it is the minimal model of the reduct of P relative to M. We denote by 𝒮(P), or simply 𝒮, the set of stable models of the program P.

Evaluation without grounding

While the most common form to generate stable models is based on grounding, a different approach is the one supported by s(CASP), that evaluates ASP programs with function symbols and constraints without grounding, enabling human-readable explanations and addressing grounding-based solvers’ issues of exponential growth and unnecessary complete model computation.

WASPs and their Derived Programs

If a is an atom and w[0,1], a weighted fact (WF) (or a fact with weight annotation) is a:w. Notice that we have w[0,1] but w is interpreted as a balance between the choices a and a¯, and not a probability.

weighted answer-set programs (WASPs) extend ASPs by adding WFs. We denote the set of weighted facts of a program P by 𝒲(P), and 𝒜𝒲(P) the set of positive atoms in 𝒲. When possible we simplify notation as 𝒲,𝒜𝒲.

Our weighted answer-set programs definition is restricted to illustrate weight propagation from TCs to events, excluding logical variables, relation symbols, and functors. Weight annotations aren’t used on rule heads or disjunctions, but this doesn’t limit expressiveness:

α:wβ{γ:w,αβγ (1)

while annotated disjunctive facts

αβ:w{γ:w,αβγ,α¯γ¯,β¯γ¯. (2)
Derived program

The derived program of a WASP is the ASP obtained by replacing each weighted fact a:w by the derived disjunction aa¯. The stable models of a WASP program are the stable models of its derived program. So, we also denote the set of SMs of a (derived or) WASP program P by 𝒮(P) or 𝒮.

Events

An event of a program P is a set of atoms from P. We denote the set of events by (P) or simply . An event e which includes a set {x,x¯} is said to be inconsistent; otherwise it is consistent. The set of consistent events is denoted by 𝒞.

Example 4 (A simple weighted answer-set program).

Consider the following WASP :

P1={a:0.3,bca (3)

with weighted facts 𝒲={a:0.3}. The derived program is the ASP

P1={aa¯,bca, (4)

with SMs 𝒮={a¯,ab,ac}. The atoms are 𝒜={a,a¯,b,b¯,c,c¯} and the events are =2𝒜 5552X is the power set of X: A2XAX..

Total Choices and their Weights

A disjunctive head aa¯ in the derived program represents a single choice, either a or a¯. We define the set of total choices (TCs) of a set of atoms by the recursion

{𝒯()=,𝒯(Xa)=t𝒯(X)(ta)t𝒯(X)(ta¯) (5)

where X is a set of atoms and a is an atom. The total choices of a WASP P are the TCs of the positive atoms in it’s weighted facts: 𝒯(P)=𝒯(𝒜𝒲(P)). When possible we write simply 𝒯. Given a WASP, the weight of the total choice t𝒯 is given by the product

ω𝒯(t)=a:w𝒲,atw×a:w𝒲,a¯tw¯. (6)

Here w¯=1w, and we use the subscript in ω𝒯 to explicitly state that this function concerns total choices. Later we’ll use subscripts 𝒮, to deal with weight functions of stable models and events, ω𝒮,ω. Some stable models are entailed from some total choices while other SMs are entailed by other TCs. We write 𝒮(t) to represent the set of stable models entailed by the total choice t𝒯. Our goal can now be rephrased as to know how to propagate the weights of the program’s total choices, ω𝒯, in Equation 6 to the program’s events, ω to be defined later, in Equations 21a and 21b.

Propagation of weights

As a first step to propagate weight from total choices to events, consider the program P1 of Equation 3 and a possible propagation of ω𝒯:𝒯[0,1] from total choices to the stable models, ω𝒮:𝒮[0,1] (still informal, see Equation 16). It might seem straightforward, in ex. 4, to set ω𝒮(a¯)=0.7 but there is no explicit way to assign values to ω𝒮(ab) and ω𝒮(ac). We represent this non-determinist by a parameter θ as in

ω𝒮(ab) =0.3θ, (7)
ω𝒮(ac) =0.3(1θ)

to express our knowledge that ab and ac are models entailed from a specific choice and, simultaneously, the inherent non-determinism of that entailment. In general, it might be necessary to have several such parameters, each associated to a given stable model s (in Equation 7, s=ab in the first line and s=ac in the second line) and a total choice t (t=a above), so we write θs,t. Obviously, for reasonable θs,t, the total choice t must be a subset of the stable model s.

Unless we introduce some bias, such as θ=0.5 as in LPMLN [15], the values for θs,t can’t be determined just with the information given in the program. But it might be estimated with the help of further information, such as an empirical distribution from a dataset. Further discussion of this point is outside the scope of this paper.

Now consider the program

{a:0.3,bab (8)

that has a single SM, a¯. Since the weights are not interpreted as probabilities, there is no need to have the sum on the stable models equal to 1. So the weights in the TCs of Equation 8 only set

ω𝒮(a¯)=0.7.

In this case, if we were to derive a probability of the SMs, normalization would give P(a¯)=1.0.

Also facts without annotations can be transformed into facts with weight 1:

aa:1.0. (9)

The method that we are proposing does not follow the framework of Kifer and Subrahmanian [13] and others, where the syntax of the program determines the propagation from probabilities explicitly set either in facts or other elements of the program. Our approach requires that we consider the semantics, i.e. the stable models of the program, independently of the syntax that provided them. From there we propagate weights to the program’s events and then, if required, normalization provides the final probabilities. Moreover, we allow the occurrence of variables in the weights, in order to deal with the non-determinism that results from the non-uniqueness of SMs entailed from a single TC. These variables can be later estimated from available data.

Related Approaches and Systems

The core problem of setting a semantics for probabilistic logic programs, the propagation of probabilities from total choices to stable models in the case of ASP or to other types in other logic programming systems (e.g. to possible worlds in Problog) has been studied for some time [13, 25]. For example, the credal set approach [6], defines P𝒯 in a way similar to Equation 6 but then, for a𝒜,t𝒯, the probability P(a|t) is unknown but bounded by P¯(a|t) and P¯(a|t), that can be explicitly estimated from the program.

Problog [8, 28] extends Prolog with probabilistic facts so that a program specifies a probability distribution over possible worlds. A world is a model of TR where T is a total choice and R the set of rules of a program. The semantics is only defined for sound programs [24] i.e., programs for which each possible total choice T leads to a well-founded model that is two-valued or total. The probability of a possible world that is a model of the program is the probability of the total choice. Otherwise the probability is 0 [24, 27]. Another system, based on Markov Logic [22], is LPMLN [15, 16], whose models result from weighted rules of the form abn where a is disjunction of atoms, b is conjunction of atoms and n is constructed from atoms using conjunction, disjunction and negation. For each model there is a unique maximal set of rules that are satisfied by it and the respective weights determine the weight of that model, that can be normalized to a probability.

Towards Propagating Weights from Total Choices to Events

The program P1 in Equation 3 from ex. 4 showcases the problem of propagating weights from total choices to stable models and then to events. The main issue arises from the lack of information in the program on how to assign un-biased weights to the stable models. This becomes crucial in situations where multiple stable models result from a single total choice.

Our assumptions 1, 2, and 3 enunciate that a WASP program represents a system; the states of that system, which are partially observable and stochastic, are associated to the program’s stable models; and state observations are encoded as events, i.e. sets of atoms of the program. Then:

  1. 1.

    With a weight set for the stable models, we extend it to any event in the program domain.

  2. 2.

    If statistical knowledge is available, it’s considered external and doesn’t affect the propagation procedure.

  3. 3.

    However, that knowledge can be used to estimate the parameters θs,t and to “score” the program.

  4. 4.

    If a program is one of many candidates, its score can be used as fitness by algorithms searching for optimal programs in a dataset.

  5. 5.

    If events are not consistent with the program, then we ought to conclude that the program is wrong and must be changed accordingly.

We address propagating weights from stable models to events using parameters like θ. This defines a function μ that can be normalized to set probabilities P, ensuring consistent probabilistic reasoning with the ASP program.

4 Propagating Weights

Figure 1: This diagram shows events related to the stable models of program P1. Circle nodes are total choices, shaded nodes are SMs. Solid lines show SM relations, dashed lines show subset/superset relations. The set of events in all SMs, Λ, is {λ} because a¯abac=.

The diagram in Figure 1 shows the challenge of propagating weights from total choices to stable models and then to general events, where node values depend on neighbours. This leads to interpretation issues, such as assigning unexplained values to nodes like bc. We propose basing propagation on the event’s relation to stable models instead.

4.1 An Equivalence Relation

Figure 2: Classes of (consistent) events related to the stable models of P1 are defined through sub-/super-set relations. In this picture we can see, for example, that {c¯ab,ab,b} and {a,abc} are part of different classes, represented by different fillings. As before, the circle nodes are total choices and shaded nodes are stable models. Notice that bc is not in a filled area.
Figure 3: This diagram shows the lattice of stable cores from ex. 4. Nodes are stable cores derived from stable models, plus the inconsistent class. The bottom node is the independent events class, with no relation to SMs. The top node represents events related to all SMs (consequences). Shaded nodes are SMs.

Our approach to propagating weights views stable models as prime factors or “irreducible events”. Events are considered based on their relation to SMs. In ex. 4, a relates to ab and ac, while c only relates to ac. Thus, a and c relate to different SMs, but a and abc relate to the same SMs. We formalize this relation. The stable core (SC) of the event e is

e:={s𝒮|sees} (10)

where 𝒮 is the set of stable models.

Notice that the minimality of stable models implies that either e is a stable model or at least one of s(se),s(es) is false i.e., no stable model contains another. We now define an equivalence relation so that two events are related if either both are inconsistent or both are consistent and, in the latter case, with the same stable core.

Definition 5 (Equivalence Relation on Events).

For a given program, let u,v. The equivalence relation uv is defined by

u,v𝒞(u,v𝒞u=v). (11)

This equivalence relation defines a partition on the set of events, where each class holds a unique relation with the stable models. In particular we denote each class by:

[e]={:=𝒞if e𝒞,{u𝒞|u=e}if e𝒞. (12)

where denotes the set 𝒞 of inconsistent events, i.e. events that contain {x,x¯} for some atom x.

Let λ be the empty set event (notice that λ=) 666We adopt the notation “λ” for empty word, from formal languages, to distinguish “” from “”., and Λ the consequence class of (consistent) events related with all the stable models. Then

[λ]=Λ. (13)

The combinations of stable models, i.e. the stable cores, together with the set of inconsistent events () forms a set of representatives for the equivalence relation . Since all events within a consistent equivalence class have the same stable core, we are interested in functions (including weight assignments), that are constant within classes. A function f:Y, where Y is any set, is said to be coherent if

eu[e](f(u)=f(e)). (14)

Considering coherent functions, in the specific case of Equation 3, instead of dealing with the 26=64 events, we need to consider only the 23+1=9 classes, well defined in terms of combinations of the stable models, to define coherent functions. In general, a program with n atoms and m stable models has 22n events and 2m+1 stable cores – but easily mn.

4.2 From Total Choices to Events

The “propagation” phase, traced by Equation 6 and Equations 1621b, starts with the weight of total choices, ω𝒯(t), propagates it to the stable models, ω𝒮(s), and then, within the equivalence relation from Equation 11, to a coherent weight of events, ω(e). So we are specifying a sequence of functions

ω𝒯ω𝒮ωω (15)

on successive larger domains

𝒯𝒮[]

such that the last function (ω) is a finite coherent function on the set of events and thus, as a final step, it can easily be used to define a probability distribution of events by normalization:

ωP.

total choices and stable models

Let’s start by looking into the first two steps of the sequence of functions Equation 15: ω𝒯 and ω𝒮. The weight ω𝒯 of the total choice t𝒯 is already given by Equation 6. Recall that each total choice t𝒯, together with the rules and the other facts of a program, defines the set 𝒮(t) of stable models associated with that choice. Given a total choice t𝒯, a stable model s𝒮, and formal variables or values θs,t[0,1] such that s𝒮(t)θs,t=1, we define

ω𝒮(s,t):={θs,tif s𝒮(t)0otherwise. (16)

The θs,t parameters in Equation 16 express the program’s lack of information about the weight assignment, when a single total choice entails more than one stable model. We address this issue by assigning a possibly unknown parameter, i.e. a formal algebraic variable (θs,t) associated with a total choice (t) and a stable model (s). This allows the expression of a quantity that does not result from the program’s syntax but can be determined or estimated given more information, e.g. observed data.

As sets, two stable models can have non-empty intersection. But because different SMs represent different states of a system –which are disjoint events – we assume that the algebra of the stable models is σ-additive.

Assumption 6 (stable models as disjoint events).

For any set X of stable models and any total choice t,

ω𝒮(X,t)=sXω𝒮(s,t). (17)

Equation 17 is the basis for Equation 19a and effectively extends ω𝒮:𝒮×𝒯 to ω𝒮:2𝒮×𝒯. Now the pre-condition of Equation 16 can be stated as ω𝒮(𝒮(t),t)=1.

Classes

The next step in the sequence of Equation 15 is the function ω on []. Each class of the relation (eq. 11) is either the inconsistent class () or is associated with a stable core, i.e. a set of stable models. Therefore, ω is defined considering the following two cases.

Inconsistent class

This class contains inconsistent events; they should not be observed and have weight zero.

ω(,t):=0. (18)
Consistent classes

To be coherent, the propagation function must be constant within a class and its value dependent only on the stable core:

ω([e],t):=ω𝒮(e,t)=seω𝒮(s,t). (19a)
and we further define the following:
ω([e]):=t𝒯ω𝒯(t)ω([e],t) (19b)

Equation 19a states that the weight of a class [e] is the weight of its stable core (e) and Equation 19b averages Equation 19a over the total choices. Notice that Equation 19a also applies to the independent class, ={e|e=}, because events in this class are not related with any stable model:

ω(,t)=sω𝒮(s,t)=0. (20)

Events and Probability

Each consistent event e is in the class defined by its stable core e. So, denoting the number of elements in X as #X, we set:

ω(e,t):={ω([e],t)#[e]if #[e]>0,0otherwise. (21a)
and, by averaging over the total choices:
ω(e):=t𝒯ω𝒯(t)ω(e,t). (21b)

The Equation 21b is the main goal of this paper: propagate the weights associated to facts of an WASP to the set of all events of that program. In order to get a probability from Equation 21b, concerning the Probabilistic Tasks goal, we define the normalizing factor:

Z:=eω(e)=[e][]ω([e]), (22)

and now Equation 21b provides a straightforward way to define the probability of a single event e:

P(e):=ω(e)Z. (23)

Equation 23 defines a coherent prior777In the Bayesian sense that future observations might update this probability. probability of events and, together with external statistical knowledge, can be used to learn about the initial probabilities of the atoms.

The effect of propagation

To assess weight propagation from SMs to events, compare P with syntactically induced probabilities from fact weights. It is sufficient to check if P(t)=P𝒯(t) for all total choices. Normalize TCs weights to get a probability distribution. For t𝒯,

P𝒯(t)=ω𝒯(t)τ𝒯ω𝒯(τ) (24)

And now we ask if these probabilities coincide in 𝒯: t𝒯(P(t)=P𝒯(t))?

It is easy to see that, in general, this cannot be true. While the domain of P𝒯 is the set of total choices, for P the domain is much larger, including all the events. Except for trivial programs, some events other than total choices will have non-zero weight: If a program has a consistent event e𝒞𝒯 such that P(e)0 then there is at least one t𝒯 such that

P𝒯(t)P(t). (25)

The essential, perhaps counter-intuitive, conclusion of Equation 25 is that we are dealing with two distributions: P𝒯, restricted to the total choices, results syntactically from the annotations of the program, while P, extended to events, results from both the annotations and the program’s semantics, i.e. the stable models. For ex. 4:

P𝒯(a)=0.3 from the program and P(a)=364 from Equation 29.

Now P:[0,1] can be extended to P:2[0,1] by abusing notation and setting, for X,

P(X)=xXP(x). (26)

It is straightforward to verify that the latter satisfies the Kolmogorov axioms of probability.

We can now properly state the following property about certain facts such as a:1.0. Consider a program A with the weighted fact α:1.0 and B that results from A by replacing that fact by the deterministic fact α. Then

e(ωA(e)=ωB(e)). (27)

Normalization of Equation 27 entails that, for PA given by Equation 23 for A and PB for B, then

e(PA(e)=PB(e)). (28)
Example 7 (Probability of Events).

The coherent prior probability of events of program P1 in ex. 4 is

ea¯abaca¯,aba¯,acab,acΛP(e)007207123θ123θ¯003461023 (29)

To compute the probability of e find the column of the event’s stable core:

  • ω(ab)=θ23, because ab is the only SM related with ab so ab={ab} and the weight value is found in the respective column of Equation 29.

  • ω(abc)=346 because abcab and abcac. So abc={ab,ac}.

  • ω(bc)=0 because, since there is no SM s that either sbc or bcs, bc= i.e. bc.

  • ω(a¯)=7207 and ω(a)=346.

Notice that ω(a¯)+ω(a)1. This highlights the fundamental difference between ω and ω𝒯 (cf. Equation 25), where the former results from the lattice of the stable cores and the latter directly from the explicit assignment of probabilities to literals. Related with this case, consider the complement of a consistent event e, denoted by e. To calculate P(e) find the classes in [] that are not [e], i.e. the complement of e’s class within []888All the usual set operations hold on the complement. For example, X=X., [e]. Considering that [] is in a one-to-one correspondence with the stable cores plus ,

[]{,,{a¯},{ab},{ac},{a¯,ab},{a¯,ac},{ab,ac},Λ}.

In particular, for ω(a), since a={ab,ac} then [a]=[][a] and P(a)=P([][a])=1P(a). Also, P(a¯)=1P(a¯).

While not illustrated in our examples, this method also applies to programs that have more than one probabilistic fact, like

{a:0.3,b:0.6,cdab.

4.3 Encoding of Bayesian Networks

P(M|A)mm¯a0.90.1a¯0.050.95
P(J|A)jj¯a0.70.3a¯0.010.99
P(A|B,E)aa¯be0.950.05be¯0.940.06b¯e0.290.71b¯e¯0.0010.999
Figure 4: The Earthquake, Burglary, Alarm model.

Our approach generalizes to Bayesian networks similarly to previous works [6, 21, 12, 26]. Acyclic propositional programs can be seen as Bayesian networks with binary variables, where the structure is the dependency graph, and variables correspond to atoms with probabilities from facts and rules. Conversely, any binary Bayesian network can be specified by an acyclic non-disjunctive WASP. A classic example in Bayesian networks involves the events Burglary (B), Earthquake (E), Alarm (A), and calls from Mary (M) and John (J) [20]. These events have conditional probabilities, such as the alarm going off given a burglary or earthquake, and neighbors calling if they hear the alarm. The example illustrates reasoning under uncertainty. We follow the convention of representing the (upper case) random variable X by the (lower case) positive atom x. From Figure 4 we obtain the following specification:

{b:0.001,e:0.002

For the table giving the probability P(M|A) we obtain, cf. Equation 1, the program

{m:0.9a,m:0.05a¯

Similarly, for the probability P(J|A) we obtain

{j:0.7a,j:0.01a¯,

Finally, for the probability P(A|BE) we obtain

{a:0.95be,a:0.94be¯,a:0.29b¯e,a:0.001b¯e¯.

One can then proceed as in the previous subsection and analyze this example. The details of such analysis are not given here since they are analogous, albeit admittedly more cumbersome.

5 Discussion and Future Work

This work introduces weight assignments using algebraic expressions from ASPs, with much to explore regarding their full expressive power, including recursion and logical variables. Bayesian Networks’ theory and tools can be adapted, while connections to Markov Fields [14] and program selection applications are future work. The equivalence relation identifies subset cases, and better relations between SMs are possible. Inconsistent events’ weights are set to 0, but inconsistencies from noise in observations may require reconsideration.

References

  • [1] Weronika T Adrian, Mario Alviano, Francesco Calimeri, Bernardo Cuteri, Carmine Dodaro, Wolfgang Faber, Davide Fuscà, Nicola Leone, Marco Manna, Simona Perri, et al. The asp system dlv: advancements and applications. KI-Künstliche Intelligenz, 32:177–179, 2018. doi:10.1007/S13218-018-0533-0.
  • [2] Marco Alberti, Elena Bellodi, Giuseppe Cota, Fabrizio Riguzzi, and Riccardo Zese. cplint on swish: Probabilistic logical inference with a web browser. Intelligenza Artificiale, 11(1):47–64, 2017. doi:10.3233/IA-170106.
  • [3] Joaquín Arias, Manuel Carro, Zhuo Chen, and Gopal Gupta. Justifications for goal-directed constraint answer set programming. arXiv preprint arXiv:2009.10238, 2020.
  • [4] Chitta Baral, Michael Gelfond, and Nelson Rushton. Probabilistic reasoning with Answer Sets. Theory and Practice of Logic Programming, 9(1):57–144, 2009. doi:10.1017/S1471068408003645.
  • [5] Francesco Calimeri, Wolfgang Faber, Martin Gebser, Giovambattista Ianni, Roland Kaminski, Thomas Krennwallner, Nicola Leone, Marco Maratea, Francesco Ricca, and Torsten Schaub. Asp-core-2 input language format. Theory and Practice of Logic Programming, 20(2):294–309, 2020. doi:10.1017/S1471068419000450.
  • [6] Fabio Gagliardi Cozman and Denis Deratani Mauá. The joy of probabilistic answer set programming: semantics, complexity, expressivity, inference. International Journal of Approximate Reasoning, 125:218–239, 2020. doi:10.1016/J.IJAR.2020.07.004.
  • [7] Luc De Raedt, Angelika Kimmig, Hannu Toivonen, and M Veloso. Problog: A probabilistic Prolog and its application in link discovery. In IJCAI 2007, Proceedings of the 20th international joint conference on artificial intelligence, pages 2462–2467. IJCAI-INT JOINT CONF ARTIF INTELL, 2007.
  • [8] Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice of Logic Programming, 15(3):358–401, 2015. doi:10.1017/S1471068414000076.
  • [9] Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub. Answer set solving in practice. Springer Nature, 2022.
  • [10] Martin Gebser, Benjamin Kaufmann, Roland Kaminski, Max Ostrowski, Torsten Schaub, and Marius Schneider. Potassco: The Potsdam answer set solving collection. AI Communications, 24(2):107–124, 2011. doi:10.3233/AIC-2011-0491.
  • [11] Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming. In ICLP/SLP, volume 88, pages 1070–1080. Cambridge, MA, 1988.
  • [12] Werner Kießling, Helmut Thöne, and Ulrich Güntzer. Database support for problematic knowledge. In Advances in Database Technology - EDBT’92: 3rd International Conference on Extending Database Technology Vienna, Austria, March 23–27, 1992 Proceedings 3, pages 421–436. Springer, 1992. doi:10.1007/BFB0032446.
  • [13] Michael Kifer and VS Subrahmanian. Theory of generalized annotated logic programming and its applications. The Journal of Logic Programming, 12(4):335–367, 1992. doi:10.1016/0743-1066(92)90007-P.
  • [14] Ross Kindermann and J. Laurie Snell. Markov random fields and their applications, volume 1 of Contemporary Mathematics. American Mathematical Society, Providence, RI, 1980.
  • [15] Joohyung Lee and Yi Wang. Weighted rules under the stable model semantics. In Fifteenth international conference on the principles of knowledge representation and reasoning, 2016.
  • [16] Joohyung Lee and Zhun Yang. Lpmln, Weak Constraints, and P-log. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  • [17] Vladimir Lifschitz. Twelve definitions of a stable model. In International Conference on Logic Programming, pages 37–51. Springer, 2008. doi:10.1007/978-3-540-89982-2_8.
  • [18] Kyle Marple, Elmer Salazar, and Gopal Gupta. Computing stable models of normal logic programs without grounding. arXiv preprint arXiv:1709.00501, 2017. arXiv:1709.00501.
  • [19] Ilkka Niemelä and Patrik Simons. Smodels - an implementation of the stable model and well-founded semantics for normal logic programs. In Logic Programming And Nonmonotonic Reasoning: 4th International Conference, LPNMR’97 Dagstuhl Castle, Germany, July 28–31, 1997 Proceedings 4, pages 420–429. Springer, 1997.
  • [20] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. The Morgan Kaufmann Series in Representation and Reasoning. Morgan Kaufmann, San Mateo, CA, 1988.
  • [21] Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical relational artificial intelligence: Logic, probability, and computation. Synthesis lectures on artificial intelligence and machine learning, 10(2):1–189, 2016.
  • [22] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine learning, 62:107–136, 2006. doi:10.1007/S10994-006-5833-1.
  • [23] Fabrizio Riguzzi. Foundations of Probabilistic Logic Programming: Languages, Semantics, Inference and Learning. River Publishers, New York, 1 edition, September 2022. doi:10.1201/9781003338192.
  • [24] Fabrizio Riguzzi and Terrance Swift. Well–definedness and efficient inference for probabilistic logic programming under the distribution semantics. Theory and practice of logic programming, 13(2):279–302, 2013. doi:10.1017/S1471068411000664.
  • [25] Taisuke Sato. A statistical learning method for logic programs with distribution semantics. In International Conference on Logic Programming, 1995.
  • [26] Helmut Thöne, Ulrich Güntzer, and Werner Kießling. Increased robustness of bayesian networks through probability intervals. International Journal of Approximate Reasoning, 17(1):37–76, 1997. doi:10.1016/S0888-613X(96)00138-7.
  • [27] Allen Van Gelder, Kenneth A Ross, and John S Schlipf. The well-founded semantics for general logic programs. Journal of the ACM (JACM), 38(3):619–649, 1991. doi:10.1145/116825.116838.
  • [28] Victor Verreet, Vincent Derkinderen, Pedro Zuidberg Dos Martires, and Luc De Raedt. Inference and learning with model uncertainty in probabilistic logic programs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10060–10069, 2022. doi:10.1609/AAAI.V36I9.21245.