peer-to-peer query answering with inconsistent knowledgesheila/publications/bin-mci... · 2009. 4....

11
Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and Sheila A. McIlraith Department of Computer Science University of Toronto Toronto, Ontario, Canada {abinas,sheila}@cs.toronto.edu Abstract Decentralized reasoning is receiving increasing attention due to the distributed nature of knowledge on the Web. We ad- dress the problem of answering queries to distributed propo- sitional reasoners which may be mutually inconsistent. This paper provides a formal characterization of a prioritized peer- to-peer query answering framework that exploits a priority ordering over the peers, as well as a distributed entailment re- lation as an extension to established work on argumentation frameworks. We develop decentralized algorithms for com- puting query answers according to distributed entailment and prove their soundness and completeness. To improve the effi- ciency of query answering, we propose an ordering heuristic that exploits the peers’ priority ordering and empirically eval- uate its effectiveness. 1 Introduction With the advent of the Web has come a significant increase in the availability of information from a variety of sources, not all of which are mutually consistent or equally reliable. These information sources are often databases, but many foresee a future in which some of them will be deductive databases, logic programs, or even full-fledged logical rea- soners. Motivated by this general problem, this paper ad- dresses the problem of peer-to-peer (P2P) query answering over distributed propositional information sources that may be mutually inconsistent. We assume the existence of a pri- ority ordering over the peers to discriminate between peers with conflicting information. This ordering may reflect an individual’s level of trust in an information source or it may be obtained from an objective third party rating. We pro- vide a formal characterization of a prioritized P2P query answering framework and a distributed entailment relation related to argued entailment [5] and argumentation frame- works [13]. To realize the specification of our problem, we develop decentralized algorithms based on [1] for com- puting answers to arbitrary CNF queries according to dis- tributed entailment and prove their soundness and complete- ness. To improve the efficiency of reasoning, we propose heuristic and pruning techniques that exploit the priority or- dering over peers and their knowledge and empirically illus- trate their effectiveness. Copyright c 2008, American Association for Artificial Intelli- gence (www.aaai.org). All rights reserved. There has been significant previous work on distributed logical reasoning. For example, Amir and McIlraith first in- troduced partition-based logical reasoning for propositional and first-order logic (FOL) to reason with consistent dis- tributed knowledge bases (KBs) or with large KBs that they partitioned. Their message-passing approach to conse- quence finding was limited to partitioned KBs that were con- sistent and connected in a tree topology [3]. Adjiman et al. introduced the first consequence-finding algorithm for dis- tributed propositional theories connected by graphs of arbi- trary topology, but limited such systems to be globally con- sistent and not include priorities [1]. Chatalic et al. extended this approach to allow for mutually inconsistent peers that are connected by mapping clauses [12]. However, their ap- proach allows for a formula and its negation to be derived as consequences at the same time. We build on this work, addressing a different general problem. We discuss other related work in the final section. 2 P2P Query Answering Systems In this section we formalize a P2P query answering frame- work and provide a distributed entailment relation for such systems. We illustrate concepts via a running example. 2.1 Framework A P2P query answering system (PQAS) consists of multi- ple peers, all of which are assigned priorities to reflect user- specific preferences or trust. Priorities could be universal or user specific—distributed to the peers when a user joins the network. In this paper, priorities are drawn from a to- tally ordered set of comparable elements (such as the set of integers), and a priority is better than another priority if its value is lower. The peers’ priorities define a total or partial preference ordering over the peers. Knowledge from a peer with better priority is preferred over that from a peer with worse priority. Each peer hosts a consistent propositional local KB and consequence relation, which may be classi- cal entailment or any other consequence relation. Multiple peers’ KBs may be mutually inconsistent. In the general framework, local consequence relations may vary to allow for systems of different desired behaviors (classical, locally or globally non-monotonic, etc.). In Section 3, we develop a query answering algorithm for the special but interesting

Upload: others

Post on 07-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

Peer-to-peer Query Answering with Inconsistent Knowledge

Arnold Binas and Sheila A. McIlraithDepartment of Computer Science

University of TorontoToronto, Ontario, Canada

{abinas,sheila}@cs.toronto.edu

Abstract

Decentralized reasoning is receiving increasing attention dueto the distributed nature of knowledge on the Web. We ad-dress the problem of answering queries to distributed propo-sitional reasoners which may be mutually inconsistent. Thispaper provides a formal characterization of a prioritized peer-to-peer query answering framework that exploits a priorityordering over the peers, as well as a distributed entailment re-lation as an extension to established work on argumentationframeworks. We develop decentralized algorithms for com-puting query answers according to distributed entailment andprove their soundness and completeness. To improve the effi-ciency of query answering, we propose an ordering heuristicthat exploits the peers’ priority ordering and empirically eval-uate its effectiveness.

1 Introduction

With the advent of the Web has come a significant increasein the availability of information from a variety of sources,not all of which are mutually consistent or equally reliable.These information sources are often databases, but manyforesee a future in which some of them will be deductivedatabases, logic programs, or even full-fledged logical rea-soners. Motivated by this general problem, this paper ad-dresses the problem of peer-to-peer (P2P) query answeringover distributed propositional information sources that maybe mutually inconsistent. We assume the existence of a pri-ority ordering over the peers to discriminate between peerswith conflicting information. This ordering may reflect anindividual’s level of trust in an information source or it maybe obtained from an objective third party rating. We pro-vide a formal characterization of a prioritized P2P queryanswering framework and a distributed entailment relationrelated to argued entailment [5] and argumentation frame-works [13]. To realize the specification of our problem,we develop decentralized algorithms based on [1] for com-puting answers to arbitrary CNF queries according to dis-tributed entailment and prove their soundness and complete-ness. To improve the efficiency of reasoning, we proposeheuristic and pruning techniques that exploit the priority or-dering over peers and their knowledge and empirically illus-trate their effectiveness.

Copyright c© 2008, American Association for Artificial Intelli-gence (www.aaai.org). All rights reserved.

There has been significant previous work on distributedlogical reasoning. For example, Amir and McIlraith first in-troduced partition-based logical reasoning for propositionaland first-order logic (FOL) to reason with consistent dis-tributed knowledge bases (KBs) or with large KBs thatthey partitioned. Their message-passing approach to conse-quence finding was limited to partitioned KBs that were con-sistent and connected in a tree topology [3]. Adjiman et al.introduced the first consequence-finding algorithm for dis-tributed propositional theories connected by graphs of arbi-trary topology, but limited such systems to be globally con-sistent and not include priorities [1]. Chatalic et al. extendedthis approach to allow for mutually inconsistent peers thatare connected by mapping clauses [12]. However, their ap-proach allows for a formula and its negation to be derivedas consequences at the same time. We build on this work,addressing a different general problem. We discuss otherrelated work in the final section.

2 P2P Query Answering Systems

In this section we formalize a P2P query answering frame-work and provide a distributed entailment relation for suchsystems. We illustrate concepts via a running example.

2.1 Framework

A P2P query answering system (PQAS) consists of multi-ple peers, all of which are assigned priorities to reflect user-specific preferences or trust. Priorities could be universalor user specific—distributed to the peers when a user joinsthe network. In this paper, priorities are drawn from a to-tally ordered set of comparable elements (such as the set ofintegers), and a priority is better than another priority if itsvalue is lower. The peers’ priorities define a total or partialpreference ordering over the peers. Knowledge from a peerwith better priority is preferred over that from a peer withworse priority. Each peer hosts a consistent propositionallocal KB and consequence relation, which may be classi-cal entailment or any other consequence relation. Multiplepeers’ KBs may be mutually inconsistent. In the generalframework, local consequence relations may vary to allowfor systems of different desired behaviors (classical, locallyor globally non-monotonic, etc.). In Section 3, we developa query answering algorithm for the special but interesting

Page 2: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

case in which each peer’s local consequence relation is clas-sical entailment. Peers are connected by edges, which arelabeled by the sets of variables that determine the languagein which formulas can be passed between peers. This corre-sponds to restrictions on the information that KBs will sharewith each other.

Definition 1 (Peers). A peer Pi is a triple (KBi, Consi, Ii)comprising a local propositional knowledge base KBi, itslocal consequence relation Consi, and the peer’s priorityIi. The notation Σ |=i φ is used to denote φ ∈ Consi(Σ),where Σ is a set of formulas. Peer Pi is said to have betterpriority than peer Pj iff Ii < Ij . A peer’s signature Li is theset of propositional symbols in its knowledge base.

Definition 2 (P2P query answering system (PQAS)). A P2Pquery answering system P is a tuple (P,G) where P ={Pi}

ni=1 is a set of n peers and G is a graph (V,E) describ-

ing the communication connections between those peers.Sn

i=1KBi is called the global theory of P. Each peer’s sig-

nature Li is a subset of the global signature L =Sn

j=1Lj of

P, which gives rise to the global language L. V is the set{1, . . . , n} of vertices in G, with vertex i corresponding topeer Pi. E is the set of labeled edges (i, j, Lij) in G, whereLij is the edge label (link signature) between peers Pi andPj and Lij ⊆ Li ∩ Lj .

Figure 1 is a small example of a PQAS consisting of thepeers P = {Bookstore, V isa, CCOne,CCTwo,Customer}corresponding to peers P1–P5, respectively. The priority Ii

of peer i is displayed on its upper right corner. The book-store’s KB, KB1, contains the axioms ¬pmt rcvd ∧ ord →¬del ontime and pmt rcvd ∧ ord→ del ontime, stating thatthe book is delivered on time if payment is received and anorder placed and not delivered on time if no payment is re-ceived. The Visa peer contains the KB KB2 = {paid →pmt rcvd}, stating that payment is received if money is paid.CCTwo (KB4) additionally requires confirmation (conf).The unreliable CCOne has worse priority than the Visa peerand contains the KBKB3 = {¬pmt rcvd, paid→ lost}, stat-ing that payment will not be received. The edges are labeledby the propositions which pairs of peers may use to com-municate. An edge label between two peers only containspropositions which are mentioned in both peers’ local KBs,but not necessarily all such propositions (perhaps for reasonsof confidentiality). For example, the Visa and CCOne peersboth contain the propositions pmt rcvd and paid, but are notconnected by an edge labeled with them as they representconfidential financial information.

2.2 Distributed Entailment

The distributed entailment we wish to achieve is best intro-duced through our example in Figure 1. Consider the casewhere Customer has asserted ord and paid and poses thequery del ontime. According to Bookstore, the book is de-livered on time if an order has been placed and payment re-ceived. If the book is ordered and payment not received, thebook will not be delivered on time (¬del ontime). Accord-ing to Visa, payment is received (paid and pmt rcvd∨¬paidimply pmt rcvd), and del ontime derived. However, ac-cording to credit card peer CCOne, payment will not be

pmt rcvd ¬ord ¬del ontime¬pmt rcvd ¬ord del ontime

BOOKSTORE 3

pmt rcvd ¬paidVISA 5

¬pmt rcvd¬paid lostCCOne 10

{pmt rcvd}{pmt rcvd}

{paid} {paid, conf}

{pmt rcvd}

{ord}

paidord

CUSTOMER 3

{paid}

pmt rcvd ¬paid ¬confCCTwo 3

Figure 1: Bookstore example

received by the bookstore (¬pmt rcvd) and ¬del ontime isderived. Clearly we have a contradiction (del ontime and¬del ontime). But since Visa is preferred over CCOne dueto its better priority, we would like to receive the answer itsupports. Hence we need del ontime to be entailed by thesystem according to distributed entailment.

In the following, we formally define this new notion ofentailment by appealing to argumentation frameworks [13]and extending them to the distributed and prioritized case.To this end, we employ the notion of reasons which wasused to define argued entailment for prioritized centralizedKBs in [5]. Benferhat et al. defined a formula to be entailedby argued entailment if a better reason exists for it than forits negation. A reason for a formula is a consistent subsetof the KB that entails the formula (since an inconsistent onecould derive any formula and would thus be meaningless).In our distributed setting, we extend this definition to ac-count for several, distributed KBs, possibly different localconsequence relations, and restricted sharing of informationbetween KBs. To this end, we define a support relation in-tended to mention all formulas that can be derived from con-sistent subsets of the global theory while obeying the linklanguages between peers. All formulas that can be derivedby an individual peer from its local KB are mentioned inthe relation. Inductively, all formulas that can be derivedat an individual peer using its local KB as well as formu-las already mentioned by the support relation which can becommunicated to the peer according to the link languagesshared with its neighbors are also in the support relation.

Definition 3 (Support of a formula by a reason). Let P =(P,G) be a PQAS with P = {P1, . . . , Pn} and G = (V,E).The support relation SUPPP ⊆ {1, . . . , n}×2KB1∪···∪KBn×L is the smallest set such that:

• If Σ ⊆ KBi, Σ |=i ρ, and for no Σ∗ ⊂ Σ, Σ∗ |=i ρ,(i,Σ, ρ) ∈ SUPPP .

• If

– there exists a set of formulas ψ1, . . . , ψm s.t.{ψ1, . . . , ψm} |=i ρ,

– for no Ψ ⊂ {ψ1, . . . , ψm}, Ψ |=i ρ,

– for each ψj , there exists a peer Pk and a set of formulas

Page 3: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

Σj s.t. (k,Σj , ψj) ∈ SUPPP and (i, k, Lik) ∈ E withLik ⊇ sig(ψj), and

–Sm

j=1Σj 6|= ⊥,

then (i,Sm

j=1Σj , ρ) ∈ SUPPP .

The formula φ is supported in P by the reason Σ, denotedP |=Σ φ, if for some i, (i,Σ, φ) ∈ SUPPP .

Thus a reason for a formula φ exists in a PQAS P if someconsistent subset of the peers’ local KBs derives φ given thepeers’ local consequence relations and given that whenevertwo formulas of different peers need to interact, an edge be-tween the peers exists and is labeled by all variables men-tioned in the formula. Note a fine subtlety regarding theformula sets in the definition. Formulas coming from a sin-gle peer should be minimal, i.e. a reason should contain alland only those formulas required to entail the given formula.Eliminating extraneous formulas in the reason avoids thegeneration of unnecessary inconsistencies with other formu-las that will be added to the reason in later inductive ap-plications of the definition. The minimality constraint doesnot apply for merged sets of formulas from several peers be-cause of the potentially limited connections between them.In other words, two seemingly redundant formulas from twodifferent peers’ KBs may both be required to derive a de-sired consequence because they cannot travel between thetwo peers if the peers are not directly connected or the edgeconnecting them is not labeled by all the symbols occurringin the formula.

Since a reason may exist for both a formula φ and its nega-tion ¬φ, we need to weigh the priorities of the formulas inboth reasons in order to decide whether to believe φ or ¬φ.The priority of a formula is that of its host peer. A rea-son along with its conclusion will be called an argument tolay the basis for applying Dung’s argumentation theory [13].The rank of an argument is the priority of the worst-priorityformula in its reason.

Definition 4 (Arguments (after [13])). An argument A is apair A = (Σ, φ) s.t. Σ is a reason for φ. φ is called theconclusion of the argument.

Definition 5 (Rank of an argument). Let A = (Σ, φ) be anargument. The rank of A, denoted R(A), is the priority ofthe worst-priority formula in A’s reason Σ. I.e. R(A) =maxσ∈Σ(σ.prio).

While arguments are, by definition, based on reasons thatare consistent in themselves, they may be contradicted, orattacked, by other arguments of better rank. The set of ar-guments along with this attack relation is referred to as theargumentation framework induced by a PQAS.

Definition 6 (Attacking arguments (after [13])). Let A =(ΣA, φA) and B = (ΣB , φB) be arguments. A is said to at-tack B iff φA ∪ ΣB |= ⊥ and R(A) ≤ R(B).

Definition 7 (Argumentation frameworks (after [13])).The argumentation framework induced by a PQAS P isAF (P) = (AR, attacks), where AR is the set of argumentsin P and attacks their attack relation.

The following definitions from [13] then develop the no-tion of a preferred extension of an argumentation frame-work.

Definition 8 (Conflict-free argument sets [13]). A set S ofarguments is conflict-free if no argument in S is attacked byanother argument in S.

Definition 9 (Acceptable arguments and admissible argu-ment sets [13]). Let AF (P) be a PQAS-induced argumen-tation framework. An argument A ∈ AR is acceptable wrt.an argument set S ⊆ AR iff for each argument B ∈ AR, ifB attacks A then B is attacked by some argument in S. Aconflict-free set of arguments S is admissible iff each argu-ment in S is acceptable wrt. S.

Definition 10 (Preferred extensions [13]). A preferred ex-tension of an argumentation framework AF is a maximaladmissible set of AF .

Given the notion of distributed arguments and thepriority-based attack relation between them developedabove, distributed entailment is then defined in terms ofDung’s preferred extensions on the induced argumentationframework. More specifically, we will believe formulas thatare conclusions of arguments that are in some preferred ex-tension of the induced argumentation framework, i.e. thosesupported and not successfully attacked by the most pre-ferred peers of P.

Definition 11 (Distributed Entailment). A formula φ is en-tailed by the PQAS P by distributed entailment, denotedP |=D φ, iff φ is the conclusion of an argument in somepreferred extension of AF (P).

Note that this argumentation-based definition of dis-tributed entailment means that both a formula and its nega-tion can be entailed by distributed entailment at the sametime. This may occur in cases where arguments for andagainst the query have the same rank. To illustrate this, con-sider the simple example of a PQAS-induced argumentationframework containing only the arguments A = ({p}, p) sup-porting the formula p and B = ({¬p},¬p) supporting theformula ¬p. If both A and B have the same rank, A attacksB and B attacks A. Thus there are two preferred extensions:one containing only A and one containing only B. Suchcases are easily detected and can be reported by a queryanswering algorithm, as will be done by the algorithm wepresent in the next section.

While in the general case, a PQAS’s peers may be mu-tually inconsistent, an inconsistency may not always be de-rived. This could be because all peers are mutually con-sistent or because the inconsistency is not derivable withthe restricted communication imposed by peer connectiv-ity. To distinguish between PQASs that can derive contra-dictions and those that cannot, we introduce the concept ofP-consistency.

Definition 12 (P-consistency). A PQAS P is said to be P-inconsistent if there exists a formula φ s.t. there exists both areason for φ and a reason for ¬φ in P. Otherwise P is saidto be P-consistent.

Since in a P-consistent system no contradictions (attacks)can be derived, the existence of a reason for a formula φ

guarantees that the corresponding argument is in the pre-ferred extension formed by the entire argument set and thusthat φ is entailed by distributed entailment in P.

Page 4: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

The following theorem shows that for fully connected sys-tems with classical local entailment, P -consistency is equiv-alent to the satisfiability of the global theory.

Theorem 1. Let P = (P,G) be a PQAS such that if forany two peers Pi and Pj , Lij = Li ∩ Lj is non-empty,(i, j, Lij) ∈ E and for all peers Pi, Consi is the classicalentailment relation |=. Then P is P-consistent iff

Sn

i=1KBi

is satisfiable. Otherwise it is P-inconsistent.

Proof sketch. Since P is fully connected, every formula inthe system can interact with any other formula in the sys-tems as if they were part of a single theory. Hence, there isan argument for each formula that can be derived from theglobal theory.

⇒: If P is P-consistent, there exists no formula φ s.t. thereis a reason for both φ and ¬φ (Definition 12). Hence sincethe system is fully connected, for no formula φ, both φ and¬φ can be derived from the global theory (Definition 3).Thus the empty clause cannot be derived from the globaltheory, which is hence satisfiable.

⇐: If the global theory is satisfiable, there does not exist aformula φ s.t. both φ and ¬φ can be derived from the globaltheory. Thus since the system is fully connected, there ex-ists no formula φ s.t. there exists a reason for both φ and¬φ (Definition 3) and therefore P is P-consistent (Defini-tion 12). �

The following theorem further establishes that in a P-consistent system with full peer connectivity and classicallocal entailment, distributed entailment as defined throughpreferred extensions in the induced argumentation frame-work is equivalent to classical entailment from the globaltheory.

Theorem 2. Let P = (P,G) be a P-consistent PQAS suchthat if for any two peers Pi and Pj , Lij = Li ∩ Lj is non-empty, (i, j, Lij) ∈ E and for all peers Pi, Consi is the clas-sical entailment relation |=. Then P |=D φ iff

S

i KBi |= φ.

Proof sketch. Since P is fully connected, every formula inthe system can interact with any other formula in the sys-tems as if they were part of a single theory. Hence, thereis an argument for each formula that can be derived fromthe global theory. Since P is also P-consistent, no argumentattacks any other argument and the argument set containingat least one argument for each formula in the classical de-ductive closure of

S

i KBi is the lone preferred extension ofAF (P); i.e. each formula that is in the deductive closure ofS

i KBi is also entailed in P by distributed entailment andvice versa. �

3 Query Answering in a PQAS

Given the PQAS framework, we want to pose a formula as aquery and have the system determine the status of the queryaccording to our definition of distributed entailment by usingindividual peers’ local knowledge and reasoning capabili-ties. The goal of this section is to develop a message-passingalgorithm that solves this problem for arbitrary CNF queriesin the special case of a possibly P-inconsistent PQAS in

which all peers use classical entailment as their local con-sequence relation.

A consequence-finding algorithm for systems of mutuallyconsistent peers already exists [1]. Adjiman et al.’s algo-rithm computes consequences of individual literals for ar-bitrary peer topologies and shall serve as the basis for ourquery answering algorithm. Here we modify and extend Ad-jiman et al.’s algorithm to find the distributed consequencesof individual literals as well as their reasons and prioritiesin a possibly P-inconsistent PQAS P. These consequencesand their reasons are then used to compute query answers ac-cording to distributed entailment. Our focus on prioritizedpeers enables a search space pruning technique and searchheuristic that drastically improves performance. A query isposed by a query peer that may be one of the peers of P oran additional peer connected to P.

3.1 Algorithm

To answer an arbitrary CNF query Q to peer P accordingto distributed entailment, Algorithm 1 finds the ranks of thebest-rank arguments from preferred extensions ofAF (P) forboth Q and ¬Q by passing the negation of each subquery toFind Bottom Distr of Algorithm 2 for a refutation proof andreturns the answer corresponding to the best-rank argument.BOTH is returned if the best arguments for and against thequery have equal rank and UNK (for unknown) if no argu-ment for either exists in any preferred extension.

Find Bottom Distr(Q,P, best) finds the best-rank way to de-rive the empty clause (�) from Q relying on an argument ofrank no worse than best and which is in a preferred extensionof AF (P). Arguments with reason Σ that are in no preferredextension due to being attacked are detected by recursivecalls to Find Bottom Distr(Σ, P, best) and ignored. To find thebest-rank way of deriving the empty clause from the CNFquery Q, it is split into its individual literals, and each lit-eral’s distributed consequences that rely on arguments frompreferred extensions (OA sets in the algorithms) are foundby sending a forth message for each. The returned backmessages are collected and merged back together to form theconsequences of the original clauses in Q. We will outlinethe workings of the message-passing procedures and givean example in a moment. Given the consequences of Q,their OA sets (reasons) are tested for satisfiability (to com-ply with Definition 3) and for being in a preferred extensionof the argumentation framework induced by the PQAS (tocomply with Definition 11). The latter is done by recursivecalls to Find Bottom Distr, which is only asked to find con-tradicting arguments of rank worse or equal to the reason at-tacks on which are to be found. Find Bottom Distr then callsthe subroutine Find Empty Clauses to find contradictions inthe deductive closure of all the consequences of the passed-in (negated) query Q, which indicate a refutation proof forthe original query. The satisfiability and attack checks arerepeated on those proof candidates and the rank of the best-rank argument from a preferred extension returned.

Reasons for individual literal’s consequences are com-puted by the message-passing Algorithms 3, 4, 5, and 6 andthen combined to find reasons for the original CNF query.In order to compute reasons for a query literal p’s conse-

Page 5: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

quences, a forth message is sent to a peer whose signaturecontains p and back messages with the consequences col-lected. A history is used to keep track of which literalsand clauses the currently processed literal depended on, en-abling the algorithm to detect if a literal has been processedby the same peer before or whether it depends on its ownnegation in the history and thus derives the empty clause.Each derived clause has an associated OA set (for “originalancestor”) which contains all original parent clauses and isused to reconstruct the reasons for consequences and theirranks. The following notation, proposed by [1], is used inthe message-passing algorithms, which extend Adjiman etal.’s by priorities, OA sets, and attack checks.

Definition 13 (History ([1])). A history hist is a list of tu-ples (l, P, c) of a literal l, a peer P , and a clause c. c is aconsequence of l in P , and l is a literal of the clause of theprevious tuple in the hist list.

Definition 14 (Local consequences ([1])). Resolvent(l, Pi)is the set {c|KBi |=i c ∨ ¬l} of local consequences in peerPi of the literal l.

Definition 15 (Acquainted peers by literal ([1])). ACQ(l, P )is the set {P ′ ∈ V |(P, P ′, Lij) ∈ E, l ∈ Lij} of peers sharingan edge labeled by Lij with peer P , where l ∈ Lij .

Definition 16 (Distributed disjunction ([1])). The dis-tributed disjunction operator ⊗ in A ⊗ B forms clauses bydisjoining all combinations of clauses in the sets A andB . For an indexed set of clause sets {Ai}i, the notation⊗i∈{1,2,3}Ai means A1 ⊗A2 ⊗A3.

Each peer can send and receive four message types duringthe search for consequences of the query. Forth messages(Algorithm 3) request the search for consequences of a lit-eral by neighboring peers. The receiving peer computes alllocal consequences of the literal of the message (line 15).When two clauses are resolved, their original ancestors areadded to the resolvent’s OA set. Any consequences withconsistent OA sets are returned to the sender peer via aback message if a recursive call to Find Bottom Distr verifiesthat their reasons are not attacked without defense (line 23).Then the local consequence clauses are split into their indi-vidual literals and each sent to the connected peers (if any)via another forth message (line 35). Back messages (Al-gorithm 4) are sent back to the sender peer when a conse-quence with satisfiable and non-attacked OA set is derived.Upon receiving a back message, a peer stores the containedconsequence of the literal of the corresponding forth mes-sage (line 3). If their respective OA sets are mutually sat-isfiable and not attacked, one back message for each com-bination of per-literal consequences is sent to the last peerin the history hist (line 17). Final messages (Algorithm 5)indicate that the exploration of a particular search branch iscompleted (Algorithm 3, lines 3, 5, 28, and 37). Prio mes-sages (Algorithm 6) are sent when a new argument from apreferred extension is founds whose rank is better than thatof any such argument for the original query known so far(Algorithm 2, line 29). Prio messages serve to update eachpeer’s local value of best(id), which determines the priorityat which peers and clauses of the reasoning process id can

Algorithm 1 Distributed query answering algorithm

1: Answer Query Distr(Q)2: Q← ¬Q in CNF3: for all P ∈ P do4: posP ← Find Bottom Distr(Q, P,∞)5: negP ← Find Bottom Distr(Q,P,∞)6: pos← minP (posP )7: neg ← minP (negP )8: if pos < neg then9: return Y ES

10: if neg < pos then11: return NO12: if neg = pos & pos <∞ then13: return BOTH

14: return UNK

Algorithm 2 Finding best-rank arguments

1: Find Bottom Distr(Q,P, best)2: id← newID()3: best(id)← best4: for all clauses c ∈ Q do5: for all literals l ∈ c do6: send m(Self, P, forth, [(l, Self, c)|hist], l, id)7: Consc,l ← ∅8: best prio←∞9: while final message not received do

10: for all back messages received for clause c and literal l withID id, containing consequence cons do

11: Consc,l ← Consc,l ∪ {cons}12: for all clauses c ∈ Q do13: Consc ← ⊗l∈cConsc,l

14: c.prio← max(l.prio)15: c.OA←

S

l∈c l.OA

16: Cons←S

c∈QConsc

17: for all c ∈ Cons do18: if c.prio > best then19: Cons← Cons\{c}20: for all c ∈ Cons do21: if c.OA is UNSAT or

Find Bottom Distr(c.OA, P, c.prio) ≤ c.prio then22: Cons← Cons\{c}23: �Cons ← Find Empty Clauses(Cons)24: for all � ∈ �Cons do25: if �.OA is UNSAT or

Find Bottom Distr(�.OA, P,�.prio)≤ �.prio then26: �Cons ← �Cons\�27: best prio← min( min�∈�Cons

(�.prio), best prio )28: if best prio < best(id) then29: send m(Self, P, prio, best prio, id)

30: return best prio

be safely pruned from the search space. This pruning of thesearch space deserves highlighting and results in a signifi-cant improvement in the performance of the system.

3.2 Example

Returning to the bookstore example of Figure 1, Customerasserts ord and paid and asks at the bookstore whether thebook will be delivered on time. This is a single-literal querywhich Algorithm 1 asks positively and negatively via Algo-

Page 6: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

Algorithm 3 Forth message algorithm

1: ReceiveForthMessage(m(Sender, Self, forth, hist, p, id))2: if (p, Self, ) ∈ hist then3: send m(Self, Sender, final, [(p, Self, true)|hist], id)4: else if p.prio > best(id) or Self.prio > best(id) then5: send m(Self, Sender, final, [(p, Self, true)|hist], id)6: else7: if (¬p, , ) ∈ hist then8: hist is of the form [(l′, , c′)|hist′]9: if c′.OA is SAT then

10: let � be a new empty clause11: �.OA← c′.OA12: �.prio← c′.prio13: if Find Bottom Distr(�.OA, Self,�.prio)>

�.prio then14: send m(Self, Sender, back,

[(p, Self,�)|hist],�, id)15: LOCAL(Self )← {p} ∪Resolvent(p, Self)16: for all c ∈ LOCAL(Self ) do17: let {c∗i }i be the set of clauses that went into c18: c.OA←

S

ic∗i .OA (recursively)

19: c.prio← maxi(c∗i .prio) (recursively)

20: LOCAL(Self )← {c ∈LOCAL(Self )|c.prio ≤ best(id)}

21: temp min←∞22: for all c ∈ LOCAL(Self ) s.t. c.OA is SAT do23: if Find Bottom Distr(c.OA, Self, c.prio)> c.prio

then24: send m(Self,Sender,back,[(p, Self, c)|hist],c,id)25: temp min← min(temp min, c.prio)26: LOCAL(Self )← {c ∈ LOCAL(Self ) | all literals in c are

shared}27: if LOCAL(Self ) = ∅ then28: send m(Self,Sender,final,[(p, Self, true)|hist],id)29: for all c ∈ LOCAL(Self ) do30: for all l ∈ c do31: CONS(l, [(p, Self, c)|hist])← ∅32: ACQ∗ ← {P ′ ∈ ACQ(l, Self)|P ′.prio ≤

best(id)}33: for all RP ∈ ACQ∗ do34: FINAL(l, [(p, Self, c)|hist], RP )← false35: send m(Self,RP,forth,[(p, Self, c)|hist],l,id)36: if no forth message sent then37: send m(Self,Sender,final,[(p, Self, true)|hist],id)

rithm 2. Algorithm 2 passes each on as a forth message.At the Bookstore peer, ¬del ontime generates the local con-sequence ¬pmt rcvd ∨ ¬ord which is split into its individ-ual literals. A forth message containing ¬pmt rcvd is sentto Visa, CCOne, and CCTwo. Visa generates ¬paid whichresolves with paid to the empty clause at Customer. Theempty clause with its OA set is passed back to Visa and thento Bookstore via back messages. ¬pmt rcvd generates noconsequences in CCOne or CCTwo, and final messages aresent for these branches. The ¬ord of the consequence inBookstore is sent as a forth message to Customer where itresolves with ord to the empty clause and is passed backas a back message. The empty clauses for ¬pmt rcvd and¬ord are merged back together at the bookstore and the re-sulting OA set {¬pmt rcvd∨¬ord∨ del ontime, pmt rcvd∨¬paid, paid, ord} is determined to be consistent. Thus a rea-

Algorithm 4 Back message algorithm

1: ReceiveBackMessage(m(Sender, Self, back, hist, c∗, id))2: hist is of the form [(l′, Sender, c′), (p, Self, c)|hist′]3: CONS(l′, [(p, Self, c)|hist′])←

CONS(l′, [(p, Self, c)|hist′]) ∪ c∗

4: RESULT← (⊗l∈c\{l′}CONS(l, [(p, Self, c)|hist′]))⊗{c∗}5: for all cl ∈ RESULT s.t. cl contains a distributed consequence

for each l ∈ c do6: let {c∗i }i be the set of distributed consequence clauses that

went into cl7: cl.OA←

S

ic∗i .OA

8: cl.prio← maxi(c∗i .prio)

9: RESULT← {cl ∈ RESULT|cl.prio ≤ best(id) and cl.OAis SAT}

10: if hist′ = ∅ then11: U ← User12: else13: U ← the first peer P ′ of hist′

14: temp min←∞15: for all cl′ ∈ RESULT do16: if Find Bottom Distr(cl′.OA, Self, cl′.prio)> cl′.prio

then17: send m(Self, U, back, [(p, Self, c)|hist′], cl′, id)18: temp min← min(temp min, cl′.prio)

Algorithm 5 Final message algorithm

1: ReceiveFinalMessage(m(Sender, Self, final, hist, id))2: hist is of the form [(l′, Sender, true), (p, Self, c)|hist′]3: FINAL(l′, [(p, Self, c)|hist′], Sender)← true4: if ∀c∗ ∈ LOCAL(Self ) and ∀l ∈ c∗,

FINAL(l, [(p, Self, c∗)|hist′], ) = true then5: if hist′ = ∅ then6: U ← User7: else8: U ← the first peer P ′ of hist′

9: send m(Self, U, final, [(p, Self, true)|hist′], id)

Algorithm 6 Prio message algorithm

1: ReceivePrioMessage(m(Sender, Self, prio, x, id))2: if x < best(id) then3: best(id)s← x4: for all RP ∈ ACQ( , Self) s.t. RP 6= Sender do5: send m(Self,RP, prio, best(id), id)

son for del ontime is found with rank 5 (max over the threepeers contributing clauses to the reason). It is verified thatthe argument this reason is in is in a preferred extension byrecursively passing it to Algorithm 2 and observing that nocontradiction is derived (and the reason is thus not attacked).Similarly the empty clause can be derived from del ontime

using Bookstore, CCOne, and Customer, resulting in a rea-son of rank 10 for ¬del ontime. The argument with thisreason, however, is determined to be attacked by a recursivecall to Algorithm 2 and thus ignored. Since del ontime isinferred with better rank than ¬del ontime, YES is returnedas the answer in Algorithm 1.

Page 7: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

3.3 Analysis

We proved various results for the algorithms and restate themost important theorems here. The proofs are outlined inas much detail as space permits. Our ultimate goal in thisanalysis is to show that Algorithm 1 is sound and completewith respect to distributed entailment (Definition 11). Tothis end, we will first establish that Algorithms 3, 4, and 5are sound and complete wrt. computing the reasons of dis-tributed consequences of individual literals and then use thisresult to show that Algorithm 2 computes the rank of thebest-rank argument for the query that belongs to a preferredextension of the argumentation framework AF (P) inducedby the PQAS P. In order to make the line of argumenta-tion easier to follow, we will first consider the algorithmswithout priority-based pruning and without recursive callsto Find Bottom Distr of Algorithm 2 (i.e. without checkingreasons for being attacked). Without attack checks, the ar-guments that the computed reasons belong to are valid argu-ments, but do not necessarily belong to a preferred extensionof AF (P), which is necessary for distributed entailment. Wegradually add both attack checks and priority-based pruningback in to arrive at our final soundness and completenessresult.

We proceed by establishing the following in this order,each bullet corresponding to a theorem.

• Computation of reasons for consequences of individualliterals by Algorithms 3, 4, and 5 without priority-basedpruning and attack checks

• Computation of reasons for arbitrary CNF queries byAlgorithm 2 without priority-based pruning and attackchecks

• Computation of reasons for arbitrary CNF queries by Al-gorithm 2 without priority-based pruning but with attackchecks

• Termination of Algorithm 2

• Correctness of priority-based pruning

• Soundness and completeness of Algorithm 1 wrt. dis-tributed entailment

First, the following theorem states that, without priority-based pruning and attack checks, the message-passing algo-rithms find the reasons of distributed consequences of indi-vidual literals. Both theorem and proof somewhat resem-ble those in [1], but additionally take into account link lan-guages and validity of reasons (satisfiability of OA sets).

Theorem 3 (Soundness and completeness wrt. computingreasons for consequences of individual literals). Let P be apossibly P-inconsistent PQAS. The following holds for Al-gorithms 3, 4, and 5 without calls to Find Bottom Distr ofAlgorithm 2 and without priority-based pruning. There ex-ist peers Pi and Pj and a literal p in Lij s.t. Pi receivesthe message m(Pj , Pi, back, [(p, Pj , c)], c) as a response to themessage m(Pi, Pj , forth, ∅, p) iff there is a set of formulas Σs.t. P |=Σ p→ c, where c is a clause.

Proof sketch. The message passing Algorithms 3, 4, and 5can be shown to be sound and complete for computing dis-tributed consequences of an individual literal, i.e. those for

which a reason exists (by Definition 3). This is proved byinduction on the length of the history hist by relating themerging of distributed consequences of individual literalsof a clause (line 4 of Algorithm 4) to the inductive step ofDefinition 3 and checking the satisfiability of OA sets to in-sure consistent reasons. The full proof is very similar to thatin [1], but additionally accounts for satisfiability checks ofOA sets and for the constraints imposed by link languages(in the inductive step of Definition 3). �

Theorem 4 extends the above result to full CNF queriesto Algorithm 2. I.e., it is shown that without priority-basedpruning and attach checks, reasons are correctly computed,although they may still be attacked.

Theorem 4 (Best-rank reasons). Let P be a pos-sibly P-inconsistent PQAS. Then for some peer P ,Find Bottom Distr(Q,P,∞) of Algorithm 2 without recursivecalls to itself and without priority-based pruning returns therank of the best-rank reason for ¬Q.

Proof sketch. Theorem 3 establishes that the message-passing Algorithms 3, 4, and 5, invoked at some peer Pconnected to the peer relevant for the query, find all con-sequences of individual literals along with their support-ing reasons. For each clause in the negated query Q inCNF, Algorithm 2 calls the message-passing algorithms tofind its per-literal consequences (line 6) and merges themback together to obtain the consequences of the negatedquery (line 13). All contradictions ultimately caused by thenegated query (since all consequences resulting from unsat-isfiableOA sets are ignored, lines 21 and 25) are found in thedeductive closure of the per-clause consequences (line 23)and the priority of the best one, corresponding to the best-rank reason, is returned. �

The result obtained in Theorem 4 is extended below toconsider only reasons of arguments for the query that are ina preferred extension of AF (P). This is achieved when con-sidering recursive calls to Find Bottom Distr of Algorithm 2,which serve as attack checks on the argument with the rea-son under consideration. Theorem 5 is the core of establish-ing the soundness and completeness of Algorithm 1 as ar-guments in preferred extensions are the basis for distributedentailment (Definition 11).

Theorem 5 (Best-rank reasons of preferred extensions). LetP be a possibly P-inconsistent PQAS. Then for some peerP , Find Bottom Distr(Q,P,∞) of Algorithm 2 with recursivecalls to Find Bottom Distr but without priority-based pruningreturns the rank of the best-rank reason for ¬Q that belongsto an argument in a preferred extension of AF (P).

Proof sketch. We already know that, without recursive callsto Find Bottom Distr, the algorithms find the best-rank reasonfor the query (Theorem 4). It remains to show that reasonswhich are in no preferred extension ofAF (P) are filtered outby posing the reasons to recursive calls of Find Bottom Distr

to find attacks.This is proved by induction on the number of recursive

calls to Find Bottom Distr. The intuition underlying the proofis as follows. In the base case, Algorithm 2 returns ∞

Page 8: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

because no consequence of Q has better or equal prioritythan best and all of Q’s consequences are thus disregarded(line 19). In this case there is no argument attacking Q, so Qis in a preferred extension of AF (P).

We briefly outline the induction step as follows. In thisstep, a recursive call to Find Bottom Distr returns a prioritybetter than best, indicating that an argument attacking Q ex-ists. Call that argument Q′. Since Q is attacked by Q′, it isdisregarded by the algorithm (lines 21 and 25). There aretwo cases.

(1): Q′ is not attacked by any other argument of betterrank. Then Q′ is a valid attack on Q and Q is thus in nopreferred extension of AF (P) and hence Q is rightfully dis-regarded by the algorithm.

(2): Q′ is itself attacked by an argumentQ′′ of better rank.Then either (a) Q∪Q′′ 6|= ⊥ (i.e. both reasons are consistent)or (b) Q ∪Q′′ |= ⊥ (i.e. they are not consistent).

(a): Q′′ is in the same conflict-free set withQ and thus de-fendsQ against the attack fromQ′. In this case the algorithmrightfully disregards Q′ as an attacking reason.

(b): Q′′ also attacks Q directly (since both are mutuallyinconsistent). Thus the recursive call to Find Bottom Distr

will return a better priority than best due to Q′′ already anddisregarding Q′ will do no harm. �

Theorem 6 below establishes that Algorithm 2 terminates,a necessary condition for soundness and completeness.

Theorem 6 (Termination (partially following [1])). Algo-rithm 2 terminates.

Proof sketch. There are two recursions in the overall algo-rithm; one grows the history to find consequences of conse-quences using the message passing algorithms, and the otherchecks whether reasons are attacked via recursive calls toFind Bottom Distr.

For each forth message, a new element is added to the his-tory. There are finitely many variables and peers, so there areonly finitely many possible history elements. Thus a historycan only be of a finite length. Therefore, only finitely manyforth messages can be sent. A forth message necessarily re-sults in a final message and thus this recursion terminates.

The number of recursions of Find Bottom Distr cannot beinfinite as for each priority level (parameter best in Algo-rithm 2), there are finitely many possible attacks (or incon-sistencies of better or equal rank, since the global theory isfinite) and the priority limit best at which possible attacksare valid is monotonically increasing (i.e. worsening) witheach recursive call to Find Bottom Distr. �

Given termination of Algorithm 2, it is easily shown thatAlgorithm 1 terminates.

In the following, we reintroduce priority-based pruningand show that soundness and completeness are retained. Thepruning strategy, while not necessary for the correctness ofthe algorithm, allows for large performance improvementsdue to a significantly smaller search space.

Theorem 7 (Pruning by priority limit). The following holdfor a PQAS running Algorithms 2, 3, 4, 5, and 6. (i) When-ever a peer updates its local value of best, there exists a

reason for the original query or its negation with prioritybest. (ii) Forth messages and acquainted peers ignored by apeer P cannot result in a reason for the original query or itsnegation with better or equal rank than the current value ofbest.

Proof sketch. (i) The algorithms only send prio messages ifa new argument from a preferred extension for either thequery or its negation has been found (line 29 Algorithm 2and line 5 of Algorithm 6).

(ii) Ignoring a forth message or a peer of a certain pri-ority amounts to ignoring a clause of that priority. In ahistory [(ln, Pn, cn), . . . , (li, Pi, ci), . . . , (l1, P1, c1)], the prior-ities ci.prio are monotonically increasing with increasing i

due to the priority of a resolvent being the max-value of thepriorities of the input clauses. Thus an empty clause derivedfrom a clause c with priority c.prio > best cannot have apriority better than or equal to best. The empty clause cor-responds to the reason (via its OA set) and its priority to thereason’s rank. �

Since Algorithm 2 finds the rank of the best-rank argu-ment for the query (Theorem 5), and the algorithm termi-nates (Theorem 6), asking all peers the query and its nega-tion results in an answer according to distributed entailment(Algorithm 1). The following theorem formalizes this mainresult of the present paper regarding our set of query answer-ing algorithms.

Theorem 8 (Soundness and completeness wrt. distributedentailment). Given a PQAS P, Answer Query(Q) of Algo-rithm 1 returns YES iff P |=D Q and P 6|=D ¬Q, NO iffP |=D ¬Q and P 6|=D Q, BOTH iff P |=D Q and P |=D ¬Qand UNK otherwise.

Proof sketch. Assume that P |=D Q and P 6|=D ¬Q. ThenAnswer Query(Q) of Algorithm 1 collects the ranks of thebest-rank arguments for and against the query resulting byasking them at all peers in posP and negP , respectively(lines 4 and 5). The overall best ranks for and against thequery are found (lines 6 and 7) and known to be correct(Theorem 5). Since P |=D Q and P 6|=D ¬Q, we will havepos < neg and YES is returned by the algorithm (line 9).The argument for the algorithm returning NO if P |=D ¬Qand P 6|=D Q is symmetric.

Assume that P |=D Q and P |=D ¬Q. Then we will havepos = neg <∞ and Algorithm 1 returns BOTH (line 13).

If neither pos < ∞ nor neg < ∞, the algorithm returnsUNK (line 14).

The reverse direction can be shown similarly. �

3.4 Pruning and Ordering Heuristic

Time can be saved by the message-passing algorithms whensearching for the best-rank reasons for a query and its nega-tion simultaneously by exploiting the priority ordering overthe peers and the resulting priority ordering over formu-las. Since we are only interested in the best-rank argu-ment from a preferred extension, and since clauses and peerswith worse priority than the currently best known such argu-ment are pruned away, it generally pays off for each peer to

Page 9: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

process messages with consequences of better priority first.This technique ensures that better-rank reasons are foundearlier and the priority limit best is updated more quickly,resulting in a larger part of the search space being pruned.Furthermore, the best-rank reason from a preferred exten-sion will be found earlier (although not necessarily first asthe algorithm runs concurrently across distributed peers). InSection 4, we present empirical results illustrating the effec-tiveness of this priority-ordering heuristic.

3.5 Discussion

Since we are reasoning with peer KBs that may be mutu-ally inconsistent, we want to ensure that any conclusionswe draw are derived from a consistent subset of these KBs.Hence, before sending a found consequence in a back mes-sage, the algorithm checks the satisfiability of the clause’sOA set, since this set could be inconsistent. This may bedone either by a call to a SAT solver or by comparing theclauses in the OA set against cached nogoods as in [12].Both approaches have their merits—one needs no precom-puted nogoods and the other may amortize their computationover multiple queries.

In the special case of a P-consistent PQAS the computa-tion of query answers is less costly. Since no contradictionscan be derived in such a system, a reason only exists eitherfor the query, its negation, or neither. Furthermore, the satis-fiability of clause sets and therefore also the acceptability ofreasons is guaranteed in a P-consistent system and OA setsneed not be tracked nor their consistency checked. Prioritiesneed not be tracked when trying to find the answer to a queryonly. In this case, the first reason for the query found willsuffice and the algorithm may terminate. If the best-rank rea-son for the answers is of interest (for example, if prioritiesrepresent reliability), priorities must be tracked, but only thequery and not its negation asked in a P-consistent system.

4 Implementation and Experiments

To evaluate our pruning technique and ordering heuristic,we created an implementation that simulates on a single ma-chine a PQAS where reasons are not checked for being at-tacked (to either leave the freedom to judge acceptability tothe querying peer or when it can be assumed that reasons arenot attacked in the given PQAS). Although this is a limitedcase of the full system, this demonstrates the effectivenessof the pruning strategy. There is a message queue for eachpeer, and peers take turns to process one message each. Thedefault message queue is a first-in first-out (FIFO) queue,ensuring that messages are processed in the order received.The ordering heuristic uses a priority message queue, whichis sorted by the priorities of clauses and peers of the mes-sages and returns better-priority messages first. Our prun-ing strategy prunes consequences and peers of worse prior-ity than the currently best known argument from a preferredextension.

We executed experiments to evaluate the effectiveness ofboth our priority-ordering heuristic and our pruning strategy.The quality of query answers need not be empirically evalu-ated as a full implementation of the algorithms would neces-

sarily return the globally correct answer due to their sound-ness and completeness wrt. distributed entailment (Theo-rem 8). We ran three variants of the algorithm queryingeach proposition of each of a set of several hundred ran-domly generated PQAS instances. For a given number of ofpeers, instances were generated by giving each peer the samenumber of (distinct) propositions, sharing a random subsetof them with a random selection of other peers through la-beled connections, and randomly generating a fixed numberof clauses of length 1–3 within each peer from its local vo-cabulary. Each instance had 4–20 peers with 4–8 clausesper peer, 1–6 shared propositions per peer, and a total of 8–96 distinct propositions. An incomplete attempt to answera query by a given variant of the algorithm was aborted af-ter ten minutes. Out of all queries given, the naive, prun-ing, and ordering methods solved 1341, 2292, and 2842queries, respectively. The average number of messages re-quired to solve a query for the three methods were 2735,1266, and 456 messages, respectively. Figure 2 shows theresults for individual queries in terms of the number of mes-sages passed throughout the system to solve a query. Thetop plot contrasts the total number of messages sent until ter-mination of the naive algorithm (without pruning) with thenumber of messages used when pruning. Problem instancesare sorted by the number of messages used by the naive ap-proach. The bottom plot compares the number of messagessent by the ordering heuristic version with the pruning-onlyversion and is sorted by the hardness of problems for the lat-ter. We chose the number of messages sent to answer a queryas a performance measure as in a real-world distributed sys-tem, messages would be sent over a network, creating a bot-tleneck for the system. In both plots only non-trivial queriesthat have been solved by at least one of the three methodsare shown. A non-trivial query is one that took the naive al-gorithm at least 300 messages to solve. Such queries took onaverage 52, 52, and 46 messages to solve for the naive, prun-ing, and ordering versions of the algorithm, respectively, soignoring such trivial queries does not hurt the evaluation ofthe naive approach in the comparison with the other two ap-proaches. Figure 3 shows the number of messages savedby the pruning and priority-ordering techniques comparedto not using them for the same set of queries (and in thesame order) as in Figure 2.

The graphs show that pruning can lead to a drastic re-duction in the number of messages generated over the naivealgorithm, and in turn, that the ordering heuristic in con-junction with pruning often beats pruning alone. In a largenumber of cases, the savings due to both pruning and order-ing are very significant. In some cases there are no or almostno savings, and in many cases the savings are somewhere in-between. In instances where no consequences or peers canbe pruned, our pruning strategy may require slightly moremessages than the naive approach (prio message overhead).In a few degenerate cases, the heuristic version of the algo-rithm can use more messages than the pruning-only version.This occurs when a medium-priority message generating areason and best update is at the front of the FIFO queue inthe pruning-only algorithm, but the heuristic version pro-cesses better-priority messages that lead to no reason first,

Page 10: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

0 500 1000 1500 2000 2500 30000

2000

4000

6000

8000

10000

Problem Instances

Num

ber

Messages

Pruning vs. No Pruning

0 500 1000 1500 2000 2500 30000

2000

4000

6000

8000

10000

Problem Instances

Num

ber

Messages

Pruning vs. Pruning with Ordering Heuristic

no pruning

pruning

pruning

pruning + ordering heuristic

Figure 2: Total number of messages sent

0 500 1000 1500 2000 2500 3000−2000

0

2000

4000

6000

8000

10000

Problem Instances

Messages S

aved

Message Savings for Pruning vs. No Pruning

0 500 1000 1500 2000 2500 3000−2000

0

2000

4000

6000

8000

10000

Problem Instances

Messages S

aved

Message Savings for Ordering vs. Pruning

Figure 3: Number of messages saved

potentially generating messages of bad priority before thepruning cutoff variable best is updated in Algorithm 6.

5 Discussion and Related Work

In this paper we provided a formal characterization of a P2Pquery answering system and a distributed entailment relationwhich extends established work on argumentation frame-works to the distributed and prioritized case. Our systemincludes a priority ordering over the peers (which may re-flect trust or reputation) and allows for inconsistent infor-mation among them. The priority ordering is employed bothto find the best-rank argument (for or against a query) froma preferred extension, and to improve the efficiency of thiscomputation. While a priority specification as simple as theone presented in this paper is sufficient for many applica-tions, our framework can be easily extended to richer speci-fications of priorities and aggregation schemes.

We provided an extension to Adjiman et al.’s message-passing algorithm that computes query answers in the pres-

ence of inconsistent knowledge and, optionally, a preferenceordering over the peers’ knowledge. In this paper we usedthis preference ordering solely to determine whether or nota query was entailed by the system. However, the algo-rithm can be trivially extended to report the best priority withwhich a formula is entailed. We proved our algorithm soundand complete with respect to our specification. The com-plexity of the problem requires the algorithms to be doublyrecursive, and thus doubly exponential in runtime. We pro-vided a safe pruning technique and ordering heuristic to al-leviate some of this cost. Empirical analysis demonstrated adramatic improvement in the algorithm’s performance usingthese techniques. This work provides a foundation for manyinteresting extensions, including one to rule languages forthe Semantic Web such as SWRL and RuleML, informa-tion integration by ontology translation between peers, andricher priority specification and aggregation schemes. Oneespecially promising direction is our current work on globalclosed-world reasoning from local closed-world reasoners.

The work presented in this paper touches upon issuesfrom a variety of other areas of research including but notlimited to distributed reasoning in AI and databases, reason-ing with inconsistent knowledge, and reasoning with priori-ties and about trust.

In Section 1 we discussed the most significant relatedwork within the area of distributed logical reasoning (i.e.,[1, 3, 12]). Also related is the work on distributed logicalreasoning with distributed first-order logics [9], and on dis-tributed description logics [19].

Much of the related work on reasoning with inconsistentknowledge concerns inconsistency within a lone KB. Ar-gumentation frameworks (used in this paper to define dis-tributed entailment) provide one of the most plausible se-mantics for inconsistent knowledge [13]. Preference-basedargumentation frameworks only consider preference order-ings over arguments, not individual formulas, and computethe best preference ordering over arguments as a function ofan ordering over conclusions, and not the other way around[2, 16]. Benferhat et al. compare several entailment rela-tions for prioritized inconsistent KBs [5]. (We exploitedtheir notion of prioritized argued entailment to define dis-tributed reasons in this paper.) Previously, Baral et al. pre-sented an approach to merge multiple inconsistent KBs withassociated integrity constraints [4]. In related work, Nebelexplored syntax-based approaches to belief revision [17].Grosof provided a framework for prioritized, inconsistentlogic programs with unique consistent answer sets [15]. Pri-orities have also been used in [8] to achieve extended defaultreasoning in centralized theories.

There is also significant work on distributed inconsistentdatabases. For example, Bertossi and Bravo study queryanswering in P2P data exchange systems that must adhereto data exchange constraints and trust relationships betweenpeers. They develop a semantics of consistency within thissetting [6, 7]. Calvanese et al. develop global semantics infirst-order and epistemic logics for P2P systems of databasesthat may be globally inconsistent, and present a correspond-ing query answering algorithm [11, 10]. Other related workis on P2P information management systems, in which the

Page 11: Peer-to-peer Query Answering with Inconsistent Knowledgesheila/publications/bin-mci... · 2009. 4. 30. · Peer-to-peer Query Answering with Inconsistent Knowledge Arnold Binas and

peers are again regular databases (e.g., [20, 14]). Again, inall of these approaches, information sources are databases,i.e. they contain facts but no rules. This means that, whilethe work in this paper is restricted to propositional logic, inmany cases their knowledge is expressed in first-order lan-guages. However, distributed database systems are restrictedto exchanging facts, whereas we enable a diversity of propo-sitional reasoning in our PQAS.

Lastly, there has been a diversity of work on trust repre-sentation and aggregation for distributed systems. Much ofthe work that is most closely related to this paper concernsthe role of trust in peer data exchange systems, as discussedabove [7]. In this work, trust relationships exist betweenpeers expressed in binary relations of the form “peer A trustsits own data less/equally much as that of peer B”. This dif-fers significantly from the priorities used in our work as suchbinary relations do not necessarily give rise to a partial ortotal ordering of the peers by priority or trust. For a moreextensive review of trust models see [18].

Acknowledgments

We are very grateful to Gerhard Brewka for feedback on thiswork in its various stages of development. We also thankChristian Fritz for helpful discussions on technical details ofthe paper. Michael Gruninger provided his insights on anearlier version of this work. We furthermore thank PhilippeAdjiman for providing an early version of his P2PIS codewhich we did not end up using. Support from the Natu-ral Sciences and Engineering Research Council of Canada(NSERC) is also gratefully acknowledged.

References

[1] P. Adjiman, P. Chatalic, F. Goasdoue, M.-C. Rousset,and L. Simon. Distributed Reasoning in a Peer-to-PeerSetting: Application to the Semantic Web. Journal ofArtificial Intelligence Research, 25:269–314, 2006.

[2] L. Amgoud and C. Cayrol. A Reasoning Model Basedon the Production of Acceptable Arguments. AMAIJournal, 34:197–216, 2002.

[3] E. Amir and S. McIlraith. Partition-Based LogicalReasoning for First-Order and Propositional Theories.Artificial Intelligence, 162(1-2):49–88, 2005.

[4] C. Baral, S. Kraus, and J. Minker. Combining MultipleKnowledge Bases. IEEE Transactions on Knowledgeand Data Engineering, 3(2):208–220, 1991.

[5] S. Benferhat, D. Dubois, and H. Prade. Some Syntac-tic Approaches to the Handling of Inconsistent Knowl-edge Bases: a Comparative Study Part 2: the Prior-itized Case. In E. Orłowska, editor, Logic at Work:Essays Dedicated to the Memory of Helen Rasiowa,volume 24, pages 437–511. Physica-Verlag, 1999.

[6] Leopoldo E. Bertossi and Loreto Bravo. QueryAnswering in Peer-to-Peer Data Exchange Systems.In Proceedings of the EDBT workshops, Heraklion,Greece, pages 476–485, 2004.

[7] Leopoldo E. Bertossi and Loreto Bravo. The Seman-tics of Consistency and Trust in Peer Data ExchangeSystems. In Proceedings of LPAR 2007, pages 107–122, 2007.

[8] G. Brewka and T. Eiter. Prioritizing Default Logic.In Intellectics and Computational Logic, pages 27–45.Kluwer Academic Publishers, 2000.

[9] Ghidini C. and Serafini L. Distributed First Order Log-ics. In Frontiers of Combining Systems 2, pages 121–139. Research Studies Press Ltd. Baldock, 2000.

[10] Diego Calvanese, Giuseppe De Giacomo, DomenicoLembo, Maurizio Lenzerini, and Riccardo Rosati. In-consistency Tolerance in P2P Data Integration: AnEpistemic Logic Approach. Information Systems,33(4-5):360–384, 2008.

[11] Diego Calvanese, Giuseppe De Giacomo, MaurizioLenzerini, and Riccardo Rosati. Logical Foundationsof Peer-to-peer Data Integration. In PODS ’04: Pro-ceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,pages 241–251, New York, NY, USA, 2004. ACM.

[12] P. Chatalic, G.-H. Nguyen, and M.-C. Rousset. Rea-soning with Inconsistencies in Propositional Peer-to-Peer Inference Systems. In ECAI 2006, pages 352–357, 2006.

[13] P. M. Dung. On the Acceptability of Arguments and itsFundamental Role in Nonmonotonic Reasoning, LogicProgramming and n-Person Games. Artificial Intelli-gence, 77(2):321–358, 1995.

[14] Enrico Franconi, Gabriel Kuper, Andrei Lopatenko,and Ilya Zaihrayeu. A Distributed Algorithm for Ro-bust Data Sharing and Updates in P2P Database Net-works. In Proceedings of the EDBT P2P&DB work-shop, Heraklion, Greece, pages 446–455, 2004.

[15] B. N. Grosof. Courteous Logic Programs: PrioritizedConflict Handling for Rules. Technical Report RC20836, IBM, 1997.

[16] S. Kaci and L. van der Torre. Preference Reason-ing for Argumentation: Non-monotonicity and Algo-rithms. In Proceedings of the International Workshopon Non-Monotonic Reasoning (NMR’06), pages 237–243, 2006.

[17] B. Nebel. Syntax-based Approaches to Belief Revi-sion. In P. Gardenfors, editor, Belief Revision, vol-ume 29, pages 52–88. Cambridge University Press,1992.

[18] Jordi Sabater and Carles Sierra. Review on Computa-tional Trust and Reputation Models. Artificial Intelli-gence Review, 24(1):33–60, 2005.

[19] L. Serafini and A. Tamilin. DRAGO: Distributed Rea-soning Architecture for the Semantic Web. TechnicalReport T04-12-05, ITC-irst, 2004.

[20] I. Zaihrayeu. Towards Peer-to-Peer Information Man-agement Systems. PhD thesis, University of Trento,2006.