learning nearly monotone k-term dnf

ELSEVIER Information Processing Letters 67 (1998) 75-79

Informqtion h=ccx?sng

Learning nearly monotone k-term DNF *

Jorge Castro *, David Guijarro ’ , Victor Lavin 2

Received 15 May 1997; received in revised form 15 April 1998 Communicated by P.M.B. Vitanyi

Abstract

This note studies the learnability of the class of k-term DNF with a bounded number of negations per term and using either membership queries only or equivalence queries only. We give tight upper and lower bounds for the number of terms and negations per term when learning with a polynomial number of membership queries. We prove that a polynomial number of cquivalencc queries will not suffice. Finally, we give an algorithm for the simple-PAC model. 0 1998 Published by Elsevier Science B.V. All rights reserved.

Kqww~/v: Algorithms: Concept learning: Query learning; Kolmogorov complexity

1. Introduction

Among the different models of learning proposed in Computational Learning Theory, one of the most

widely studied is the exact learning via queries model, introduced by Angluin in [2,3]. In this model the learner’s goal is to identify an unknown target function .f’ taken from some representation class C. In order to get information about the target, the learner has two types of queries available: membership and equivalence queries. In a membership query, the learner sup- plies an instance x from the domain and gets f’(z) as answer. The input to an equivalence query is a hypothesis h, and the answer is either “yes” if h = ,f or a

Supported by The Esprit Long Term Research Project ALCOM

IT (nr. 20244), the Working Group 8556 (NemoColt). and the Spanish DGICYT (project PB92-0709).

* Corresponding author, Email: [email protected].

’ Email: [email protected].

’ Supported by FP93 13717942 grant from the Spanish Govern-

ment. Email: [email protected].

counterexample from the symmetric difference of ,f and h.

Many interesting results have been obtained within

this framework using different combinations of que-

ries; however, there are a great deal of problems that still remain open. In particular, one of the most

challenging open problems is whether the class of

DNF formulas is learnable. Although it is known that this class is not learnable using either membership or equivalence queries alone [3,4j, not much can be said

by now about learning using both types of queries. Several subclasses of the class DNF have been

studied from the point of view of learnability. We will focus our attention on the class of k-term DNF, i.e.,

the class of DNF formulas whose number of terms is

at most k. We now review some of the results concerning

this class. In [l], Angluin gives a polynomial time algorithm that learns k-term DNF with membership and equivalence queries. The algorithm is proper in the sense that it uses k-term DNF formulas as hypotheses.

OO20-0190/98/$19.00 0 1998 Published by Elsevier Science B.V. All rights reserved. PB.S0020-0190(98)00090-8

76 J. Castro et al. /Information Processing Letters 67 (1998) 75-79

Improvements in the running time can be found

in [9,5]. On the other hand, it is known that the class of

k-term DNF cannot be learned with a polynomial

number of membership queries; however, if we re- strict that class to be monotone then learning can be

achieved in polynomial time. We study how much non-monotonicity can be tolerated so that learning is

still feasible. We give tight upper and lower bounds in the

number of negations per term that make learning feasible. We prove that if the number of negations per

term is bounded by a constant then we can learn in polynomial time using membership queries, but any

bound that grows faster than a constant makes the learning task impossible in polynomial time. We also show that the number of terms is tight, i.e., if the number of terms grows faster than a constant then

learning becomes unfeasible. Both non-learnability proofs are information theoretic, which implies that the negative results hold even if we allow the learner

to output a representation from any arbitrary class of hypotheses.

We also prove that any learning algorithm for the considered class requires a superpolynomial number of equivalence queries if this is the only type of query allowed. Our results, negative and positive, for membership queries also apply to k-term DNF formulas with a bounded number of positive literals per term and to k-clause CNF formulas where the number of negative or positive literals per clause is bounded. From [7], we can easily obtain a learning

algorithm for these classes, but it generates decision trees as hypotheses. In this work we only consider proper learning algorithms.

Finally, we consider the simple-PAC model intro-

duced by Li and Vitanyi in [lo], and extend their result for monotone k-term DNF to the class considered in our paper. This model is a distribution dependent version of PAC where examples are drawn according to the universal distribution. This distribution assigns high probability to examples with low Kolmogorov complexity. Learning under the universal distribution has a nice property: if the output hypothesis is “good” under the universal distribution then it also performs well under any computable distribution.

The paper is organized as follows. In Section 2 we introduce notation and definitions needed in the

paper. In Section 3 we describe our algorithm to learn the class we consider, prove its correctness and time

bounds, and describe the simple-PAC result. Section 4 is devoted to negative results.

2. Definitions

As we have already mentioned, we use the model of exact learning via queries as it was proposed in [3].

We deal with Boolean functions represented as dis- junctions of conjunctions of literals (DNF). A literal

is a variable or the negation of a variable, and a term

is a conjunction of literals. A k-term DNF formula is a

disjunction of at most k terms. A (k, j)-term DNF formula is a k-term DNF formula with at most j negated variables per term. Variables are indexed by natural

numbers from 1 to n (where n is the number of variables over which the target class is defined). We use

A, to denote the set of all assignments with at most s

zeroes. For an assignment a, ones(a) denotes the set of indices of variables where a has a 1 and zeros(u) denotes the set of indices of variables where a has a 0.

We direct the reader to [lo] for definitions related to simple-PAC.

3. Positive results

First, we note that we can solve the equivalence test of two (k, j)-term DNF formulas using the assign-

ments in Ak+j only.

Lemma 1. Let f be a (k, j)-term DNFformula and g

be a k-term DNF fomtula such that f does not imply

g. Then there exists an assignment with at most k + j

zeros that satisfies f but not g.

Proof. Let x be an assignment that satisfies some term t of f and falsifies some clause c of the CNF

representation of g. Note that j zeros are enough to satisfy t. On the other hand, the clauses in the CNF representation of g have at most k literals so k zeros suffice to falsify c. The assignment that coincides with x in those k + j literals and has ones in the rest also satisfies t and falsifies c. ~1

J. Castro et al. /Information Processing Letters 67 (I 998) 75-79 77

We present an algorithm that learns the class of (k, j)-term DNF formulas in time

k+j+l logk n

using membership queries only. This algorithm is based on Blum and Rudich’s [6] algorithm for k-term DNF that uses membership and equivalence queries.

Our strategy will be to replace the equivalence queries by membership queries using Lemma 1.

Blum and Rudich’s algorithm keeps a hypothesis that is a DNF (not necessarily a k-term DNF) and makes an equivalence query with it. If the answer

is a negative counterexample then the terms of the

hypothesis that are satisfied by the counterexample are deleted. If the counterexample is positive the deterministic procedure Produce- terms is called

and its output is added to the hypothesis. Their algorithm works because of Fact 2.

Fact 2 [6]. Let f be a k-term DNF formula and x an assignment such that f(x) = 1. The function

Produce-terms outputs a set T of at most 20ck)

log n terms satisfiing the following properties:

- For any irreducible k-term DNF representation f’

off, T contains at least one of the terms from f’

satisfied by x. - Produce- terms uses membership queries only

and runs in time 20ck)n logn.

We cannot apply their algorithm directly because the hypotheses they use are not necessarily (k, j)-term DNF and we do not know how to replace the equivalence queries. We use a technique from [9] in order to make equivalence queries with (k, j)-term DNF. This technique consists of simulating the construction

of a tree of depth at most k that, in each node, contains a (k, j)-term DNF formula that implies the target. The root node contains the empty DNF. At every node an equivalence test is done using Lemma 1. If a counterexample is found then the algorithm calls Produce-terms and the node branches into at most 2’ck) log n new nodes that contain a formula with one more term. Our algorithm traverses the tree by levels keeping a set of nodes alive. A node survives if it implies the target, all its terms have at most j negations, and has no more than k terms. The process is termi- nated when we find a node equivalent to the target.

Correctness and termination. They follow from Fact

2 and the tree construction that allows us to answer equivalence queries correctly.

Query complexity. First we ask membership queries on Ak+j and we ask 2 Ock)n log n membership queries

in each call to Produce-terms. Note that Pro-

duce-terms is only called when the node contains

a DNF with at most k - 1 terms. The resulting query

complexity is (en/(k + j)) k+j + 2°(k2)n logk n, using

(en/(k + j)jk’j as an upper bound for 1 Ak+j I.

Time complexity. The cost of the process at each

internal node is at most 1 Ak+j 1 + 20ck)n logn and only IAk+j 1 at the leaves. So the total cost is at

most 2°(k’)n(en/(k + j)) k+j logk n, where the extra n

factor comes from a rough upper bound on the time

used to build a membership query. This factor can be improved in our particular case but we do not deal with

these low-level implementation details. All previous considerations lead to our main result

stated as Theorem 3.

Theorem 3. The class of (k, j)-term DNF,formulas is

exactly learnable in time 2°(k2)(en/(k + j))(k+j)+* . logk n using membership queries only.

In [lo], Li and Vitanyi prove that the class of monotone k-term DNF is simple-PAC learnable. In

fact, they prove a stronger result since their algorithm outputs, with high probability, an exact representation

of the target function instead of producing a “good”

approximation. The key observation in their proof is that when sampling according to the universal distribution, all the assignments with at most 1 zeros are in a polynomial-sized sample with high probability if E is a constant.

We will proceed in the same manner for the class of (k, j)-term DNF proving simple-PAC learnability for this class. We ask for a polynomial-sized sample and get, with high probability, all the assignments with at most k + j zeros. If the sample contains all those assignments we run the algorithm above, otherwise, we do nothing. Whenever the algorithm asks a membership query for an assignment not in the sample (this might occur while running Produce- terms) we answer it using Lemmas 1 and 4.

78 J. Castro et al. /Information Processing Letters 67 (1998) 75-79

Lemma 4. Let f be a DNF with at most j negations

per term. For any assignment a, f(a) = 1 if and only if

there exists a term t with at most j negations such that

t(a) = 1, t implies f, and the set of positive variables

in t is ones(a).

Proof. The if part is trivial. For the only if part, let

t’ be a term in f that accepts a. Clearly the positive

variables of t’ are a subset of ones(a). We define t to be a term with the negated variables taken from

the negated variables oft’ and with positive variables ones(a). It is trivial to see that t(a) = 1. It is also

easy to show that t implies f since t implies t’ and t’ implies f. Moreover, ones(a) are the positive

variables oft by construction. q

Let b be the input assignment to a membership query. Now, we construct all the terms that have as

positive literals the variables in ones(b) and at most

j negations. Then, we use Lemma 1 to test if any

of those terms implies the target function. If none of

them implies the target we answer NO, otherwise, we

answer YES. Note that this procedure of simulating membership

queries could be used to improve the query complex-

ity of the algorithm described above since the initial

(en/(k + j))k+j membership queries would suffice.

This would increase the time complexity by a multi-

plicative factor of (en/j)j(en/(k + j))k+j.

4. Lower bounds

In this section we prove superpolynomial lower

bounds on the number of queries needed to learn the class of (k, j)-term DNF when k or j are not constant.

In the case of membership queries we prove that our

positive result is tight for two directions, that is, we cannot afford either more than a constant number of

terms or more than a constant number of negations per term. For the first case we use a target class from [8] and for the second we simply count the number of singleton sets that can be represented with a number

of negations per term that is bigger than any constant. For equivalence queries we describe an adversary

that forces a superpolynomial number of queries. In

the case of equivalence queries this does not imply

non-learnability using any hypotheses class.

Theorem 5. There is no polynomial time algorithm

that learns (f(n), 0)-DNF formulas using member-

ship queries if f grows faster than any constant.

Proof. Suppose that the class is learnable with n’

membership queries. We use the class T,, of part (1) of Theorem 22 in [8]. Let m be c + 1,l be n/m and TN be

{f: f = t V Vy=“=, ti], where each ti = lh\:&_ljl+l Xj and t is a monotone term that contains all variables

except one from each ti. Observe that Tn is a subset

of (f(n), 0)-DNF for n such that f(n) > c + 1. We

choose n large enough such that n“ < I”‘. Following

the same argument as in [8] one can show that 1’”

queries are necessary to learn T,. q

Using a standard argument with singletons it can be

proven that the number of negations we allow per term

is also optimal.

Theorem 6. There is no polynomial time algorithm

that learns (1, f (n))-DNF formulas using member-

ship queries if ,f grows faster than any constant.

Theorem 7. For any k > 1 and j 3 0, the class of

(k, j)-term DNF is not learnable using a polynomial

number of equivalence queries that are k-term DNF (without the j restriction).

Proof. Let Tn,k be the target class of functions where each function is the disjunction of k pairwise disjoint

monotone terms of length n/k. Here, disjoint means that no pair of terms have a variable in common. It can be shown that

ITn.kl= n!

We will prove that for any k-term DNF formula h there exists an assignment a such that the number of functions in Tn.k that agree with h in the classification of a is a “small” fraction of ]Tn,k ]. Therefore, provid-

ing that assignment as a counterexample to an equivalence query on input h will force any learner to make a “large” number of equivalence queries before finding

J. Castro et ul. /Information Processing Letters 67 (lY%) 75-79 19

the target. To see that some u satisfying these require- the time complexity. This non-adaptiveness yields a ments exists, we consider the following cases: simple-PAC learning algorithm.

(i)

(ii)

(iii)

Ihl < k. Then h is falsified by some assignment

containing less than k 0’s. Clearly, such an assignment does not falsify any of the functions

in T,,.k.

Ihl = k and some term r of h has at most n/k

positive literals. The assignment that satisfies

those positive literals and has O’s in the remaining

positions is a positive example for h. However,

the fraction of formulas in T,, satisfied by such assignment is at most

l&r-n/k.k-I I = (n - n/k)!(n/k)!k

IT,I.LI II !

which is smaller than 1 /n(’ for any constant c and

n large enough. Otherwise, since all the terms in h have more

than n/k positive literals, at least two of them must share one positive literal. Therefore h can

be falsitied by some assignment with less than k

O’s as in case I. 0

5. Conclusions

We have studied the learnability of the class of

(k, j)-term DNF formulas with either equivalence queries only or membership queries only. In the case

of equivalence queries we have shown that even using queries that are general k-term DNF formulas does not

suffice to learn the class of (k, j)-term DNF formulas

in polynomial time. In the case of membership queries we have shown that k and j must be constants if we

require polynomial time learning. Our first approach leads to an algorithm that asks adaptive membership

queries and we show how to transform it into an algorithm with non-adaptive membership queries at

the price of a multiplicative polynomial factor in

Our results generalize to the subclass of k-term DNF formulas where each term has either at most j negated literals or j unnegated literals and also, by duality, to k-clause CNF formulas with the corresponding bound in the number of negated/unnegated literals per clause.

Acknowledgements

We thank Jose L. Balcazar, Ricard Gavalda, Car- 10s Domingo, and two anonymous referees for help-

ful comments. This work was presented at the Euro- colt’97 meeting in Jerusalem.

References

D. Angluin, Learning k-term DNF formulas using queries

and counter-examples. Technical Report YALEUIDCSIRR-

559, Yale University, 1987.

D. Angluin. Learning regular sets from queries and counterex-

amples, Inform. and Comput. 75 (1987) 87-106.

D. Angluin, Queries and concept learning, Mach. Learning 2

(4) (I 988) 3 19-342.

D. Angluin, Negative results for equivalence queries, Mach.

Learning 5 (1990) 3 19-342.

U. Berggren, Linear time deterministic learning of k-term

DNF. in: Proc. 6th Annual Workshop on Comput. Learning

Theory. ACM Press, New York, 1993, pp. 37-40.

A. Blum. S. Rudich, Fast learning of k-term DNF formulas

with queries, J. Comput. System Sci. 5 I (1995) 367-373.

N. Bshouty, Simple learning algorithms using divide and

conquer, in: Proc. 8th Annual ACM Workshop on Comput.

Learning Theory, 1995, pp. 447453.

N. Bshouty, R. Cleve, R. Gavalda S. Kannan, C. Tamon,

Oracles and queries that are sufficient for exact learning,

J. Comput. System Sci. 52 (1996) 421-433.

N. Bshouty, S. Goldman, T. Hancock, S. Matar, Asking queries

to minimize errors, in: Proc. 6th Annual Workshop on Comput.

Learning Theory, ACM Press, New York, 1993, pp. 41-50.

M. Li, P. Vitzinyi, Learning simple concepts under simple

distributions, SIAM J. Comput. 20 (1991) 91 l-935.

learning nearly monotone k-term dnf

Documents