learning nearly monotone k-term dnf
TRANSCRIPT
ELSEVIER Information Processing Letters 67 (1998) 75-79
Informqtion h=ccx?sng
Learning nearly monotone k-term DNF *
Jorge Castro *, David Guijarro ’ , Victor Lavin 2
Received 15 May 1997; received in revised form 15 April 1998 Communicated by P.M.B. Vitanyi
Abstract
This note studies the learnability of the class of k-term DNF with a bounded number of negations per term and using either membership queries only or equivalence queries only. We give tight upper and lower bounds for the number of terms and negations per term when learning with a polynomial number of membership queries. We prove that a polynomial number of cquivalencc queries will not suffice. Finally, we give an algorithm for the simple-PAC model. 0 1998 Published by Elsevier Science B.V. All rights reserved.
Kqww~/v: Algorithms: Concept learning: Query learning; Kolmogorov complexity
1. Introduction
Among the different models of learning proposed in Computational Learning Theory, one of the most
widely studied is the exact learning via queries model, introduced by Angluin in [2,3]. In this model the learner’s goal is to identify an unknown target function .f’ taken from some representation class C. In order to get information about the target, the learner has two types of queries available: membership and equiva- lence queries. In a membership query, the learner sup- plies an instance x from the domain and gets f’(z) as answer. The input to an equivalence query is a hypoth- esis h, and the answer is either “yes” if h = ,f or a
Supported by The Esprit Long Term Research Project ALCOM
IT (nr. 20244), the Working Group 8556 (NemoColt). and the Spanish DGICYT (project PB92-0709).
* Corresponding author, Email: [email protected].
’ Email: [email protected].
’ Supported by FP93 13717942 grant from the Spanish Govern-
ment. Email: [email protected].
counterexample from the symmetric difference of ,f and h.
Many interesting results have been obtained within
this framework using different combinations of que-
ries; however, there are a great deal of problems that still remain open. In particular, one of the most
challenging open problems is whether the class of
DNF formulas is learnable. Although it is known that this class is not learnable using either membership or equivalence queries alone [3,4j, not much can be said
by now about learning using both types of queries. Several subclasses of the class DNF have been
studied from the point of view of learnability. We will focus our attention on the class of k-term DNF, i.e.,
the class of DNF formulas whose number of terms is
at most k. We now review some of the results concerning
this class. In [l], Angluin gives a polynomial time algorithm that learns k-term DNF with membership and equivalence queries. The algorithm is proper in the sense that it uses k-term DNF formulas as hypotheses.
OO20-0190/98/$19.00 0 1998 Published by Elsevier Science B.V. All rights reserved. PB.S0020-0190(98)00090-8
76 J. Castro et al. /Information Processing Letters 67 (1998) 75-79
Improvements in the running time can be found
in [9,5]. On the other hand, it is known that the class of
k-term DNF cannot be learned with a polynomial
number of membership queries; however, if we re- strict that class to be monotone then learning can be
achieved in polynomial time. We study how much non-monotonicity can be tolerated so that learning is
still feasible. We give tight upper and lower bounds in the
number of negations per term that make learning feasible. We prove that if the number of negations per
term is bounded by a constant then we can learn in polynomial time using membership queries, but any
bound that grows faster than a constant makes the learning task impossible in polynomial time. We also show that the number of terms is tight, i.e., if the number of terms grows faster than a constant then
learning becomes unfeasible. Both non-learnability proofs are information theoretic, which implies that the negative results hold even if we allow the learner
to output a representation from any arbitrary class of hypotheses.
We also prove that any learning algorithm for the considered class requires a superpolynomial number of equivalence queries if this is the only type of query allowed. Our results, negative and positive, for membership queries also apply to k-term DNF formulas with a bounded number of positive literals per term and to k-clause CNF formulas where the number of negative or positive literals per clause is bounded. From [7], we can easily obtain a learning
algorithm for these classes, but it generates decision trees as hypotheses. In this work we only consider proper learning algorithms.
Finally, we consider the simple-PAC model intro-
duced by Li and Vitanyi in [lo], and extend their re- sult for monotone k-term DNF to the class considered in our paper. This model is a distribution dependent version of PAC where examples are drawn according to the universal distribution. This distribution assigns high probability to examples with low Kolmogorov complexity. Learning under the universal distribution has a nice property: if the output hypothesis is “good” under the universal distribution then it also performs well under any computable distribution.
The paper is organized as follows. In Section 2 we introduce notation and definitions needed in the
paper. In Section 3 we describe our algorithm to learn the class we consider, prove its correctness and time
bounds, and describe the simple-PAC result. Section 4 is devoted to negative results.
2. Definitions
As we have already mentioned, we use the model of exact learning via queries as it was proposed in [3].
We deal with Boolean functions represented as dis- junctions of conjunctions of literals (DNF). A literal
is a variable or the negation of a variable, and a term
is a conjunction of literals. A k-term DNF formula is a
disjunction of at most k terms. A (k, j)-term DNF for- mula is a k-term DNF formula with at most j negated variables per term. Variables are indexed by natural
numbers from 1 to n (where n is the number of vari- ables over which the target class is defined). We use
A, to denote the set of all assignments with at most s
zeroes. For an assignment a, ones(a) denotes the set of indices of variables where a has a 1 and zeros(u) denotes the set of indices of variables where a has a 0.
We direct the reader to [lo] for definitions related to simple-PAC.
3. Positive results
First, we note that we can solve the equivalence test of two (k, j)-term DNF formulas using the assign-
ments in Ak+j only.
Lemma 1. Let f be a (k, j)-term DNFformula and g
be a k-term DNF fomtula such that f does not imply
g. Then there exists an assignment with at most k + j
zeros that satisfies f but not g.
Proof. Let x be an assignment that satisfies some term t of f and falsifies some clause c of the CNF
representation of g. Note that j zeros are enough to satisfy t. On the other hand, the clauses in the CNF representation of g have at most k literals so k zeros suffice to falsify c. The assignment that coincides with x in those k + j literals and has ones in the rest also satisfies t and falsifies c. ~1
J. Castro et al. /Information Processing Letters 67 (I 998) 75-79 77
We present an algorithm that learns the class of (k, j)-term DNF formulas in time
k+j+l logk n
using membership queries only. This algorithm is based on Blum and Rudich’s [6] algorithm for k-term DNF that uses membership and equivalence queries.
Our strategy will be to replace the equivalence queries by membership queries using Lemma 1.
Blum and Rudich’s algorithm keeps a hypothesis that is a DNF (not necessarily a k-term DNF) and makes an equivalence query with it. If the answer
is a negative counterexample then the terms of the
hypothesis that are satisfied by the counterexample are deleted. If the counterexample is positive the deterministic procedure Produce- terms is called
and its output is added to the hypothesis. Their algorithm works because of Fact 2.
Fact 2 [6]. Let f be a k-term DNF formula and x an assignment such that f(x) = 1. The function
Produce-terms outputs a set T of at most 20ck)
log n terms satisfiing the following properties:
- For any irreducible k-term DNF representation f’
off, T contains at least one of the terms from f’
satisfied by x. - Produce- terms uses membership queries only
and runs in time 20ck)n logn.
We cannot apply their algorithm directly because the hypotheses they use are not necessarily (k, j)-term DNF and we do not know how to replace the equiva- lence queries. We use a technique from [9] in order to make equivalence queries with (k, j)-term DNF. This technique consists of simulating the construction
of a tree of depth at most k that, in each node, con- tains a (k, j)-term DNF formula that implies the tar- get. The root node contains the empty DNF. At every node an equivalence test is done using Lemma 1. If a counterexample is found then the algorithm calls Produce-terms and the node branches into at most 2’ck) log n new nodes that contain a formula with one more term. Our algorithm traverses the tree by levels keeping a set of nodes alive. A node survives if it im- plies the target, all its terms have at most j negations, and has no more than k terms. The process is termi- nated when we find a node equivalent to the target.
Correctness and termination. They follow from Fact
2 and the tree construction that allows us to answer equivalence queries correctly.
Query complexity. First we ask membership queries on Ak+j and we ask 2 Ock)n log n membership queries
in each call to Produce-terms. Note that Pro-
duce-terms is only called when the node contains
a DNF with at most k - 1 terms. The resulting query
complexity is (en/(k + j)) k+j + 2°(k2)n logk n, using
(en/(k + j)jk’j as an upper bound for 1 Ak+j I.
Time complexity. The cost of the process at each
internal node is at most 1 Ak+j 1 + 20ck)n logn and only IAk+j 1 at the leaves. So the total cost is at
most 2°(k’)n(en/(k + j)) k+j logk n, where the extra n
factor comes from a rough upper bound on the time
used to build a membership query. This factor can be improved in our particular case but we do not deal with
these low-level implementation details. All previous considerations lead to our main result
stated as Theorem 3.
Theorem 3. The class of (k, j)-term DNF,formulas is
exactly learnable in time 2°(k2)(en/(k + j))(k+j)+* . logk n using membership queries only.
In [lo], Li and Vitanyi prove that the class of monotone k-term DNF is simple-PAC learnable. In
fact, they prove a stronger result since their algorithm outputs, with high probability, an exact representation
of the target function instead of producing a “good”
approximation. The key observation in their proof is that when sampling according to the universal distribution, all the assignments with at most 1 zeros are in a polynomial-sized sample with high probability if E is a constant.
We will proceed in the same manner for the class of (k, j)-term DNF proving simple-PAC learnability for this class. We ask for a polynomial-sized sample and get, with high probability, all the assignments with at most k + j zeros. If the sample contains all those assignments we run the algorithm above, otherwise, we do nothing. Whenever the algorithm asks a membership query for an assignment not in the sample (this might occur while running Produce- terms) we answer it using Lemmas 1 and 4.
78 J. Castro et al. /Information Processing Letters 67 (1998) 75-79
Lemma 4. Let f be a DNF with at most j negations
per term. For any assignment a, f(a) = 1 if and only if
there exists a term t with at most j negations such that
t(a) = 1, t implies f, and the set of positive variables
in t is ones(a).
Proof. The if part is trivial. For the only if part, let
t’ be a term in f that accepts a. Clearly the positive
variables of t’ are a subset of ones(a). We define t to be a term with the negated variables taken from
the negated variables oft’ and with positive variables ones(a). It is trivial to see that t(a) = 1. It is also
easy to show that t implies f since t implies t’ and t’ implies f. Moreover, ones(a) are the positive
variables oft by construction. q
Let b be the input assignment to a membership query. Now, we construct all the terms that have as
positive literals the variables in ones(b) and at most
j negations. Then, we use Lemma 1 to test if any
of those terms implies the target function. If none of
them implies the target we answer NO, otherwise, we
answer YES. Note that this procedure of simulating membership
queries could be used to improve the query complex-
ity of the algorithm described above since the initial
(en/(k + j))k+j membership queries would suffice.
This would increase the time complexity by a multi-
plicative factor of (en/j)j(en/(k + j))k+j.
4. Lower bounds
In this section we prove superpolynomial lower
bounds on the number of queries needed to learn the class of (k, j)-term DNF when k or j are not constant.
In the case of membership queries we prove that our
positive result is tight for two directions, that is, we cannot afford either more than a constant number of
terms or more than a constant number of negations per term. For the first case we use a target class from [8] and for the second we simply count the number of singleton sets that can be represented with a number
of negations per term that is bigger than any constant. For equivalence queries we describe an adversary
that forces a superpolynomial number of queries. In
the case of equivalence queries this does not imply
non-learnability using any hypotheses class.
Theorem 5. There is no polynomial time algorithm
that learns (f(n), 0)-DNF formulas using member-
ship queries if f grows faster than any constant.
Proof. Suppose that the class is learnable with n’
membership queries. We use the class T,, of part (1) of Theorem 22 in [8]. Let m be c + 1,l be n/m and TN be
{f: f = t V Vy=“=, ti], where each ti = lh\:&_ljl+l Xj and t is a monotone term that contains all variables
except one from each ti. Observe that Tn is a subset
of (f(n), 0)-DNF for n such that f(n) > c + 1. We
choose n large enough such that n“ < I”‘. Following
the same argument as in [8] one can show that 1’”
queries are necessary to learn T,. q
Using a standard argument with singletons it can be
proven that the number of negations we allow per term
is also optimal.
Theorem 6. There is no polynomial time algorithm
that learns (1, f (n))-DNF formulas using member-
ship queries if ,f grows faster than any constant.
Theorem 7. For any k > 1 and j 3 0, the class of
(k, j)-term DNF is not learnable using a polynomial
number of equivalence queries that are k-term DNF (without the j restriction).
Proof. Let Tn,k be the target class of functions where each function is the disjunction of k pairwise disjoint
monotone terms of length n/k. Here, disjoint means that no pair of terms have a variable in common. It can be shown that
ITn.kl= n!
We will prove that for any k-term DNF formula h there exists an assignment a such that the number of func- tions in Tn.k that agree with h in the classification of a is a “small” fraction of ]Tn,k ]. Therefore, provid-
ing that assignment as a counterexample to an equiva- lence query on input h will force any learner to make a “large” number of equivalence queries before finding
J. Castro et ul. /Information Processing Letters 67 (lY%) 75-79 19
the target. To see that some u satisfying these require- the time complexity. This non-adaptiveness yields a ments exists, we consider the following cases: simple-PAC learning algorithm.
(i)
(ii)
(iii)
Ihl < k. Then h is falsified by some assignment
containing less than k 0’s. Clearly, such an assignment does not falsify any of the functions
in T,,.k.
Ihl = k and some term r of h has at most n/k
positive literals. The assignment that satisfies
those positive literals and has O’s in the remaining
positions is a positive example for h. However,
the fraction of formulas in T,, satisfied by such assignment is at most
l&r-n/k.k-I I = (n - n/k)!(n/k)!k
IT,I.LI II !
which is smaller than 1 /n(’ for any constant c and
n large enough. Otherwise, since all the terms in h have more
than n/k positive literals, at least two of them must share one positive literal. Therefore h can
be falsitied by some assignment with less than k
O’s as in case I. 0
5. Conclusions
We have studied the learnability of the class of
(k, j)-term DNF formulas with either equivalence queries only or membership queries only. In the case
of equivalence queries we have shown that even using queries that are general k-term DNF formulas does not
suffice to learn the class of (k, j)-term DNF formulas
in polynomial time. In the case of membership queries we have shown that k and j must be constants if we
require polynomial time learning. Our first approach leads to an algorithm that asks adaptive membership
queries and we show how to transform it into an algorithm with non-adaptive membership queries at
the price of a multiplicative polynomial factor in
Our results generalize to the subclass of k-term DNF formulas where each term has either at most j negated literals or j unnegated literals and also, by duality, to k-clause CNF formulas with the corresponding bound in the number of negated/unnegated literals per clause.
Acknowledgements
We thank Jose L. Balcazar, Ricard Gavalda, Car- 10s Domingo, and two anonymous referees for help-
ful comments. This work was presented at the Euro- colt’97 meeting in Jerusalem.
References
D. Angluin, Learning k-term DNF formulas using queries
and counter-examples. Technical Report YALEUIDCSIRR-
559, Yale University, 1987.
D. Angluin. Learning regular sets from queries and counterex-
amples, Inform. and Comput. 75 (1987) 87-106.
D. Angluin, Queries and concept learning, Mach. Learning 2
(4) (I 988) 3 19-342.
D. Angluin, Negative results for equivalence queries, Mach.
Learning 5 (1990) 3 19-342.
U. Berggren, Linear time deterministic learning of k-term
DNF. in: Proc. 6th Annual Workshop on Comput. Learning
Theory. ACM Press, New York, 1993, pp. 37-40.
A. Blum. S. Rudich, Fast learning of k-term DNF formulas
with queries, J. Comput. System Sci. 5 I (1995) 367-373.
N. Bshouty, Simple learning algorithms using divide and
conquer, in: Proc. 8th Annual ACM Workshop on Comput.
Learning Theory, 1995, pp. 447453.
N. Bshouty, R. Cleve, R. Gavalda S. Kannan, C. Tamon,
Oracles and queries that are sufficient for exact learning,
J. Comput. System Sci. 52 (1996) 421-433.
N. Bshouty, S. Goldman, T. Hancock, S. Matar, Asking queries
to minimize errors, in: Proc. 6th Annual Workshop on Comput.
Learning Theory, ACM Press, New York, 1993, pp. 41-50.
M. Li, P. Vitzinyi, Learning simple concepts under simple
distributions, SIAM J. Comput. 20 (1991) 91 l-935.