exact learning of boolean functions with queries lisa hellerstein polytechnic university brooklyn,...
Post on 20-Dec-2015
215 views
TRANSCRIPT
Exact Learning of Boolean Functions with Queries
Lisa Hellerstein
Polytechnic University
Brooklyn, NY
AMS Short Course on Statistical Learning Theory, 2007
I. Introduction
Learning Boolean formula f
Problem: Boolean formula f hidden in a black box. Told that f is from class C of formulas.
Task: “Learn” f
f(x1, x2, x3) = x1 Λ x3
Learning representation f of Boolean function
Problem: f hidden in a black box. Told that f is from class C of representations of Boolean functions.
Task: “Learn” f
f = x1 + x2 (mod 2)
Boolean functions can represent
• Whether a person is good or bad
• Whether an email message is spam
• Whether tumor is malignant
• Whether a book is a romance novel
• etc.
How hard is it to learn target f?
Need to specify:Type of information available
What’s meant by “learning”
Learning Models
II. Learning Models
Valiant’s PAC Model (1984)
• PAC = Probably Approximately Correct• Type of info available:
– Random examples:Value of f on “random” points in its domain
• Success Criterion:– Approximate learningOutput h that is approximately functionally
equivalent to f
Query Models (this talk)
• Type of info available:– Oracles that answer questions about f
• Success Criterion:– Exact learning
Output h where h ≡ f
• Want to learn f within “polynomial” number of queries, in “polynomial” time– polynomial in n and size of f
Types of queries
Membership queries (point evaluation)Question: What is f(x)?
Answer: f(x)
Equivalence queriesQuestion: Is h ≡ f?
h is hypothesisAnswer: Yes if so, else x such that f(x) ≠ h(x) x is counterexample
Definition: A membership and equivalence query algorithm learns a class C of representations if given 1. Oracles to answer membership and equivalence queries for some f in C
2. The number n of variables of f the algorithm outputs a representation h s.t. f ≡ h
Say algorithm runs in polynomial time if running time is poly(n, size of f)
About membership and equivalence queries
• Assume queries answered perfectly• Membership queries
– Black-box interpolation– Perfect answers often not available in practice
• Equivalence queries– Can be simulated in PAC model
• Test whether f(x) = h(x) on random examples x
– Relation to mistake-bound learning in on-line model
III. Example Query Algorithm
E.g. C = Boolean monomials
Boolean monomial = conjunction of literals
f(x1, x2, x3) =
¬x1 Λ x3 = ¬x1x3
x1 ¬x2 x3
¬x2
Learning monomial f(x1, x2, x3)
1. Ask equiv. query: Is f ≡ 0? Suppose get counterexample f(1,0,1)=1
2. For each xi, determine if it appears in monomial f with negation, without negation, or not at all
Since x1=1 in counterexample, x1 appears in f without negation, or not at all. Ask membership query: What is f(0, 0, 1)?
If answer is 0, x1 without negation is in monomial f
If answer is 1, x1 does not appear in f at all
Do similarly for x2 and x3
Learning Boolean monomials
• Previous approach learns Boolean monomials in n+1 equivalence and membership queries, polynomial time
• Can also learn Boolean monomials with equivalence queries alone
• Need exponential queries (worst-case) with membership queries alone.
If monomial includes all n variables, f=1 for only one of the 2n points in its domain
IV. Four interesting representation classes
1. DNF Formulas
OR of ANDs
f = ¬x1 x2 x3 V ¬x1x4V x1x2x3
• Natural way of describing classification rule• Not known whether DNF learnable in polynomial
time with membership and equivalence queries (or in PAC model)
• Best known algorithm runs in time )loglog(O 3
2 snn
2. Boolean linear threshold functions f = 1 if x1 + x2 + x3 > 2
= 0 otherwise
Learnable in polynomial time, equivalence q’s
3. Polynomials over GF[2] (integers mod 2) f = x2 x3 + x1 x3 + x1 x2 x3
Learnable in polynomial time, memb+equiv q’s
4. Boolean decision trees
Learnable in polynomial time, memb+equiv q’s
x1
x3
01
x2
0 1
=1=0
=0
=1=0
=1
Representation and size
• Can represent every Boolean function as DNF formula, GF[2] polynomial, or decision tree– But sizes of representations can be very different– e.g. Parity function
Representation as GF[2] polynomial is small
f(x1,…,xn) = x1 + x2 + … + xn (mod 2)
Requires DNF formula of size exponential in n
Requires decision tree of size exponential in n
V. Learning with Polynomial Number of Queries
Halving AlgorithmGeneric algorithm for learning with poly number of queries
• Assume (for simplicity) know size s of target f• Keep set V of all possible f
Initially, V contains all representations in C (on n variables) of size s
• Repeat until success:– Use V to construct Majority Hypothesis h– Ask equivalence query with h– Either “yes” (success), or receive counterexample. – If the latter, update V
f6
f1
f8 f3
f5
V
Majority Hypothesis hFor each x in domain of f h (x) = 1 if majority of fi’s in V have fi(x) = 1 = 0 if majority of fi’s in V have fi(x) = 0
Counterexample to majority hypothesis eliminates at least half of fi’s in V Number of equivalence queries of Halving Algorithm is log2(Original size of V)
VI. Challenge: Learn in polynomial time
If restrict hypotheses to be in C
• May be NP-hard to learnComputational hardness of learning– Tools to prove: Complexity theory, NP-completeness
reductions, non-approximability• May require exponential number of queries to
learn Informational hardness of learning
– Tools to prove: Structural properties of C, combinatorial arguments
Example
• Suppose C is class of 2-term DNF formulas
and want to learn C with equivalence queries alone
• NP-hard to learn 2-term DNF formulas with equivalence queries alone if hypotheses must be 2-term DNF formulas
f = ¬x1 x3 V x1x2
• 2-term DNF formulas can be factored
f = ¬x1 x3 V x1x2
= (¬x1 V x2)(x3 V x1) (x3 V x2)
• Result is 2-CNF formula – AND of ORs in which each OR has at most 2
literals– Size of 2-CNF formula O(n2)– 2-CNF formulas can be learned in poly-time
with equivalence queries alone (how?)
• Learn 2-term DNF formula using algorithm for learning 2-CNF formulas.
Learning 2-CNF
2-CNF formula
f = (¬x1 V x2)(x3 V x1) (x3 V x2)Can be viewed as monomial over new variable set
{y1, y2, …,}
y1 = (¬x1 V x2)
y2 = (x1 V x2)
y3 = (x2 V ¬x3) etc. Learn 2-CNF formulas using algorithm for learning
monomials by translating between original vars and new vars
Two Useful Techniques
1.To show C learnable, find C' s.t. – C' poly-time learnable– each f in C has equivalent f' in C' of size at
most polynomially larger.– Learn C using algorithm for C’
2. Use existing algorithm with new variable set
BUT…
Even if allow hypotheses not from C, can still be hard to learn C in polynomial time If C sufficiently rich class of Boolean
circuits/formulas– Can show that C can represent cryptographic
primitives– Learning C as hard as breaking cryptographic
primitives
VII. Learning GF[2] Polynomials and Decision Trees
GF[2] Polynomials and Decision Trees
• Poly-time learnable with membership and equivalence queries using algorithm for learning Hankel matrix representations (multiplicity automata)– Useful Technique 1
• Hankel matrix representations learnable using variant of Deterministic Finite Automaton learning algorithm
Hankel Matrix H of f(x1,…,xn)
View f as function on binary strings
Rows/columns of H indexed by all binary strings.
H[x,y] = f(x◦y) if |x|+|y|=n = 0 otherwise
Hankel matrix of f(x1,x2) = x1 V x2
ε 0 1 00 01 10 11 111 ... ε 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 00 0 01 1 10 1 11 1 …
Learning Hankel matrices of Boolean functions
• Can represent Hankel matrix compactly– Suffices to specify particular O(r2) entries, where r is rank of
matrix– Running time of Hankel matrix algorithm polynomial in r, n
• Lemma: If f(x1,…,xn) is a GF[2] polynomial with s terms, then rank of its Hankel matrix is poly(n,s)
• Lemma: If f(x1,…,xn) is a decision tree with s nodes, then rank of its Hankel matrix is poly(n,s)
• Use Hankel matrix algorithm to learn GF[2] polynomials and decision trees
VIII. Summary
• Definition of Query Learning Models
• Halving algorithm for learning with polynomial number of equivalence queries
• Techniques for polynomial-time learning
• Examples of classes learnable in polynomial-time
• Barriers to polynomial-time learning
Selected References
• Learning Models– Valiant, L. G., A Theory of the Learnable. Communications of the
ACM, 1984– Angluin, D. Queries and concept learning. Machine Learning
2(4), 1988
• Learning Algorithms– Beimel, A., Bergadano, F., Bshouty, N. H., Kushilevitz, E., and
Varricchio, S. Learning functions represented as multiplicity automata. Journal of the ACM (3), 2000
– Maass, W. and Turan, G. On the complexity of learning with counterexamples. Proc. of the 30th IEEE Symposium on Foundations of Computer Science (FOCS), 1989
– Klivans, A. and Servedio, R. Learning DNF in Time 2^{O(n^{1/3})}. Journal of Computer and System Sciences 68(2), 2004
• Hardness of Learning– Kearns, M. J. and Valiant, L. G. Cryptographic limitations on
learning Boolean formulae and finite automata. J. ACM (1), 1994
– Angluin, D. Negative results for equivalence queries.
Machine Learning (5), 1990– Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., and Wilkins,
D. How many queries are needed to learn? J. ACM (43), 1996