exact learning of boolean functions with queries lisa hellerstein polytechnic university brooklyn,...

39
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory, 2007

Post on 20-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Exact Learning of Boolean Functions with Queries

Lisa Hellerstein

Polytechnic University

Brooklyn, NY

AMS Short Course on Statistical Learning Theory, 2007

Page 2: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

I. Introduction

Page 3: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning Boolean formula f

Problem: Boolean formula f hidden in a black box. Told that f is from class C of formulas.

Task: “Learn” f

f(x1, x2, x3) = x1 Λ x3

Page 4: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning representation f of Boolean function

Problem: f hidden in a black box. Told that f is from class C of representations of Boolean functions.

Task: “Learn” f

f = x1 + x2 (mod 2)

Page 5: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Boolean functions can represent

• Whether a person is good or bad

• Whether an email message is spam

• Whether tumor is malignant

• Whether a book is a romance novel

• etc.

Page 6: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

How hard is it to learn target f?

Need to specify:Type of information available

What’s meant by “learning”

Learning Models

Page 7: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

II. Learning Models

Page 8: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Valiant’s PAC Model (1984)

• PAC = Probably Approximately Correct• Type of info available:

– Random examples:Value of f on “random” points in its domain

• Success Criterion:– Approximate learningOutput h that is approximately functionally

equivalent to f

Page 9: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Query Models (this talk)

• Type of info available:– Oracles that answer questions about f

• Success Criterion:– Exact learning

Output h where h ≡ f

• Want to learn f within “polynomial” number of queries, in “polynomial” time– polynomial in n and size of f

Page 10: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Types of queries

Membership queries (point evaluation)Question: What is f(x)?

Answer: f(x)

Equivalence queriesQuestion: Is h ≡ f?

h is hypothesisAnswer: Yes if so, else x such that f(x) ≠ h(x) x is counterexample

Page 11: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Definition: A membership and equivalence query algorithm learns a class C of representations if given 1. Oracles to answer membership and equivalence queries for some f in C

2. The number n of variables of f the algorithm outputs a representation h s.t. f ≡ h

Say algorithm runs in polynomial time if running time is poly(n, size of f)

Page 12: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

About membership and equivalence queries

• Assume queries answered perfectly• Membership queries

– Black-box interpolation– Perfect answers often not available in practice

• Equivalence queries– Can be simulated in PAC model

• Test whether f(x) = h(x) on random examples x

– Relation to mistake-bound learning in on-line model

Page 13: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

III. Example Query Algorithm

Page 14: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

E.g. C = Boolean monomials

Boolean monomial = conjunction of literals

f(x1, x2, x3) =

¬x1 Λ x3 = ¬x1x3

x1 ¬x2 x3

¬x2

Page 15: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning monomial f(x1, x2, x3)

1. Ask equiv. query: Is f ≡ 0? Suppose get counterexample f(1,0,1)=1

2. For each xi, determine if it appears in monomial f with negation, without negation, or not at all

Since x1=1 in counterexample, x1 appears in f without negation, or not at all. Ask membership query: What is f(0, 0, 1)?

If answer is 0, x1 without negation is in monomial f

If answer is 1, x1 does not appear in f at all

Do similarly for x2 and x3

Page 16: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning Boolean monomials

• Previous approach learns Boolean monomials in n+1 equivalence and membership queries, polynomial time

• Can also learn Boolean monomials with equivalence queries alone

• Need exponential queries (worst-case) with membership queries alone.

If monomial includes all n variables, f=1 for only one of the 2n points in its domain

Page 17: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

IV. Four interesting representation classes

Page 18: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

1. DNF Formulas

OR of ANDs

f = ¬x1 x2 x3 V ¬x1x4V x1x2x3

• Natural way of describing classification rule• Not known whether DNF learnable in polynomial

time with membership and equivalence queries (or in PAC model)

• Best known algorithm runs in time )loglog(O 3

2 snn

Page 19: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

2. Boolean linear threshold functions f = 1 if x1 + x2 + x3 > 2

= 0 otherwise

Learnable in polynomial time, equivalence q’s

3. Polynomials over GF[2] (integers mod 2) f = x2 x3 + x1 x3 + x1 x2 x3

Learnable in polynomial time, memb+equiv q’s

Page 20: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

4. Boolean decision trees

Learnable in polynomial time, memb+equiv q’s

x1

x3

01

x2

0 1

=1=0

=0

=1=0

=1

Page 21: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Representation and size

• Can represent every Boolean function as DNF formula, GF[2] polynomial, or decision tree– But sizes of representations can be very different– e.g. Parity function

Representation as GF[2] polynomial is small

f(x1,…,xn) = x1 + x2 + … + xn (mod 2)

Requires DNF formula of size exponential in n

Requires decision tree of size exponential in n

Page 22: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

V. Learning with Polynomial Number of Queries

Page 23: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Halving AlgorithmGeneric algorithm for learning with poly number of queries

• Assume (for simplicity) know size s of target f• Keep set V of all possible f

Initially, V contains all representations in C (on n variables) of size s

• Repeat until success:– Use V to construct Majority Hypothesis h– Ask equivalence query with h– Either “yes” (success), or receive counterexample. – If the latter, update V

Page 24: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

f6

f1

f8 f3

f5

V

Majority Hypothesis hFor each x in domain of f h (x) = 1 if majority of fi’s in V have fi(x) = 1 = 0 if majority of fi’s in V have fi(x) = 0

Counterexample to majority hypothesis eliminates at least half of fi’s in V Number of equivalence queries of Halving Algorithm is log2(Original size of V)

Page 25: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

VI. Challenge: Learn in polynomial time

Page 26: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

If restrict hypotheses to be in C

• May be NP-hard to learnComputational hardness of learning– Tools to prove: Complexity theory, NP-completeness

reductions, non-approximability• May require exponential number of queries to

learn Informational hardness of learning

– Tools to prove: Structural properties of C, combinatorial arguments

Page 27: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Example

• Suppose C is class of 2-term DNF formulas

and want to learn C with equivalence queries alone

• NP-hard to learn 2-term DNF formulas with equivalence queries alone if hypotheses must be 2-term DNF formulas

f = ¬x1 x3 V x1x2

Page 28: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

• 2-term DNF formulas can be factored

f = ¬x1 x3 V x1x2

= (¬x1 V x2)(x3 V x1) (x3 V x2)

• Result is 2-CNF formula – AND of ORs in which each OR has at most 2

literals– Size of 2-CNF formula O(n2)– 2-CNF formulas can be learned in poly-time

with equivalence queries alone (how?)

• Learn 2-term DNF formula using algorithm for learning 2-CNF formulas.

Page 29: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning 2-CNF

2-CNF formula

f = (¬x1 V x2)(x3 V x1) (x3 V x2)Can be viewed as monomial over new variable set

{y1, y2, …,}

y1 = (¬x1 V x2)

y2 = (x1 V x2)

y3 = (x2 V ¬x3) etc. Learn 2-CNF formulas using algorithm for learning

monomials by translating between original vars and new vars

Page 30: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Two Useful Techniques

1.To show C learnable, find C' s.t. – C' poly-time learnable– each f in C has equivalent f' in C' of size at

most polynomially larger.– Learn C using algorithm for C’

2. Use existing algorithm with new variable set

Page 31: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

BUT…

Even if allow hypotheses not from C, can still be hard to learn C in polynomial time If C sufficiently rich class of Boolean

circuits/formulas– Can show that C can represent cryptographic

primitives– Learning C as hard as breaking cryptographic

primitives

Page 32: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

VII. Learning GF[2] Polynomials and Decision Trees

Page 33: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

GF[2] Polynomials and Decision Trees

• Poly-time learnable with membership and equivalence queries using algorithm for learning Hankel matrix representations (multiplicity automata)– Useful Technique 1

• Hankel matrix representations learnable using variant of Deterministic Finite Automaton learning algorithm

Page 34: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Hankel Matrix H of f(x1,…,xn)

View f as function on binary strings

Rows/columns of H indexed by all binary strings.

H[x,y] = f(x◦y) if |x|+|y|=n = 0 otherwise

Page 35: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Hankel matrix of f(x1,x2) = x1 V x2

ε 0 1 00 01 10 11 111 ... ε 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 00 0 01 1 10 1 11 1 …

Page 36: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Learning Hankel matrices of Boolean functions

• Can represent Hankel matrix compactly– Suffices to specify particular O(r2) entries, where r is rank of

matrix– Running time of Hankel matrix algorithm polynomial in r, n

• Lemma: If f(x1,…,xn) is a GF[2] polynomial with s terms, then rank of its Hankel matrix is poly(n,s)

• Lemma: If f(x1,…,xn) is a decision tree with s nodes, then rank of its Hankel matrix is poly(n,s)

• Use Hankel matrix algorithm to learn GF[2] polynomials and decision trees

Page 37: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

VIII. Summary

• Definition of Query Learning Models

• Halving algorithm for learning with polynomial number of equivalence queries

• Techniques for polynomial-time learning

• Examples of classes learnable in polynomial-time

• Barriers to polynomial-time learning

Page 38: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Selected References

• Learning Models– Valiant, L. G., A Theory of the Learnable. Communications of the

ACM, 1984– Angluin, D. Queries and concept learning. Machine Learning

2(4), 1988

• Learning Algorithms– Beimel, A., Bergadano, F., Bshouty, N. H., Kushilevitz, E., and

Varricchio, S. Learning functions represented as multiplicity automata. Journal of the ACM (3), 2000

– Maass, W. and Turan, G. On the complexity of learning with counterexamples. Proc. of the 30th IEEE Symposium on Foundations of Computer Science (FOCS), 1989

– Klivans, A. and Servedio, R. Learning DNF in Time 2^{O(n^{1/3})}. Journal of Computer and System Sciences 68(2), 2004

Page 39: Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

• Hardness of Learning– Kearns, M. J. and Valiant, L. G. Cryptographic limitations on

learning Boolean formulae and finite automata. J. ACM (1), 1994

– Angluin, D. Negative results for equivalence queries.

Machine Learning (5), 1990– Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., and Wilkins,

D. How many queries are needed to learn? J. ACM (43), 1996