learning from satisfying assignments

57
Learning From Satisfying Assignments Anindya De Ilias Diakonikolas UC Berkeley/IAS U. Edinburgh Rocco A. Servedio Columbia University Brown University December 2013 1

Upload: iola-valdez

Post on 02-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Learning From Satisfying Assignments. Rocco A. Servedio Columbia University. Anindya De Ilias Diakonikolas UC Berkeley/IAS U. Edinburgh. Brown UniversityDecember 2013. Learning Probability Distributions. Big topic in statistics literature (“density estimation”) for decades - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning From  Satisfying Assignments

1

Learning From Satisfying Assignments

Anindya De Ilias DiakonikolasUC Berkeley/IAS U. Edinburgh

Rocco A. ServedioColumbia University

Brown University December 2013

Page 2: Learning From  Satisfying Assignments

2

Learning Probability Distributions

• Big topic in statistics literature (“density estimation”) for decades

• Exciting work in the last decade+ in TCS, largely on learning continuous distributions (mixtures of Gaussians & more)

• This talk: distribution learning from a complexity theoretic perspective

– What about distributions over the hypercube?– Can we formalize intuition that “simple distributions are easy to

learn”?

Page 3: Learning From  Satisfying Assignments

3

What do we mean by “learn a distribution”?

• Unknown target distribution

• Algorithm gets i.i.d. draws from

• With probability 9/10, must output (a sampler for a) distribution such that statistical distance between and is small:

(Natural analogue of Boolean function learning.)

Page 4: Learning From  Satisfying Assignments

4

Previous work: [KRRSS94]

• Looked at learning distributions over {0,1}n in terms of n-output circuits that generate distributions:

• [AIK04] showed it’s hard to learn even very simple distributions from this perspective: already hard even if each output bit is a 4-junta of input bits.

circuitz1........................ zminput uniform over {0,1}m

x1............ xnoutput distributed according to

Page 5: Learning From  Satisfying Assignments

5

This work: A different perspective

Our notion of a “simple” distribution over {0,1}n: uniform distribution over satisfying assignments of a “simple” Boolean function.

What kinds of Boolean functions can we learn from their satisfying assignments?

Want algorithms that have polynomial runtime and # of samples required.

Page 6: Learning From  Satisfying Assignments

6

What are “simple” functions?

OR

AND AND AND

x2 x3 x5 x6 x3 x5 x1 x6 x7

__ _ _

DNF formulas:

++++

-- - -- - - -

-+

++

+Halfspaces:

Page 7: Learning From  Satisfying Assignments

7

Simple functions, cont.

3-CNF formulas:

Monotone 2-CNF:

AND

OR OR OR

x2 x3 x5 x3 x5 x1 x6 x7

__ _ _x7

AND

OR OR OR

x2 x3 x3 x2 x6 x7

OR

x3 x5

Page 8: Learning From  Satisfying Assignments

8

Yet more simple functions

Intersections of k halfspaces:+

+++

-- - --

-

-- -- -

- --

-

-

+- -+ +

+

++++

+ +

+

+-

-

--- --

-- -

Low-degree polynomial threshold functions:

Page 9: Learning From  Satisfying Assignments

9

The model, more precisely

• Let be a fixed class of Boolean functions over

• There is some unknown . Learning algorithm sees samples drawn uniformly from . Target distribution: .

• Goal : With probability 9/10, output a sampler for a hypothesis distribution such that

We’ll call this a distribution learning algorithm for .

Page 10: Learning From  Satisfying Assignments

10

Relation to other learning problems

A: Only get positive examples. At least two other ways:

• (not so major) Want to output a hypothesis distribution rather than a hypothesis function

• (really major) Much more demanding guarantee than usual uniform-distribution learning.

Q: How is this different from learning (function learning) under the uniform distribution?

Page 11: Learning From  Satisfying Assignments

11

Example: Halfspaces

1n

0n

Usual uniform-distribution model for learning functions:Hypothesis allowed to be wrong on points in .

For highly biased target function like , constant-0 function is a fine hypothesis for any .

Page 12: Learning From  Satisfying Assignments

12

A stronger requirement

1n

0n

Essentially, we require hypothesis function with multiplicative rather than additive -accuracy relative to .

Our distribution-learning model: “constant-0 hypothesis” is meaningless!

For to be good hypothesis distribution,must be only a fraction of .

Page 13: Learning From  Satisfying Assignments

13

Given: draws from , must Output: hypothesis with the

following guarantee :

Given: random labeled examples from , must

Output: hypothesis such that

Our settingUsual function-learning setting

must satisfyIf both regions are small, this

is fine!

Page 14: Learning From  Satisfying Assignments

14

Brief motivational digression into the real world: language learning

People typically learn new languages by being exposed to correct utterances (positive examples), which are a sparse subset of all possible vocalizations (all examples).

Goal is to be able to generate new correct utterances (generate draws from a distribution similar to the one the samples came from).

Page 15: Learning From  Satisfying Assignments

15

Our positive results

Theorem 1: We give an efficient distribution learning algorithm for = { halfspaces }.

Runtime is

Both results obtained via a general approach, plus -specific work.

Theorem 2: We give a (pretty) efficient distribution learning algorithm for = { poly(n)-term DNFs }.

Runtime is

++++

-- - --

-+

++

+

OR

AND AND AND

x2 x3 x5 x6 x3 x5 x1 x6 x7

__ _ _

Page 16: Learning From  Satisfying Assignments

16

Our negative resultsAssuming crypto-hardness (essentially RSA), there are

no efficient distribution learning algorithms for:

o Intersections of two halfspaces

o Degree-2 polynomial threshold functions

o 3 – CNFs , or even

o Monotone 2-CNFs

+++

-- ---

-

- - -- -- -

-

+- -+ +

++++

++ +

+-

-

----

- -AND

OR OR OR

x2 x3 x5 x3 x5 x1 x6 x7

__ _ _x7

+ - --

-- -

AND

OR OR OR

x2 x3 x3 x2 x6 x7

OR

x3 x5

Page 17: Learning From  Satisfying Assignments

17

Rest of talk

• Mostly positive results

• Mostly halfspaces (and general approach)

• Touch on DNFs, negative results

Page 18: Learning From  Satisfying Assignments

18

Learning halfspace distributions

Given positive examples drawn uniformly from for some unknown halfspace ,

We need to (whp) output a sampler for a distribution that’s close to .

++++ +

++

++

1n

0n

unknown

Page 19: Learning From  Satisfying Assignments

19

Let’s fantasize

Even then, we need to output a sampler for a distribution close to uniform over .

Is this doable? Yes.

++++ +

++

++

1n

0n

Suppose somebody gave us .

known

Page 20: Learning From  Satisfying Assignments

20

Approximate sampling for halfspaces

Theorem: Given over , can return a uniform point from in time (with failure probability )

• [MorrisSinclair99]: sophisticated MCMC analysis

• [Dyer03]: elementary randomized algorithm & analysis using “dart throwing”

Of course, in our setting we are not given .

But, we should expect to use (at least) this machinery for our general problem.

Page 21: Learning From  Satisfying Assignments

21

A potentially easier case…?For approximate sampling problem (where we’re given ), problem is much easier if is large: sample uniformly & do rejection sampling.

Maybe our problem is easier too in this case?

In fact, yes. Let’s consider this case first.

Page 22: Learning From  Satisfying Assignments

22

Halfspaces: the high-density case

• Let .

• We will first consider the case that .

• We’ll solve this case using Statistical Query learning & hypothesis testing for distributions.

Page 23: Learning From  Satisfying Assignments

23

First Ingredient for the high-density case: SQ

Statistical Query (SQ) learning model:o SQ oracle : given poly-time computable

outputs where .

o An algorithm is said to be a SQ learner for

(under distribution ) if can learn given access to .

Page 24: Learning From  Satisfying Assignments

24

SQ learning for halfspaces

Good news: [BlumFriezeKannanVempala97] gave an efficient SQ learning algorithm for halfspaces.

Of course, to run it, need access to oracle for for the unknown halfspace .

So, we need to simulate this given our examples from .

Outputs halfspace hypotheses!

Page 25: Learning From  Satisfying Assignments

25

The high-density case: first step

Lemma: Given access to uniform random samples from and such that , queries to can be simulated up to error in time .

Proof sketch:

Estimate using samplesfrom

Estimate using samplesfrom

Page 26: Learning From  Satisfying Assignments

26

The high-density case: first step

Lemma: Given access to uniform random samples from and such that , queries to can be simulated up to error in time .

Recall promise:

Additionally, we assume that we have = .

Lemma lets us use the halfspace SQ-learner to get such that

A halfspace!

Page 27: Learning From  Satisfying Assignments

27

Handling the high-density case

• Since , have that o o

• Hence using rejection sampling, we can easily sample .

Caveat : We don’t actually have an estimate for .

Page 28: Learning From  Satisfying Assignments

28

Ingredient #2: Hypothesis testing

• Try all possible values of in a sufficiently fine multiplicative grid

• We will get a list of candidate distributions such that at least one of them is -close to .

• Run a “distribution hypothesis tester” to return which is - close to .

Page 29: Learning From  Satisfying Assignments

29

Distribution hypothesis testing

• Sampler for target distribution

• Approximate samplers for distributions

• Approximate evaluation oracles for

• Promise :

Hypothesis tester guarantee: Outputs such that in time

Having samplers & evaluators for hypotheses is crucial for this.

Theorem: Given

Page 30: Learning From  Satisfying Assignments

30

Distribution hypothesis testing, cont.

We need samplers & evaluators for our hypothesis distributions

All our hypotheses are dense, so can do approximate counting easily (rejection sampling) to estimate

Note that

So we get the required (approximate) evaluators. Similarly, (approximate) samples are easy via rejection

sampling.

Page 31: Learning From  Satisfying Assignments

31

Recap

So we handled the high-density case using

• SQ learning (for halfspaces)• Hypothesis testing (generic).

(Also used approximate sampling & counting, but they were trivial because we were in the dense case.)

Now let’s consider the low-density case (the interesting case).

Page 32: Learning From  Satisfying Assignments

32

Low density case: A new ingredient

New ingredient for the low-density case: A new kind of algorithm called a densifier.

• Input: such that , and samples from

• Output: A function such that:–

– For simplicity, assume that

(like )

Page 33: Learning From  Satisfying Assignments

33

Densifier illustration

f

g

Samples from

:

:

Good estimate

Page 34: Learning From  Satisfying Assignments

34

Low-density case (cont.)

To solve the low-density case, we need approximate sampling and approximate counting algorithms for the class .

This, plus previous ingredients (SQ learning, hypothesis testing, & densifier) suffices: given all these ingredients, we get a distribution learning algorithm for .

Page 35: Learning From  Satisfying Assignments

35

How does it work?The overall algorithm: (recall that )

1. Run densifier to get 2. Use approximate sampling algorithm for to get samples

from 3. Run SQ-learner for under distribution to get

hypothesis for 4. Sample from till get such that ; output

this .

Repeat with different guesses for , & use hypothesis testing to choose that’s close to

Needs good estimate of

Page 36: Learning From  Satisfying Assignments

36

A picture of one stage

f

g

1. Using samples from , run densifier to get g

2. Run approximate uniform generation algorithm to get uniform positive examples of g

4. Sample from till get point where , and output it.

h

3. Run SQ-learner on distribution to get high-accuracy hypothesis h for (under )

Note: This all assumed we have a good estimate

Page 37: Learning From  Satisfying Assignments

37

How it works, cont.

Recall that to carry out hypothesis testing, we need samplers & evaluators for our hypothesis distributions

Now some hypotheses may be very sparse…

• Use approximate counting to estimate As before,

so we get (approximate) evaluator.

• Use approximate sampling to get samples from .

Page 38: Learning From  Satisfying Assignments

38

Recap: a general method

Theorem: Let be a class of Boolean functions such that:(i) is efficiently SQ-learnable;(ii) has a densifier with an output in ; and (iii) has efficient approximate counting and

sampling algorithms.

Then there is an efficient distribution learning algorithm for .

Page 39: Learning From  Satisfying Assignments

39

Back to halfspaces: what have we got?

• Saw earlier we have SQ learning [BlumFriezeKannanVempala97]• [MorrisSinclair99,Dyer03] give approximate counting and

sampling.

Reminiscent of [Dyer03] “dart throwing” approach to approximate counting – but in that setting, we are given

Given , come up with Can we come up with a suitable given only samples from ?

Approximate counting setting: Densifier setting:

So we have all the necessary ingredients.…except a densifier.

f gf

g

Page 40: Learning From  Satisfying Assignments

40

A densifier for halfspaces

Theorem: There is an algorithm running in time such that for any halfspace , if the algorithm gets as input such that and access to , it outputs a halfspace with the following properties :

1. , and

2. .

Page 41: Learning From  Satisfying Assignments

41

Getting a densifier for halfspaces

Key ingredients:

o Online learner of [MaassTuran90] o Approximate sampling for halfspaces

[MorrisSinclair,Dyer03]

Page 42: Learning From  Satisfying Assignments

42

Towards a densifier for halfspaces

Proof: If (1) fails for a halfspace , then .

Fact follows from union bound over all (at most many) halfspaces .

Recall our goals: 1.

2.

Fact: Let be of size . Then, with probability , condition (1) holds for any halfspace such that .

So ensuring (1) is easy – choose and ensure is consistent with . How to ensure (2)?

Page 43: Learning From  Satisfying Assignments

Online learning as a two-player game

i. Bob initializes to the empty setii. Bob runs a (specific polytime) algorithm on the set and returns halfspace consistent with iii. Alice either says “yes, “ or else returns an such that iv. Bob adds to and returns to step (ii).

43

Imagine a two player game in which Alice has a halfspace and Bob wants to learn :

Page 44: Learning From  Satisfying Assignments

44

Guarantee of the game

Q: How is this helpful for us ? A: Bob seems to have a powerful strategy We will exploit it.

(Algorithm is essentially the ellipsoid algorithm.)

Theorem: [MaassTuran90] There is a specific algorithm that Bob can run so that the game terminates in at most rounds. At the end, either or Bob can certify that there is no halfspace meeting all the constraints.

Page 45: Learning From  Satisfying Assignments

45

Using the online learner

• Choose as defined earlier. Start with . • “Bob” simulation: stage – Run Bob’s strategy

and return consistent with . • “Alice” simulation: If for some

,then return . – Else, if (approx counting)

then we are done and return . – Else use approx sampling to randomly choose a

point and return .

Page 46: Learning From  Satisfying Assignments

46

Why is the simulation correct?

• If for , then the simulation step is indeed correct.

• The other case in which Alice returns a point is that . This means that the simulation at every step is correct with probability .

• Since the simulation lasts steps, all the steps are correct with probability .

Page 47: Learning From  Satisfying Assignments

47

Finishing the algorithm

• Provided the simulation is correct, which gets returned always satisfies the conditions:

1.

2.

So, we have a densifier – and a distribution learning algorithm – for halfspaces.

Page 48: Learning From  Satisfying Assignments

48

DNFs

Recall general result:

Get (iii) from [KarpLubyMadras89]. What about densifier and SQ learning?

Theorem: Let be a class of Boolean functions such that:(i) is efficiently SQ-learnable;(ii) has a densifier with an output in ; and (iii) has efficient approximate counting and sampling

algorithms. Then there is an efficient distribution learning algorithm for .

Page 49: Learning From  Satisfying Assignments

49

Sketch of the densifier for DNFs

• Consider a DNF . For concreteness, suppose each

• Key observation: for each i,

So Pr[ consecutive samples from all satisfy same ] is

• If this happens, whp these samples completely identify

• The densifier finds candidate terms in this way, outputs OR of all candidate terms.

Page 50: Learning From  Satisfying Assignments

50

SQ learning for DNFs

• Unlike halfspaces, no efficient SQ algorithm for learning DNFs under arbitrary distributions is known; best known runtime is .

• But: our densifier identifies “candidate terms” such that f is (essentially) an OR of at most of them.

• Can use noise-tolerant SQ learner for sparse disjunctions, applied over “metavariables” (the candidate terms).

• Running time is poly(# metavariables).

Page 51: Learning From  Satisfying Assignments

51

Hardness results

Page 52: Learning From  Satisfying Assignments

52

Secure signature schemes

• : (randomized) key generation algorithm; produces key pairs

• : signing algorithm; is signature for message using secret key .

• : verification algorithm; if

Security guarantee: Given signed messages ,no poly-time algorithm can produce such that for a new

Page 53: Learning From  Satisfying Assignments

53

Connection with our problem

Intuition: View as uniform distribution over signed messages . .

If, given signed messages, you can (approximately) sample from , this means you can generate new signed messages – contradicts security guarantee!

Need to work with a refinement of signature schemes – unique signature schemes [MicaliRabinVadhan99] – for intuition to go through. Unique signature schemes known to exist under various crypto assumptions (RSA’, Diffie-Hellman’, etc.)

Page 54: Learning From  Satisfying Assignments

54

Lemma: For any secure signature scheme, there is a secure signature scheme with the same security where the verification algorithm is a 3-CNF.

corresponds to , so security of signature scheme no distribution learning algorithm for 3-CNF.

Signature schemes + Cook-Levin

Page 55: Learning From  Satisfying Assignments

55

Same approach yields hardness for intersections of 2 halfspaces & degree-2 PTFs. (Require parsimonious reductions, efficiently computable/invertible maps between sat. assignments of and sat. assignments of 3-CNF.)

More hardness

For monotone 2CNFs: use the “Blow-up” reduction used in proving hardness of approximate counting for monotone-2-SAT. Roughly, most sat. assignments of monotone-2-CNF correspond to sat. assignments of 3-CNF.

Page 56: Learning From  Satisfying Assignments

56

Summary of talk

• New model: Learning distribution

• “Multiplicative accuracy” learning

• Positive results:

• Negative results:

OR

AND AND AND+

+++

-- - -- - - -

-+

++

+

Halfspaces DNFs

+++

-- - -

-

-- -

-+ - +- -

+ +

++++ ++ ++-

----

-- -

-- -- - -- -

Intersectionof 2 halfspaces

AND

OR OR OR

3-CNFs

AND

OR OR OR

Monotone 2-CNFs

Degree-2 PTFs

Page 57: Learning From  Satisfying Assignments

57

Thank you!