hw2-sol

3
CS761 Spring 2015 Homework 2 Assigned Feb. 11, Due Feb. 18, 2015 before class Instructions: Homeworks are to be done individually. Typeset your homework in latex using this file as template (e.g. use pdfla- tex). We do not accept hand-written homeworks. Show your derivations. Hand in a single-sided printed copy of your homework before class. Home- work will no longer be accepted once the lecture starts. Unless explicitly specified in the questions, you do not need to hand in any code. Fill in your name and email below. This will produce a page separated from the rest of your homework. We will do “double blind review.” Do not include identifying information (e.g. your name) in the rest of your homework. Name: Rahul Chatterjee Email: [email protected] 1

Upload: rahul-chatterjee

Post on 25-Dec-2015

3 views

Category:

Documents


1 download

DESCRIPTION

This is the Solution of Homework 2

TRANSCRIPT

Page 1: hw2-sol

CS761 Spring 2015 Homework 2

Assigned Feb. 11, Due Feb. 18, 2015 before class

Instructions:

• Homeworks are to be done individually.

• Typeset your homework in latex using this file as template (e.g. use pdfla-tex). We do not accept hand-written homeworks. Show your derivations.

• Hand in a single-sided printed copy of your homework before class. Home-work will no longer be accepted once the lecture starts.

• Unless explicitly specified in the questions, you do not need to hand inany code.

• Fill in your name and email below. This will produce a page separatedfrom the rest of your homework. We will do “double blind review.” Donot include identifying information (e.g. your name) in the rest of yourhomework.

Name: Rahul ChatterjeeEmail: [email protected]

1

Page 2: hw2-sol

1. Let X = Z be the set of integers. Let H = {hz : z ∈ X} ∪ {h0}, where foreach z ∈ X , hz(x) = 1 if x = z and hz(x) = 0 if x 6= z. h0 is the hypothesisthat classifies everything 0. We make the realizable assumption, namelythe true hypothesis f classifies everything 0, except perhaps one item. Forthis problem, do not use a VC argument.

(a) Define an algorithm that implements ERM.

(b) Prove that H is PAC learnable. Give an upper bound on the samplecomplexity.

Ans.(a) The ERM will be pretty simple. Given a sample S drawn identicallyindependently with replacement from X , if there exists any z for whichf(z) = 1, then output hz, else, output h0. Clearly the training error willbe zero, and we get our ERM.

(b) H is PAC learnable if there exists two small non-negative quantity εand δ, s.t. for any sample of size ≥ n(ε, δ), we can guarantee that sampleerror will not far than ε from the generalization error with high probability(≥ δ). Let z∗ be the optimal point where f is 1 and zero otherwise. Now,in this case, the hz(x) can only make mistake if it does not see z in itssample (S), and, in that case the error LD(hz) = 1. Because it can nevergive false positive, i.e. will not output a hz if it has not seen the z forwhich it has got a positive sample.

Now, we have to analyze that what is the probability that the sample ofsize n will not contain that z∗. TODO

2. (a) Find a class H of binary-valued functions over the real interval X =[0, 1] such that H is infinite and V Cdim(H) = 1.

(b) Find a class H of binary-valued functions over the real interval X =[0, 1] such that H is finite and V Cdim(H) = log2(|H|).

Ans.(a) H = {h≥θ : θ ∈ X}, where h≥θ(x) = 1 if x ≥ θ, 0 otherwise. Clearlyh≥θ can only shatter one point and hence its VC-dimension is 1.

(b) In this case, let H is a function that implements all possible binaryassignments of n points to 0 or 1, for a fixed n. More precisely, X ={x1, x2 . . . xn} ∈ Xn, H = {h : X → {0, 1}n}. We can easily extend thedomain of the function to X from X by setting the functions value zerooutside X. Clearly, |H| = 2n. And we know, this binary function has aVC dimension of n. A loose reasoning could be, it can definitely shattern, points (by taking X as those n points) but for n+ 1 it cannot achieveall 1 sample.

3. Let X be a finite domain. Let 1 ≤ k ≤ |X | be a fixed integer. Findthe VC dimension of the following hypothesis spaces of binary classifiersh : X 7→ {0, 1}. Prove your claim.

2

Page 3: hw2-sol

(a) H = {h :∑x∈X h(x) = k}, i.e. all hypotheses that classify exactly k

items positive.

(b) H = {h :∑x∈X h(x) ≤ k}, i.e. all hypotheses that classify at most

k items positive.

Ans.(a) H is a subset of all the binary mapping from X to {0, 1}|X | such that,∑x∈X h(x) = k. It is easy to prove that VC-dim(H) ≤ k. As, if we set

x1, x2 . . . xk+1 all 1, there is no h that can classify them. If, |X | > 2k thenVC-dim(H) is clearly k. As for any labeling of k x′is, there can be oneh that can shatter them. In this case, the ideal h will have lot of (> k)points not in the taken set, to classify in whatever way it needs to satisfythe first constraint.

But if |X | ≤ 2k then it can only shatter, |X |/2 points other wise youcan find a always set a subset with all zeros and there will not be anyh ∈ H that can classify it with zero error. Summarily, VC-dim(H) =min(|X |/2, k).

4. Lethθ(x) = sgn(sin(θx))

where sgn(z) = 1 if z ≥ 0, and 0 otherwise. Consider

H = {hθ : θ ∈ R}.

Let xi = 2i for i = 1 . . . n. Prove that, for any y1, . . . , yn ∈ {0, 1}, ∃θ ∈ Rsuch that hθ(xi) = yi for i = 1 . . . n.

You have just shown that V Cdim(H) =∞, even thoughH is parametrizedby a single parameter θ.

3