hw2-sol
DESCRIPTION
This is the Solution of Homework 2TRANSCRIPT
![Page 1: hw2-sol](https://reader035.vdocument.in/reader035/viewer/2022072010/55cf9330550346f57b9c9214/html5/thumbnails/1.jpg)
CS761 Spring 2015 Homework 2
Assigned Feb. 11, Due Feb. 18, 2015 before class
Instructions:
• Homeworks are to be done individually.
• Typeset your homework in latex using this file as template (e.g. use pdfla-tex). We do not accept hand-written homeworks. Show your derivations.
• Hand in a single-sided printed copy of your homework before class. Home-work will no longer be accepted once the lecture starts.
• Unless explicitly specified in the questions, you do not need to hand inany code.
• Fill in your name and email below. This will produce a page separatedfrom the rest of your homework. We will do “double blind review.” Donot include identifying information (e.g. your name) in the rest of yourhomework.
Name: Rahul ChatterjeeEmail: [email protected]
1
![Page 2: hw2-sol](https://reader035.vdocument.in/reader035/viewer/2022072010/55cf9330550346f57b9c9214/html5/thumbnails/2.jpg)
1. Let X = Z be the set of integers. Let H = {hz : z ∈ X} ∪ {h0}, where foreach z ∈ X , hz(x) = 1 if x = z and hz(x) = 0 if x 6= z. h0 is the hypothesisthat classifies everything 0. We make the realizable assumption, namelythe true hypothesis f classifies everything 0, except perhaps one item. Forthis problem, do not use a VC argument.
(a) Define an algorithm that implements ERM.
(b) Prove that H is PAC learnable. Give an upper bound on the samplecomplexity.
Ans.(a) The ERM will be pretty simple. Given a sample S drawn identicallyindependently with replacement from X , if there exists any z for whichf(z) = 1, then output hz, else, output h0. Clearly the training error willbe zero, and we get our ERM.
(b) H is PAC learnable if there exists two small non-negative quantity εand δ, s.t. for any sample of size ≥ n(ε, δ), we can guarantee that sampleerror will not far than ε from the generalization error with high probability(≥ δ). Let z∗ be the optimal point where f is 1 and zero otherwise. Now,in this case, the hz(x) can only make mistake if it does not see z in itssample (S), and, in that case the error LD(hz) = 1. Because it can nevergive false positive, i.e. will not output a hz if it has not seen the z forwhich it has got a positive sample.
Now, we have to analyze that what is the probability that the sample ofsize n will not contain that z∗. TODO
2. (a) Find a class H of binary-valued functions over the real interval X =[0, 1] such that H is infinite and V Cdim(H) = 1.
(b) Find a class H of binary-valued functions over the real interval X =[0, 1] such that H is finite and V Cdim(H) = log2(|H|).
Ans.(a) H = {h≥θ : θ ∈ X}, where h≥θ(x) = 1 if x ≥ θ, 0 otherwise. Clearlyh≥θ can only shatter one point and hence its VC-dimension is 1.
(b) In this case, let H is a function that implements all possible binaryassignments of n points to 0 or 1, for a fixed n. More precisely, X ={x1, x2 . . . xn} ∈ Xn, H = {h : X → {0, 1}n}. We can easily extend thedomain of the function to X from X by setting the functions value zerooutside X. Clearly, |H| = 2n. And we know, this binary function has aVC dimension of n. A loose reasoning could be, it can definitely shattern, points (by taking X as those n points) but for n+ 1 it cannot achieveall 1 sample.
3. Let X be a finite domain. Let 1 ≤ k ≤ |X | be a fixed integer. Findthe VC dimension of the following hypothesis spaces of binary classifiersh : X 7→ {0, 1}. Prove your claim.
2
![Page 3: hw2-sol](https://reader035.vdocument.in/reader035/viewer/2022072010/55cf9330550346f57b9c9214/html5/thumbnails/3.jpg)
(a) H = {h :∑x∈X h(x) = k}, i.e. all hypotheses that classify exactly k
items positive.
(b) H = {h :∑x∈X h(x) ≤ k}, i.e. all hypotheses that classify at most
k items positive.
Ans.(a) H is a subset of all the binary mapping from X to {0, 1}|X | such that,∑x∈X h(x) = k. It is easy to prove that VC-dim(H) ≤ k. As, if we set
x1, x2 . . . xk+1 all 1, there is no h that can classify them. If, |X | > 2k thenVC-dim(H) is clearly k. As for any labeling of k x′is, there can be oneh that can shatter them. In this case, the ideal h will have lot of (> k)points not in the taken set, to classify in whatever way it needs to satisfythe first constraint.
But if |X | ≤ 2k then it can only shatter, |X |/2 points other wise youcan find a always set a subset with all zeros and there will not be anyh ∈ H that can classify it with zero error. Summarily, VC-dim(H) =min(|X |/2, k).
4. Lethθ(x) = sgn(sin(θx))
where sgn(z) = 1 if z ≥ 0, and 0 otherwise. Consider
H = {hθ : θ ∈ R}.
Let xi = 2i for i = 1 . . . n. Prove that, for any y1, . . . , yn ∈ {0, 1}, ∃θ ∈ Rsuch that hθ(xi) = yi for i = 1 . . . n.
You have just shown that V Cdim(H) =∞, even thoughH is parametrizedby a single parameter θ.
3