[fall 2012] cs-402 dm - midterm exam_v01

out of 1

Upload: taaloos

Post on 14-Apr-2018

214 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

7/30/2019 [Fall 2012] CS-402 DM - Midterm Exam_v01

1/1

BSCS Honors Program CS-402 GIFT University Gujranwala

Page 1 of 1

Course: Data Mining 14-December-2012 (Fall 2012)

Resource Person: Nadeem Qaisar Mehmood MIDTERM EXAMINATION

Max. Marks: 50 Time Allowed: 70 Minutes

Question 01: Short Questions (Marks 5x5)

1. In context to Rule-based classification discuss direct or indirect methods of building rules. Which

method provides you the robust and exhaustive rules and why?

2. Discuss the hypothesis space of ID3 algorithm based Decision Tree learning.

3. Which proximity measure would you prefer to use, if in case you are working on finding distances

between the class instances but the record data matrix to mine clusters is very sparse?

4. Find the following proximity measures for the following two binary vectors p and q.p = (0 1 1 0 0 0 0 0 0 1), q = (0 1 0 0 0 0 1 0 0 1) Jaccard, Cosin, Hamming

5. In Rule-based Classification what are the three different methods to evaluate a rule.

Question 02: (Marks 15)

AutoRash Traders has a broad business of automobiles and has millions of customers. In order to market their

new product in Texas City they are planning to target only a limited number of possible customers, to advertisethem directly. In order to achieve the goal they collected the previous advertising data While looking on thefollowing dataset, what data mining technique you will suggest to AutoRash Traders , so that they could able

to predict about a new customer that he will purchase a car or not. AutoRash Traders have gathered thefollowing dataset about different customers, please suggest a predicting modeling technique that could predict

about a new customer that whether he will purchase a car or not.

Serial

No

Marital

Status

Home

Owner

Annual

Income

Bought

Car

1 Single Y Normal Yes

2 Married Y Normal Yes

3 Divorced N Low No

4 Divorced Y High Yes5 Single N Low No

6 Single N Normal No

Some Entropy Measurements:

Question 03: (Marks 15)

Consider a training set that contains 100 positive examples and 400 negative examples. For each of thefollowing candidate rules:

R1: A + (covers 4 positive and 1 negative examples),

R2: B + (covers 30 positive and 10 negative examples),

R3: C + (covers 100 positive and 90 negative examples),

Rule accuracy measures are 80% (for R1), 75% (for R2), and 52.6% (forR3), respectively. Therefore R1 is thebest candidate and R3 is the worst candidate according to rule accuracy. However this is not always theobviuous case why? Explain the situation with the help ofFOILs Information gain.

END OF EXAM

)log(log....'

)()(

00

0

2

11

1

21

1

np

p

np

ppGainnInformatiosFOIL

iEntropyn

nSEntropyGAIN

k

i

i

+

+=

=

=

2+, 2-] =1 [2+, 0] =0 [2+, 3-] =0.971 [3+, 2-] =0.971 [3+, 1-] = 0.8115 [3+, 2-] =0.971 [5+, 1-] = 0.650

=

++

=

==1

0

2

22

])|([1)(

)()()()()(

c

i

tipsGini

PLogPPLogPSEEntropy