[fall 2012] cs-402 dm - midterm exam_v01

Upload: taaloos

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 [Fall 2012] CS-402 DM - Midterm Exam_v01

    1/1

    BSCS Honors Program CS-402 GIFT University Gujranwala

    Page 1 of 1

    Course: Data Mining 14-December-2012 (Fall 2012)

    Resource Person: Nadeem Qaisar Mehmood MIDTERM EXAMINATION

    Max. Marks: 50 Time Allowed: 70 Minutes

    Question 01: Short Questions (Marks 5x5)

    1. In context to Rule-based classification discuss direct or indirect methods of building rules. Which

    method provides you the robust and exhaustive rules and why?

    2. Discuss the hypothesis space of ID3 algorithm based Decision Tree learning.

    3. Which proximity measure would you prefer to use, if in case you are working on finding distances

    between the class instances but the record data matrix to mine clusters is very sparse?

    4. Find the following proximity measures for the following two binary vectors p and q.p = (0 1 1 0 0 0 0 0 0 1), q = (0 1 0 0 0 0 1 0 0 1) Jaccard, Cosin, Hamming

    5. In Rule-based Classification what are the three different methods to evaluate a rule.

    Question 02: (Marks 15)

    AutoRash Traders has a broad business of automobiles and has millions of customers. In order to market their

    new product in Texas City they are planning to target only a limited number of possible customers, to advertisethem directly. In order to achieve the goal they collected the previous advertising data While looking on thefollowing dataset, what data mining technique you will suggest to AutoRash Traders , so that they could able

    to predict about a new customer that he will purchase a car or not. AutoRash Traders have gathered thefollowing dataset about different customers, please suggest a predicting modeling technique that could predict

    about a new customer that whether he will purchase a car or not.

    Serial

    No

    Marital

    Status

    Home

    Owner

    Annual

    Income

    Bought

    Car

    1 Single Y Normal Yes

    2 Married Y Normal Yes

    3 Divorced N Low No

    4 Divorced Y High Yes5 Single N Low No

    6 Single N Normal No

    Some Entropy Measurements:

    Question 03: (Marks 15)

    Consider a training set that contains 100 positive examples and 400 negative examples. For each of thefollowing candidate rules:

    R1: A + (covers 4 positive and 1 negative examples),

    R2: B + (covers 30 positive and 10 negative examples),

    R3: C + (covers 100 positive and 90 negative examples),

    Rule accuracy measures are 80% (for R1), 75% (for R2), and 52.6% (forR3), respectively. Therefore R1 is thebest candidate and R3 is the worst candidate according to rule accuracy. However this is not always theobviuous case why? Explain the situation with the help ofFOILs Information gain.

    END OF EXAM

    )log(log....'

    )()(

    00

    0

    2

    11

    1

    21

    1

    np

    p

    np

    ppGainnInformatiosFOIL

    iEntropyn

    nSEntropyGAIN

    k

    i

    i

    +

    +=

    =

    =

    2+, 2-] =1 [2+, 0] =0 [2+, 3-] =0.971 [3+, 2-] =0.971 [3+, 1-] = 0.8115 [3+, 2-] =0.971 [5+, 1-] = 0.650

    =

    ++

    =

    ==1

    0

    2

    22

    ])|([1)(

    )()()()()(

    c

    i

    tipsGini

    PLogPPLogPSEEntropy