[cs-402] assignment-2 proximity and impurity measurement_v00

Upload: taaloos

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 [CS-402] Assignment-2 Proximity and Impurity Measurement_v00

    1/3

    BSCSHonors Program CS-402 GIFT University Gujranwala

    Page 1 of 3

    Course: Data Mining 03-December-2012 (Fall 2012)

    Resource Person: Nadeem Qaisar Mehmood ASSIGNMENT2 (Proximity & Classification)

    Total Points: 50Submission Due: Saturday 8

    thDecember, 2012

    Instruc tions : Please Read Careful ly!

    This is a group assignment. A group must have at most 3 members (but not more than this)

    Each individual member must pass the viva for this assignment to get any marks for this assignment.The viva will be conducted after the submission of this assignment.

    Please do not copy the assignment. All copies will be awarded a straightforward ZERO. However youare allowed to share the ideas and helping each other in discussion.

    You are expected to submit this assignment as:

    a. A single .zip file containing all the source files of your implementation. This zip file must benamed as CS402-AS02-(ROLLNUMBER1)(ROLLNUMBER2).zip and nothing else!

    Assignment is to be submitted electronically via email [email protected] the dead line.

    a. The subject of the email should be: CS402-AS02-(ROLLNUM BER1) (ROLLNUMBER2).

    b. Attach the zip file to the email.

    c. Keep the body of the email as empty.

    d. Send a copy of your email to you other group member.

    Send this email to the above address on or before the due date and time.

    There will be a 25% penalty against late submissions.

    No assignment will be submitted afterSunday 9th

    December 2012. You have to follow above strict

    dead lines.

    NOTE: You must pass the subsequent viva of this assignment to actually have any

    marks for this assignment.

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 7/30/2019 [CS-402] Assignment-2 Proximity and Impurity Measurement_v00

    2/3

    BSCSHonors Program CS-402 GIFT University Gujranwala

    Page 2 of 3

    1. For the fol lowing vectors , xxxandyyy, Wri te a program that accepts the x and y

    vecto rs and calculate the ind icated sim ilari ty or distanc e measures. [25]

    1) x=(1,1,1,1), y=(2,2,2,2) cosine, correlation, Hamming, Euclidean

    2) x=(0,1,0,1), y=(1,0,1,0) cosine, correlation, Hamming, Euclidean, Jaccard, SMC3) x=(1,1,0,1,01), y=(1,1,1,0,0,1) cosine, correlation, Jaccard, SMC

    Note: Program only one function for each asked similarity measure which will be enough for each

    individual vector for each of the above three sub parts.

    2. Find prox imity calculat ion between the fol lowing document vectors [25]

    In a document vector each attribute is a component of a vector. The value of each component is thenumber of times the corresponding term occurs in the document. Such kinds of term vectors are

    provided in the following document vectors. You have to write a program which shall find cosine

    based complete proximity matrix against the following document vectors.

    Note: To find details about cosine based proximity measurement, please refer to Tans book chapter 02 at

    page number 75. You can reuse the cosine calculation function programmed above in question number 01.

    3. Program for Imp uri ty measurement Calculat ion (50 Marks)

    Following data contains information about the eye patients who got subscription to use lenses based on their disease age,

    spectacle and stigma reports. The disease age varies between young, pre-presbyopic, and presbyopic. Howeve

    spectacle prescription would be myope and hypermetrope. Either a patient can have astigma or not. Use this data for

    a binary classification problem to find the following measures:

    AGE ASTIGMA SPECTACLE CONTACT LENSES

    1 Young No Myope Soft

    2 Young Yes Myope Hard

    3 Young No Hypermetrope Soft

    4 Young Yes Hypermetrope Hard

    Document 1

    season

    timeout

    lost

    wi

    ngame

    score

    ball

    play

    coach

    team

    Document 2

    Document 3

    3 0 5 0 2 6 0 2 0 2

    0

    0

    7 0 2 1 0 0 3 0 0

    1 0 0 1 2 2 0 3 0

  • 7/30/2019 [CS-402] Assignment-2 Proximity and Impurity Measurement_v00

    3/3

    BSCSHonors Program CS-402 GIFT University Gujranwala

    Page 3 of 3

    5 pre-presbyopic No Myope Soft

    6 pre-presbyopic Yes Myope Hard

    7 pre-presbyopic No Hypermetrope Soft

    8 Presbyopic Yes Myope Hard

    9 Presbyopic No hypermetrope Soft

    I. Compute the GINI and Entropy for the overall collection of the training examples.

    II. Compute the GINI index for the Age attribute.

    III. Compute the GINI index for the Astigma attribute.

    IV. Compute the GINI index for the Spectacle attribute.

    V. Which attribute is the better one? Age, Astigma or Spectacle?

    Note: You may extend it for entropy measurement. (Optional)

    Note: It is optional for you to do this assignment using C++ or Java programming languages only.

    END OF ASSIGNMENT