machine learning in practice midterm review

Machine Learning in PracticeMidTerm ReviewCarolyn Penstein Rosé

Kishore Prahallad

Language Technologies Institute

Error Analysis

Error Analysis from Assgn4 === Confusion Matrix === a b c d e f g h <-- classified as 1443 105 98 127 141 27 396 73 | a = irf03 934 190 125 264 86 58 604 163 | b = irf04 985 95 350 219 108 69 863 134 | c = irf06 841 152 177 774 127 80 524 161 | d = irf07 269 98 25 78 111 180 1208 294 | e = irm02 369 94 27 61 95 438 1062 235 | f = irm05 241 70 43 38 123 216 1457 88 | g = irm06 470 66 38 55 73 211 1188 422 | h = irm07


Diagonal Elements are non-zero


Diagonal Elements are non-zero

NON-Diagonal Elements should be

Zero

Try to find an explanation for large error cells in confusion matrix

From Assgn6 === Stratified cross-validation === === Summary ===

Correctly Classified Instances 77 51.3333 % Incorrectly Classified Instances 73 48.6667 %

Kappa statistic 0.0235 === Confusion Matrix ===

a b <-- classified as33 40 | a = negative33 44 | b = positive

Ranked Attributes Ranked attributes: 16.6146 6465 life 15.3272 996 bad 14.3417 7565 nothing 12.3659 2625 created 12.24 12337 world 11.7684 7798 others 10.8115 10654 stupid 9.6538 11050 terrible 9.5345 2552 could 9.0771 3388 dream 8.86 11285 top 8.4936 1992 children

Add Bigrams (only) and select Top 5 Attributes

Correctly Classified Instances 87 58% Incorrectly Classified Instances 63 42 % Kappa statistic 0.1414 === Confusion Matrix ===

a b <-- classified as12 61 | a = negative 2 75 | b = positive

What these Top 5 Attributes are? Ranked attributes: 7.745 22 entir_movi 5.456 23 fall_flat 5.456 42 million_dollar 4.904 56 support_role 4.904 59 visual_effect

Add All Features and do a Naïve Bayes

Correctly Classified Instances 107 71.3% Incorrectly Classified Instances 43 28.6% Kappa statistic 0.4252 === Confusion Matrix ===

a b <-- classified as49 24 | a = negative19 58 | b = positive

Methods of Analyzing Error Confusion amongst classes

Check the confusion Matrix Check to see what is common across these

two classes Find out ways to remove these commonalities by

feature extraction or selection

AlgorithmsOften errors are also due to nature of ML

algorithm used Experiment with different algorithms

machine learning in practice midterm review

Documents

confusion matrixtry

confusion matrix check

classified instances

large error cells

b c d e f g h error

confusion matrixfrom

b methods

kappa statistic