using error-correcting codes for text classification

Using Error-Correcting Codes For Text Classification

Rayid Ghanirayid@cs.cmu.edu

This presentation can be accessed at http://www.cs.cmu.edu/~rayid/talks/

Outline Introduction to ECOC Intuition & Motivation Some Questions? Experimental Results Semi-Theoretical Model Types of Codes Drawbacks Conclusions

Introduction Decompose a multiclass

classification problem into multiple binary problems One-Per-Class Approach (moderately

expensive) All-Pairs (very expensive) Distributed Output Code (efficient but

what about performance?) Error-Correcting Output Codes (?)

Is it a good idea? Larger margin for error since errors can

now be “corrected” One-per-class is a code with minimum

hamming distance (HD) = 2 Distributed codes have low HD

The individual binary problems can be harder than before

Useless unless number of classes > 5

Training ECOC

Given m distinct classes

Create an m x n binary matrix M.

Each class is assigned ONE row of M.

Each column of the matrix divides the classes into TWO groups.

Train the Base classifiers to learn the n binary problems.

Testing ECOC To test a new instance

Apply each of the n classifiers to the new instance

Combine the predictions to obtain a binary string(codeword) for the new point

Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

f1 f2 f3 f4 f5

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

f1 f2 f3 f4 f5

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

f1 f2 f3 f4 f5

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

f1 f2 f3 f4 f5

X 1 1 1 1 0

Questions? How well does it work? How long should the code be? Do we need a lot of training data? What kind of codes can we use? Are there intelligent ways of creating

the code?

Previous Work Combine with Boosting –

ADABOOST.OC (Schapire, 1997), (Guruswami & Sahai, 1999)

Local Learners (Ricci & Aha, 1997) Text Classification (Berger, 1999)

Experimental Setup Generate the code

BCH Codes Choose a Base Learner

Naive Bayes Classifier as used in text classification tasks (McCallum & Nigam 1998)

Dataset Industry Sector Dataset

Consists of company web pages classified into 105 economic sectors

Standard stoplist No Stemming Skip all MIME headers and HTML tags Experimental approach similar to

McCallum et al. (1998) for comparison purposes.

Results

Industry Sector Data Set

Naïve Bayes

Shrinkage1 ME2 ME/ w Prior3

ECOC 63-bit

66.1% 76% 79% 81.1% 88.5%

ECOC reduces the error of the Naïve Bayes Classifier by 66%

1. (McCallum et al. 1998) 2,3. (Nigam et al. 1999)

The Longer the Better!Naive Bayes Classifier15-bit ECOC 31-bit ECOC 63-bit ECOC

Accuracy(%) 65.3 77.4 83.6 88.1

Table 2: Average Classification Accuracy on 5 random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000 words selected using Information Gain.

Longer codes mean larger codeword separation

The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C

If minimum hamming distance is h, then the code can correct (h-1)/2 errors

Size Matters?

Variation of accuracy with code length and training size

0 20 40 60 80 100

Training size per class

%) SBC

Size does NOT matter!

Percent Decrease in Error with Training size and length of code

0 20 40 60 80 100

Training Size

Semi-Theoretical Model Model ECOC by a Binomial Distribution B(n,p)

n = length of the codep = probability of each bit being classified

incorrectly

incorrectly# of Bits Hmin Emax Pave Accuracy

15 5 2 .85 .59

15 5 2 .89 .80

15 5 2 .91 .84

31 11 5 .85 .67

31 11 5 .89 .91

31 11 5 .91 .94

63 31 15 .89 .99

incorrectly# of Bits Hmin Emax Pave Accuracy

15 5 2 .85 .59

15 5 2 .89 .80

15 5 2 .91 .84

31 11 5 .85 .67

31 11 5 .89 .91

31 11 5 .91 .94

63 31 15 .89 .99

Theoretical Vs. Experimental AccuracyVocabsize=10000

15 15 15 31 31 31 63

Length of Code

Theoretical

Exprerimental

Types of CodesTypes of Codes Data-Independent Data-Dependent

Algebraic

Random

Hand-Constructed

Adaptive

What is a Good Code? Row Separation Column Separation (Independence

of errors for each binary classifier) Efficiency (for long codes)

Choosing Codes

Random Algebraic

Row Sep On AverageFor long codes

Guaranteed

Col Sep On AverageFor long codes

Can be Guaranteed

Efficiency No Yes

Experimental Results

Code Min Row HD

Max Row HD

Min Col HD

Max Col HD

Error Rate

15-Bit BCH

5 15 49 64 20.6%

19-Bit Hybrid

5 18 15 69 22.3%

15-bit Random

2 (1.5)

13 42 60 24.1%

Drawbacks Can be computationally expensive Random Codes throw away the real-

world nature of the data by picking random partitions to create artificial binary problems

Future Work Combine ECOC with Co-Training Automatically construct optimal /

adaptive codes

Conclusion Improves Classification Accuracy

considerably! Can be used when training data is sparse Algebraic codes perform better than

random codes for a given code lenth Hand-constructed codes are not the

answer

using error-correcting codes for text classification

errorcorrecting codes

longer codes

1abcdadcbf1 f2 f3 f4

n binary problems

errorcorrecting output

distributed codes

binary stringcodeword

kind of codes

Documents

practical session 10 error detecting and correcting codes

error-correcting codes and frames with erasures

an introduction to error correcting codes

dc error correcting codes

error-correcting codes from permutation groups

analysis of reed solomon error correcting codes on...

error correcting codes

information theory and error-correcting codes in g

hardware accelerator for efficient error-correcting codes

linear-time encodable and decodable error-correcting codes

using error-correcting codes for text classification

v2_n5 error correcting codes (part i).pdf

error correcting codes macwilliams sloane 1977 785s

kdd project report using error-correcting codes for

elec 6131: error detecting and correcting codes

coen 180 erasure correcting, error detecting, and error...

error correcting codes for cooperative broadcasting

c5-error correcting codes

error-correcting codes: classical to quantum

selecting error correcting codes to minimize power in...