pebl: web page classification without negative examples

PEBL: Web Page Classification without

Negative Examples

Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan

IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004

Presented by Chirayu Wongchokprasitti

Introduction

Web page classification is one of the main techniques for Web mining

Constructing a classifier requires positive and negative training examples

Cautious to avoid bias and laborious to collect negative training examples

Typical Learning Framework

Positive Example Base Learning (PEBL) Framework

Learn from positive data and unlabeled data

Unlabeled data indicates random samples of the universal set

Apply the Mapping-Convergence (M-C) Algorithm

Mapping-Convergence (M-C) Algorithm

Divide into 2 stages Mapping stage

Use any classifier that does not generate false negatives

They chose 1-DNF ( monotone Disjunctive Normal Form)

Convergence stage For maximizing margin They chose SVM (Support Vector Machine)

Mapping Stage

Use a weak classifier to draw an initial approximation of “strong” negative data.

First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features.

If feature frequency in positive data is larger than one in the universal data, it is a strong positive

Filter out any possible positive, leaving only strong negatives.

Convergence Stage

Use SVM to scope down the class boundary Iterate SVM for certain times to extract

negative data from unlabeled data The boundary will converge into the true

boundary.

Support Vector Machines

Visualization of a Support Vector Machine

Convergence of SVM

Data Flow Diagram

Experimental Results

Report the result with precision-recall breakeven point (P-R)

Experiment 1: the Internet Use DMOZ as the universal set

Experiment 2: University CS department Use WebKB data set

Mixture Models

Experiment 1

Experiment 2

Mixture Models

Summary and Conclusions

PEBL framework eliminates the need for manually collecting negative training examples

The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM

PEBL needs faster training time

pebl: web page classification without negative examples

positive data

universal data

data engineering

unlabeled dataunlabeled

strong positive features

convergence stageuse

negative exampleshwanjo

strong negatives

Documents

pebl: web page classification without negative examples

fasoeducation.netfasoeducation.net/espace_eleves/annales/terminale/annales_anglais_tle_cd.pdf10...

pebl c-2 unit 1: media. spotlight on media exploring the...

some neat results from assignment 1. assignment 1: negative...

negative examples for sequential importance sampling of...

beginning algebra — lesson 25 work the following examples...

inductive logic programming: the problem specification...

the pebl manualpebl.sourceforge.net/peblmanual.pdf ·...

dont hide from the potentially negative lessons learned &...

examples of riemannian manifolds with non-negative

chapter 7 deviance. norms guide almost all human activities...

d tic 10 electe il · for example, are the examples...

motorola pebl u6 for t-mobile

examples of riemannian manifolds with...

· web view2.explain and give examples of merit goods,...

chapter 5a reinforcement - lane community college media...

carole seheult . negative consequences of pressure some...

integers: are positive and negative whole numbers examples:...

more algebra! oh no!. examples: 6 * 3 = 18, positive *...

the art of negative space - 25 amazing examples