pebl: web page classification without negative examples

Post on 13-Jan-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

PEBL: Web Page Classification without Negative Examples. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

PEBL: Web Page Classification without

Negative Examples

Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan

IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004

Presented by Chirayu Wongchokprasitti

Introduction

Web page classification is one of the main techniques for Web mining

Constructing a classifier requires positive and negative training examples

Cautious to avoid bias and laborious to collect negative training examples

Typical Learning Framework

Positive Example Base Learning (PEBL) Framework

Learn from positive data and unlabeled data

Unlabeled data indicates random samples of the universal set

Apply the Mapping-Convergence (M-C) Algorithm

Mapping-Convergence (M-C) Algorithm

Divide into 2 stages Mapping stage

Use any classifier that does not generate false negatives

They chose 1-DNF ( monotone Disjunctive Normal Form)

Convergence stage For maximizing margin They chose SVM (Support Vector Machine)

Mapping Stage

Use a weak classifier to draw an initial approximation of “strong” negative data.

First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features.

If feature frequency in positive data is larger than one in the universal data, it is a strong positive

Filter out any possible positive, leaving only strong negatives.

Convergence Stage

Use SVM to scope down the class boundary Iterate SVM for certain times to extract

negative data from unlabeled data The boundary will converge into the true

boundary.

Support Vector Machines

Visualization of a Support Vector Machine

Convergence of SVM

Data Flow Diagram

Experimental Results

Report the result with precision-recall breakeven point (P-R)

Experiment 1: the Internet Use DMOZ as the universal set

Experiment 2: University CS department Use WebKB data set

Mixture Models

Experiment 1

Experiment 2

Mixture Models

Summary and Conclusions

PEBL framework eliminates the need for manually collecting negative training examples

The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM

PEBL needs faster training time

top related