468 ieee transactions on knowledge and data...

17
Uncertain One-Class Learning and Concept Summarization Learning on Uncertain Data Streams Bo Liu, Yanshan Xiao, Philip S. Yu, Fellow, IEEE, Longbing Cao, Senior Member, IEEE, Yun Zhang, and Zhifeng Hao Abstract—This paper presents a novel framework to uncertain one-class learning and concept summarization learning on uncertain data streams. Our proposed framework consists of two parts. First, we put forward uncertain one-class learning to cope with data of uncertainty. We first propose a local kernel-density-based method to generate a bound score for each instance, which refines the location of the corresponding instance, and then construct an uncertain one-class classifier (UOCC) by incorporating the generated bound score into a one-class SVM-based learning phase. Second, we propose a support vectors (SVs)-based clustering technique to summarize the concept of the user from the history chunks by representing the chunk data using support vectors of the uncertain one- class classifier developed on each chunk, and then extend k-mean clustering method to cluster history chunks into clusters so that we can summarize concept from the history chunks. Our proposed framework explicitly addresses the problem of one-class learning and concept summarization learning on uncertain one-class data streams. Extensive experiments on uncertain data streams demonstrate that our proposed uncertain one-class learning method performs better than others, and our concept summarization method can summarize the evolving interests of the user from the history chunks. Index Terms—Data streams, data of uncertainty Ç 1 INTRODUCTION I N one-class learning, only one class of samples is labeled in the training phase [51]. The labeled class is typically called the target/positve class, while all other samples not in this class are called the nontarget class. In some real-world applications, such as anomaly detection [43], [21], [52], it is easy to obtain one class of normal data, whereas collecting and labeling abnormal instances may be expensive or impossible. In such cases, one-class learning has been studied to learn a distinctive classifier from the labeled target class, and then utilize the learned one-class classifier to decide whether a test instance belongs to the target class or not. To date, one-class learning has been found a large variety of applications from anomaly detection [43], [21], [52], document classification [39], automatic image annota- tion [26], [32], authorship verification [28], transcription factor binding site recognition [24], change detection [10] to sensor data drift detection [50]. Depend on the principle of the learning models, the previous work on one-class learning can be classified into two broad categories: 1) static-data-based methods [45], [51], [37], [36], [57], [17], in which one-class classifier is built only by the labeled target class [45], [51], or one first extracts negative examples from the unlabeled data if the unlabeled data is available, and then constructs a binary classifier based on the labeled target class and the extracted negative class. For example, one-class SVM [45] has been proposed to build a one-class classifier by mapping the target data into a feature space and constructing a hyperplane to separate the target class and the origin of the feature space with maximum margin. The learned classifier is then utilized to classify a test sample into target class or nontarget class. For another group of methods [37], [36], [33], [57], [17], which are developed for document-based one-class learn- ing, they consist of two steps: in the first step, a set of reliable negative documents from unlabeled documents is identified; in the second step, a binary classifier is built based on the target class and the extracted negative documents. For example, in step one, the S-EM method [37] uses a Spy technique, PEBL [57] uses a 1-DNF technique, and the Roc-SVM [33] uses Rocchio method to extract negative documents. The method in [17] iteratively clusters the data into microclusters and extracts representa- tive documents from the unlabeled data. In step two, S-EM uses the EM algorithm, while PEBL and Roc-SVM use SVM 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014 . B. Liu is with the Department of Automation, Guangdong University of Technology, Guangzhou, Guangdong, China 510006, and with the Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607. E-mail: [email protected]. . Y. Xiao and Z. Hao are with the Department of Computer Science, Guangdong University of Technology, Guangzhou, Guangdong, China 510006. E-mail: [email protected], [email protected]. . P.S. Yu is with the Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, and with the Department of Computer Science, King Abdulaziz University Jeddah, Saudi Arabia. E-mail: [email protected]. . L. Cao is with the Faculty of Engineering and Information Technology, University of Technology, PO Box 123 Broadway, New South Wales 2007, Sydney, Australia. E-mail: [email protected]. . Y. Zhang is with the Department of Automation, Guangdong University of Technology, Guangzhou, Guangdong, China 510006. E-mail: [email protected]. Manuscript received 17 Mar. 2012; revised 13 Aug. 2012; accepted 12 Nov. 2012; published online 28 Nov. 2012. Recommended for acceptance by B.C. Ooi. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TKDE-2012-03-0175. Digital Object Identifier no. 10.1109/TKDE.2012.235. 1041-4347/14/$31.00 ß 2014 IEEE Published by the IEEE Computer Society

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

Uncertain One-Class Learning and ConceptSummarization Learning on Uncertain

Data StreamsBo Liu, Yanshan Xiao, Philip S. Yu, Fellow, IEEE, Longbing Cao, Senior Member, IEEE,

Yun Zhang, and Zhifeng Hao

Abstract—This paper presents a novel framework to uncertain one-class learning and concept summarization learning on uncertain

data streams. Our proposed framework consists of two parts. First, we put forward uncertain one-class learning to cope with data of

uncertainty. We first propose a local kernel-density-based method to generate a bound score for each instance, which refines the

location of the corresponding instance, and then construct an uncertain one-class classifier (UOCC) by incorporating the generated

bound score into a one-class SVM-based learning phase. Second, we propose a support vectors (SVs)-based clustering technique to

summarize the concept of the user from the history chunks by representing the chunk data using support vectors of the uncertain one-

class classifier developed on each chunk, and then extend k-mean clustering method to cluster history chunks into clusters so that we

can summarize concept from the history chunks. Our proposed framework explicitly addresses the problem of one-class learning and

concept summarization learning on uncertain one-class data streams. Extensive experiments on uncertain data streams demonstrate

that our proposed uncertain one-class learning method performs better than others, and our concept summarization method can

summarize the evolving interests of the user from the history chunks.

Index Terms—Data streams, data of uncertainty

Ç

1 INTRODUCTION

IN one-class learning, only one class of samples is labeledin the training phase [51]. The labeled class is typically

called the target/positve class, while all other samples not inthis class are called the nontarget class. In some real-worldapplications, such as anomaly detection [43], [21], [52], it iseasy to obtain one class of normal data, whereas collectingand labeling abnormal instances may be expensive orimpossible. In such cases, one-class learning has beenstudied to learn a distinctive classifier from the labeledtarget class, and then utilize the learned one-class classifierto decide whether a test instance belongs to the target classor not. To date, one-class learning has been found a large

variety of applications from anomaly detection [43], [21],[52], document classification [39], automatic image annota-tion [26], [32], authorship verification [28], transcriptionfactor binding site recognition [24], change detection [10] tosensor data drift detection [50].

Depend on the principle of the learning models, theprevious work on one-class learning can be classified intotwo broad categories: 1) static-data-based methods [45],[51], [37], [36], [57], [17], in which one-class classifier is builtonly by the labeled target class [45], [51], or one first extractsnegative examples from the unlabeled data if the unlabeleddata is available, and then constructs a binary classifierbased on the labeled target class and the extracted negativeclass. For example, one-class SVM [45] has been proposed tobuild a one-class classifier by mapping the target data into afeature space and constructing a hyperplane to separate thetarget class and the origin of the feature space withmaximum margin. The learned classifier is then utilizedto classify a test sample into target class or nontarget class.For another group of methods [37], [36], [33], [57], [17],which are developed for document-based one-class learn-ing, they consist of two steps: in the first step, a set ofreliable negative documents from unlabeled documents isidentified; in the second step, a binary classifier is builtbased on the target class and the extracted negativedocuments. For example, in step one, the S-EM method[37] uses a Spy technique, PEBL [57] uses a 1-DNFtechnique, and the Roc-SVM [33] uses Rocchio method toextract negative documents. The method in [17] iterativelyclusters the data into microclusters and extracts representa-tive documents from the unlabeled data. In step two, S-EMuses the EM algorithm, while PEBL and Roc-SVM use SVM

468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

. B. Liu is with the Department of Automation, Guangdong University ofTechnology, Guangzhou, Guangdong, China 510006, and with theDepartment of Computer Science, University of Illinois at Chicago,Chicago, IL 60607. E-mail: [email protected].

. Y. Xiao and Z. Hao are with the Department of Computer Science,Guangdong University of Technology, Guangzhou, Guangdong, China510006. E-mail: [email protected], [email protected].

. P.S. Yu is with the Department of Computer Science, University of Illinoisat Chicago, Chicago, IL 60607, and with the Department of ComputerScience, King Abdulaziz University Jeddah, Saudi Arabia.E-mail: [email protected].

. L. Cao is with the Faculty of Engineering and Information Technology,University of Technology, PO Box 123 Broadway, New South Wales 2007,Sydney, Australia. E-mail: [email protected].

. Y. Zhang is with the Department of Automation, Guangdong University ofTechnology, Guangzhou, Guangdong, China 510006.E-mail: [email protected].

Manuscript received 17 Mar. 2012; revised 13 Aug. 2012; accepted 12 Nov.2012; published online 28 Nov. 2012.Recommended for acceptance by B.C. Ooi.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-2012-03-0175.Digital Object Identifier no. 10.1109/TKDE.2012.235.

1041-4347/14/$31.00 � 2014 IEEE Published by the IEEE Computer Society

Page 2: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

to iteratively build a binary classifier. 2) Stream-data-basedmethods [34], [60], [38], [59], in which the static-data-basedmethods are extended into data stream environment, sinceone-class data such as sensor network and intrusiondetection data, is always collected in a data streamenvironment [55], [48], [19]. For example, one-class SVMmethod is used in data streams for one-class learning with asmall number of labeled target data and large unlabeleddata [60], [59]. In addition, the document-based one-classlearning method has been extended into text-based datastreams by extracting reliable negative documents from theunlabeled data and constructing an SVM-based classifier inthe data stream environment [34].

Despite much progress in this area, most of the existingwork on one-class data stream learning has not explicitlydealt with the uncertainty of the input data. They are allbased on an underlying assumption that the training dataset does not contain any uncertainty information. However,data in many real-world applications is uncertain in nature[7], [5], [3], [35], [12]. This is because data collectionmethodologies are only able to capture a certain level ofinformation, making the extracted data incomplete orinaccurate [7]. For example, physical measurements cannever be precise in theory (due to Heisenbergs UncertaintyPrinciple) and limitations of measuring devices thus induceuncertainty to the measured values in practice [27]. Anotherexample is that, in environmental monitoring applications,sensor networks typically generate a large amount ofuncertain data because of instrument errors, limitedaccuracy or noise-prone wireless transmission [7]. Thiskind of uncertain information, typically ignored in most ofthe one-class data stream learning, is important and shouldbe considered in the learning phase to build a moreaccurate one-class classifier. Another important observa-tion, in comparison with binary or multiclass data streams,is that user only labels samples of interest to them aspositive/target class and will not need to label/care aboutwhat they do not prefer in one-class data streams. In thiscase, when the user’s interest/concept evolved in the one-class data streams changes as time goes on, we caninvestigate the positive/target class of one-class stream tosummarize the interest of user. For example, in the textualdata streams, a user may be interested in EntertainmentNews, while the interest of the user drift to Military Newafter a while, we can determine the concept drift of the userby examining the change of the positive/target classes.Therefore, after labeling the data streams for a certainamount of time, there is a need to reproduce conceptsummarization for user’s interest from the positive/targetclass of the data streams. More generally, we might need todynamically summarize user’s concept at any particulartime, without referring historical stream data. For example,in Fig. 1, we summarize the discrete concept drift of a userover the data streams, in which the horizontal axis denotesthe chunk number and the upright axis means the conceptof the user. Thus, concept summarization can capture user’sinterests and restore the concept drift relationships duringthe data streams.

In this paper, we address the problem of one-classlearning on uncertain data streams and concept summar-ization learning of the user from history data streams. Wepropose a novel framework, called uncertain one-classlearning and concept summarization learning framework

(UOLCS) on uncertain data streams, which copes with dataof uncertainty and the concept summarization learning inuncertain one-class data streams. UOLCS consists of twoparts. In the first part, we construct an uncertain one-classclassifier (UOCC) by incorporating the uncertain informa-tion into the one-class SVM learning phase to build a moreaccurate classifier. In the second part, we summarize user’sconcept drift from data streams by developing a supportvectors (SVs)-based clustering technique over the historychunks. The main contribution of our work can besummarized as follows:

1. We propose a local kernel-density-based methodto generate a bound score for each instance byinvestigating its local nearest neighbors in thefeature space. By using the generated bound score,we estimate the range of uncertainty such that wecan refine the location of the instance in the sequentuncertain one-class learning phase.

2. To cope with data uncertainty, the generated boundscore is thereafter incorporated into the one-classSVM learning phase to build an uncertain one-classclassifier. In this phase, we advice the usage of aninteractive framework to mitigate the effect of noiseon the one-class classifier. To the best of ourknowledge, this is the first time to explicitly handledata uncertainty in one-class uncertain data streams.

3. We propose a support vectors-based clustering(SVBC) method to summarize concept by extendingk-means clustering method on the support vectors ofeach uncertain one-class classifier derived from thehistory chunks. In the first step, we collect thesupport vectors of the uncertain one-class classifieron each data chunk by considering the characteristicof the one-class SVM that support vectors candescribe the structure of the data distribution [45],[51]. In the second step, we extend the k-meansclustering method to cluster the history chunksrepresented by the support vectors into severalclusters, and each cluster denotes one concept ofthe user. Based on this, we can summarize theconcept of the user over the data streams.

4. We conduct extensive experiments to evaluate theperformance of our proposed UOLCS framework.The statistical results show that the uncertain one-class classifier outperforms state-of-the-art one-class

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 469

Fig. 1. Illustration of discrete concept drift of the user in one-class datastream. Horizontal axis represents the chunk number, the upright axisdenotes the concepts of the user

Page 3: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

learning method in terms of performance andsensitivity to noise added into data. In addition,SVBC can summarize evolving interests of the userin comparison with the existing method.

The remainder of the paper is organized as follows:Section 2 discusses previous work related to this paper.Section 3 introduces the preliminaries of our study. Section 4presents our proposed approach to uncertain one-classlearning and concept summarization learning. Section 5reports substantial experimental results on mining uncer-tain data streams. Section 6 concludes the paper anddiscusses possible directions for future work. Section 8 isthe Appendix.

2 RELATED WORK

In this section, we briefly review previous work related toour study. We first review the previous methods onuncertain data in Section 2.1, and then introduce thestrategy of building predictive classifier for data stream inSection 2.2.

2.1 Mining Uncertain Data

In recent years, many advanced techniques have beendeveloped to collect and store large quantities of data; whilesome records in the data might be degraded due to noise,precision of equipment or other factors, which leads tomissing or partially complete data [7]. Thus, the data objectsmay be only vaguely specified and considered uncertain intheir representation [7]. To date, many algorithms havebeen developed to address uncertain data [7], [44], [9], [2].We briefly review the previous work from clustering,classification, and other category [7].

For the clustering methods with uncertain data, theyalways extend the original clustering methods to cope withdata of uncertainty. We briefly review some of them asfollows: FDBSCAN [29], developed on DBSCAN [16],probabilistically specifies the uncertain distances betweenobjects. FOPTICS [30] introduces a fuzzy distance functionto measure the similarity between uncertain data on top ofthe hierarchical density-based clustering algorithm. UK-means [42] assigns an object to the cluster whose represen-tative has the smallest expected distance from the object.Aggarwal [1] uses density-based approach to handle error-prone and missing data. The method in [27] studies theproblem of clustering uncertain objects whose locations aredescribed by probability density functions and uses Voronoidiagrams and R-Tree index to cluster uncertain data.

For the classification methods on uncertain data, thestandard binary SVM is extended to handle uncertain data[47], [8], [23], which provide a geometric algorithm byoptimizing the probabilistic separation between the twoclasses on both sides of the boundary. In addition, Gao andWang [18] mine discriminative patterns from uncertain dataand construct a distinctive classifier for uncertain data.Tsang et al. [53] propose a series of pruning techniques fordecision tree to build classifier for uncertain data.

In addition, frequent pattern mining on uncertain data isinvestigated in [13], in which the probability of an itembelonging to a particular transaction is typically modeled.The work in [49] studies the discovery of frequent patterns

and association rules from probabilistic data under thepossible world semantics. In addition, outlier detectionwith uncertain data has been studied in [6], which drawsmultiple samples from the data and computes the fractionof the samples. The algorithms for mining uncertain graphdata are also developed [25], [61]. Murthy et al. [41]propose aggregation work in probabilistic databases whileYuen et al. [58] use nearest neighbor search on uncertainspatial databases.

The above methods on uncertain data are developed forstatic uncertain data. Since data is always obtained instream environment, uncertain data streams have beeninvestigated. The method of clustering uncertain datastreams has been discussed in [5], which incorporates errorstatistics and the micro-clustering concept into learningphase [3]. The similarity join processing is developed onuncertain data streams [35]. In addition, continuous sub-graph pattern search over certain and uncertain graphstreams is also developed [12].

Despite much progress on uncertain data mining, mostof the previous work has not explicitly dealt with one-classlearning on uncertain data. This paper proposes anuncertain one-class learning and concept summarizationlearning framework to cope with the data of uncertaintyand the concept summarization learning. In this frame-work, uncertain one-class classifier extends the standardone-class SVM for uncertain data. Although UOCC is asupport vector method, it is different from uncertain binarySVM [47], [8], [23]. First, we propose a local kernel-density-based method to generate a bound score for each instance.Second, we simplify our optimization problem (8) into astandard QP (quadratic programming) optimization pro-blem (9) by considering the characteristic of one-classlearning, which has difference from uncertain binary SVM[47], [8], [23]. Third, we put forward concept summariza-tion learning in the one-class data streams to summarize theconcept of the user.

2.2 Building Predictive Classifier for Data stream

One characteristic of data stream is concept drift. There aretwo main categories of solutions to build a predictiveclassifier from data streams: incremental learning [14] andensemble learning [19], [55]. Incremental learning aims atmaking use of new data to update the models trained fromhistorical streaming data, so that the learning process canadapt to the changing concepts. Ensemble learning, on theother hand, divides the data stream into a sequence of datachunks and combines the base classifiers learned onindividual data chunks to form an ensemble classifier forprediction. As a result, various ensemble methods, such asweighted voting, dynamic voting, and dynamic selection,have been proposed to determine the weight for each baseclassifier to construct the ensemble classifier. By combininga group of individual classifiers, ensemble learning hasdemonstrated the capability of yielding higher predictionaccuracy than just using a single classifier.

In this paper, we first build uncertain one-class classifierand then ensemble the classifiers for prediction. For theSVM, [11], [31], [20] introduce the incremental andadaptation SVM. In the experiments, we will compare the

470 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Page 4: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

performance of the ensemble classifier and incrementalone-class SVM.

3 PRELIMINARY

We introduce the one-class SVM which is used in one-classdata stream learning in Section 3.1, and then introduce theone-class uncertain stream problem definition in Section 3.2.

3.1 One-Class SVM

Since one-class data stream learning always extends theone-class methods developed for the static data, theprevious work can be classified into two categories. Forthe methods in the first category [37], [36], [33], [57], [17],they are developed for document-related one-class pro-blems. They extract negative samples from the unlabeledexamples and then construct a binary classifier by targetsamples and the extracted negative samples.

In another category, one-class SVM and support vectordata description (SVDD) are representative methods [45],[51]. Both methods construct one-class classifier using onlythe target class. The advantage of them is that they can copewith any one-class classification problem. In this paper, westudy one-class learning in this category. Because one-classSVM and SVDD have been proved to have the samedecision classifier [46], [51], we briefly introduce one-classSVM [46], [45] as follows:

Suppose the training target class is S ¼ fx1;x2; . . . ;xjSjg,where xi 2 Rn. In one-class SVM, input data is mappedfrom the input space into a feature space via kernelfunction, in which the inner product of two vectors �ðxÞand �ðxiÞ can be calculated by a kernel function Kðx;xiÞ ¼�ðxÞ � �ðxiÞ. Among a variety of kernel functions, RBFkernel is a typical one:

Kðx;xiÞ ¼ exp�� k x� xi k2

2=2�2�; ð1Þ

where � is a parameter. In the feature space, one-class SVMaims to determine a hyperplane to separate the target classand the origin of the input space with the maximum margin:

min1

2k w k2 ��þ 1

v � jSjXjSji¼1

�i

s:t: w � �ðxiÞ � �� �i�i � 0; i ¼ 1; 2; . . . ; jSj;

ð2Þ

where w is vector, parameter v is used to tradeoff the spherevolume and the errors

PjSji¼1 �i. By introducing Lagrangian

function [54], the optimization problem (2) is converted into

F ð�Þ ¼ min1

2

XjSji¼1

XjSjj¼1

�i � �jKðxi;xjÞ

s:t: 0 � �i �1

v � jSjXjSji¼1

�i ¼ 1 i ¼ 1; 2; . . . ; jSj:

ð3Þ

After resolving problem (3), we can obtain Lagrange

multipliers �i. The sample xi whose 0 < �i <1

v�jSj , resides

on the surface of hyperplane, and is called support vector.

The sample xj whose �j ¼ 1v�jSj , is misclassified and called

bounded support vector (BSV). In addition, w and � ¼w � �ðxÞ are then represented by Lagrange multipliers

�i : w ¼PjSj

j¼1 �j � �ðxjÞ; � ¼ w � �ðxÞ. For a test sample xt,

if w � �ðxtÞ > �, it is classified into the target class;

otherwise, it belongs to the nontarget class.In addition, the support vectors of one-class SVM can

describe the data distribution of the target class, the same assupport vector data description method [51], [46]. Asillustrated in Fig. 2a, the support vectors, i.e., the samplescovered the circles, support the one-class classifier anddescribe the data distribution of the target class. In theconcept summarization learning, we will utilize thischaracteristic of one-class SVM to summarize the conceptof the user over data streams.

3.2 Uncertain Data Stream Problem Definition

Suppose we have a series of data streams D1; D2; . . . ; Dm, inwhich each Dtðt ¼ 1; 2; . . . ;mÞ, called “chunk,” contains thedata that arrived between time tt�1 and tt. Here, Dc is calledthe current chunk and the yet-to-come data chunk (denotedas Dcþ1) is dedicated as the target chunk.

In the current chunk Dc, the labeled examples are putinto set PDc. Other examples, including nonlabeled targetclass examples and nontarget class examples, are put intoset UDc. We have following two targets.

1. Build an uncertain one-class SVM classifier aroundthe labeled target class examples PDc, and predictthe data label arriving in the yet-to-come chunk.

2. Summarize concept of the user from the historychunk data.

For the one-class data streams, we have the followingassumption: only instances in the current chunk Dc areaccessible, and once the algorithm moves from chunk Dc tochunk Dcþ1, all instances in chunk Dt; 1 � t � c, areinaccessible; we can only use the models trained from thehistorical chunks. This is because aggregating historicaldata always requires extra storage, and most data streammining algorithms are required to make predictions basedon one-scanning of the data streams without referring tohistorical data.

For the labeled class in PDc, we assume each sample xi issubject to an additive noise vector 4xi. Thus, the originaluncorrupted input xsi is denoted xsi ¼ xi þ4xi. In theory,we can assume 4xi follows a certain distribution. Inpractice, we may not have any prior knowledge on thenoise distribution. Alternatively, the method of bounded

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 471

Fig. 2. (a) Illustration of one-class SVM. (b) Illustration of reachabilityarea of instance xi.

Page 5: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

and ellipsoidal uncertainties has been investigated in [8]and [22]. In this situation, we consider a simple bound score�i for each instance xi, and the norm of xi is less than orequals to �i, that is

k 4xi k� �i: ð4Þ

Actually, this setting has a similar effect of assuming 4xihas a certain distribution. For example, if we assume 4xifollows a Gaussian noise model, that is

p���xi � xsi

��� � expkxi�xsik2

2�2 ;

The bound �i has a similar influence of the standarddeviation � in the Gaussian noise model. In addition, thesquared penalty term

kxi�xsi k2

2�2 is replaced by a constraintk 4xi k� �i.

We let xi þ4xiðk 4xi k� �iÞ denote the reachability areaof instance xi as illustrated in Fig. 2b.

We then have

k xsi k¼k xi þ4xi k�k xi k þ k 4xi k�k xi k þ �i: ð5Þ

In this way, xsi falls in the reachability area of xi. Byusing the bound score for each input sample, we canconvert the uncertain one-class learning into standard one-class learning with constraints.

4 PROPOSED APPROACH

In one-class-based data streams, subject to sampling errorsor device imperfections, the instance might be infected andthereafter is considered uncertain in its representation.Another observation is that, we may want to summarize theconcept drift of a user over the data streams. To deal withthe one-class leaning and concept summarization learningon uncertain data streams, we propose the uncertain one-class learning and concept summarization framework, asillustrated in Fig. 3. The proposed UOLCS frameworkconsist of two parts, the first part is to construct uncertainone-class classifier from uncertain data streams, the secondpart is concept summarization learning over the historydata streams.

In the following, we introduce uncertain one-classlearning in Section 4.1 and concept summarization learningin Section 4.2.

4.1 Uncertain One-Class Learning

In all, the uncertain one-class learning for uncertain datastreams consists of three steps, as illustrated in the part oneof Fig. 3.

. In the first step, we generate a bound score for eachinstance in PDc based on its local data behavior.

. In the second step, we incorporate this generatedbound score into the learning phase to interactivelybuild an uncertain one-class classifier.

. In the third step, we integrate the uncertain one-classclassifiers derived from the current and historicalchunks to predict the data in the target chunk.

In the following, we exhibit the three steps in detail. Forsimplicity, we detail the bound score determination anduncertain one-class classifier learning for the current chunkDc. We can generalize them on other chunks in the same way.

4.1.1 Local Kernel-Density-Based Method for Bound

Score Generation

In this section, we put forward local kernel-density methodto estimate the bound score in the feature space. In practice,we are unlikely to know the distribution of 4xi, and it willbe difficult to exactly determine the bound score for eachinstance. We propose a local kernel-density method togenerate a bound score for each instance by examining itslocal nearest neighbors in a kernel feature space. Morespecifically, we first identify the Dk

c nearest neighbors ofinstance xi in the feature space. The distance between xiand xj in the feature space is computed as

k �ðxiÞ � �ðxjÞ k¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiKðxi;xiÞ þKðxj;xjÞ � 2Kðxi;xjÞ

q:

ð6Þ

We then calculate the average distance of xi to its Dkc nearest

neighbors as follows:

DavðxiÞ ¼1

jSkðxiÞjXjSkðxiÞjj¼1

k �ðxjÞ � �ðxiÞ k; ð7Þ

where SkðxiÞ is the Dkc nearest neighbors set of xi.

In general, the smaller the DavðxiÞ, the greater the densityaround instance xi in the feature space.

We then let �i ¼ DavðxiÞ. The motivation behind thissetting is that: if a normal instance is corrupted by noisesuch that it resides far from other normal examples, we usethis setting to ensure the reachability area of xi can reachthe region other normal examples form. As illustrated inFig. 4, assuming xi is corrupted by noise, the inner rectanglein the left figure covers the four-nearest neighbors of xi; bycalculating the average distance of xi to its four-nearestneighbors based on (7), the corresponding reachability areaof xi can reach the area which its four-nearest neighborsforms. As illustrated in the right subfigure of Fig. 4, theinner circle which is calculated using four nearest neighborscan reach the regions the four-nearest neighbors cover.

472 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Fig. 3. Uncertain one-class learning and concept summarizationframework.

Page 6: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

Intuitively, a large Dkc will result in a large reachability

area of xi. As illustrated in Fig. 4, the reachability area of xicalculated using 12 nearest neighbors (outer circle in the

right figure) can cover a much greater area than that

computed by four nearest neighbors (inner circle in the

right figure). In general, a small value of Dkc may restrict the

reachability area of the sample, a too large value of Dkc will

need much time to calculate the average distance of the

sample to its Dkc nearest neighbors. Thus, we can set Dk

c as a

relatively large value in practice.

4.1.2 Uncertain One-Class Classifier Construction

After generating a bound score for each instance in PDc, the

next step is to build an uncertain one-class classifier to cope

with uncertain data. Below, we put forward an extended

formulation of the standard one-class SVM in feature space.Formulation of uncertain one-class classifier. Based on the

bound score generated for each sample, we extend the

standard one-class SVM (Optimization problem (2)) as

follows:

min1

2k w k2 ��þ 1

v � jPDcjXjPDcj

i¼1

�i

s:t: w � �ðxi þ4xiÞ � �� �i�i � 0; k 4xi k� �ii ¼ 1; 2; . . . ; jPDcj;

ð8Þ

where w is vector, parameter v is used to tradeoff the sphere

volume and the errorsPjSj

i¼1 �i. We explain the above

formulation as follows: In problem (8), the method of

modeling uncertainty renders UOCC less sensitive to the

sample corrupted by noise since we can always determine a

choice of 4xi to render xi þ4xi far from the decision

boundary learned from the standard one-class SVM. As

illustrated in Fig. 5, x1 and x2 are corrupted samples:

assume the dash line is the classifier learned from the

standard one-class SVM, after incorporating x1 þ4x1 and

x2 þ4x2 into the learning phase, we obtain the refined

boundary denoted by a solid line. In this case, the

hyperplane learned from problem (8), which is denoted as

the refined boundary in Fig. 5, is less sensitive to uncertain

data compared with that derived from the original problem

(2), which is denoted as the original boundary in Fig. 5. For

the problem (8), we will iteratively optimize � and 4xi toobtain the classifier in the following section.

Solution to uncertain one-class classifier. To solve theproblem (8), we use an iterative approach to calculate �

and 4xi to obtain the uncertain one-class classifier: fixeach 4xi and solve problem (8) to obtain �, and then fix� to calculate 4xi iteratively. We detail the alternativetwo steps as follows: (Please refer to Appendix section forthe detailed derivation of the lemmas and theorems inthis section)

First of all, we fix each 4 xi as a small value such thatk 4xi k< �i, we then resolve optimization problem (8) andhave the following Lemma.

Lemma 1. If fix each 4xi as a small value such that

k 4xi k< �i, the solution of problem (8) over w and � is

equivalent to

min1

2k w k2 ��þ 1

v � jPDcjXjPDcj

i¼1

�i

s:t: w � �ðxi þ4xiÞ � �� �i�i � 0;

i ¼ 1; 2; . . . ; jPDcj:

ð9Þ

This lemma indicates that we can release the constraint k4xi k� �i in problem (8) to simplify the optimization if k4xi k< �i holds. In the iterative approach to obtain theuncertain classifier, we initialize 4xi as zero vector.

Second, we need to resolve optimization problem (9),and have Lemma 2.

Lemma 2. By using Lagrangian function [54], the solution of

the optimization problem (9) is to resolve the following dual

problem:

F ð�Þ ¼ min 1

2

XjPDcj

i¼1

XjPDcj

j¼1

�i � �jKðxi þ4xi;xj þ4xjÞ

s:t: 0 � �j �1

v � jPDcjXjPDcj

i¼j�j ¼ 1 i ¼ 1; 2; . . . ; jPDcj;

ð10Þ

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 473

Fig. 4. (A) Illustration of four nearest neighbors’ range (inner rectangle)and 12 nearest neighbors’ range (outer rectangle) of instance xi.(B) Reachability area of instance xi in terms of four nearest neighbors(inner circle) and 12 nearest neighbors (outer circle).

Fig. 5. Decision boundaries of the standard OC-SVM and UOCCmethods.

Page 7: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

and

w ¼XjPDcj

j¼1

�j � �ðxj þ4xjÞ; ð11Þ

� ¼ w � �ðxÞ; ð12Þ

where �j is the Lagrange multipliers. This lemma indicatesthat the solution of problem (10) is a standard dual problem bytreating xi þ4xi as a new instance, i.e, x0 ¼ xi þ4xi.

After set 4xi as a small value, and resolve optimizationproblem (10), we obtain � and w. The next step is to fixobtained � and w to calculate new4xi. We have Theorem 1as follows.

Lemma 3. If a hyperplane is given which is denoted as� ¼ w � �ðxÞ, the solution of problem (8) over 4xi is

4xi ¼ �iv

k v k ; ð13Þ

where v ¼PPDc

j¼1 �j �K0ðxi;xj þ4xjÞ. This theorem indi-cates that, for a given w, the minimization of problem (8) over4xi is quite straightforward.

After that, we have one round of alternation andcontinue to update � and 4xi iteratively. We then haveTheorem 1 as follows.

Theorem 1. If optimal 4xi ¼ �iv= k v k ði ¼ 1; 2; . . . ; jPDcjÞis fixed, the solution of problem (8) over w and � is equivalentto optimization problem (9).

This theorem indicates that we can release the constraintk 4xi k� �i in problem (8) to simplify the optimization.

So far, we have introduced the alternative approach toupdate � and 4xi at a round, we can use the above steps toobtain an uncertain one-class classifier. By referring to thealternating optimization method in [22], we propose theusage of the iterative approach to resolve problem (8) inAlgorithm 1.

Algorithm 1. Uncertain one-class classifier construction

Input: PDc; // positive set in current chunk Dc

........... v. // parameter of one-class SVM.

........... �i // bound value for each sample.

Output: �.

1: Initialize each 4xi ¼ 0;

2: t ¼ 0;3: Initialize FvaðtÞ ¼ 1;

4: repeat

5: t ¼ tþ 1;

6: Fix 4xi for i ¼ 1; 2; . . . ; jPDcj and solve problem

(10);

7: Let FvaðtÞ ¼ F ð�Þ;8: Obtain �i; i ¼ 1; 2; . . . ; jPDcj;9: Obtain w ¼

PPDc

i¼1 �i � �ðxi þ4xiÞ and hyperplane� ¼ w � �ðxÞ;

10: Fix w and resolve optimization problem (8) to

update each 4xi according to (13);

11: until jFvaðtÞ � Fvaðt� 1Þj < "jFvaðt� 1Þj12: Return � ¼ w � �ðxÞ.

In Algorithm 1, " is a threshold. Since the value of FvalðtÞis nonnegative, with the decreasing of FvalðtÞ, jFvalðtÞ �Fvalðt� 1Þj=jFvalðt� 1Þj will be smaller than a threshold.Thus, Algorithm 1 can converge in finite steps.

After that, we obtain the learned uncertain one-classclassifier fc ¼ w � �ðxÞ for prediction and the subsequentconcept summarization learning.

Computation complexity analysis. The computation com-plexity of problem (10) is OðjPDcjÞ2. The update of xi in (13)just needs linear time, that is OðjPDcjÞ. Suppose theiterative approach stops after m times iterations. Thecomputation complexity of resolving problems (8) ism �OðjPDcjÞ2 þm �OðjPDcjÞ.

4.1.3 Ensemble Classifiers

After building uncertain one-class classifiers on the currentand historical chunks, we obtain fc�lþ1; fc�l; . . . ; fc. To copewith concept drift in data streams, we combine these kclassifiers to form an ensemble classifier fE to predictinstances in the target chunk. By referring to the method in[55], we assign a weight value gi to each individual classifierfor the ensemble classifier fE .

4.1.4 Discussion

To handle uncertain data, we propose kernel k-nearestneighbors method to obtain bound score and formulate theoptimization problem (8). We then simplify (8) into analternative framework: fix 4xi to resolve the standard one-class SVM (9) and update4xi based on (13) alternatively. Insuch way, we deliver a solution to let one-class SVM handleuncertain data.

In the standard one-class SVM (2), although the choice ofparameter v can refine the decision boundary of one-classclassifier, there is no explicit way how to select v foruncertain data. The modification of v in (2) and the solutionof (8) are different ways to refine the decision boundary.However, the optimization problem (8) is particularlydesigned for uncertain data.

4.2 Concept Summarization Learning

In data streams learning, it is necessary to know theconcepts and their relations of the user from history chunks.In this section, we will put forward our support vector-based clustering method for concept summarization learn-ing from data streams.

Intuitively, we can regard the data streams as a wholeand conduct clustering algorithms on the stream, and eachcluster denotes one concept of the use. After that, we cansummarize the concept of the user by investigating whichchunks share the same concept of the user. However, thismay take too much time for learning on the whole datastreams, and data stream learning is always requiring one-scanning of the data streams without referring to historicaldata. Another approach [59] uses feature-based clustering(FBC) technique to summarize concept of the user. It firstextracts features from a data chunk and considers thischunk as a virtual sample represented by the extractedfeatures, thus, the whole data streams is presented by avirtual sample set, in which each virtual sample representsone data chunk. In this process, it transforms the originalfeature values in each chunk into some histogram format,such that each chunk can be represented using histogram

474 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Page 8: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

features. In the second step, it then conducts clustering-based technique on the generated virtual sample set tocluster the virtual samples into several clusters, and eachcluster denotes one concept. By referring to chunk informa-tion, we can summarize the concept of the user.

However, considering each chunk as a virtual sample byusing histogram features will discard much data informa-tion of the chunk which may lead to the low performance ofthe subsequent clustering results. To cope with thischallenge, we propose a support vectors-based clusteringfor concept summarization learning by representing eachdata chunk using the support vectors of the uncertain one-class classifier. This operation is based on the characteristicof the one-class SVM that support vectors always describethe structure of the data distribution. In all, our approachworks in two steps, as illustrated in the part two of Fig. 3.

1. In the first step, represent each data chunk using thesupport vectors of the uncertain one-class classifierbuilt on the corresponding data chunk.

2. In the second step, cluster the history chunksrepresented by support vectors into clusters andeach cluster denotes one concept and summarize theconcept of the user.

In the following, we exhibit the two steps in detail.

4.2.1 Represent Chunk Data Using Support Vectors

from Uncertain One-Class Classifiers

As discussed in Section 3.1, the one-class classifier is onlydecided by support vectors and SVs can describe the datadistribution of the target class, as illustrated in Fig. 2a. In thestream learning, it always requires one-scanning of the datastreams without referring to historical data. In this case, wecan utilize the support vectors of the one-class classifier torepresent the corresponding data chunk. Since we cancollect support vectors from the one-class classifier, and wehave to store the constructed one-class classifier forprediction, thus, we do not need additional storage to storethe chunk data.

For chunk Dc, the constructed uncertain one-classclassifier in the second step of uncertain one-class learningis fc, we collect the support vectors from fc and put theminto set Sc. Thus, we represent the chunk data by supportvectors derived from the uncertain one-class classifier.

4.2.2 Extended K-Mean Clustering Technique to

Determine Concept over the History Chunk

In this step, we extend the k-means clustering method tocluster the history chunks represented by support vectorsinto clusters and each cluster denotes one concept ofthe user.

After collect support vectors of uncertain one-classclassifier, we obtain

Sm; Smþ1; . . . ; Sn; ð14Þ

where m and n are the chunk number, which means wewant to summarize concept from chunk m to chunk n. Weextend the original k-means clustering technique onthe chunk data represented by support vectors. We considerthe support vectors in each Si as an entity and extend the k-means clustering to separate the Si; i ¼ m; . . . ; n into severalclusters. Assume there will be k clusters, and each cluster is

denoted by �i, the extended k-means is to minimize thefollowing objective function:

Min D�f�tgkt¼1

�¼Xkt¼1

XSi2�t

Xxj2Sikxj � otk2; ð15Þ

and

ot ¼P

Si2�tP

xj2Si xj

j�tj; ð16Þ

where ot denotes the center of cluster �t. It is noted that thesolution of optimization problem (15) is similar with theoriginal k-means method, as illustrated in Algorithm 2.After obtain kclusters, we can regard there exist kconceptsof the users, which is numbered from one to k. We theninvestigate which chunk share the same clusters tosummarize the concept of the user from the data streams.

Algorithm 2. Extended k-means clustering method.

Input: Si; i ¼ 1; 2; . . . ; n�mþ 1: support vectors set........... k: number of the clusters;

........... f�ð0Þt gkt¼1: initial clusters

Output: f�tgkt¼1: final clusters

1: Initialize the k clusters: f�ð0Þ1 g; . . . ; f�ð0Þk g;2: for each cluster t, compute the center of it according to

(16);

3: For each Si entity and each cluster t, compute

dðSi;otÞ ¼ 1jSij kxj � otk;xj 2 Si; ð17Þ

4: Assign Si to cluster c which satisfies

dðSi;ocÞ � dðSi; otÞ; t ¼ 1; . . . ; k ð18Þ

5: Repeat from step 2 to step 4 until algorithm stops, i.e.,

the clusters do not change or run a fixed number ofround, and output final clusters f�tgkt¼1.

Discussion. In FBC method [59], each chunk is representedwith a virtual sample by transforming the original featurevalues in each chunk into some histogram format. In ourmethod, we utilize the characteristic of the one-class SVMthat the support vectors can describe the data distributionof the target class and use support vectors to represent thechunk data. In addition, our method do not need additionalstorage to store the converted virtual samples. In theexperiments, we will discover that, SVBC outperformsFBC method in concept summarization learning.

5 PERFORMANCE EVALUATION

5.1 Baseline and Metrics

In this section, we will investigate the performance of ourproposed UOLCS framework in data stream environment,i.e., uncertain one-class learning (UOCL) and supportvectors-based clustering method for concept summariza-tion learning.

. For the investigation of uncertain one-class learning,another two methods are used as baselines. The firstbaseline is the original one-class SVM, since it wasdeveloped for static data, we embed it in the data

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 475

Page 9: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

stream environment and use an ensemble classifierto predict yet-to-come data. Since the UOCL isdeveloped based on the standard one-class SVM, thefirst baseline is utilized to show the improvement ofour uncertain one-class SVM over the original one-class SVM in handling uncertain data. The secondbaseline is the incremental one-class SVM (IOC-SVM) [31], which builds an online classifier forprediction. This baseline is used to compare theperformance of ensemble classifier and incrementalone-class classifier.

. For the comparison of concept summarizationlearning, feature-based clustering [59] is used asbaseline, which is state-of-the-art technique tosummarize concept from the history chunks. Ittransforms each chunk into a virtual sample andconducts k-means techniques on the virtual sampleset for concept summarization.

For uncertain one-class learning, since the performanceof classification systems is typically evaluated in terms ofF-measure [56], we use it as metrics. The F measure tradesoff precision p and recall r. Based on the confusion matrixshown in Table 1, the two metrics precision and recall canbe defined as follows:

Precision ¼ TP

TP þ FP ; ð19Þ

Recall ¼ TP

TP þ FN : ð20Þ

The F measure is defined

F ¼ 2 � Precision �RecallPrecisionþRecall : ð21Þ

From the definition, we know only when both precisionand recall are large, will the F-measure exhibit large value.For the theoretical basis of F-measure, please refer to [56]for detail.

For concept summarization learning, the same as feature-based clustering method [59], we use accuracy as baseline tosummarize the concept of the user.

All the experiments are on a laptop with a 2.8-GHzprocessor and 3-GB DRAM. For both UOCL and one-classSVM, we use Lib-SVM1 to resolve the standard QP problem.RBF kernel function (1) is used in the experiment since it isthe most common kernel function.

5.2 Stream Data Description

In the experiments, we use four real-world data streamdata sets which have been previously studied by other

researchers for data stream learning [5], [60], [59], [4]. The

basic information of the used data steam data sets is

introduced as follows:

1. KDD-992: This data set was collected from the KDDCUP challenge in 1999, and the task is to buildpredictive models capable of distinguishing betweenpossible intrusions and normal connections. Theoriginal data (10 percent sampling) contain 41 fea-tures, 494,020 training and 311,029 test samples, andover 24 intrusion types. KDD data has timestampinformation associated with each record, so convert-ing into data stream is rather straightforward.In these types, there exist three large size classes,that are Normal, Neptune, and Smurf classes. Thethree classes dominate the whole data set. Weuse the three large size classes to generate a one-class data streams.

2. Sensor3: This data stream contains information(temperature, humidity, light, and sensor voltage)collected from 54 sensors deployed in Intel BerkeleyResearch Lab, as shown in Fig. 6. The whole streamcontains consecutive information recorded over a2 months period. The same as previous work [60],we use sensor from four regions which are coveredby ellipses, respectively, in Fig. 6. The stream hasfour classes, in which each region data denotes oneclass, with 1,051,229 samples, three features.

3. Forest CoverType4: This data set contains 581,012observations and each observation consists of 54 at-tributes, including 10 quantitative variables, fourbinary wilderness areas and 40 binary soil typevariables. Seven classes exist: “Spruce-Fir,” “Lodge-pole Pine,” “Ponderosa Pine,” “Cottonwood/Willow,” “Aspen,” “Douglas-fir,” and “Krumm-holz.” We use the seven classes to generate one-class data streams.

4. Powersupply5: The data set contains hourly powersupply of an electricity company from two sources:power supply from the main grid and powertransformed from other grids. The learning task of

476 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

TABLE 1Confusion Matrix

Fig. 6. The sensor distribution map deployed in Intel Berkeley ResearchLab, where each number denotes a sensor location. The four ellipseswhich cover the sensors denotes the used sensor regions.

1. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

2. Available at http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

3. Available at http://db.csail.mit.edu/labdata/labdata.html.4. Available at http://archive.ics.uci.edu/ml/data sets/Covertype.5. Available at http://www.cse.fau.edu/xqzhu/stream.html.

Page 10: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

this stream is to predict which hours (one out of the12 periods from (0,2], (2,4], (22,24]) the currentpower supply belongs to. The whole stream contains29,928 samples, each of which has four dimensions.

5.3 Experiment Setting

5.3.1 One-Class Data Streams

For each data set, we first randomly choose one class, andregard it as target class and treat the other categories asthe nontarget class. Take the CoverType data set forexample: if we choose the “Spruce-Fir” category as thetarget class, then the nontarget class consists of the“Lodgepole Pine,” “Ponderosa Pine,” “Cottonwood Wil-low,” “Aspen,” “Douglas-fir,” and “Krummholz.” Thiskind of operation has been widely used in one-classlearning in static data [45], [51], [37], [36], [57], [17] andstream data [34], [60], [59].

One characteristic of data streams is concept drift, twoscenarios, i.e., regular concept shifting (RCS) [60], [59] andprobability concept shifting (PCS) [60], [59] have beeninvolved into data streams.

. In the regular shifting model (RSM) [59], a userregularly shifts the interest from one class to anotherclass after a fixed number of chunks. We set the fixednumber of chunks as 5 in the experiments.

. In the probability shifting model (PSM) [59], a userchanges the interest with a probability, if theprobability is larger than a threshold, the userchanges the target class from one class into another.We set the threshold as 0.5 in the experiments.

Using the above operation, we obtain two data streams foreach data set, called RCS-based, and PCS-based datastreams. For fair comparison, both UOCL and one-classSVM use the same number of classifiers to form theensembles. We omit the details of the algorithm perfor-mance with respect to the number of ensemble classifiers (l),because the impact of this factor has been addressed moreor less in the stream data mining literature [55], [48], [19].Instead, we use ensemble size l ¼ 10 for all streams.

For each chunk Dc, we randomly label a majority oftarget class examples, i.e., approximately 90 percent targetexamples, and consider the remaining target class examplesand nontarget class examples as unlabeled examples. Thenumber of Dk

c in the bound score determination step is setas jPDcj=10, and " is set as 0.15. The � in RBF kernelfunction (1) is ranged from 2�10 to 210 to obtain theoptimized classifier.

5.3.2 Uncertain Information Generation

We note that the above data streams are deterministic, sowe need to model and involve uncertainty to these datasets. In the following, we introduce noise addition to PDc;the same operation can be used on other chunks.

Following the method in the previous work [5], wegenerate the noise using a Gaussian distribution with zeromean and a standard deviation whose parameter waschosen as follows: For each chunk of data streams, we firstcompute the standard deviation �0

i of the entire data in PDc

along the ith dimension, and then obtain the standarddeviation of the Gaussian noise �i randomly from the range½0; 2 � � �0

i �.

Specifically, the standard deviation �0i of the entire

data along the ith dimension is first obtained. To modelthe difference in noise on different dimensions, we define thestandard deviation �i along the ith dimension, whose valueis randomly drawn from the range ½0; 2 � � �0

i �. Then, forthe ith dimension, we add noise from a random distributionwith standard deviation �i. In this way, a data example xj inthe target class is added with the noise, which can bepresented as a vector

�xj ¼��

xj1 ; �

xj2 ; . . . ; �

xjr�1; �

xjr

�: ð22Þ

Here, r denotes the number of dimensions for a dataexample xj, and �

xji , i ¼ 1; . . . r represents the noise added

into the ith dimension of the data example.

5.4 Comparison of Uncertain One-Class Learningand One-Class SVM

We investigate the performance, performance sensitivityand efficiency of uncertain one-class learning, one-classSVM and incremental one-class SVM as follows.

5.4.1 Performance Comparison with Different Chunk

Sizes

We first investigate the performance of UOCL, one-classSVM and incremental one-class SVM in terms of RCS-based, and PCS-based uncertain data streams with differentchunk sizes from 300 to 1,400, in which the noise level isset to 0.5. The average F-measure value of forty chunks indata streams is reported in Fig. 7, in which x-axis illustratesthe chunk size of the data streams, and y-axis denotes theaverage F-measure value for each method.

It is clear that in this case, UOCL provides superiorperformance compared with one-class SVM and incremen-tal one-class SVM in terms of RCS-based, and PCS-baseduncertain data streams. This occurs because UOCL, devel-oped on top of the standard one-class SVM, can reduce theeffect of the noise on the decision boundary such that wecan build a more accurate classifier on the uncertain datastreams; while one-class SVM and uncertain one-class SVMwill not handle the uncertain information in the data, theyperform worse than UOCL. In addition, we find that,ensemble one-class SVM slightly performs better thanincremental one-class SVM, since ensemble classifier canwell handle concept drift, which have been investigated inthe previous work [19], [55]. On the other hand, incrementalone-class SVM adds the recently coming data into themodel to modify the decision boundary. However, thestreaming data are not generated by a stationary stochasticprocess, indeed, the future examples we need to classifymay have a very different distribution from the historicaldata [55]. As a result, ensemble classifier performs betterthan the incremental classifier. In addition, we discover, asthe chunk size increases, the performance of three methodsimprove simultaneously. This is because, larger chunk sizewill offer more data information of the data streams andtherefore delivers a more accurate classifier.

Above, we have illustrated that, the average F-measurevalue of UOCL over 40 chunks is higher than those of one-class SVM and incremental one-SVM. Taking the KDD-RCSand KDD-PCS data streams as examples, we illustrate theF-measure of each data chunk in Fig. 8, in which the chunk

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 477

Page 11: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

size is fixed as 1,000. We find that, UOCL performs betterthan one-class SVM and incremental one-SVM for each datachunk, since UOCL can reduce the effect of the noise on thedecision boundary. In addition, for the KDD, Sensor,CoverType and Powersupply data streams, we find thesame phenomenon.

5.4.2 Performance Sensitivity on Different Noise Levels

We investigate the performance sensitivity of three methodson different noise level, in which the chunk size is set 1,000.In the RCS-based and PCS-based uncertain data streams,we increase the noise level from 0.5 to 2. In Fig. 9, weillustrate the variation in effectiveness with increasingnoise error. On the x-axis, we illustrate the noise level. Onthe y-axis, we illustrate the average F-measure value of fortychunks. It is clear that in each case, the F-measure valuereduces with the increasing noise level of the underlyingdata streams. This occurs because when the level of noiseincreases, the target class potentially becomes less distin-guishable from the nontarget class. However, we can clearlysee that, our UOCL approach can still consistently yieldhigher F-measure value than one-class SVM and incre-mental one-class SVM. This indicates that, UOCL methodcan reduce the effect of noise.

We further discover that as the noise level increases,UOCL decreases more slowly than one-class SVM andincremental one-class SVM. This is because UOCL has theability to reduce the effect of noise on the decision boundaryby incorporating the uncertainty information into thelearning phase. Consequently, UOCL is demonstrated tobe robust to noise compared with the one-class SVM andincremental one-class SVM.

5.4.3 Efficiency Comparison

So far we have investigated the performance and sensitivityof UOCL, one-class SVM and incremental one-class SVMmethods to noise level. However, it is still interesting toknow the efficiency of the three methods.

To test the efficiency of the UOCL method, we refer tothe method in [5] and perform UOCL, one-class SVM and

incremental one-class SVM on PCS-based uncertain data

streams in which noise level is set 0.5 and chunk size is set

1,000. Fig. 10 illustrates the efficiency of UOCL and one-

class SVM on the four data sets. On the x-axis, we illustrate

the progression of the data streams in terms of the number

of samples, whereas on the y-axis, we illustrate the number

of samples processed per second at particular points of the

data streams progression. This number is computed by

using the average number of samples processed per second.From these subfigures, we find that the original one-class

SVM always obtains higher processing speed because it

478 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Fig. 8. The F-measure performance of three methods on each datachunk.

Fig. 7. The performance of UOCL and one-class SVM methods at different number of chunk size.

Page 12: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

does not mitigate the effect of uncertain data on theclassifier construction. Consequently, one-class SVM offershigher processing speed yet inferior performance to UOCL.We further discover that, although UOCL uses an inter-active framework to address data uncertainty, the UOCLmethod is able to process thousands of samples per secondin each case. Thus, the UOCL method is not only effective,but is also an efficient one-class method for uncertain datastreams. In addition, incremental one-class SVM has slowerprocessing speed than the standard one-class SVM, sincestandard one-class SVM only solves one-class SVM classi-fier on each data chunk, while incremental one-class SVMincreases the chunk data to the classifier model; conse-quently, ensemble one-class SVM performs much fasterthan the incremental one-class SVM.

5.5 Comparison of Concept SummarizationLearning

In this section, we first investigate the performance of themodified k-means clustering method and then comparethe performance of support vectors-based clusteringtechnique and feature-based clustering method for conceptsummarization.

5.5.1 Performance of Modified K-Means Clustering

Method

In SVBC, we put forward the modified k-means clustering

method to assign the data into clusters. We first investigate

the performance of the modified k-mean clustering by using

five UCI data sets. The basic information of the UCI data

sets is listed in Table 2.The modified k-mean clustering method assigns a group

of data to a class as a whole. We set the experiments as

follows: For each class of a data set, we randomly assign the

class data into ten groups, we then obtain 10 � C groups of

data, in which C is the number of classes. We then perform

the modified k-means clustering method to assign a group

of data into one of several clusters, in which we set the

number of the clusters as the number of classes. Fig. 11

illustrates the accuracy of the proposed modified k-mean

clustering method. We find that, the proposed method can

well assign a group of data into a cluster and obtains a

comparable high accuracy.

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 479

Fig. 10. Efficiency comparison of UOCL and one-class SVM methods, inwhich ¼ 0:5.

Fig. 9. The performance of UOCL and one-class SVM methods as increase noise levels.

TABLE 2Information of UCI Data Sets

Fig. 11. Accuracy of modified k-means clustering method.

Page 13: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

5.5.2 Performance Comparison for Concept

Summarization

So far, we have investigated the performance of modifiedk-mean clustering method, we continue to compare theperformance of support vectors-based clustering techniqueand feature-based clustering method for concept summar-ization. Although SVBC and FBC both perform k-meansmethod to cluster chunk data into clusters. However, FBCmethod converts each data chunk into a virtual sample,while our SVBC utilizes the support vectors of one-classclassifier to represent the chunk data. We use the PCS-based data streams in this set of experiments, andsummarize concept of the users among one hundred ofdata chunks. In terms of the number of k in k-meansclustering, the same as FBC, it assumes we know thenumber of the concepts of the users. The comparison ofconcept summarization is illustrated in Fig. 12.

In Fig. 12, we discover that support vectors-basedclustering performs significantly better than feature-basedclustering. This is because, FBC method converts eachdata chunk into a virtual sample by transforming theoriginal feature values in each chunk into some histogramformat, thus it loses much data information of the chunkdata. However, SVBC utilizes the support vectors of one-class classifier, which can describe the data distribution ofthe target class, to represent the chunk data. As a result,SBVC significantly outperforms FBC method in conceptsummarization learning. For each data stream, we alsoillustrate the prediction accuracy for each concept in Fig. 14.It is noted that the SVBC method always outperforms FBCmethod for each concept prediction.

In addition, taking KDD-based one-class streams as anexample, we utilize Fig. 13 to illustrate the concept drift of

the user from 30 chunk out of the one hundred history

chunks by SVBC method. In the figure, if the concept is mis-

summarized, we illustrate the real concept using the pink

square on the figure.

6 CONCLUSION AND FUTURE WORK

In this paper, we propose a new framework for one-classlearning and concept summarization learning on one-class-based uncertain data streams. Our proposed frameworkconsists of two parts. First, we generate bound score tocapture the local uncertainty based on each example’s localdata behavior, and then build an uncertain one-classclassifier by incorporating the uncertainty information intoan one-class SVM-based learning framework. Second, wedevelop support vectors-based clustering method to sum-marize the concept of the user over the history chunks.Extensive experiments have shown that our uncertain one-class learning can obtain a better performance and is lesssensitive to noise in comparison with the standard one-classSVM. The experiments also indicate that the supportvectors-based clustering method can well summarize theconcept of the user in comparison with feature-basedclustering technique for concept summarization learning.

In this paper, we summarize the discrete jumps of

concepts in the data streams environments. In the future,

we would like to summarize the regular concept drift in

data streams and show the transition region of concept. In

addition, we would like to investigate how to design better

480 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Fig. 12. Comparison of concept summarization learning betweenfeature-based clustering and support vectors-based clustering methods.

Fig. 13. Illustration of concept drift of the user over 30 chunks of the data streams.

Fig. 14. Prediction accuracy for each concept in the data streams.

Page 14: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

methods to generate bound scores based on the datacharacteristics in a given application domain.

APPENDIX

In this section, we will present the detailed derivation of the

Lemmas and Theorems in this paper.

Proof of Lemma 1

If fix each 4xi as a small value and k 4xi k< �i, theconstraint k 4xi k� �i in problem (8) will not have anyimpact on this optimization problem, since 4xi is alreadyless than �i. Thus, we can delete the constraint k 4xi k� �ifrom problem (8), and then use the optimization problem(9) to replace optimization (8).

tu

Proof of Lemma 2

Let x0i ¼ xi þ4xi, and introduce multipliers � and intooptimization problem (9), we then have the followingLagrangian function:

L ¼ 1

2k w k2 ��þ 1

v � jPDcjXjPDcj

i¼1

�i �XjPDcj

i¼1

�i�w � �

�x0i�

� �þ i�i��XjPDcj

i¼1

i�i;

ð23Þ

let dLdw ¼ 0; dL

d� ¼ 0 and dLd�i¼ 0, we have

w ¼XjPDcj

i¼1

�i � ��x0i�

ð24Þ

XjPDcj

i¼1

�i ¼ 1; ð25Þ

�i þ i ¼1

v � jPDcj: ð26Þ

Since i � 0, we have

0 � �i �1

v � jPDcj: ð27Þ

By substituting (24) and (25) into optimization problem (9),

we have

min1

2

XjPDcj

i¼1

XjPDcj

j¼1

�i � �jK�x0i;x

0j

�;

subject to (25) and (27).Substitute x0i with xi þ4xi in the above formulas, we

have Lemma 2.tu

Proof of Lemma 3

If the hyperplane is fixed in problem (8), that is

w � �ðxÞ ¼XPDc

j¼1

�j �Kðxj þ4xj;xÞ; ð29Þ

where w ¼PPDc

j¼1 �j � �ðxj þ4xjÞ.

The optimization of problem (8) over 4xi equals to

minimization ofPjPDcj

i¼1 �i over each 4xi.We assume each noise vector4xi just corrupts sample xi

and will not affects other instances. Consequently,4xi only

have impact on �i. The optimization ofPjPDcj

i¼1 �i can be

divided to minimize each �i; i ¼ 1; 2; . . . ; jPDcj.For minimization of each �i, from the first constraint of

problem (8), we have

�i ¼ max 0; ��XPDc

j¼1

�j �Kðxi þ4xi;xj þ4xj

!: ð30Þ

Using Taylor expansion [40], we have

XPDc

j¼1

�j �Kðxi þ4xi;xj þ4xjÞ

¼XPDc

j¼1

�j �Kðxi;xj þ4xjÞ þ 4xiXPDc

j¼1

�j

�K0ðxi;xj þ4xjÞ:

ð31Þ

Let v ¼PPDc

j¼1 �j �K0ðxi;xj þ4xjÞ, thus (30) equals to

�i ¼ maxð0; ��w � �ðxiÞ � v � 4xiÞ: ð32Þ

By using the Cauchy-Schwarz inequality [15], we have

jv � 4xij �k v k � k 4xi k : ð33Þ

The equality holds only if 4xi ¼ cv, where c is a numerical

value. We then have

� k v k � k 4xi k� v � 4xi �k v k � k 4xi k : ð34Þ

Since

k 4xi k� �i; ð35Þ

we then have optimal

4xi ¼ �iv

k v k : ð36Þ

tu

Proof of Theorem 1

In the optimization problem (8), the constraint k 4xi k� �iis used to bound the range of noise vector 4xi. In other

words, for a given noise vector 4xi, if k 4xi k is already

less than or equal to �i, i.e., k 4xi k� �i holds, the constraint

k 4xi k� �i in problem (8) is useless and won’t have any

impact on the optimization problem (8).For an optimal 4xi

4xi ¼ �iv

k v k : ð37Þ

Since

k 4xi k¼ �ik v kk v k ¼ �i: ð38Þ

The constraint k 4xi k� �i in problem (8) will not have any

impact on this optimization problem.

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 481

Page 15: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

We can delete the constraint k 4xi k� �i from problem(8), and then use the optimization problem (9) to calculatethe updated �.

tu

ACKNOWLEDGMENTS

Yanshan Xiao is the corresponding author for this paper.The authors would like to thank the anonymousreviewers for their very useful comments and suggestions.This work is supported in part by the US National ScienceFoundation through grants IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, and OISE-1129076, US Departmentof Army through grant W911NF-12-1-0066, Google Mobile2014 Program and KAU grant, Natural Science Founda-tion of China (61070033, 61203280, 61202270), GuangdongNatural Science Funds for Distinguished Young Scholar(S2013050014133), Natural Science Foundation of Guang-dong province (9251009001000005, S2011040004187,S2012040007078), Specialized Research Fund for theDoctoral Program of Higher Education (20124420120004),Scientific Research Foundation for the Returned OverseasChinese Scholars, State Education Ministry, OverseasOutstanding Doctoral Fund (405120095), Australian Re-search Counci l Discovery Grant (DP1096218 ,DP130102691), and ARC Linkage Grant (LP100200774,LP120100566).

REFERENCES

[1] C.C. Aggarwal, “On Density Based Transforms for Uncertain DataMining,” Proc. IEEE Int’l Conf. Data Eng., pp. 866-875, 2007.

[2] C.C. Aggarwal, Managing and Mining Uncertain Data. Springer,2009.

[3] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A Framework forClustering Evolving Data Streams,” Proc. Int’l Conf. Very LargeData Bases, pp. 81-92, 2003.

[4] C.C. Aggarwal, Y. Xie, and P.S. Yu, “On Dynamic Data-DrivenSelection of Sensor Streams,” Proc. 17th ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining (KDD), pp. 1226-1234, 2011.

[5] C.C. Aggarwal and P.S. Yu, “A Framework for ClusteringUncertain Data Streams,” Proc. IEEE Int’l Conf. Data Eng.,pp. 150-159, 2008.

[6] C.C. Aggarwal and P.S. Yu, “Outlier Detection with UncertainData,” Proc. SIAM Conf. Data Mining, pp. 483-493, 2008.

[7] C.C. Aggarwal and P.S. Yu, “A Survey of Uncertain DataAlgorithms and Applications,” IEEE Trans. Knowledge and DataEng., vol. 21, no. 5, pp. 609-623, May. 2009.

[8] J. Bi and T. Zhang, “Support Vector Machines with Input DataUncertainty,” Proc. Conf. Neural Information Processing Systems(NIPS), 2004.

[9] F. Bonchi, M.V. Leeuwen, and A. Ukkonen, “CharacterizingUncertain Data Using Compression,” Proc. SIAM Conf. DataMining, pp. 534-545, 2011.

[10] F. Bovoloa, G. Camps-Vallsb, and L. Bruzzonea, “A SupportVector Domain Method for Change Detection in MultitemporalImages,” Pattern Recognition Letters, vol. 31, no. 10, pp. 1148-1154,2010.

[11] G. Cauwenberghs and T. Poggio, “Incremental and DecrementalSupport Vector Machine Learning,” Proc. Conf. Neural InformationProcessing Systems (NIPS), pp. 409-415, 2001.

[12] L. Chen and C. Wang, “Continuous Subgraph Pattern Search overCertain and Uncertain Graph Streams,” IEEE Trans. Knowledge andData Eng., vol. 22, no. 8, pp. 1093-1109, Aug. 2010.

[13] C.K. Chui and B. Kao, “A Decremental Approach for MiningFrequent Itemsets from Uncertain Data,” Proc. The Pacific-AsiaConf. Knowledge Discovery and Data Mining, pp. 64-75, 2008.

[14] P. Domingos and G. Hultena, “Mining High-Speed Data Streams,”Proc. Sixth ACM SIGKDD Int’l Conf. Knowledge Discovery and DataMining, pp. 71-80, 2000.

[15] S.S. Dragomir, “A Survey on Cauchy-Bunyakovsky-Schwarz TypeDiscrete Inequalities,” J. Inequalities in Pure and Applied Math.,vol. 4, no. 3, p. 142, 2003.

[16] M. Ester, H.P. Kriegel, J. Sander, and X. Xu, “A Density-BasedAlgorithm for Discovering Clusters in Large Spatial Databaseswith Noise,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discoveryand Data Mining, pp. 226-231, 1996.

[17] G.P.C. Fung, J.X. Yu, H. Lu, and P.S. Yu, “Text Classificationwithout Negative Examples Revisit,” IEEE Trans. Knowledge andData Eng., vol. 18, no. 6, pp. 6-20, Jan. 2006.

[18] C. Gao and J. Wang, “Direct Mining of Discriminative Patterns forClassifying Uncertain Data,” Proc. Sixth ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining, pp. 861-870, 2010.

[19] J. Gao, W. Fan, J. Han, and P.S. Yu, “A General Framework forMining Concept-Drifting Data Streams with Skewed Distribu-tions,” Proc. SIAM Conf. Data Mining, 2007.

[20] B. Geng, L. Yang, C. Xu, and X. Hua, “Ranking Model Adaptationfor Domain-Specific Search,” IEEE Trans. Knowledge and Data Eng.,vol. 24, no. 4, pp. 745-758, Apr. 2012.

[21] S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama, and T. Kanamori,“Statistical Outlier Detection Using Direct Density Ratio Estima-tion,” Knowledge and Information Systems, vol. 26, no. 2, pp. 309-336,2011.

[22] S.V. Huffel and J. Vandewalle, The Total Least Squares Problem:Computational Aspects and Analysis. SIAM Press, 1991.

[23] S.R. Gunn and J. Yang, “Exploiting Uncertain Data in SupportVector Classification,” Proc. 14th Int’l Conf. Knowledge-Based andIntelligent Information and Eng. Systems, pp. 148-155, 2007.

[24] B. Jiang, M. Zhang, and X. Zhang, “OSCAR: One-Class SVM forAccurate Recognition of CIS-Elements,” Bioinformatics, vol. 23,no. 21, pp. 2823-2828, 2007.

[25] R. Jin, L. Liu, and C.C. Aggarwal, “Discovering Highly ReliableSubgraphs in Uncertain Graphs,” Proc. ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining, pp. 992-1000, 2011.

[26] B. Li, K. Goh, and E. Chang, “Using One-Class and Two-ClassSVMs for Multiclass Image Annotation,” IEEE Trans. Knowledgeand Data Eng., vol. 17, no. 10, pp. 13330-1346, Oct. 2005.

[27] B. Kao, S.D. Lee, F.K.F. Lee, D.W. Cheung, and W. Ho, “ClusteringUncertain Data Using Voronoi Diagrams and R-Tree Index,” IEEETrans. Knowledge and Data Eng., vol. 22, no. 9, pp. 1219-1233, Sept.2010.

[28] M. Koppel and J. Schler, “Authorship Verification as a One-ClassClassification Problem,” Proc. 21st Int’l Conf. Machine Learning(ICML), pp. 62-68, 2004.

[29] H.P. Kriegel and P. Martin, “Density-Based Clustering ofUncertain Data,” Proc. ACM SIGKDD Int’l Conf. KnowledgeDiscovery and Data Mining, pp. 672-677, 2005.

[30] H.P. Kriegel and M. Pfeifle, “Hierarchical Density Based Cluster-ing of Uncertain Data,” Proc. IEEE Int’l Conf. Data Eng., pp. 689-692, 2005.

[31] P. Laskov, C. Gehl, S. Kruger, and K. Muller, “Incremental SupportVector Learning: Analysis, Implementation and Applications,”J. Machine Learning Research, vol. 7, no. 12, pp. 1909-1936, 2006.

[32] J. Li, L. Su, and C. Cheng, “Finding Pre-Images via EvolutionStrategies,” Applied Soft Computing, vol. 11, no. 6, pp. 4183-4194,2011.

[33] X. Li and B. Liu, “Learning to Classify Texts Using Positive andUnlabeled Data,” Proc. Int’l Joint Conf. Artificial Intelligence (IJCAI),pp. 587-592, 2003.

[34] X. Li, P.S. Yu, B. Liu, and S. Ng, “Positive Unlabeled Learning forData Stream Classification,” Proc. SIAM Conf. Data Mining,pp. 257-268, 2009.

[35] X. Lian and L. Chen, “Similarity Join Processing on UncertainData Streams,” IEEE Trans. Knowledge and Data Eng., vol. 23,no. 11, pp. 1718-1734, Nov. 2011.

[36] B. Liu, Y. Dai, X. Li, W.S. Lee, and P.S. Yu, “Building TextClassifiers Using Positive and Unlabeled Examples,” Proc. IEEEConf. Data Mining, pp. 179-186, 2003.

[37] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially SupervisedClassification of Text Documents,” Proc. Int’l Conf. MachineLearning (ICML), pp. 387-394, 2002.

[38] B. Liu, Y. Xiao, L. Cao, and P.S. Yu, “Vote-Based LELC for Positiveand Unlabeled Textual Data Streams,” Proc. IEEE Int’l Conf. DataMining Workshops (ICDM), pp. 951-958, 2010.

[39] L. Manevitz and M. Yousef, “One-Class SVMs for DocumentClassification,” J. Machine Learning Research, vol. 2, pp. 139-154,2002.

482 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014

Page 16: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

[40] K. Morris, Calculus: An Intuitive and Physical Approach. Dover, 1998.[41] R. Murthy, R. Ikeda, and J. Widom, “Making Aggregation Work in

Uncertain and Probabilistic Databases,” IEEE Trans. Knowledge andData Eng., vol. 22, no. 8, pp. 1261-1273, Aug. 2011.

[42] W. Ngai, B. Kao, C. Chui, R. Cheng, M. Chau, and K.Y. Yip,“Efficient Clustering of Uncertaindata,” Proc. IEEE Int’l Conf. DataMining, pp. 436-445, 2006.

[43] R. Perdisci, G. Gu, and W. Lee, “Using an Ensemble of One-ClassSVM Classifiers to Harden Payload-Based Anomaly DetectionSystems,” Proc. IEEE Conf. Data Mining, pp. 488-498, 2006.

[44] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, “WorkingModels for Uncertain Data,” Proc. IEEE Int’l Conf. Data Eng.,pp. 163-174, 2006.

[45] B. Scholkopf, J. Platt, J.S. Taylor, A.J. Smola, and R. Williamson,“Estimating the Support of a High-Dimensional Distribution,”Neural Computation, vol. 13, pp. 1443-1471, 2001.

[46] B. Scholkopf, R.C. Williamson, A. Smola, and J.S. Taylor, “SVEstimation of a Distribution’s Support,” Proc. Conf. NeuralInformation Processing Systems (NIPS), 1999.

[47] P.K. Shivaswamy, C. Bhattacharyya, and A.J. Smola, “SecondOrder Cone Programming Approaches for Handling Missing andUncertain Data,” J. Machine Learning Research, vol. 7, pp. 1283-1314, 2006.

[48] W. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA)for Large-Scale Classification,” Proc. ACM SIGKDD Int’l Conf.Knowledge Discovery and Data Mining, pp. 377-382, 2001.

[49] L. Sun, R. Cheng, D.W. Cheung, and J. Cheng, “Mining UncertainData with Probabilistic Guarantees,” Proc. ACM SIGKDD Int’lConf. Knowledge Discovery and Data Mining, pp. 273-282, 2010.

[50] M. Takruri, S. Rajasegarar, S. Challa, C. Leckie, and M.Palaniswami, “Spatio-Temporal Modelling-Based Drift-AwareWireless Sensor Networks,” Wireless Sensor Systems, vol. 1, no. 2,pp. 110-122, 2011.

[51] D.M.J. Tax and R.P.W. Duin, “Support Vector Data Description,”Machine Learning, vol. 54, no. 1, pp. 45-66, 2004.

[52] L. Trung, T. Dat, N. Phuoc, M. Wanli, and D. Sharma, “MultipleDistribution Data Description Learning Method for NoveltyDetection,” Proc. Int’l Joint Conf. Neural Networks (IJCNN),pp. 2321-2326, 2011.

[53] S. Tsang, B. Kao, K.Y. Yip, W.S. Ho, and S.D. Lee, “Decision Treesfor Uncertain Data,” IEEE Trans. Knowledge and Data Eng., vol. 23,no. 1, pp. 64-78, Jan. 2011.

[54] V. Vapnik, Statistical Learning Theory. Springer-Verlag, 1998.[55] H. Wang, W. Fan, P.S. Yu, and J. Han, “Mining Concept-Drifting

Data Streams Using Ensemble Classifiers,” Proc. ACM SIGKDDInt’l Conf. Knowledge Discovery and Data Mining, pp. 226-235, 2003.

[56] J. William and M. Shaw, “On the Foundation of Evaluation,” Am.Soc. for Information Science, vol. 37, no. 5, pp. 346-348, 1986.

[57] H. Yu, J. Han, and K.C.C. Chang, “PEBL: Web Page Classificationwithout Negative Examples,” IEEE Trans. Knowledge and DataEng., vol. 16, no. 1, pp. 70-81, Jan. 2004.

[58] S.M. Yuen, Y. Tao, X. Xiao, J. Pei, and D. Zhang, “SupersedingNearest Neighbor Search on Uncertain Spatial Databases,” IEEETrans. Knowledge and Data Eng., vol. 22, no. 7, pp. 1041-1055, July2010.

[59] X. Zhu, W. Ding, P.S. Yu, and C. Zhang, “One-Class Learning andConcept Summarization for Data Streams,” Knowledge andInformation Systems, vol. 28, no. 3, pp. 523-553, 2011.

[60] X. Zhu, X. Wu, and C. Zhang, “Vague One-Class Learning forData Streams,” Proc. IEEE Conf. Data Mining, pp. 657-666, 2009.

[61] Z. Zou, H. Gao, and J. Li, “Discovering Frequent Subgraphs overUncertain Graph Databases under Probabilistic Semantics,” Proc.ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining,pp. 633-642, 2010.

Bo Liu is with the Department of Automation,Guangdong University of Technology and theDepartment of Computer Science, University ofIllinois at Chicago. His research interests includemachine learning and data mining. He haspublished papers on IEEE Transactions onNeural Networks, IEEE Transactions on Knowl-edge and Data Engineering, Knowledge andInformation Systems, IJCAI, ICDM, SDM, andCIKM.

Yanshan Xiao is with the Department ofComputer Science, Guangdong University ofTechnology. Her research interests includemulti-instance learning and data mining.

Philip S. Yu received the BS degree in electricalengineering from National Taiwan University,the MS and PhD degrees in electrical engineer-ing from Stanford University, and the MBAdegree from New York University. He is aprofessor in the Department of ComputerScience at the University of Illinois at Chicagoand also holds the Wexler chair in informationtechnology. He spent most of his career at IBMThomas J. Watson Research Center and was a

manager of the Software Tools and Techniques Group. His researchinterests include data mining, privacy preserving data publishing, datastream, Internet applications and technologies, and database systems.He has published more than 710 papers in refereed journals andconferences. He holds or has applied for more than 300 US patents. Heis the editor-in-chief of the ACM Transactions on Knowledge Discoveryfrom Data. He is on the steering committee of the IEEE Conference onData Mining and ACM Conference on Information and KnowledgeManagement and was a member of the IEEE Data Engineering steeringcommittee. He was the editor-in-chief of the IEEE Transactions onKnowledge and Data Engineering (2001-2004). He had also served asan associate editor of ACM Transactions on the Internet Technology(2000-2010) and Knowledge and Information Systems (1998-2004). Inaddition to serving as program committee member on variousconferences, he was the program chair or cochairs of the 2009 IEEEInternational Conference Service-Oriented Computing and Applications,the IEEE Workshop of Scalable Stream Processing Systems (SSPS’07),the IEEE Workshop on Mining Evolving and Streaming Data (2006), the2006 joint conferences of the Eighth IEEE Conference on E-CommerceTechnology (CEC’ 06) and the Third IEEE Conference on EnterpriseComputing, E-Commerce and E-Services (EEE’ 06), the 11th IEEEInternational Conference on Data Engineering, the Sixth Pacific AreaConference on Knowledge Discovery and Data Mining, the Ninth ACMSIGMOD Workshop on Research Issues in Data Mining and KnowledgeDiscovery, the Second IEEE International Workshop on ResearchIssues on Data Engineering: Transaction and Query Processing, thePAKDD Workshop on Knowledge Discovery from Advanced Databases,and the Second IEEE International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as thegeneral chair or cochairs of the 2009 IEEE International Conference onData Mining, the 2009 IEEE International Conference on DataEngineering, the 2006 ACM Conference on Information and KnowledgeManagement, the 1998 IEEE International Conference on DataEngineering, and the Second IEEE International Conference on DataMining. He had received several IBM honors including two IBMOutstanding Innovation Awards, an Outstanding Technical AchievementAward, two Research Division Awards and the 94th plateau of InventionAchievement Awards. He was an IBM Master Inventor. He received aResearch Contributions Award from IEEE International Conference onData Mining in 2003 and also an IEEE Region 1 Award for “promotingand perpetuating numerous new electrical engineering concepts” in1999. He is a fellow of the IEEE and the ACM.

LIU ET AL.: UNCERTAIN ONE-CLASS LEARNING AND CONCEPT SUMMARIZATION LEARNING ON UNCERTAIN DATA STREAMS 483

Page 17: 468 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA …203.170.84.89/~idawis33/DataScienceLab/.../TKDE-Liu14-uncertain.fi… · Uncertain One-Class Learning and Concept Summarization Learning

Longbing Cao is a professor in the Faculty ofInformation Technology, University of Technol-ogy, Sydney. His research interests include datamining, multiagent technology, and agent anddata mining integration. He is a senior memberof the IEEE.

Yun Zhang is with the Department ofAutomation, Guangdong University of Technol-ogy. His research interests include controltheory and systems engineering, nonlinearsystems, and pattern recognition.

Zhifeng Hao is with the Faculty of Computer,Guangdong University of Technology. His cur-rent research interests include design andanalysis of algorithm, mathematical modeling,and combinatorial optimization.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

484 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014