positive unlabeled learning for time series classification

Positive Unlabeled Positive Unlabeled Learning for Time Learning for Time

Series ClassificationSeries Classification

Nguyen Minh NhutXiao-Li LiSee-Kiong Ng

Institute for Infocomm Research, Singapore

OutlineOutline

Introduction and Related Work

The Proposed Technique Learning from Common Local Clusters (LCLC )

Evaluation Experiments

Conclusions

Discussion and question

1. Introduction1. Introduction

Traditional Supervised Learning

– Given a set of labeled training examples of n classes, the system uses this set to build a classifier.

– The classifier is then used to classify new examples into the n classes.

Typically require a large number of labeled examples:

– Human labels can be expensive, time consuming and sometimes even impossible.

Unlabeled DataUnlabeled Data

Unlabeled data are usually plentiful.

Can we label only a small number of examples and make use of a large number of unlabeled examples to learn?

Unlabeled data contain information which can be used to improve the classification accuracy.

Positive-Unlabeled (PU) Positive-Unlabeled (PU) LearningLearning

Positive examples: We have a set of examples of interesting class P, and

Unlabeled set: also has a set U of unlabeled (or mixed) examples with instances from P and also not from P (negative examples).

Build a classifier: using P and U to classify the data in U as well as future test data.

PU Learning for Time PU Learning for Time Series ClassificationSeries Classification

PU learning is applicable in a wide range of application domains such as: text classification, bio-medical informatics, pattern recognition, and recommendation system…

However, the application of PU learning for time series data has been relatively less explored due to:– High feature correlation.– Lack of joint probability distribution over words and

collocations (in texts).

Existing PU Learning for Existing PU Learning for Time Series ClassificationTime Series Classification

Wei, L. and E. Keogh (2006). "Semi-supervised time series classification." ACM SIGKDD.

The idea is: Let the classifier teach itself by its own predication.

Unfortunately, without a good stopping criterion, the method often stops too early, resulting in high precision but low recall.

The Proposed Technique: LCLC The Proposed Technique: LCLC ( Learning from Common Local ( Learning from Common Local

Clusters)Clusters)

The proposed LCLC algorithm addresses two specific issues in PU learning for time series classification:– Select independent and relevant features

from the time series data using cluster-based approach.

– Accurately extract reliable positive and negatives from the given unlabeled data.

Local Clustering and Feature Local Clustering and Feature SelectionSelection

P

U

U

U

U

U

U

U

U

U

U

Algorithm 1. Local clustering and feature selection Input: one initial seed positive s, unlabelled dataset U, number of clusters K 1. Use Wei’s method to get an initial positive set

P; 2. K-ULCs Partition U into K local clusters

using K-means; 3. Select K common principal features from the

raw feature set Clever-Cluster(P, K-ULCs);

Yoon, H., K. Yang, et al. (2005). "Feature subset selection and feature ranking for multivariate time series." IEEE Transactions on Knowledge and Data Engineering

The cluster-based approach of the proposed LCLC method offers the following advantages:

The cluster-based approach is much more robust than instance-based methods for extracting the likely positives and negatives from U.

The similarity between two time series data can be effectively measured using a well-selected subset of the common principal features that can capture the underlying characteristics of both positive and unlabeled clusters.

Local Clustering and Feature Local Clustering and Feature SelectionSelection

Extracting Reliable Negative set Extracting Reliable Negative set

1. RN = , AMBI= ;2. For i=1 to K3. Compute the distance between local cluster

ULCi to P;

4. Sort d(ULCi , P) (i=1, 2, …, K) in a decreasing order;

5. dMedian= the median distance of d(ULCi , P) (i=1, 2,…, K);

6. For i=1 to K

7. If (d(ULCi , P)> dMedian)

8. RN= RN ULCi;9. Else

10. AMBI = AMBI ULCi;

P

R N

R N

R N

R N

R N

A M B I

A M B I

A M B I

A M B I

A M B I

Algorithm 2. Extracting Reliable Negative ExamplesInput: positive data P, K unlabeled local clusters ULCi

Boundary decision using Cluster Boundary decision using Cluster chaining approachchaining approach

Algorithm 3. Identifying likely positive clusters LP and likely negative clusters LNInput: ambiguous clusters AMBI, positive cluster P, reliable negative clusters set in RN

P

D ecision B ou n d a ry

R N

R N

R N

R N

R N

A M B I

A M B I

A M B I

A M B I

A M B I

1. LP =; LN =; 2. While (AMBI!= ) 3. Find the nearest AMBI cluster CAMBI,A to P and

add CAMBI,A to cluster-chaini ; 4. While CAMBI,A RN 5. Find the nearest cluster CAMBI,B (from

AMBI RN) to CAMBI,A and add CAMBI,B to cluster-chaini;

6. CAMBI,A= CAMBI,B; 7. Loop for all the cluster-chains 8. breaking linki (Cm, Cm+1)= the link with

maximal distance between the clusters in the cluster-chain

9. LP AMBI from the cluster-chain with P 10. LN AMBI from the cluster-chain without P

Boundary decision using Cluster Boundary decision using Cluster chaining approachchaining approach

Minimize the effect of possible noisy examples.

Offer a robust solution for the cases of severely unbalanced positive and negative examples in the unlabeled dataset U .

EMPIRICAL EVALUATIONEMPIRICAL EVALUATION

NameTraining set Testing set

Num ofFeaturesPositive Negative Positive Negative

ECG 208 602 312 904 86Word

Spotting109 796 109 796 272

Wafer 381 3201 381 3201 152Yoga 156 150 156 150 428CBF 155 310 155 310 128

Datasets

Wei, L. (2007). "Self Training dataset." http://alumni.cs.ucr.edu/~wli/selfTraining/.Keogh, E. (2008). "The UCR Time Series Classification/ Clustering Homepage" http://www.cs.ucr.edu/~eamonn/time_series_data/.

http://alumni.cs.ucr.edu/~wli/selfTraining/

http://www.cs.ucr.edu/~eamonn/time_series_data/

Experiment settingExperiment setting

We randomly select just one seed instance from the positive class for the learning phase, the rest of training data are treated as unlabeled data.

We build a 1-NN classifier using P together with LP as a positive training set, and RN together with LN as a negative training set.

We repeat our experiments 30 times and report the average values of the 30 results.

Overall performanceOverall performance

Dataset ECGWord

SpottingWafer Yoga CBF

Wei’s method 0.405 0.279 0.433 0.466 0.201

Ratana’s method 0.840 0.637 0.080 0.626 0.309

LCLC wo FS 0.631 0.608 0.637 0.808 0.599

LCLC wo CC 0.781 0.52 0.32 0.699 0.586

LCLC (F-measure) 0.867 0.727 0.724 0.854 0.701

Wei, L. and E. Keogh (2006)Ratanamahatana, C. and D. Wanichsan (2008).

Sensitivity of the size of Sensitivity of the size of local clusterslocal clusters

We have set the number of clusters K = Size(U)/ULC_size, where ULC_size is size of the unlabeled clusters.

ConclusionsConclusions

First, LCLC adopts a cluster-based method that is much more robust than instance-based PU learning methods.

Secondly, we have adopted a feature selection strategy that can take the characteristics of both positive and unlabeled clusters.

Finally, we have devised a novel cluster chaining approach to extract the boundary positive and negative clusters.

There are three key approaches that underlie LCLC’s improved classification performance over existing methods.

Discussion and Discussion and questionquestion

positive unlabeled learning for time series classification

Documents